API Reference

Functions

`convert(html, options?)`

Converts an HTML string to Markdown.

import { convert } from 'markdown-for-agents';

function convert(html: string, options?: ConvertOptions): ConvertResult;

Parameters:

html — the HTML string to convert
options — optional ConvertOptions

Returns: ConvertResult

Example:

const { markdown, tokenEstimate, contentHash } = convert('<h1>Hello</h1>', {
    extract: true,
    baseUrl: 'https://example.com'
});

`createRule(filter, replacement, priority?)`

Creates a conversion rule.

import { createRule } from 'markdown-for-agents';

function createRule(filter: string | string[] | ((node: Element) => boolean), replacement: (context: RuleContext) => string | null | undefined, priority?: number): Rule;

Parameters:

filter — tag name, array of tag names, or predicate function
replacement — function that returns the Markdown string, null to remove, or undefined to fall through
priority — rule priority (default: 100). Higher runs first.

Returns: Rule

`getDefaultRules()`

Returns the array of built-in conversion rules.

import { getDefaultRules } from 'markdown-for-agents';

function getDefaultRules(): Rule[];

The result is cached — subsequent calls return the same array.

`extractContent(document, options?)`

Prunes a parsed DOM tree in-place, removing non-content elements.

import { extractContent } from 'markdown-for-agents/extract';

function extractContent(document: Document, options?: ExtractOptions): void;

Parameters:

document — a domhandler Document (from htmlparser2)
options — optional ExtractOptions

This mutates the document. Stripped elements are removed from the tree.

`estimateTokens(text)`

Estimates token, character, and word counts for a string.

import { estimateTokens } from 'markdown-for-agents/tokens';

function estimateTokens(text: string): TokenEstimate;

Uses a ~4 characters per token heuristic.

`markdownMiddleware(options?)`

Creates a Web Standard middleware that converts HTML responses to Markdown based on the Accept header.

import { markdownMiddleware } from '@markdown-for-agents/web';

function markdownMiddleware(options?: MiddlewareOptions): (request: Request, next: Handler) => Promise<Response>;

`markdown(options?)` (Express)

Creates an Express middleware for content negotiation.

import { markdown } from '@markdown-for-agents/express';

function markdown(options?: MiddlewareOptions): ExpressMiddleware;

The middleware intercepts res.send(). When the client sends Accept: text/markdown and the response is HTML, the body is converted to Markdown.

`markdown(options?)` (Fastify)

Creates a Fastify plugin that registers an onSend hook for content negotiation.

import { markdown } from '@markdown-for-agents/fastify';

function markdown(options?: MiddlewareOptions): FastifyPlugin;

Register it with fastify.register(markdown()). The plugin intercepts HTML responses when the client sends Accept: text/markdown.

`markdown(options?)` (Hono)

Creates a Hono middleware for content negotiation.

import { markdown } from '@markdown-for-agents/hono';

function markdown(options?: MiddlewareOptions): MiddlewareHandler;

`withMarkdown(handler, options?)` (Next.js)

Wraps a Next.js route handler with Markdown content negotiation. Automatically includes nextImageRule to unwrap /_next/image optimization URLs. See the Next.js example for a complete working app with the proxy pattern.

import { withMarkdown } from '@markdown-for-agents/nextjs';

function withMarkdown(handler: NextMiddleware, options?: MiddlewareOptions): NextMiddleware;

`nextImageRule` (Next.js)

A conversion rule that extracts original image URLs from Next.js /_next/image optimization paths. Automatically included by withMarkdown, but can also be used standalone with the core convert function.

import { nextImageRule } from '@markdown-for-agents/nextjs';
import { convert } from 'markdown-for-agents';

// Standalone usage
const { markdown } = convert(html, { rules: [nextImageRule] });

Extracts the url query parameter from paths like /_next/image?url=%2Fphoto.png&w=640&q=75 and produces ![alt](/photo.png) instead of the optimized URL. Has priority 1 (higher than built-in rules).

Types

`ConvertOptions`

interface ConvertOptions {
    extract?: boolean | ExtractOptions;
    rules?: Rule[];
    baseUrl?: string;
    headingStyle?: 'atx' | 'setext';
    bulletChar?: '-' | '*' | '+';
    codeBlockStyle?: 'fenced' | 'indented';
    fenceChar?: '`' | '~';
    strongDelimiter?: '**' | '__';
    emDelimiter?: '*' | '_';
    linkStyle?: 'inlined' | 'referenced';
    deduplicate?: boolean | DeduplicateOptions;
    tokenCounter?: (text: string) => TokenEstimate;
    serverTiming?: boolean;
}

Property	Type	Default	Description
`extract`	`boolean \| ExtractOptions`	`false`	Enable content extraction
`rules`	`Rule[]`	`[]`	Custom conversion rules
`baseUrl`	`string`	`""`	Base URL for resolving relative URLs
`headingStyle`	`"atx" \| "setext"`	`"atx"`	Heading format
`bulletChar`	`"-" \| "*" \| "+"`	`"-"`	Unordered list bullet
`codeBlockStyle`	`"fenced" \| "indented"`	`"fenced"`	Code block format
`fenceChar`	"`" \| "~"	"`"	Fence character
`strongDelimiter`	`"**" \| "__"`	`"**"`	Bold delimiter
`emDelimiter`	`"*" \| "_"`	`"*"`	Italic delimiter
`linkStyle`	`"inlined" \| "referenced"`	`"inlined"`	Link format
`deduplicate`	`boolean \| DeduplicateOptions`	`false`	Remove duplicate content blocks
`tokenCounter`	`(text: string) => TokenEstimate`	Built-in heuristic	Custom token counter (see below)
`serverTiming`	`boolean`	`false`	Measure conversion duration and return it in `ConvertResult` (see below)

`tokenCounter`

Replace the built-in heuristic (~4 characters per token) with an exact tokenizer. The function receives the final markdown string and must return a TokenEstimate.

import { encoding_for_model } from 'tiktoken';

const enc = encoding_for_model('gpt-4o');

const { tokenEstimate } = convert(html, {
    tokenCounter: text => ({
        tokens: enc.encode(text).length,
        characters: text.length,
        words: text.split(/\s+/).filter(Boolean).length
    })
});

When used with middleware, the custom counter's tokens value is used for the x-markdown-tokens response header.

`ConvertResult`

interface ConvertResult {
    markdown: string;
    tokenEstimate: TokenEstimate;
    contentHash: string;
    convertDuration?: number;
}

Property	Type	Description
`markdown`	`string`	The generated markdown string
`tokenEstimate`	`TokenEstimate`	Token / character / word estimates
`contentHash`	`string`	Deterministic content hash of the markdown output (FNV-1a, base36)
`convertDuration`	`number`	Conversion time in milliseconds (only present when `serverTiming` is `true`)

The contentHash is useful as an ETag value or cache key — the same markdown always produces the same hash.

`Rule`

interface Rule {
    filter: string | string[] | ((node: Element) => boolean);
    replacement: (context: RuleContext) => string | null | undefined;
    priority?: number;
}

filter — determines which elements the rule applies to
replacement — produces the Markdown output. Return null to remove, undefined to fall through.
priority — higher priority rules are checked first. Default: 0 for built-in rules, 100 for createRule.

`RuleContext`

interface RuleContext {
    node: Element;
    parent: Element | Document | null;
    convertChildren: (node: Element | Document) => string;
    options: ResolvedOptions;
    listDepth: number;
    insidePre: boolean;
    insideTable: boolean;
    siblingIndex: number;
}

Property	Type	Description
`node`	`Element`	The current DOM element
`parent`	`Element \| Document \| null`	Parent node
`convertChildren`	`(node) => string`	Recursively convert children
`options`	`ResolvedOptions`	Resolved converter options
`listDepth`	`number`	Current list nesting depth (0 = not in list)
`insidePre`	`boolean`	Whether inside a `<pre>` element
`insideTable`	`boolean`	Whether inside a `<table>` element
`siblingIndex`	`number`	Index of this node among its parent's children

`ExtractOptions`

interface ExtractOptions {
    stripTags?: string[];
    stripClasses?: (string | RegExp)[];
    stripRoles?: string[];
    stripIds?: (string | RegExp)[];
    keepHeader?: boolean;
    keepFooter?: boolean;
    keepNav?: boolean;
}

Property	Type	Default	Description
`stripTags`	`string[]`	`[]`	Additional tags to strip
`stripClasses`	`(string \| RegExp)[]`	`[]`	Additional class patterns to strip
`stripRoles`	`string[]`	`[]`	Additional ARIA roles to strip
`stripIds`	`(string \| RegExp)[]`	`[]`	Additional ID patterns to strip
`keepHeader`	`boolean`	`false`	Keep `<header>` elements
`keepFooter`	`boolean`	`false`	Keep `<footer>` elements
`keepNav`	`boolean`	`false`	Keep `<nav>` elements

`DeduplicateOptions`

interface DeduplicateOptions {
    minLength?: number;
}

Property	Type	Default	Description
`minLength`	`number`	`10`	Minimum block length (in characters) eligible for deduplication

Blocks shorter than minLength are always kept, which protects separators (---), short headings, and formatting elements. Lower it to catch short repeated phrases like "Read more"; raise it for more conservative deduplication.

`TokenEstimate`

interface TokenEstimate {
    tokens: number;
    characters: number;
    words: number;
}

`MiddlewareOptions`

interface MiddlewareOptions extends ConvertOptions {
    tokenHeader?: string;
    timingHeader?: string;
    contentSignal?: ContentSignalOptions;
}

Extends ConvertOptions with:

Property	Type	Default	Description
`tokenHeader`	`string`	`"x-markdown-tokens"`	Response header name for token count
`timingHeader`	`string`	`"x-markdown-timing"`	Response header name for the CDN-safe timing duplicate
`contentSignal`	`ContentSignalOptions`	-	Publisher consent signals for the `content-signal` header

When serverTiming is true (inherited from ConvertOptions), middleware sets both a Server-Timing header and an x-markdown-timing header with mfa.convert duration. The x-markdown-timing header carries the same value but survives CDN caching (some CDNs strip Server-Timing from cached responses). The Next.js middleware additionally includes mfa.fetch duration for the proxy self-fetch.

`ResolvedOptions`

type ResolvedOptions = Required<Omit<ConvertOptions, 'extract' | 'rules'>> & {
    extract: boolean | ExtractOptions;
    rules: Rule[];
};

The fully resolved options object with all defaults applied. This is what rules receive in their context.

API Reference ​

Functions ​

convert(html, options?) ​

createRule(filter, replacement, priority?) ​

getDefaultRules() ​

extractContent(document, options?) ​

estimateTokens(text) ​

markdownMiddleware(options?) ​

markdown(options?) (Express) ​

markdown(options?) (Fastify) ​

markdown(options?) (Hono) ​

withMarkdown(handler, options?) (Next.js) ​

nextImageRule (Next.js) ​

Types ​

ConvertOptions ​

tokenCounter ​

ConvertResult ​

Rule ​

RuleContext ​

ExtractOptions ​

DeduplicateOptions ​

TokenEstimate ​

MiddlewareOptions ​

ResolvedOptions ​

API Reference

Functions

`convert(html, options?)`

`createRule(filter, replacement, priority?)`

`getDefaultRules()`

`extractContent(document, options?)`

`estimateTokens(text)`

`markdownMiddleware(options?)`

`markdown(options?)` (Express)

`markdown(options?)` (Fastify)

`markdown(options?)` (Hono)

`withMarkdown(handler, options?)` (Next.js)

`nextImageRule` (Next.js)

Types

`ConvertOptions`

`tokenCounter`

`ConvertResult`

`Rule`

`RuleContext`

`ExtractOptions`

`DeduplicateOptions`

`TokenEstimate`

`MiddlewareOptions`

`ResolvedOptions`