API Reference
Functions
convert(html, options?)
Converts an HTML string to Markdown.
import { convert } from 'markdown-for-agents';
function convert(html: string, options?: ConvertOptions): ConvertResult;Parameters:
html— the HTML string to convertoptions— optional ConvertOptions
Returns: ConvertResult
Example:
const { markdown, tokenEstimate, contentHash } = convert('<h1>Hello</h1>', {
extract: true,
baseUrl: 'https://example.com'
});createRule(filter, replacement, priority?)
Creates a conversion rule.
import { createRule } from 'markdown-for-agents';
function createRule(filter: string | string[] | ((node: Element) => boolean), replacement: (context: RuleContext) => string | null | undefined, priority?: number): Rule;Parameters:
filter— tag name, array of tag names, or predicate functionreplacement— function that returns the Markdown string,nullto remove, orundefinedto fall throughpriority— rule priority (default:100). Higher runs first.
Returns: Rule
getDefaultRules()
Returns the array of built-in conversion rules.
import { getDefaultRules } from 'markdown-for-agents';
function getDefaultRules(): Rule[];The result is cached — subsequent calls return the same array.
extractContent(document, options?)
Prunes a parsed DOM tree in-place, removing non-content elements.
import { extractContent } from 'markdown-for-agents/extract';
function extractContent(document: Document, options?: ExtractOptions): void;Parameters:
document— a domhandlerDocument(fromhtmlparser2)options— optional ExtractOptions
This mutates the document. Stripped elements are removed from the tree.
estimateTokens(text)
Estimates token, character, and word counts for a string.
import { estimateTokens } from 'markdown-for-agents/tokens';
function estimateTokens(text: string): TokenEstimate;Uses a ~4 characters per token heuristic.
markdownMiddleware(options?)
Creates a Web Standard middleware that converts HTML responses to Markdown based on the Accept header.
import { markdownMiddleware } from '@markdown-for-agents/web';
function markdownMiddleware(options?: MiddlewareOptions): (request: Request, next: Handler) => Promise<Response>;markdown(options?) (Express)
Creates an Express middleware for content negotiation.
import { markdown } from '@markdown-for-agents/express';
function markdown(options?: MiddlewareOptions): ExpressMiddleware;The middleware intercepts res.send(). When the client sends Accept: text/markdown and the response is HTML, the body is converted to Markdown.
markdown(options?) (Fastify)
Creates a Fastify plugin that registers an onSend hook for content negotiation.
import { markdown } from '@markdown-for-agents/fastify';
function markdown(options?: MiddlewareOptions): FastifyPlugin;Register it with fastify.register(markdown()). The plugin intercepts HTML responses when the client sends Accept: text/markdown.
markdown(options?) (Hono)
Creates a Hono middleware for content negotiation.
import { markdown } from '@markdown-for-agents/hono';
function markdown(options?: MiddlewareOptions): MiddlewareHandler;withMarkdown(handler, options?) (Next.js)
Wraps a Next.js route handler with Markdown content negotiation. Automatically includes nextImageRule to unwrap /_next/image optimization URLs. See the Next.js example for a complete working app with the proxy pattern.
import { withMarkdown } from '@markdown-for-agents/nextjs';
function withMarkdown(handler: NextMiddleware, options?: MiddlewareOptions): NextMiddleware;nextImageRule (Next.js)
A conversion rule that extracts original image URLs from Next.js /_next/image optimization paths. Automatically included by withMarkdown, but can also be used standalone with the core convert function.
import { nextImageRule } from '@markdown-for-agents/nextjs';
import { convert } from 'markdown-for-agents';
// Standalone usage
const { markdown } = convert(html, { rules: [nextImageRule] });Extracts the url query parameter from paths like /_next/image?url=%2Fphoto.png&w=640&q=75 and produces  instead of the optimized URL. Has priority 1 (higher than built-in rules).
Types
ConvertOptions
interface ConvertOptions {
extract?: boolean | ExtractOptions;
rules?: Rule[];
baseUrl?: string;
headingStyle?: 'atx' | 'setext';
bulletChar?: '-' | '*' | '+';
codeBlockStyle?: 'fenced' | 'indented';
fenceChar?: '`' | '~';
strongDelimiter?: '**' | '__';
emDelimiter?: '*' | '_';
linkStyle?: 'inlined' | 'referenced';
deduplicate?: boolean | DeduplicateOptions;
tokenCounter?: (text: string) => TokenEstimate;
serverTiming?: boolean;
}| Property | Type | Default | Description |
|---|---|---|---|
extract | boolean | ExtractOptions | false | Enable content extraction |
rules | Rule[] | [] | Custom conversion rules |
baseUrl | string | "" | Base URL for resolving relative URLs |
headingStyle | "atx" | "setext" | "atx" | Heading format |
bulletChar | "-" | "*" | "+" | "-" | Unordered list bullet |
codeBlockStyle | "fenced" | "indented" | "fenced" | Code block format |
fenceChar | "`" | "~" | "`" | Fence character |
strongDelimiter | "**" | "__" | "**" | Bold delimiter |
emDelimiter | "*" | "_" | "*" | Italic delimiter |
linkStyle | "inlined" | "referenced" | "inlined" | Link format |
deduplicate | boolean | DeduplicateOptions | false | Remove duplicate content blocks |
tokenCounter | (text: string) => TokenEstimate | Built-in heuristic | Custom token counter (see below) |
serverTiming | boolean | false | Measure conversion duration and return it in ConvertResult (see below) |
tokenCounter
Replace the built-in heuristic (~4 characters per token) with an exact tokenizer. The function receives the final markdown string and must return a TokenEstimate.
import { encoding_for_model } from 'tiktoken';
const enc = encoding_for_model('gpt-4o');
const { tokenEstimate } = convert(html, {
tokenCounter: text => ({
tokens: enc.encode(text).length,
characters: text.length,
words: text.split(/\s+/).filter(Boolean).length
})
});When used with middleware, the custom counter's tokens value is used for the x-markdown-tokens response header.
ConvertResult
interface ConvertResult {
markdown: string;
tokenEstimate: TokenEstimate;
contentHash: string;
convertDuration?: number;
}| Property | Type | Description |
|---|---|---|
markdown | string | The generated markdown string |
tokenEstimate | TokenEstimate | Token / character / word estimates |
contentHash | string | Deterministic content hash of the markdown output (FNV-1a, base36) |
convertDuration | number | Conversion time in milliseconds (only present when serverTiming is true) |
The contentHash is useful as an ETag value or cache key — the same markdown always produces the same hash.
Rule
interface Rule {
filter: string | string[] | ((node: Element) => boolean);
replacement: (context: RuleContext) => string | null | undefined;
priority?: number;
}filter— determines which elements the rule applies toreplacement— produces the Markdown output. Returnnullto remove,undefinedto fall through.priority— higher priority rules are checked first. Default:0for built-in rules,100forcreateRule.
RuleContext
interface RuleContext {
node: Element;
parent: Element | Document | null;
convertChildren: (node: Element | Document) => string;
options: ResolvedOptions;
listDepth: number;
insidePre: boolean;
insideTable: boolean;
siblingIndex: number;
}| Property | Type | Description |
|---|---|---|
node | Element | The current DOM element |
parent | Element | Document | null | Parent node |
convertChildren | (node) => string | Recursively convert children |
options | ResolvedOptions | Resolved converter options |
listDepth | number | Current list nesting depth (0 = not in list) |
insidePre | boolean | Whether inside a <pre> element |
insideTable | boolean | Whether inside a <table> element |
siblingIndex | number | Index of this node among its parent's children |
ExtractOptions
interface ExtractOptions {
stripTags?: string[];
stripClasses?: (string | RegExp)[];
stripRoles?: string[];
stripIds?: (string | RegExp)[];
keepHeader?: boolean;
keepFooter?: boolean;
keepNav?: boolean;
}| Property | Type | Default | Description |
|---|---|---|---|
stripTags | string[] | [] | Additional tags to strip |
stripClasses | (string | RegExp)[] | [] | Additional class patterns to strip |
stripRoles | string[] | [] | Additional ARIA roles to strip |
stripIds | (string | RegExp)[] | [] | Additional ID patterns to strip |
keepHeader | boolean | false | Keep <header> elements |
keepFooter | boolean | false | Keep <footer> elements |
keepNav | boolean | false | Keep <nav> elements |
DeduplicateOptions
interface DeduplicateOptions {
minLength?: number;
}| Property | Type | Default | Description |
|---|---|---|---|
minLength | number | 10 | Minimum block length (in characters) eligible for deduplication |
Blocks shorter than minLength are always kept, which protects separators (---), short headings, and formatting elements. Lower it to catch short repeated phrases like "Read more"; raise it for more conservative deduplication.
TokenEstimate
interface TokenEstimate {
tokens: number;
characters: number;
words: number;
}MiddlewareOptions
interface MiddlewareOptions extends ConvertOptions {
tokenHeader?: string;
timingHeader?: string;
contentSignal?: ContentSignalOptions;
}Extends ConvertOptions with:
| Property | Type | Default | Description |
|---|---|---|---|
tokenHeader | string | "x-markdown-tokens" | Response header name for token count |
timingHeader | string | "x-markdown-timing" | Response header name for the CDN-safe timing duplicate |
contentSignal | ContentSignalOptions | - | Publisher consent signals for the content-signal header |
When serverTiming is true (inherited from ConvertOptions), middleware sets both a Server-Timing header and an x-markdown-timing header with mfa.convert duration. The x-markdown-timing header carries the same value but survives CDN caching (some CDNs strip Server-Timing from cached responses). The Next.js middleware additionally includes mfa.fetch duration for the proxy self-fetch.
ResolvedOptions
type ResolvedOptions = Required<Omit<ConvertOptions, 'extract' | 'rules'>> & {
extract: boolean | ExtractOptions;
rules: Rule[];
};The fully resolved options object with all defaults applied. This is what rules receive in their context.