Skip to content

API Reference

Functions

convert(html, options?)

Converts an HTML string to Markdown.

ts
import { convert } from 'markdown-for-agents';

function convert(html: string, options?: ConvertOptions): ConvertResult;

Parameters:

  • html — the HTML string to convert
  • options — optional ConvertOptions

Returns: ConvertResult

Example:

ts
const { markdown, tokenEstimate, contentHash } = convert('<h1>Hello</h1>', {
    extract: true,
    baseUrl: 'https://example.com'
});

createRule(filter, replacement, priority?)

Creates a conversion rule.

ts
import { createRule } from 'markdown-for-agents';

function createRule(filter: string | string[] | ((node: Element) => boolean), replacement: (context: RuleContext) => string | null | undefined, priority?: number): Rule;

Parameters:

  • filter — tag name, array of tag names, or predicate function
  • replacement — function that returns the Markdown string, null to remove, or undefined to fall through
  • priority — rule priority (default: 100). Higher runs first.

Returns: Rule


getDefaultRules()

Returns the array of built-in conversion rules.

ts
import { getDefaultRules } from 'markdown-for-agents';

function getDefaultRules(): Rule[];

The result is cached — subsequent calls return the same array.


extractContent(document, options?)

Prunes a parsed DOM tree in-place, removing non-content elements.

ts
import { extractContent } from 'markdown-for-agents/extract';

function extractContent(document: Document, options?: ExtractOptions): void;

Parameters:

This mutates the document. Stripped elements are removed from the tree.


estimateTokens(text)

Estimates token, character, and word counts for a string.

ts
import { estimateTokens } from 'markdown-for-agents/tokens';

function estimateTokens(text: string): TokenEstimate;

Uses a ~4 characters per token heuristic.


markdownMiddleware(options?)

Creates a Web Standard middleware that converts HTML responses to Markdown based on the Accept header.

ts
import { markdownMiddleware } from '@markdown-for-agents/web';

function markdownMiddleware(options?: MiddlewareOptions): (request: Request, next: Handler) => Promise<Response>;

markdown(options?) (Express)

Creates an Express middleware for content negotiation.

ts
import { markdown } from '@markdown-for-agents/express';

function markdown(options?: MiddlewareOptions): ExpressMiddleware;

The middleware intercepts res.send(). When the client sends Accept: text/markdown and the response is HTML, the body is converted to Markdown.


markdown(options?) (Fastify)

Creates a Fastify plugin that registers an onSend hook for content negotiation.

ts
import { markdown } from '@markdown-for-agents/fastify';

function markdown(options?: MiddlewareOptions): FastifyPlugin;

Register it with fastify.register(markdown()). The plugin intercepts HTML responses when the client sends Accept: text/markdown.


markdown(options?) (Hono)

Creates a Hono middleware for content negotiation.

ts
import { markdown } from '@markdown-for-agents/hono';

function markdown(options?: MiddlewareOptions): MiddlewareHandler;

withMarkdown(handler, options?) (Next.js)

Wraps a Next.js route handler with Markdown content negotiation. Automatically includes nextImageRule to unwrap /_next/image optimization URLs. See the Next.js example for a complete working app with the proxy pattern.

ts
import { withMarkdown } from '@markdown-for-agents/nextjs';

function withMarkdown(handler: NextMiddleware, options?: MiddlewareOptions): NextMiddleware;

nextImageRule (Next.js)

A conversion rule that extracts original image URLs from Next.js /_next/image optimization paths. Automatically included by withMarkdown, but can also be used standalone with the core convert function.

ts
import { nextImageRule } from '@markdown-for-agents/nextjs';
import { convert } from 'markdown-for-agents';

// Standalone usage
const { markdown } = convert(html, { rules: [nextImageRule] });

Extracts the url query parameter from paths like /_next/image?url=%2Fphoto.png&w=640&q=75 and produces ![alt](/photo.png) instead of the optimized URL. Has priority 1 (higher than built-in rules).


Types

ConvertOptions

ts
interface ConvertOptions {
    extract?: boolean | ExtractOptions;
    rules?: Rule[];
    baseUrl?: string;
    headingStyle?: 'atx' | 'setext';
    bulletChar?: '-' | '*' | '+';
    codeBlockStyle?: 'fenced' | 'indented';
    fenceChar?: '`' | '~';
    strongDelimiter?: '**' | '__';
    emDelimiter?: '*' | '_';
    linkStyle?: 'inlined' | 'referenced';
    deduplicate?: boolean | DeduplicateOptions;
    tokenCounter?: (text: string) => TokenEstimate;
    serverTiming?: boolean;
}
PropertyTypeDefaultDescription
extractboolean | ExtractOptionsfalseEnable content extraction
rulesRule[][]Custom conversion rules
baseUrlstring""Base URL for resolving relative URLs
headingStyle"atx" | "setext""atx"Heading format
bulletChar"-" | "*" | "+""-"Unordered list bullet
codeBlockStyle"fenced" | "indented""fenced"Code block format
fenceChar"`" | "~""`"Fence character
strongDelimiter"**" | "__""**"Bold delimiter
emDelimiter"*" | "_""*"Italic delimiter
linkStyle"inlined" | "referenced""inlined"Link format
deduplicateboolean | DeduplicateOptionsfalseRemove duplicate content blocks
tokenCounter(text: string) => TokenEstimateBuilt-in heuristicCustom token counter (see below)
serverTimingbooleanfalseMeasure conversion duration and return it in ConvertResult (see below)

tokenCounter

Replace the built-in heuristic (~4 characters per token) with an exact tokenizer. The function receives the final markdown string and must return a TokenEstimate.

ts
import { encoding_for_model } from 'tiktoken';

const enc = encoding_for_model('gpt-4o');

const { tokenEstimate } = convert(html, {
    tokenCounter: text => ({
        tokens: enc.encode(text).length,
        characters: text.length,
        words: text.split(/\s+/).filter(Boolean).length
    })
});

When used with middleware, the custom counter's tokens value is used for the x-markdown-tokens response header.


ConvertResult

ts
interface ConvertResult {
    markdown: string;
    tokenEstimate: TokenEstimate;
    contentHash: string;
    convertDuration?: number;
}
PropertyTypeDescription
markdownstringThe generated markdown string
tokenEstimateTokenEstimateToken / character / word estimates
contentHashstringDeterministic content hash of the markdown output (FNV-1a, base36)
convertDurationnumberConversion time in milliseconds (only present when serverTiming is true)

The contentHash is useful as an ETag value or cache key — the same markdown always produces the same hash.


Rule

ts
interface Rule {
    filter: string | string[] | ((node: Element) => boolean);
    replacement: (context: RuleContext) => string | null | undefined;
    priority?: number;
}
  • filter — determines which elements the rule applies to
  • replacement — produces the Markdown output. Return null to remove, undefined to fall through.
  • priority — higher priority rules are checked first. Default: 0 for built-in rules, 100 for createRule.

RuleContext

ts
interface RuleContext {
    node: Element;
    parent: Element | Document | null;
    convertChildren: (node: Element | Document) => string;
    options: ResolvedOptions;
    listDepth: number;
    insidePre: boolean;
    insideTable: boolean;
    siblingIndex: number;
}
PropertyTypeDescription
nodeElementThe current DOM element
parentElement | Document | nullParent node
convertChildren(node) => stringRecursively convert children
optionsResolvedOptionsResolved converter options
listDepthnumberCurrent list nesting depth (0 = not in list)
insidePrebooleanWhether inside a <pre> element
insideTablebooleanWhether inside a <table> element
siblingIndexnumberIndex of this node among its parent's children

ExtractOptions

ts
interface ExtractOptions {
    stripTags?: string[];
    stripClasses?: (string | RegExp)[];
    stripRoles?: string[];
    stripIds?: (string | RegExp)[];
    keepHeader?: boolean;
    keepFooter?: boolean;
    keepNav?: boolean;
}
PropertyTypeDefaultDescription
stripTagsstring[][]Additional tags to strip
stripClasses(string | RegExp)[][]Additional class patterns to strip
stripRolesstring[][]Additional ARIA roles to strip
stripIds(string | RegExp)[][]Additional ID patterns to strip
keepHeaderbooleanfalseKeep <header> elements
keepFooterbooleanfalseKeep <footer> elements
keepNavbooleanfalseKeep <nav> elements

DeduplicateOptions

ts
interface DeduplicateOptions {
    minLength?: number;
}
PropertyTypeDefaultDescription
minLengthnumber10Minimum block length (in characters) eligible for deduplication

Blocks shorter than minLength are always kept, which protects separators (---), short headings, and formatting elements. Lower it to catch short repeated phrases like "Read more"; raise it for more conservative deduplication.


TokenEstimate

ts
interface TokenEstimate {
    tokens: number;
    characters: number;
    words: number;
}

MiddlewareOptions

ts
interface MiddlewareOptions extends ConvertOptions {
    tokenHeader?: string;
    timingHeader?: string;
    contentSignal?: ContentSignalOptions;
}

Extends ConvertOptions with:

PropertyTypeDefaultDescription
tokenHeaderstring"x-markdown-tokens"Response header name for token count
timingHeaderstring"x-markdown-timing"Response header name for the CDN-safe timing duplicate
contentSignalContentSignalOptions-Publisher consent signals for the content-signal header

When serverTiming is true (inherited from ConvertOptions), middleware sets both a Server-Timing header and an x-markdown-timing header with mfa.convert duration. The x-markdown-timing header carries the same value but survives CDN caching (some CDNs strip Server-Timing from cached responses). The Next.js middleware additionally includes mfa.fetch duration for the proxy self-fetch.


ResolvedOptions

ts
type ResolvedOptions = Required<Omit<ConvertOptions, 'extract' | 'rules'>> & {
    extract: boolean | ExtractOptions;
    rules: Rule[];
};

The fully resolved options object with all defaults applied. This is what rules receive in their context.

Released under the MIT License.