All bundles Conversion bundle · 7 endpoints

Conversion

7 endpoints converting PDF, DOCX, XLSX, PPTX, CSV, HTML, and image OCR to clean markdown.

For RAG ingest, knowledge-base pipelines, and document-processing SaaS.

Endpoints in this bundle

Each endpoint is independently callable. Bundle membership is for discovery only — you do not need to opt in.

Method Path Credits Summary
POST /v1/convert/pdf-to-markdown 1 PDF to markdown.
POST /v1/convert/docx-to-markdown 1 Word .docx to markdown.
POST /v1/convert/xlsx-to-markdown 1 Excel .xlsx to markdown.
POST /v1/convert/pptx-to-markdown 1 PowerPoint .pptx to markdown.
POST /v1/convert/csv-to-markdown 1 CSV to markdown table.
POST /v1/convert/html-to-markdown 1 HTML to markdown (clean).
POST /v1/convert/ocr 1 Image OCR.
Recipe

Mixed-format knowledge-base ingest

  1. Inspect the file's MIME type and route to the corresponding /v1/convert/* endpoint.
  2. For scanned PDFs where pdf-to-markdown returns empty or very low text density, fall back to /v1/convert/ocr page by page.
  3. Post-process the returned markdown into chunks of your preferred size, embed, and write to your store.
Sample code

Try a request

Pick a language. Click to expand the snippet.

curl
curl -X POST https://api.ollagraph.com/v1/convert/pdf-to-markdown \
  -H "Authorization: Bearer $OLLAGRAPH_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://example.com/whitepaper.pdf"}'
python
import httpx, os

r = httpx.post(
    "https://api.ollagraph.com/v1/convert/pdf-to-markdown",
    headers={"Authorization": f"Bearer {os.environ['OLLAGRAPH_API_KEY']}"},
    json={"url": "https://example.com/whitepaper.pdf"},
    timeout=120.0,
)
print(r.json())
node
const res = await fetch("https://api.ollagraph.com/v1/convert/pdf-to-markdown", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.OLLAGRAPH_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ url: "https://example.com/whitepaper.pdf" }),
});
console.log(await res.json());
FAQ

Conversion bundle FAQ

Can I send the file body inline or only a URL?

Both. Pass `url` for us to fetch, or `file_b64` for an inline base64 body. Inline is capped at 25 MB; URL fetches at 100 MB.

Does pdf-to-markdown preserve tables?

Yes, as markdown table syntax when the layout is recoverable. Complex multi-column layouts with merged cells degrade to plain text rows.

What OCR engine backs /convert/ocr?

Tesseract 5 in our standard tier. Layout-aware OCR (Docling) is in pilot on a separate worker — contact us for access.

Ship with the Conversion bundle.

1,000 credits on signup. No card. Every endpoint in this bundle is live from minute one.

Try this bundle View on docs