I just saved 70,000 claude tokens with one Python tool.

When I first started using Claude, I used to upload pdfs, images and a lot of files for reference, for good prompts.

But what I didn’t notice, was that uploading two or more pdfs of 40 pages, would eat up my one session tokens in like 10 minutes. The work was easy, no coding, just document writing, on markdown, still why is it taking up this much tokens? I wondered. The claude who can make apps in 2 hours, with still only 10 percent tokens used is getting used up in 10 minutes.

That’s when I discovered about Microsoft’s MarkItDown. You’ve got PDFs, Word docs, PowerPoints, spreadsheets, audio files, images — and on the other side, a large language model that wants clean, structured text.

Microsoft’s MarkItDown is a direct answer to that problem. With over 123,000 GitHub stars and 8,300 forks, it has clearly struck a nerve., to those who know about it. Here, I have explored what it does, how it works, and why it’s become a go-to tool for AI developers.

What Is MarkItDown?

MarkItDown is a lightweight, open-source Python library built by Microsoft’s AutoGen team. Its singular purpose: convert virtually any file format into clean, well-structured Markdown, the format that LLMs understand best.

The project’s README puts it plainly: it is “meant to be consumed by text analysis tools,” not necessarily for human-facing document conversion. This is infrastructure for AI pipelines, not a pretty PDF viewer.

Why Markdown, Specifically?

This is actually the most thoughtful part of the project’s design philosophy.

Markdown sits in a sweet spot: it’s almost plain text (so it stays lightweight and token-efficient), but it still carries structural meaning — headings, lists, tables, links, code blocks. LLMs like GPT-4o are trained on lots and lots of Markdown-formatted content, which means they know this language the most ;}

Comparing this with raw HTML (bloated with tags), PDFs converted to plain text (structure lost, tables mangled), or JSON (verbose, not prose-friendly). Markdown is the Goldilocks format for feeding documents to language models and it might save your claude usage too along with better answers, better contexts and informed decisions.

What Can It Convert?

Literally anything and everything:

Office documents: PDF, Word (.docx), Excel (.xlsx/.xls), PowerPoint (.pptx)
Web formats: HTML, XML, JSON, CSV
Media: Images (with EXIF metadata extraction and OCR support), Audio (EXIF metadata + speech transcription)
Other: YouTube URLs (pulls transcriptions), EPubs, ZIP files (iterates over contents), Outlook messages

For images and audio, MarkItDown can optionally hook into an LLM (like GPT-4o) to generate richer descriptions — not just metadata, but actual semantic content. For example, drop in a diagram-heavy slide deck and get back a Markdown document with image descriptions generated by a vision model.

How Does It Actually Work?

Architecture

MarkItDown uses a converter-based plugin architecture. Each file format has a dedicated converter class that knows how to extract and structure that format’s content. When you call md.convert("file.pdf"), the library:

Detects the file type (via extension or MIME type)
Selects the appropriate converter
Runs the conversion, preserving structure where possible
Returns a DocumentConverterResult object with a text_content attribute

The library is intentionally modular — you install only the dependencies you need.

Installation

# Full installation (all formats)
pip install 'markitdown[all]'
# Selective installation
pip install 'markitdown[pdf, docx, pptx]'

Basic Python Usage

from markitdown import MarkItDown
md = MarkItDown()
result = md.convert("report.pdf")
print(result.text_content)

Three lines. That’s it. The result is clean Markdown. Now, you can ship it back to your LLM, where you are paying the cost.

With LLM Vision (for images)

from markitdown import MarkItDown
from openai import OpenAI
client = OpenAI()
md = MarkItDown(llm_client=client, llm_model="gpt-4o")
result = md.convert("diagram.jpg")
print(result.text_content)

When an LLM client is provided, image files are sent to the vision model for description generation. This turns a JPEG of a flowchart into a textual description of what the flowchart shows. This makes it enormously useful for RAG pipelines.

Command Line

# Basic conversion
markitdown path-to-file.pdf > document.md
# With output file
markitdown path-to-file.pdf -o document.md
# With Azure Document Intelligence
markitdown path-to-file.pdf -o document.md -d -e "<your_docintel_endpoint>"

The Plugin System

MarkItDown supports third-party plugins, which are disabled by default. Now, there are already built extensions — the most notable being markitdown-ocr, which adds OCR support to PDF, DOCX, PPTX, and XLSX files by extracting text from embedded images using LLM vision.

pip install markitdown-ocr
pip install openai
# Then use normally with enable_plugins=True
md = MarkItDown(enable_plugins=True, llm_client=OpenAI(), llm_model="gpt-4o")
result = md.convert("scanned_invoice.pdf")

This is a significant capability for enterprise documents where important data lives inside embedded charts or scanned pages.

And trust me, this is still not the most impressive thing here.

Azure Document Intelligence Integration

For high-stakes, production use cases — legal documents, financial reports, forms — MarkItDown integrates with Microsoft’s Azure Document Intelligence service, which uses specialized ML models trained on document understanding tasks.

from markitdown import MarkItDown
md = MarkItDown(docintel_endpoint="<your_endpoint>")
result = md.convert("complex-form.pdf")

THIS is the premium option: better layout preservation, better table extraction, better handling of complex multi-column PDFs. It trades simplicity for accuracy.

Now, where could you use it? (newbie — version)

1. RAG Pipeline Preprocessing Before embedding documents into a vector database, convert everything to Markdown. Consistent structure means better chunking, better embeddings, better retrieval.

2. LLM Context Injection Need to pass a 50-page Word document to an LLM for summarization or Q&A? Convert to Markdown first for token efficiency and structural clarity.

3. Multi-Modal Document Understanding Combine MarkItDown’s conversion with a vision LLM to extract meaning from image-heavy slide decks or scanned documents — without building a custom pipeline from scratch.

4. Audio Transcription + Analysis Feed audio files into MarkItDown, get back a Markdown transcript, then analyze, summarize, or query it.

5. YouTube Video Analysis Pass a YouTube URL, get back the video’s transcript as Markdown. Useful for research, content repurposing, or training data collection.

Security Considerations (so, you don’t put sensitive tokens inside it)

The team is refreshingly candid about this: MarkItDown runs with the privileges of the current process. If you’re building a server-side application where users upload files, you need to sanitize inputs carefully.

The recommendation is to use the narrowest API method that fits your use case:

convert_local() for local files only
convert_stream() for maximum control
convert_response() when you're managing the HTTP fetch yourself

Don’t blindly pass user-controlled input to the general convert() method in a production environment.

How It Compares

The README itself compares MarkItDown to textract, the previous go-to Python library for text extraction. The key differentiator: MarkItDown preserves document structure as Markdown (headings, tables, lists, links), while textract focuses on raw text extraction. For LLM pipelines, structure is often as valuable as the text itself.

The Bigger Picture

MarkItDown is, at its core, a document ingestion layer — the unsexy but essential piece of infrastructure that sits between the messy real world (PDFs, PowerPoints, audio recordings) and the LLM-powered applications we’re building.

The fact that it’s built by the same team behind Microsoft’s AutoGen framework makes sense: AutoGen is about multi-agent AI workflows, and agents need to read documents. MarkItDown is the reading module.

With 123k stars and growing, the community has clearly recognized what this tool is: not glamorous, but indispensable.

Getting Started

pip install 'markitdown[all]'

Then:

from markitdown import MarkItDown
md = MarkItDown()
print(md.convert("any_file.pdf").text_content)

The GitHub repo is at github.com/microsoft/markitdown. It’s MIT licensed, actively maintained, and has a healthy community of contributors.

If you’re building anything that involves feeding documents to LLMs — RAG systems, document Q&A, AI agents — MarkItDown belongs in your stack.

I just saved 70,000 claude tokens with one Python tool.

What Is MarkItDown?

Why Markdown, Specifically?

What Can It Convert?

How Does It Actually Work?

Architecture

Installation

Basic Python Usage

With LLM Vision (for images)

Command Line

The Plugin System

Azure Document Intelligence Integration

Now, where could you use it? (newbie — version)

Security Considerations (so, you don’t put sensitive tokens inside it)

How It Compares

The Bigger Picture

Getting Started

More in artificial-intelligence

AI-based PDF Auto-tagging

The Hidden Cost of Vibe Coding Nobody Talks About

10 AI Skills That Will Still Matter in 2026

Ethical AI: Who, What, Where, When, and Why?

Write about the technologies shaping the future.

Why write for Cubed?

Posts Across the Network

Best Mergers and Acquisitions Advisory Firms for Enterprise Deals

10 Tools for Monitoring Remote Workers, Ranked by What They Prove

Reading the CS2 Skin Market: What the Numbers Actually Tell You

How CS2 Skin Marketplaces Actually Work, and How to Pick One

Top 10 React Native Development Companies With AI Product Engineering Capabilities in the USA

Why Online Stores Need a Marketing Strategy Built Around Sales Growth