AI tool comparison
MarkItDown vs qmd
Which one should you ship with? Here is the side-by-side panel verdict, pricing read, reviewer split, and community vote comparison.
Developer Tools
MarkItDown
Convert any file to Markdown — PDFs, Office docs, audio, images
75%
Panel ship
—
Community
Paid
Entry
MarkItDown is Microsoft's open-source Python utility that converts virtually any file format into clean, LLM-friendly Markdown. It handles PDFs, Word documents, PowerPoint presentations, Excel spreadsheets, HTML, CSV, JSON, XML, ZIP archives, images (with optional vision model descriptions), audio files (with transcription), YouTube URLs, and EPub files in one consistent interface. The key design philosophy is LLM-first: rather than trying to reproduce original formatting for human readers, MarkItDown preserves document structure—headings, lists, tables, links—in a format that language models naturally parse efficiently. It integrates with OpenAI-compatible vision clients for image descriptions and supports speech transcription for audio content. With 108k+ GitHub stars and still gaining nearly 2,000 per day, MarkItDown has become the default document ingestion layer for countless AI pipelines. As agents increasingly need to process real-world enterprise documents, this kind of robust conversion utility becomes critical infrastructure—turning messy business files into clean inputs that Claude or GPT-4o can reason about without token-wasting formatting artifacts.
Developer Tools
qmd
Local doc search engine with BM25 + vectors + LLM re-ranking — by Shopify's CEO
50%
Panel ship
—
Community
Free
Entry
qmd is a lightweight local search engine built by Tobi Luetke, CEO of Shopify, for indexing and querying personal knowledge bases, documentation, and meeting notes — entirely offline. It combines three retrieval approaches in a single pipeline: BM25 full-text search for exact keyword matches, vector semantic search via ONNX-based embeddings, and LLM re-ranking using GGUF models through node-llama-cpp. All three stages run locally with no cloud dependency. The tool ships in multiple deployment modes: a CLI for ad-hoc queries, a Node.js library for programmatic use, an HTTP service for local API access, and — most useful for AI workflows — a native MCP server that lets Claude Code, Cursor, and similar editors query your local knowledge base directly during coding sessions. The hybrid retrieval approach means it handles both "find the exact error message from last week's standup notes" and "what was our decision about the auth architecture" equally well. What makes this notable beyond its technical approach is provenance: Luetke shipped it as a personal tool he actually uses, not a startup product. The GitHub history shows active iteration and he's been talking about it on X. It's a credible signal of where pragmatic AI-augmented knowledge management is heading for technical users who prefer local-first tools.
Reviewer scorecard
“MarkItDown solves the boring-but-critical problem of getting messy enterprise docs into LLM-friendly formats. The breadth of format support—PDF, PowerPoint, Excel, YouTube URLs, audio—means one library covers your whole intake pipeline. 108k stars is the market's verdict.”
“Hybrid BM25 + vector + LLM re-rank is the right architecture for personal knowledge search — each layer catches what the others miss. The MCP server mode is genuinely useful: being able to ask Claude Code 'what did we decide about X last month' against my own notes changes the workflow. MIT licensed and from someone who ships real products.”
“Output quality varies wildly by format. Complex PDFs with multi-column layouts, tables, and embedded images still produce garbled Markdown. It's great for clean docs but 'any file' is aspirational—you'll spend time post-processing anything messy. Microsoft started this, then moved on; community maintenance is mixed.”
“This is a well-executed weekend project, not a production tool. It requires GGUF models and manual embedding setup — a meaningful friction barrier for non-technical users. The 'built by a CEO' narrative drives GitHub stars more than the technical differentiation. Obsidian with a local AI plugin gets you here with better UX.”
“Every enterprise AI pipeline needs a document ingestion layer. MarkItDown becoming a standard here signals we've moved past 'can LLMs reason?' to 'can LLMs process the full enterprise data stack?' That's a meaningful maturation point for production AI.”
“The pattern here — local hybrid retrieval as an MCP server feeding into AI coding agents — will be ubiquitous in two years. Today it's a technical power-user tool; tomorrow it's how everyone's AI assistant knows the institutional context behind the code. qmd is an early, clean implementation of that pattern.”
“Drop in a PDF, a PowerPoint deck, even a YouTube URL and get clean Markdown back for your AI workflows. No more copy-pasting reference materials into prompts. This single utility has quietly made AI-assisted research dramatically less painful.”
“I manage a lot of notes, references, and creative briefs, but the setup friction here — GGUF models, CLI configuration — makes this inaccessible for most creators. The concept is great; the UX needs a front-end before it reaches beyond developers.”
Weekly AI Tool Verdicts
Get the next comparison in your inbox
New AI tools ship daily. We compare them before you waste an afternoon.