A focused pipeline to parse medical guidelines (PDF/HTML) into structured JSON for downstream clinical RAG or summarization. This implements models, parsers, normalization utils, and a CLI to ingest ...
Today more than 2.5trn PDFs float in the ether. But will the format survive the ai revolution? PDFs still have drawbacks.