Why is AI so bad at reading PDFs? | The Verge
Summary
The article explores the challenges AI faces in parsing PDFs, highlighting the limitations of current models and the innovative solutions being developed to improve document accessibility.
Why It Matters
Despite advancements in AI, the inability to effectively read and extract information from PDFs remains a significant barrier to utilizing vast amounts of data. This issue impacts various sectors, including legal and governmental, where document accessibility is crucial for transparency and efficiency.
Key Takeaways
- AI struggles with PDF parsing due to the format's design for visual fidelity rather than machine readability.
- Current AI models often confuse text elements, leading to inaccurate information extraction.
- Innovative startups are developing solutions to enhance PDF data accessibility, demonstrating potential for significant improvements in document management.
AIReportHow many AIs does it take to read a PDF?One of the humblest and most ubiquitous file formats is stumping the world’s most advanced models.by Josh DziezaFeb 23, 2026, 11:00 AM UTCLinkShareGiftKristen Radtke / The VergeJosh Dzieza is an investigations editor and feature writer covering technology and the people who make, use, and are affected by it. Since joining The Verge in 2014, he has won a Loeb Award for feature writing, among others.Last November, the House Oversight Committee had just released 20,000 pages of documents from the estate of Jeffrey Epstein, and Luke Igel and some friends were clicking around, trying to follow the threads of conversation through garbled email threads and a PDF viewer that was, frankly, “gross.” In the coming months, the Department of Justice would release its own batches of files, more than three million of them — again, all PDFs.This was a problem. While the Department of Justice had run optical character recognition over the text, it was not very good, Igel said, rendering the files more or less unsearchable.“There was no interface the government put out that allowed you to actually see any sort of summary of things like flights, things like calendar events, things like text messages. There was no real index. You just had to get lucky and hope that the document ID that you were looking at contains what you’re looking for,” said Igel, cofounder of the AI video editing startup Kino. What if, Igel thought, they built a Gmail clone ...