← Features / Imagery & media

PDF text scanning

Extract embedded text from uploaded PDFs and store it as post meta so the document’s contents become searchable in WordPress.

How it works

PDF text extraction kicks in when a PDF is uploaded to the media library: ClassifAI hands the file to Azure AI Vision’s Read API, which returns the embedded text page by page (including from PDFs that have been generated as scans of printed pages, where the API also runs OCR). The extracted text is stored as post meta on the attachment, which means standard WordPress search, REST queries, and search plugins like ElasticPress can index it without any extra configuration. Extraction runs as a background job because the Read API is asynchronous — the PDF appears in the library immediately, and the text fills in once the API has finished.

Configuration

  • Meta key used to store the extracted text on the attachment.
  • Allowed roles and an allowed-users list for granular access control.

Providers

PDF text scanning is the only ClassifAI feature dedicated to documents (rather than images, audio, or post content) and currently has a single supported provider:

  • Microsoft Azure AI Vision — the only supported backend exposing a production-grade Read API for PDFs.

Use cases

  • Whitepaper and report libraries where the document is the publication.
  • Public-records sites surfacing meeting minutes, budgets, and filings.
  • Academic and research archives where searchable full text is the difference between findable and lost.

From the WordPress experts at Fueled, formerly 10up.

We’ve been delivering enterprise-grade digital work on WordPress since 2011, building and growing sites for global newsrooms, Fortune 500 marketing teams, ambitious startups, and public-sector clients. Our team helps lead the official WordPress Core AI team and has led and contributed to multiple WordPress core releases.

We also partner directly with organizations to build with AI and bring it into their digital products and marketing — on or off WordPress.

15+

years building enterprise WordPress, since 2011

1M+

active installs across plugins authored by our team

Core

co-leads of the official WordPress AI Team

Multi

WordPress core releases led and contributed to