Skip to contents

pdfium provides idiomatic R bindings to Google’s PDFium engine — the same library that powers Chrome’s PDF viewer. It has two halves:

  • a read surface that exposes vector-path geometry — stroke / fill / Bezier control points / transformation matrices — alongside text, fonts, images, annotations, form fields, attachments, signatures, structure tree, and rendering. The path geometry, in particular, no other CRAN package surfaces today.
  • a mutation surface (opt-in via readwrite = TRUE) that lets you rotate / reorder / merge pages, draw fresh page objects, create and edit annotations, fill form fields, and add file attachments — then save the result.

What it is for

  • Auditing PDF figures (which lines, which colors, which fonts).
  • Extracting curves from regulatory filings and scientific publications.
  • Building PDF normalization pipelines that need geometry, not just text.
  • Filling AcroForm fields programmatically and flattening the result for downstream tooling.
  • Authoring programmatic PDFs from vector graphics, JPEG images, text in the 14 standard fonts or any TrueType / Type1 typeface, and annotations (think: figure callouts, table reports, annotated source documents). /Info-dict writes and on-save encryption are the remaining v0.1.0 gaps — both need upstream PDFium changes that we’ve proposed but Google hasn’t shipped yet.
  • Anything you’d otherwise drop into Python with pypdfium2.

See vignette("mutating-pdfs") for a walkthrough of the writer surface, and vignette("comparison") for how pdfium lines up against pdftools, qpdf, magick, tabulizer, and staplr.

Status

First CRAN release (0.1.0). The public API is documented on the pkgdown site and exercised at 100% R coverage; architectural decisions for the release are recorded under dev/decisions/.

Installation

pdfium downloads its libpdfium binary from bblanchon/pdfium-binaries at install time. The pinned version lives in tools/pdfium-version.txt. If your install runs without internet access, set PDFIUM_OFFLINE=1 and place the matching tarball under inst/pdfium-binaries/ before installing.

# Release version (once on CRAN):
install.packages("pdfium")

# Development version:
remotes::install_github("humanpred/rpdfium")

Example

library(pdfium)

doc <- pdf_doc_open(system.file("extdata", "fixtures", "minimal.pdf",
  package = "pdfium"
))
pdf_page_count(doc)
pdf_doc_close(doc)

More examples ship in the vignettes (vignette("getting-started", package = "pdfium"), etc.) and on the pkgdown site.

License

pdfium is MIT-licensed. The bundled libpdfium binary is BSD-3-Clause and is not distributed in the source tarball — see LICENSE.md and dev/decisions/ADR-003-binary-distribution.md.