Skip to contents

Returns a single-row tibble that aggregates the most-asked-for facts about a PDF document: file path, page count, Info-dictionary metadata, structural feature flags (forms, attachments, bookmarks, signatures, JavaScript, tagged-PDF), counts for each of those feature groups, encryption state, and the file-ID tuple. Designed to replace the eight-or-so individual calls users typically chain together when triaging a PDF.

Usage

pdf_doc_summary(doc, password = NULL)

Arguments

doc

A pdfium_doc from pdf_doc_open(), or a character path.

password

Optional password for encrypted PDFs when doc is a path. Ignored when doc is an open pdfium_doc.

Value

A one-row tibble.

Details

Each column either exposes an existing reader or is a length() over the matching pdfium_*_list. No new C-side work — purely an R-side aggregation. See Columns below for the source reader for each entry.

Columns

See also

pdf_doc_info() for the Info-dictionary subset alone, the per-feature readers listed under Columns for richer per-row data.

Examples

fixture <- system.file("extdata", "fixtures", "annotated.pdf",
  package = "pdfium"
)
if (nzchar(fixture)) pdf_doc_summary(fixture)
#> # A tibble: 1 × 27
#>   path    page_count file_version title author subject keywords creator producer
#>   <chr>        <int>        <int> <chr> <chr>  <chr>   <chr>    <chr>   <chr>   
#> 1 /home/…          1           14 ""    ""     ""      ""       ""      ""      
#> # ℹ 18 more variables: creation_date <chr>, mod_date <chr>, trapped <chr>,
#> #   creation_date_parsed <dttm>, mod_date_parsed <dttm>, is_tagged <lgl>,
#> #   is_encrypted <lgl>, security_revision <int>, xref_valid <lgl>,
#> #   bookmark_count <int>, attachment_count <int>, signature_count <int>,
#> #   form_field_count <int>, javascript_count <int>, named_dest_count <int>,
#> #   has_page_labels <lgl>, file_id_permanent <chr>, file_id_changing <chr>