Returns a single-row tibble that aggregates the most-asked-for facts about a PDF document: file path, page count, Info-dictionary metadata, structural feature flags (forms, attachments, bookmarks, signatures, JavaScript, tagged-PDF), counts for each of those feature groups, encryption state, and the file-ID tuple. Designed to replace the eight-or-so individual calls users typically chain together when triaging a PDF.
Arguments
- doc
A
pdfium_docfrompdf_doc_open(), or a character path.- password
Optional password for encrypted PDFs when
docis a path. Ignored whendocis an openpdfium_doc.
Details
Each column either exposes an existing reader or is a length()
over the matching pdfium_*_list. No new C-side work — purely an
R-side aggregation. See Columns below for the source reader
for each entry.
Columns
path— character; canonical path the doc was opened from, or"<raw bytes>"for in-memory loads.page_count,file_version— frompdf_doc_info().title,author,subject,keywords,creator,producer,creation_date,mod_date,trapped— frompdf_doc_info(); missing entries appear as"".creation_date_parsed,mod_date_parsed— POSIXct (UTC),NAwhen the source date is empty or unparseable. Frompdf_parse_date().is_tagged— frompdf_doc_is_tagged().is_encrypted—TRUEwhenpdf_doc_security()returns a non-NA revision;FALSEotherwise.security_revision— frompdf_doc_security();NAfor unencrypted PDFs.xref_valid— frompdf_doc_xref_valid().bookmark_count,attachment_count,signature_count,form_field_count,javascript_count,named_dest_count—length()ofpdf_doc_bookmarks(),pdf_attachments(),pdf_signatures(),pdf_form_fields(),pdf_doc_javascript(), andpdf_doc_named_dests()respectively. Zero when the document has none of the corresponding entries.has_page_labels—TRUEwhenpdf_page_labels()returns non-NA strings.file_id_permanent,file_id_changing— frompdf_doc_file_id(); UTF-8 hex strings orNA.
See also
pdf_doc_info() for the Info-dictionary subset alone,
the per-feature readers listed under Columns for richer
per-row data.
Examples
fixture <- system.file("extdata", "fixtures", "annotated.pdf",
package = "pdfium"
)
if (nzchar(fixture)) pdf_doc_summary(fixture)
#> # A tibble: 1 × 27
#> path page_count file_version title author subject keywords creator producer
#> <chr> <int> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 /home/… 1 14 "" "" "" "" "" ""
#> # ℹ 18 more variables: creation_date <chr>, mod_date <chr>, trapped <chr>,
#> # creation_date_parsed <dttm>, mod_date_parsed <dttm>, is_tagged <lgl>,
#> # is_encrypted <lgl>, security_revision <int>, xref_valid <lgl>,
#> # bookmark_count <int>, attachment_count <int>, signature_count <int>,
#> # form_field_count <int>, javascript_count <int>, named_dest_count <int>,
#> # has_page_labels <lgl>, file_id_permanent <chr>, file_id_changing <chr>