Skip to contents

Documents

pdf_doc_open()
Open a PDF document
pdf_doc_close()
Close a PDF document
pdf_page_count()
Count pages in a PDF document
pdf_doc_info()
Document-level metadata for a PDF
pdf_doc_meta()
Read one entry from a PDF's Info dictionary
pdf_parse_date()
Parse a PDF date string into POSIXct
pdf_doc_text()
Read every page's text in one call
pdf_doc_fonts()
Document-level rollup of every embedded / referenced font
pdf_doc_file_id()
Read the document's file identifier from its trailer
pdf_doc_page_mode()
Read the document's PageMode entry from its catalog
pdf_doc_permissions()
Permission flags from a PDF's encryption dictionary
pdf_doc_user_permissions()
User-level document permissions
pdf_doc_security()
Document security handler revision
pdf_doc_xref_valid()
Cross-reference table validity flag
pdf_doc_trailer_ends()
Byte offsets of every %%EOF trailer marker
pdf_doc_is_tagged()
Is the document marked as tagged?
pdf_doc_javascript()
Enumerate document-level JavaScript actions
pdf_doc_focusable_subtypes()
Annotation subtypes registered as keyboard-focusable
pdf_doc_viewer_preferences()
Read the document's viewer preferences
pdf_doc_viewer_preference_by_name()
Look up a /ViewerPreferences name-typed entry by key
pdf_doc_named_dests()
Enumerate the document's named destinations
pdf_doc_named_dest_by_name()
Resolve a named destination by name
pdf_doc_bookmarks()
Read the bookmark outline (table of contents) of a PDF
pdf_doc_bookmark_find()
Find a bookmark by its title
pdf_page_label()
Read the logical page label of a PDF page
pdf_page_labels()
Read every page's logical label in one call

Attachments

pdf_attachments()
List the files attached to a PDF document
pdf_attachment_data()
Read the raw bytes of an embedded file attachment
pdf_attachment_dict_value()
Look up an attachment-dictionary entry by key

Signatures

pdf_signatures()
List the digital signatures attached to a PDF document
pdf_signature_contents()
Read the raw bytes of a PDF signature's contents blob
pdf_signature_byte_range()
Read the signed byte ranges of a PDF signature

Pages

pdf_page_load()
Load a single page from an open PDF document
pdf_page_close()
Close a page handle
pdf_page_size()
Page dimensions in PDF points
pdf_page_rotation()
Page rotation in degrees
pdf_page_box()
Read a page's bounding box
pdf_page_links()
List the clickable links on a page
pdf_link_at_point()
Hit-test for the link annotation under a point
pdf_link_annot_at_point()
Hit-test for a link annotation, returning its annotation index
pdf_form_field_at_point()
Form-field hit-test for a point
pdf_page_actions()
Page additional actions (open / close handlers)
pdf_page_thumbnail()
Page embedded thumbnail
pdf_text_weblinks()
Auto-detected web links in a page's text

Annotations and form fields

pdf_annotations()
List the annotations on a PDF page
pdf_annot_at()
Construct a pdfium_annot handle for one annotation
as_pdfium_annot_list()
Coerce input to a pdfium_annot_list
as_tibble(<pdfium_annot_list>)
Tibble view of a pdfium_annot_list
pdf_annot_subtype()
Annotation subtype (string)
pdf_annot_subtype_code()
Annotation subtype code (integer enum)
pdf_annot_flags()
Annotation flag bitmask
pdf_annot_flags_decoded()
Annotation flags decoded as named logicals
pdf_annot_bounds()
Annotation bounding rectangle
pdf_annot_contents()
Annotation /Contents text
pdf_annot_title()
Annotation /T title (author) text
pdf_annot_subject()
Annotation /Subj subject text
pdf_annot_color()
Annotation /C colour (RGBA, 0..1)
pdf_annot_interior_color()
Annotation /IC interior colour (RGBA, 0..1)
pdf_annot_border_width()
Annotation border width
pdf_annot_font_size()
Annotation font size (FreeText / Widget subtypes)
pdf_annot_font_color()
Annotation font colour (RGB, 0..1)
pdf_annot_dict_value()
Read an annotation-dict entry by key
pdf_annot_appearance()
Appearance-stream string for an annotation
pdf_form_fields()
Enumerate AcroForm fields across the whole document
as_pdfium_form_field_list()
Coerce input to a pdfium_form_field_list
as_tibble(<pdfium_form_field_list>)
Tibble view of a pdfium_form_field_list
pdf_form_field_type()
Form-field type (string)
pdf_form_field_type_code()
Form-field type code (integer enum)
pdf_form_field_page_num()
Form-field page number

Page objects

pdf_page_objects()
Enumerate the objects on a page
pdf_obj_type()
Report the type of a page object
pdf_obj_bounds()
Axis-aligned bounding box of a page object
pdf_obj_rotated_bounds()
Rotated bounding quadpoints of a page object
pdf_obj_matrix()
Transformation matrix of a page object
pdf_obj_has_transparency()
Does a page object use alpha blending?
pdf_obj_is_active()
Active flag of a page object
pdf_obj_marks()
Content marks attached to a page object
pdf_obj_marked_content_id()
Direct marked-content ID for a page object

Paths

pdf_path_segments()
Path segments of a path page-object
pdf_path_stroke()
Stroke style of a path page-object
pdf_path_fill()
Fill color of a path page-object
pdf_path_dash()
Dash pattern of a path page-object
pdf_path_line_cap()
Stroke line-cap style of a path page-object
pdf_path_line_join()
Stroke line-join style of a path page-object
pdf_path_draw_mode()
Path draw mode (fill rule + stroke flag)

Text

pdf_text_font_size()
Font size of a text page-object
pdf_text_content()
Text content of a text page-object
pdf_text_runs()
Extract every text run on a page
pdf_text_font()
Font metadata of a text page-object
pdf_text_font_metrics()
Font ascent and descent for a text page-object's font
pdf_text_chars()
Per-character text extraction
pdf_text_colors()
Per-character fill and stroke colors and text-index mapping
pdf_text_render_mode()
Text-rendering mode of a text page-object
pdf_text_search()
Find every occurrence of a query string in a PDF
pdf_text_char_at_point()
Locate the character index nearest a (x, y) point on a page
pdf_text_index_from_char() pdf_text_char_from_text_index()
Map between PDFium's "all characters" and "extractable text" indices
pdf_text_char_obj_index()
Reverse-map a character index to its page-object index
pdf_text_obj_rendered_bitmap()
Rendered bitmap of a single text page-object
pdf_glyph_path()
Glyph outline for a single glyph in a text page-object's font
pdf_glyph_width()
Width of a glyph in a text page-object's font

Rendering

pdf_render_page()
Render a PDF page to a bitmap
pdf_render_page_with_matrix()
Render a PDF page with an arbitrary affine transformation
pdf_render_to_png()
Render a PDF page directly to a PNG file
plot(<pdfium_bitmap>)
Plot a pdfium_bitmap
as.raster(<pdfium_bitmap>)
Convert a pdfium_bitmap to base R's "raster" (character hex)
as.array(<pdfium_bitmap>)
Convert a pdfium_bitmap to a 3D RGBA array of doubles in 0..1
as.matrix(<pdfium_bitmap>)
Convert a pdfium_bitmap to a hex-color matrix

Images

pdf_image_info()
Inspect metadata for an embedded image
pdf_image_size()
Pixel size of an embedded image
pdf_image_bitmap()
Decoded image bitmap
pdf_image_rendered()
Rendered image bitmap (page CTM applied)
pdf_image_data()
Raw bytes of an embedded image stream
pdf_image_filters()
Filter chain for an embedded image stream
pdf_image_icc_profile()
Decoded ICC color profile bytes for an embedded image

Form XObjects

pdf_form_objects()
List the page objects nested inside a Form XObject

Clip paths

pdf_obj_clip_path()
Get the clip path attached to a page object
pdf_clip_path_count()
Count sub-paths in a clip path
pdf_clip_path_segments()
Read all segments of a clip path as a tibble

Structure tree (tagged PDF / accessibility)

pdf_structure_tree()
Read the tagged-PDF structure tree for a page

One-call extraction

pdf_extract_paths()
Extract all path geometry on a page into a single tibble

Document creation and serialisation

pdf_doc_new()
Create a new, empty PDF document
pdf_save()
Save a PDF document to disk
pdf_save_to_raw()
Save a PDF document to a raw vector

Structural mutation

Open a document with readwrite = TRUE (or build one with pdf_doc_new()) to enable these. See ADRs 011-018 for the writer-surface conventions.

pdf_page_new()
Add a new blank page
pdf_page_delete()
Delete a page from the document
pdf_pages_reorder()
Reorder pages
pdf_docs_merge()
Merge documents into a new PDF
pdf_n_up()
Combine N pages of a document into one
pdf_page_set_rotation()
Set a page's rotation
pdf_page_set_box()
Set one of a page's named bounding boxes
pdf_doc_set_language()
Set the document's declared language