One-call helper that opens a document (or accepts an already-open
one), enumerates every path object on the requested page, and
returns a tibble with one row per path segment carrying both the
geometry and the containing path's stroke / fill style and
bounding box. This is the function kmextract consumes via the
pdfium_native backend.
Arguments
- doc
Either a character scalar path to a PDF file, or an already-open
pdfium_docreturned bypdf_doc_open(). Whendocis a character path the document is opened and closed internally.- page_num
One-based page index (default
1).- password
Optional password for encrypted PDFs when
docis a path. Ignored whendocis already an openpdfium_doc.
Details
Returned tibble
Each row describes one path-segment operator (a moveto,
lineto, or bezierto), in the order PDFium emits them:
Path identity & segment geometry:
path_index- 1-based index of the parent path object on the pagesegment_index- 1-based segment index within the pathsegment_type-"moveto","lineto","bezierto", or"unknown"x,y- the segment's anchor / endpoint in PDF pointsclose_figure- logical, segment closes the current subpath
Style (constant across all rows of one path):
stroke_red,stroke_green,stroke_blue,stroke_alpha- 0-255 channels;NAif no strokestroke_width- PDF points;NAif no strokefill_red,fill_green,fill_blue,fill_alpha- 0-255 channels;NAif no fill
Path bounding box (constant across rows of one path):
bounds_left,bounds_bottom,bounds_right,bounds_top- PDF points
Attributes
page_size- named numericc(width, height)of the page in PDF points, frompdf_page_size()page_rotation- integer in{0, 90, 180, 270}, frompdf_page_rotation()text_runs- tibble with one row per text object on the page, the output ofpdf_text_runs().
Examples
fixture <- system.file("extdata", "fixtures", "shapes.pdf",
package = "pdfium"
)
if (nzchar(fixture)) {
paths <- pdf_extract_paths(fixture, page_num = 1)
head(paths)
attr(paths, "page_size")
attr(paths, "text_runs")
}
#> # A tibble: 1 × 13
#> obj_index bounds_left bounds_bottom bounds_right bounds_top font_size text
#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5 129. 103. 159. 114. 1 Hello
#> # ℹ 6 more variables: font_base_name <chr>, font_family <chr>,
#> # font_weight <int>, font_italic_angle <int>, font_is_embedded <lgl>,
#> # font_flags <int>