Returns one tibble row per content mark on the page object — the
tagged-PDF mechanism that links a piece of page content (a path,
a text run, an image, ...) to a structure element in
pdf_structure_tree(). Wraps FPDFPageObj_CountMarks,
FPDFPageObj_GetMark, FPDFPageObjMark_GetName,
_CountParams, _GetParamKey, _GetParamValueType, and the
_GetParamIntValue / _GetParamStringValue /
_GetParamBlobValue accessors.
Arguments
- obj
A
pdfium_objfrompdf_page_objects().
Value
A tibble with columns:
mark_indexinteger - 1-based position in the object's mark stack.namecharacter - the mark name (BDC tag).paramslist-column - a named list of the mark's parameter values. Values are typed in R: numeric forFPDF_OBJECT_NUMBER, character for_STRING/_NAME, raw vectors for blobs.
Returns a 0-row tibble of the same schema when the object has no marks (typical for content from untagged PDFs).
Details
Each mark carries a name (typically the structural type or BDC
tag — e.g. "P", "Span", "Artifact") and zero or more
parameters as key/value pairs. The most common parameter is
MCID (an integer linking the object to a structure tree
element's marked-content reference).
See also
pdf_structure_tree() for the structure-tree side of
the same linkage; pdf_obj_type().