Returns one tibble row per accessibility structure element
associated with the given page, walking PDFium's view of the
document's /StructTreeRoot depth-first. Each row carries the
element's structural type (e.g. "P", "H1", "Span",
"Figure"), its title / language / alternative text, the
marked-content ID linking it to a page-content tag, and the
tree shape (parent_index + level).
Arguments
- page
A
pdfium_pagefrompdf_page_load(), or apdfium_doc.- page_num
One-based page index. Only used when
pageis apdfium_doc. Ignored otherwise.
Value
A tibble with columns:
element_indexinteger - 1-based pre-order position in the page's tree walk.parent_indexinteger - theelement_indexof the parent element;0for top-level entries (children of the page's structure-tree root).levelinteger - 1-based nesting depth.typecharacter - the structural element type (/S), UTF-8. Common values follow the PDF spec's standard structure types (e.g."Document","Sect","P","H1","Span","Figure","Table").titlecharacter - the element's/Ttitle (often empty).langcharacter - the element's/LangIETF BCP 47 code (e.g."en","fr"); empty when none is set.alt_textcharacter - the element's/Altalternative text for assistive technology; empty when none is set.actual_textcharacter - the element's/ActualTextreplacement text; empty when none is set.idcharacter - the element's/IDstring (often empty).mcidinteger - the first marked-content ID associated with the element (whether direct/K Nor via the first/MCRchild);NAwhen the element has no marked content of its own (typical for container elements likeDocument/Sect).mcid_countinteger - how many marked-content IDs the element references;0for elements without content,1for the simple/K Ncase,>1for elements that span several content tags.obj_typecharacter - the element's/Typeentry (typically"StructElem"; empty when not set).attributeslist-column - a named list of the element's structural attributes (PDF spec table 354+), with R-typed values: logical for/O /Layout /BBox-like booleans, numeric for/RowSpan//ColSpan//StartIndent, character for/Placement//TextAlign//Lang-style names. Empty list when the element has no/Aattribute objects. Aggregated across all attribute dictionaries on the element (PDF's nested attribute-class layout is flattened to a single namespace).
Returns a 0-row tibble of the same schema when the page has no associated structure tree (typical for untagged PDFs).
Details
Most PDFs are not tagged; for those, this function returns a
0-row tibble. Tagging is required for print_high_res-quality
accessibility, screen-reader support, and PDF/UA conformance.
Wraps FPDF_StructTree_GetForPage, FPDF_StructTree_*Children,
FPDF_StructElement_GetType / GetTitle / GetLang /
GetAltText / GetActualText / GetID /
GetMarkedContentID.
See also
pdf_doc_is_tagged() for a fast yes/no check at the
document level.