Glyph outline for a single glyph in a text page-object's font
Source:R/glyph_paths.R
pdf_glyph_path.RdReturns the path segments of the glyph rendered at font_size
in PDF user-space points. Useful for:
Value
A tibble with one row per glyph-path segment:
segment_indexinteger - 1-based.segment_typecharacter -"moveto","lineto","bezierto", or"unknown".x,ynumeric - point coordinates in PDF user-space points (the glyph's local coordinate system, scaled to the requestedfont_size).close_figurelogical -TRUEif this segment closes the current sub-path. Returns an empty tibble when PDFium reports no glyph outline.
Details
Reconstructing challenging character mappings — render the glyph at the character's reported unicode code point and compare to a reference rendering of that code point to see whether the font actually draws what its ToUnicode CMap claims.
Visualising the glyphs PDFium picked when extracting text.
Computing exact glyph silhouettes for layout / collision detection beyond what bounding boxes give you.
Wraps FPDFTextObj_GetFont -> FPDFFont_GetGlyphPath ->
FPDFGlyphPath_CountGlyphSegments /
FPDFGlyphPath_GetGlyphPathSegment.
Glyph code interpretation
glyph_code is the font's glyph identifier, not the unicode
code point — though for many fonts they coincide:
TrueType fonts with
/Identity-Hencoding (most modern embedded CID-keyed fonts): glyph code equals unicode code point. Passchars$codepointfrompdf_text_chars().TrueType fonts with a
cmap(e.g. WinAnsi or MacRoman encoding): glyph code is the encoded character code in the PDF stream, not the unicode value. The unicode <-> glyph map is opaque through the public PDFium API.Type 1 fonts: glyph code is the encoding-specific character code (1-byte for almost all PDF Type 1 fonts).
If the path comes back empty, the glyph code likely doesn't map
to a glyph in this font's encoding — try the character code
from the source content stream (visible in tools like pdfinfo -text) instead.
See also
pdf_glyph_width(), pdf_text_font_metrics(),
pdf_text_chars() for the per-character readout that drives
most "investigate this glyph" workflows,
pdf_text_obj_rendered_bitmap() when you want the rendered
pixels instead of the outline.
Examples
if (FALSE) { # \dontrun{
doc <- pdf_doc_open("weird-font.pdf")
page <- pdf_page_load(doc, 1)
text_obj <- Filter(\(o) o$type == "text", pdf_page_objects(page))[[1]]
# First visible character on the page:
chars <- pdf_text_chars(page)
first <- chars[!chars$is_generated, ][1, ]
pdf_glyph_path(text_obj, first$codepoint)
} # }