Returns one row per URL that PDFium's web-link detector finds in
the page's extracted text. Detected patterns include http://...,
https://..., www.example.com, and mailto:user@host. Wraps
FPDFLink_LoadWebLinks plus FPDFLink_GetURL,
FPDFLink_GetTextRange, FPDFLink_CountRects, and
FPDFLink_GetRect.
Arguments
- page
A
pdfium_pagefrompdf_page_load(), or apdfium_doc.- page_num
One-based page index. Only used when
pageis apdfium_doc. Ignored otherwise.
Value
A tibble with one row per detected URL and columns:
url(character) — the matched URL string. UTF-8.start_char(integer) — 0-based character offset of the URL on the page's text page.char_count(integer) — number of characters in the matched span.left,bottom,right,top(numeric) — axis-aligned union of the URL's per-line rectangles in PDF user-space points.NAwhen PDFium reports no bounds.
Returns a 0-row tibble of the same schema when no URLs are detected.
Details
This is distinct from pdf_page_links(), which enumerates the
clickable link annotations declared by the PDF author. Use
pdf_text_weblinks() when the URL appears as plain text on the
page (no link annotation), and pdf_page_links() when you want
the explicit clickable regions.
Multi-line URLs produce one row whose bounding box is the
axis-aligned union of every contributing line's rectangle. If you
need a rectangle per line, pair start_char and char_count with
pdf_text_chars() over start_char:(start_char + char_count - 1L).
See also
pdf_page_links() for link annotations,
pdf_text_search() for arbitrary string search.