Extract an embedded image to a file, picking a sensible format
Source:R/images.R
pdf_image_extract.RdConvenience helper over pdf_image_filters(), pdf_image_data(),
and pdf_image_bitmap(). Inspects the image's filter chain and
picks an on-disk format:
Value
Invisibly returns the path written, with the chosen extension applied. The output format can be retrieved by inspecting the file extension on the returned string.
Details
DCTDecode→ write the raw embedded bytes as.jpg.JPXDecode→ write the raw embedded bytes as.jp2.CCITTFaxDecode/JBIG2Decode/RunLengthDecode/FlateDecode/LZWDecode/ASCII85Decode/ASCIIHexDecodechains, or no filter → rasterize viapdf_image_bitmap()and write as a PNG usingpng::writePNG(). PNG round-trips the alpha channel when present.
Mirrors pypdfium2's PdfImage.extract() convenience.
See also
pdf_image_data() to get the raw bytes directly,
pdf_image_bitmap() for the decoded pixel matrix.
Examples
fixture <- system.file("extdata", "fixtures", "image.pdf",
package = "pdfium"
)
if (nzchar(fixture)) {
doc <- pdf_doc_open(fixture)
page <- pdf_page_load(doc, 1L)
imgs <- Filter(function(o) o$type == "image", pdf_page_objects(page))
if (length(imgs) > 0L) {
# The extension is chosen from the filter chain; pass a stem.
out <- pdf_image_extract(imgs[[1L]], tempfile())
basename(out)
}
pdf_page_close(page)
pdf_doc_close(doc)
}