Problem/Motivation

smalot/pdfparser (the library the module uses) defaults ignoreEncryption to false. Many PDFs — including non-password-protected ones from tools like Adobe Acrobat — include an /Encrypt dictionary in their trailer for owner-only DRM (print restrictions, copy restrictions, etc.). The library sees this and throws Secured pdf file are currently not supported. The exception is caught silently, logged to watchdog, and the module surfaces it as the generic parse error.

`Failed to parse PDF at public:/filename.pdf: Secured pdf file are currently not supported.`

Steps to reproduce

Install the module, upload a pdf with and /Encrypt dictionary, and see error message in dblog like "Failed to parse PDF at public://file.pdf: Secured pdf file are currently not supported."

Proposed resolution

Set Parser to ignore encryption, or add a config setting to optionally ignore encryption.

CommentFileSizeAuthor
pdfa11y-ignore-encryption.patch474 bytesadambraun

Issue fork pdfa11y-3584726

Command icon Show commands

Start within a Git clone of the project using the version control instructions.

Or, if you do not have SSH keys set up on git.drupalcode.org:

Comments

adambraun created an issue. See original summary.

joshuami made their first commit to this issue’s fork.

joshuami’s picture

Assigned: Unassigned » joshuami

Seems reasonable. I'll take a look.

joshuami’s picture

Assigned: joshuami » Unassigned
Status: Active » Needs review

@adambraun, I created a configuration option for ignoring encryption. Let me know if you think this will meet your need. I'm kinda on the fence about whether to make it ignore by default.

joshuami’s picture

Status: Needs review » Postponed (maintainer needs more info)

After a bit deeper investigation, smalot/pdfparser's setIgnoreEncryption() only skips the encryption check. It does not actually decrypt the content. Per the PDF spec (ISO 32000), even owner-password-only PDFs encrypt the content bytes (using the empty user password as the key). Compliant readers decrypt transparently, but smalot/pdfparser does not implement decryption. I tested this by creating a PDF with MacOS Preview that set's an owner password and allows for copying and printing. The resulting PDF was definitely still encrypted.

This means enabling ignoreEncryption allows parsing to proceed, but all checks run against garbled data, producing unreliable results (e.g., false passes on document title with encrypted bytes and it will always fail the tagging check even if the document is correctly tagged). So ignoring encryption really breaks two of the most important tests this model provides.

This is an upstream limitation tracked at smalot/pdfparser#653. We can revisit if smalot/pdfparser supports actual decryption.

If I have this wrong and you can point me to a PDF on a reputable site (perhaps a government or education website) that has DRM but not encryption that I can download and inspect, I'm totally willing to reopen this and try to implement a configuration option.