Volunteer testers for OCRmyPDF install instructions under Cygwin?
OCFmyPDF is a command-line utility that will take image-only PDFs,
perform OCR and add a text layer to the PDF, allowing it to be
searched. It is written in Python and C++, and on Linux is installed
via the Python 'pip' installer.
I tried installing it under Cygwin64 but ran into a compiler error
while building a dependency, pikepdf. This turned out to be fixable
by a single CFLAGS change (from -std=c++14 to -std=gnu++14), which the
maintainer of pikepdf (and OCRmyPDF) graciously fast-tracked.
Note: You may get a warning about the version of pip that came
with Cygwin being out of date. It is not required, but if you want
you can update pip to the latest version with
pip3 install --upgrade pip
But note that if you do this the command name will now be just
'pip' instead of 'pip3'.
There is one optional dependency, "unpaper" that is currently not
available under Cygwin. Without it, certain options such as --clean
will produce an error message. However, the OCR-to-text-layer
functionality is available. I'll take a look at building a Cygwin
version of unpaper.
I've tried this in a clean, minimal Cygwin install but would like to
get confirmation from a few other people before submitting this to the
OCRmyPDF maintainer for inclusion in their install instructions.
Is there anyone with interest in OCRmyPDF willing to try these
instructions and report back? Off-list is fine if that would be off-