IMO *.pdf is a curse !
You need a powerful machine to render the pages fast enough
to step-pages in order to not loose concentration while trying
to absorb the contents.
I always do pdf2ascii and work with the text version when I can.
Currently I've got some pdf of Dykstra's code and 'formulars' of typed
pages which is of course just graphical-images.
To 'transpose' it I'd need 2 computers or screens: one to view the
pdf & one to enter the text manually.
Apparently most OCRs are intended for use with images scanned
from paper ? They should be capable of processing pdf-images ?
A process which does:
FOR pages first to last DO
save pdf-image;
OCR pdf-image to text
END,
would be usefull.
Is this feasible ?
Thanks for any feed back,
== Chris Glur.