View Full Version : TIFF / TIF support
Hi,
I am playing around with this superb piece of software :-), today it is about TIF versus PDF for scanned documents:
How about extended support for TIFF: Is there anything planned?
I think about:
(more important): Make text searchable (I mean the texts that are saved, by ocr for example, within the tiff files, when I scan them with the MS Office Document Imaging software. "same" thing as is already possible with pdf).
(less important, but it would be nice): Multipage tiff browsing possibility (add a back/forward button to view pages within UR)
kinook
01-05-2007, 02:29 PM
That hasn't been requested before. We'll add it to the list.
Hi there,
Nice to hear that - I have been extensively searching for TIFF software other than MODI (Microsoft Office Document Imaging) that can handle the text layer produced by MODI via OCR, or that can convert it to pdf by preserving the text layer: just found NOTHING. Seems like tiff is going to be a dead end for document handling :-(
If anybody felt discouraged to import OCR-ed tif files to UR, because of my "text is not searchable" statement from above: this was just wrong!
I forgot to set the proper options in UR for indexing .tif-files, to make texts stored within them searchable:
Added ";.tif" to tools|options|import|keywording - and every word is searchable within my tiffs...
kinook
01-15-2007, 09:18 AM
Very interesting. Could you ZIP and post or send a couple such TIF files and also indicate which OCR product+version you used to create them? Thanks.
Originally posted by kinook
Very interesting. Could you ZIP and post or send a couple such TIF files and also indicate which OCR product+version you used to create them? Thanks.
There are just two little tiff examples in the attachment, for file size limits. Physical Scans and ALL OCR was done directly from within Microsoft Office Document Imaging (MODI, MSPVIEW.EXE), which is a part of Microsoft Office - no other OCR software used (I think I had to install MODI manually after the first automatic install of MS Office, it was nested somewhere in "Office Tools"). I have also managed to easily convert different file formats to tiffs by printing to the "microsoft office document image writer" print driver, which should be automatically installed with MODI (provided it is not a 64bit windows system, grrrr). Let me know if I can help with anything more.
The OCR engine of MODI is nothing special, I think it uses something like the one in scansoft textbridge. File size is somewhat bigger and text recognition definitly inferior to pdf's made with good ocr software. But it can help to digitalize paper easily and - provided MS Office is already installed - it is for free, with a character recognition good enough to produce plenty of keywords for each file. The downside of these tiffs start off when you want to convert tiffs to other formats like pdf AND preserve the ocr-ed text; or when you want to view or copy the 'text layer' with another software than MODI. I think this software has to be written first...
Now to other good news (...if there is anyone else mad enough to use multipage tiffs within UR??):
There is at least one possibility to view multipage tiffs within UR:
Similar to viewing pdf's, tiffs can be viewed within UR via the browser together with 'alternatiff', a tiff-viewing browser plugin:
- installed the plugin to the Internet Explorer from alternatiff.com . It has to be registered for full functionality, but it is free. Its functionality can be tested on a test page in alternatiff.com.
- in UR, set .tif (and .tiff) to be viewed in the web browser via "Tools | Options | Browser | File extensions to display in internal browser view"
- deleted these extensions from the image viewer options in "Tools | Options | Documents | File extensions to display in image viewer"
Functionality is similar to a pdf reader - well, not quite yet, and as I am not paid from alternatiff.com I will stop my praise here ;-)
vBulletin® v3.8.11, Copyright ©2000-2024, vBulletin Solutions Inc.