|
#1
|
|||
|
|||
German language / Umlaute
Hello,
I am new to UR and just doing my first steps, and I have a question for which I could not find an answer in the doku or the forum. Most of the documents I use are in German language, so there are a lot of special characters like Ä, ä, Ö, etc. (Umlaute). Not only don't they appear in the normal text display of the items, but the words containing such characters are also ignored when building the automated keywords. That way, searching might sometimes become pointless. Is there a way to tell UR to handle these characters like normal ones? Thanks for your help, Peter. |
#2
|
|||
|
|||
I tested this now and found out that there is no Problem with the German Umlaut in the Title names or clean text-files, htm-files or Word-files documents imported into ultrarecall.
The Umlaute are shown in PDF Files but cannot be searched, even not with wildcards like "?" and "*" The search within the PDF-document with the search function of the pdf-reader works well In the Item Keyword window in all words with umlaute the umlaute are missing in the keyword list, but a far as I have seen only in pdf documents and not in the other kinds of documents mentioned above. For example "Küche" ist found with Aearch Value "Küche" in all documents except PDF. IN PDF it will be found with "Kche" and not with"K?che" Hartmut |
#3
|
|||
|
|||
Hello Hartmut,
thanks for your reply. You are right; I will have to use the search without Umlaute. |
#4
|
|||
|
|||
There did seem to be a problem with capturing accented Latin characters when keywording PDF documents. The main download at http://www.kinook.com/Download/UltraRecallProEval.exe has been udpated with a fix for this problem (UltraRecall.exe 3.5.3.1 in Help | About | Install Info after installing). You will need to re-import or synchronize (Item | Synchronize on the menu) PDF documents after installing to re-keyword.
|
#5
|
|||
|
|||
Thank you for your prompt attention.
Hartmut |
#6
|
|||
|
|||
Hello to all at Kinook,
this was in fact the fastest answer I ever got for any problem I ever had with any kind of software! Great! I installed the new version, it is also displayed in 'About | Install info', but now instead of leaving out these special characters, they are replaced with even more special ones: "AbkŸrzungen" instead of 'Abkürzungen', "PortrÅ*t" instead of 'Porträt', "der Gro§e" instead of 'der Große'. Maybe I get these results because my XP is running with the scheme "German / Germany"? Best regards, Peter. |
#7
|
|||
|
|||
Apparently. It works ok in our testing here on English XP when configured for German locale, but we don't have a German XP to test with. You might be able to temporarily change to English locale when importing. Or you can get back the old behavior by unzipping and double-clicking the .reg file in the attached zip file and restarting UR.
|
#8
|
|||
|
|||
OK, I am back to the old behaviour. Will there be a fix for this?
|
#9
|
|||
|
|||
We will report the problem to the vendor of the PDF component.
And please ZIP and send a couple of problem PDF files to support@kinook.com so we can verify whether the problem specific to your files. Thanks. |
#10
|
|||
|
|||
I have the german XP and don't have a problem in the PDF as far as I see.
Peter, did you follow this instructions of Kinook: "You will need to re-import or synchronize (Item | Synchronize on the menu) PDF documents after installing to re-keyword." I searched for PDF, marked all in the search result window und "ITEM SYNCHONIZE". Harmut |
#11
|
|||
|
|||
Quote:
I have tried re-import and synchronize. Now I have reinstalled UR (the version mentioned above), but the problem is still there. Please find a page attached for testing. Best regards, Peter. |
#12
|
|||
|
|||
I just tested with PDF2TXT V3.2 and it worked fine.
|
#13
|
|||
|
|||
Now I have found a few PDFs, for which the keywording sometimes is ok ("möglich"), sometimes is wrong ("mglich") in the same document. I suspect now that it might have something to do with the fonts. For files that only use fonts Reader defines as type '1' (embedded) keywording gets always wrong. For files that additionaly use fonts defined as 'TrueType', the results are mixed. Maybe this is the right track to find the error?
|
#14
|
|||
|
|||
Please ZIP and send a .urd file containing all problem PDFs imported (stored) to support@kinook.com. Thanks.
|
#15
|
|||
|
|||
It seems that our licensed version of the PDF2TXT component has some issues. We are trying to get a working version of the licensed component from the vendor.
|
Thread Tools | |
Display Modes | Rate This Thread |
|
|