View Full Version : Bloating Behavior despite Linking Files
tfjern
03-03-2010, 11:38 AM
Greetings! I am linking (not storing) hundreds of files, mostly pdfs, to a main UltraRecall (mega-dump) database, and despite the fact that all the files are linked (not stored, to repeat), the size of the UR database, now already over 100 megabytes, is bloating with each pdf file being linked (now it appears that when each file is linked, the database bloats one megabyte; 50 megabytes ago the increments were much smaller).
This is very strange and deeply disturbing behavior, indeed. I plan on linking many more files, so I am very worried that I will soon have a UR file of well over half a gigabyte, when presumably everything will start to slow down considerably (and I will still have many pdfs to link).
This defeats the purpose of linking files: namely, to have links to all my pdfs in a manageable UR database. I compact the database regularly, but the bloating seems to be progressing linked file by linked file. Any suggestions? To ask again: Why is my database now well over 100 megabytes when all the pdfs are linked, not stored?
quant
03-03-2010, 11:55 AM
because they are indexed
if you don't want pdf's indexed, remove ".pdf" from tools->options->import->file extentions to parse text
tfjern
03-03-2010, 06:33 PM
Yes, for future pdfs, fine, but what about the currently bloated index? It still shows 160 megabytes (even after compacting). Will I be stuck with this bloated index forever? Is there an option hidden somewhere where the index can be removed or reduced? And what happens when pdfs, etc., are no longer indexed (and the index removed, if possible)? Does the database grind to a screeching halt? There doesn't seem to be any mention in the "help" file. Presumably searches will be slower, but how much slower? Be nice to know. Do I have to test things, first? Trial and error? Live and learn?
Lesson of the Month (slightly off-topic): as almost everyone with a gmail account already knows, Google surprised us by suddenly introducing into our gmail accounts a new feature: buzz. It proved to be a blunder of blunders (at least the way it was first introduced; now the feature is more or less "fixed," but still a source of numerous lawsuits, one reason it was so quickly tweaked). When asked how Google could blunder so spectacularly, some of the technicians involved in creating buzz replied: "well, we tested it in-house, and there were no problems. We didn't foresee any of these problems." Most software developers have great difficulty putting themselves in their customers' shoes.
Please always give detailed, simple explanations, or don't give them at all.
$bill
03-03-2010, 11:41 PM
Oh my, I am ever so sorry that you are have difficulty, and more so that it has caused you to experience such hostility that you snapped at quant, one of our most valuable forum contributors who didn't realize that his clue did not match your degree of cluelessness....but on to address your difficulties....
Seems you must have overlooked the very important topic-- KEYWORDS (TAGGING)-- which was addressed in the Getting Started, Basic Concepts portion of the help file. Specifically KEYWORDS are AUTO-GENERATED from file type PDF (in the Professional edition only). Now since KEYWORDS (TAGS) are only used by searches to "accurately and efficiently locate info items in the database"--if such is not your intent, feel free to do as quant suggests and disable the keyword collection from the pdf's. Be aware that UltraRecall does not search information that lies outside of the Info Database, so searching will be limited to the attributes like title, date, URL, etc.
Now as to "will I be stuck with this bloated index forever?"...It would be easiest to change your definition of "bloated" if searching is useful to you. How about changing the import setting (as quant suggested), deleting the items, compacting and linking them again. If you have "full-text search enhancements enabled", I don't know any other way. If not, you can use the Item|Keyword dialog to delete the keywords. If the pdf's are small and the size of the Info Base is rising excessively, consider sending a copy of the database to kinook for evaluation.
Originally posted by tfjern
Please always give detailed, simple explanations, or don't give them at all.
Readability Flesch-Kincaid Grade Level 11.2<too high!>, Reading Ease 61.1
tfjern
03-04-2010, 04:25 AM
Thanks, $Bill -- it's a little clearer now what the problem is. It looks like I will have to start over again from scratch, re-linking all the linked pdfs. So be it.
kinook
03-04-2010, 09:19 AM
Originally posted by tfjern
Yes, for future pdfs, fine, but what about the currently bloated index? It still shows 160 megabytes (even after compacting). Will I be stuck with this bloated index forever? Is there an option hidden somewhere where the index can be removed or reduced?One option would be to create a new database and re-import the files.
Otherwise, if FTS is disabled (last option in compact/repair is unchecked):
1) Perform an advanced search of URL matches wildcard *.pdf
2) Select all search results.
3) Open the Item Keywords dialog (Item | Keywords) and Delete All auto-generated keywords.
4) Compact the database (Tools | Compact & Repair).
If FTS is enabled (last option in compact/repair is checked), use the SQLite console (http://www.kinook.com/Forum/showthread.php?threadid=2825) to perform the following statements:
DELETE FROM ftsItem;
VACUUM;
This will remove all text indexing from all items in the database and compact.
And what happens when pdfs, etc., are no longer indexed (and the index removed, if possible)? Does the database grind to a screeching halt? There doesn't seem to be any mention in the "help" file. Presumably searches will be slower, but how much slower? Be nice to know. Do I have to test things, first? Trial and error? Live and learn?Nothing special happens. Searches may be faster, but you won't be able to search on text in imported PDF files.
tfjern
03-04-2010, 10:58 AM
Thanks, Kinook, for the detailed and reasonably clear instructions.
After 1) performing an advanced search of URL matches wildcard *.pdf, 2) selecting all search results (1027 items selected), and 3) opening the Item Keywords dialog (Item | Keywords), all I get in Auto-generated keywords is one pdf (one count). That's it.
Then why is the file so large? I guess I will take your advice and start over with a new database, and then re-link all these files. Bummer!
kinook
03-04-2010, 11:06 AM
With multiple selection, only common keywords (those found in all selected items) are listed in the Item Keywords dialog, but Delete All will delete all keywords for all selected items. Have you tried deleting all and compacting? If the file is still large afterwards, please send the info requested here:
http://www.kinook.com/Forum/showthread.php?threadid=3038
Thanks.
vBulletin® v3.8.11, Copyright ©2000-2024, vBulletin Solutions Inc.