Kinook Software Forum

Go Back   Kinook Software Forum > Ultra Recall > [UR] General Discussion

Reply
 
Thread Tools Rate Thread Display Modes
  #1  
Old 03-03-2010, 10:38 AM
tfjern tfjern is online now
Registered User
 
Join Date: 10-09-2007
Posts: 132
Bloating Behavior Despite Linking Files

Greetings! I am linking (not storing) hundreds of files, mostly pdfs, to a main UltraRecall (mega-dump) database, and despite the fact that all the files are linked (not stored, to repeat), the size of the UR database, now already over 100 megabytes, is bloating with each pdf file being linked (now it appears that when each file is linked, the database bloats one megabyte; 50 megabytes ago the increments were much smaller).

This is very strange and deeply disturbing behavior, indeed. I plan on linking many more files, so I am very worried that I will soon have a UR file of well over half a gigabyte, when presumably everything will start to slow down considerably (and I will still have many pdfs to link).

This defeats the purpose of linking files: namely, to have links to all my pdfs in a manageable UR database. I compact the database regularly, but the bloating seems to be progressing linked file by linked file. Any suggestions? To ask again: Why is my database now well over 100 megabytes when all the pdfs are linked, not stored?
Reply With Quote
  #2  
Old 03-03-2010, 10:55 AM
quant's Avatar
quant quant is online now
Registered User
 
Join Date: 11-30-2006
Posts: 967
because they are indexed

if you don't want pdf's indexed, remove ".pdf" from tools->options->import->file extentions to parse text
Reply With Quote
  #3  
Old 03-03-2010, 05:33 PM
tfjern tfjern is online now
Registered User
 
Join Date: 10-09-2007
Posts: 132
Kinookish Response (not a compliment)

Yes, for future pdfs, fine, but what about the currently bloated index? It still shows 160 megabytes (even after compacting). Will I be stuck with this bloated index forever? Is there an option hidden somewhere where the index can be removed or reduced? And what happens when pdfs, etc., are no longer indexed (and the index removed, if possible)? Does the database grind to a screeching halt? There doesn't seem to be any mention in the "help" file. Presumably searches will be slower, but how much slower? Be nice to know. Do I have to test things, first? Trial and error? Live and learn?

Lesson of the Month (slightly off-topic): as almost everyone with a gmail account already knows, Google surprised us by suddenly introducing into our gmail accounts a new feature: buzz. It proved to be a blunder of blunders (at least the way it was first introduced; now the feature is more or less "fixed," but still a source of numerous lawsuits, one reason it was so quickly tweaked). When asked how Google could blunder so spectacularly, some of the technicians involved in creating buzz replied: "well, we tested it in-house, and there were no problems. We didn't foresee any of these problems." Most software developers have great difficulty putting themselves in their customers' shoes.

Please always give detailed, simple explanations, or don't give them at all.
Reply With Quote
  #4  
Old 03-03-2010, 10:41 PM
$bill $bill is online now
Registered User
 
Join Date: 09-14-2006
Posts: 210
Oh my, I am ever so sorry that you are have difficulty, and more so that it has caused you to experience such hostility that you snapped at quant, one of our most valuable forum contributors who didn't realize that his clue did not match your degree of cluelessness....but on to address your difficulties....

Seems you must have overlooked the very important topic-- KEYWORDS (TAGGING)-- which was addressed in the Getting Started, Basic Concepts portion of the help file. Specifically KEYWORDS are AUTO-GENERATED from file type PDF (in the Professional edition only). Now since KEYWORDS (TAGS) are only used by searches to "accurately and efficiently locate info items in the database"--if such is not your intent, feel free to do as quant suggests and disable the keyword collection from the pdf's. Be aware that UltraRecall does not search information that lies outside of the Info Database, so searching will be limited to the attributes like title, date, URL, etc.

Now as to "will I be stuck with this bloated index forever?"...It would be easiest to change your definition of "bloated" if searching is useful to you. How about changing the import setting (as quant suggested), deleting the items, compacting and linking them again. If you have "full-text search enhancements enabled", I don't know any other way. If not, you can use the Item|Keyword dialog to delete the keywords. If the pdf's are small and the size of the Info Base is rising excessively, consider sending a copy of the database to kinook for evaluation.

Quote:
Originally posted by tfjern
Please always give detailed, simple explanations, or don't give them at all.
Readability Flesch-Kincaid Grade Level 11.2<too high!>, Reading Ease 61.1
Reply With Quote
  #5  
Old 03-04-2010, 03:25 AM
tfjern tfjern is online now
Registered User
 
Join Date: 10-09-2007
Posts: 132
Much help

Thanks, $Bill -- it's a little clearer now what the problem is. It looks like I will have to start over again from scratch, re-linking all the linked pdfs. So be it.
Reply With Quote
  #6  
Old 03-04-2010, 08:19 AM
kinook kinook is online now
Administrator
 
Join Date: 03-06-2001
Location: Colorado
Posts: 6,003
Re: Kinookish Response (not a compliment)

Quote:
Originally posted by tfjern
Yes, for future pdfs, fine, but what about the currently bloated index? It still shows 160 megabytes (even after compacting). Will I be stuck with this bloated index forever? Is there an option hidden somewhere where the index can be removed or reduced?
One option would be to create a new database and re-import the files.

Otherwise, if FTS is disabled (last option in compact/repair is unchecked):
1) Perform an advanced search of URL matches wildcard *.pdf
2) Select all search results.
3) Open the Item Keywords dialog (Item | Keywords) and Delete All auto-generated keywords.
4) Compact the database (Tools | Compact & Repair).

If FTS is enabled (last option in compact/repair is checked), use the SQLite console to perform the following statements:

DELETE FROM ftsItem;
VACUUM;

This will remove all text indexing from all items in the database and compact.

Quote:
And what happens when pdfs, etc., are no longer indexed (and the index removed, if possible)? Does the database grind to a screeching halt? There doesn't seem to be any mention in the "help" file. Presumably searches will be slower, but how much slower? Be nice to know. Do I have to test things, first? Trial and error? Live and learn?
Nothing special happens. Searches may be faster, but you won't be able to search on text in imported PDF files.
Reply With Quote
  #7  
Old 03-04-2010, 09:58 AM
tfjern tfjern is online now
Registered User
 
Join Date: 10-09-2007
Posts: 132
OK, but...

Thanks, Kinook, for the detailed and reasonably clear instructions.

After 1) performing an advanced search of URL matches wildcard *.pdf, 2) selecting all search results (1027 items selected), and 3) opening the Item Keywords dialog (Item | Keywords), all I get in Auto-generated keywords is one pdf (one count). That's it.

Then why is the file so large? I guess I will take your advice and start over with a new database, and then re-link all these files. Bummer!
Reply With Quote
  #8  
Old 03-04-2010, 10:06 AM
kinook kinook is online now
Administrator
 
Join Date: 03-06-2001
Location: Colorado
Posts: 6,003
With multiple selection, only common keywords (those found in all selected items) are listed in the Item Keywords dialog, but Delete All will delete all keywords for all selected items. Have you tried deleting all and compacting? If the file is still large afterwards, please send the info requested here:
http://www.kinook.com/Forum/showthre...?threadid=3038

Thanks.
Reply With Quote
Reply

Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



All times are GMT -5. The time now is 02:32 PM.


Copyright © 1999-2023 Kinook Software, Inc.