Kinook Software Forum

Go Back   Kinook Software Forum > Ultra Recall > [UR] Suggestions

Reply
 
Thread Tools Rate Thread Display Modes
  #1  
Old 02-16-2005, 03:51 AM
Yanni Yanni is online now
Registered User
 
Join Date: 02-16-2005
Posts: 6
Lightbulb A feasible way to get full-text search in UR

Keyword indexing is undoubtedly the way to go for fast searches. At times, however, one needs a phrase search; unticipating that need and inserting a specific phrase as a keyword is often not practical.

SUGGESTION: As soon as the user types a second word in the search field, UR switches to phrase search. It finds the documents that contain the word with the least occurrences and does a full-text search only on those documents. This, although slower that UR's normal keyword search, it will still be much faster than a full-text search on all documents.

EXAMPLE: I type the term "unlimited possibilities." UR knows that the keyword "unlimited" is found in 40 documents while "possibilities" occurs in some 200 documents. So it starts a full-text search for "unlimited possibilities" on the 40 documents that contain "unlimited." (Or, if time-effective, the total length of the documents that contain each word can be the factor that decides which documents are searched.) Using wildcards or regular expressions would of course make the process a bit more complicated, but still faster than a raw power full-text search.
Reply With Quote
  #2  
Old 02-17-2005, 08:31 AM
kevina kevina is online now
Registered User
 
Join Date: 03-26-2003
Posts: 825
Your suggestion is a good one, and while simpler than implementing a complete full-text index, it will be a significant change that will require some research to test and implement.

This will be put on the list of things to do for a future release of Ultra Recall.
Reply With Quote
  #3  
Old 02-17-2005, 09:23 AM
ExtraLean ExtraLean is online now
Registered User
 
Join Date: 01-19-2005
Posts: 46
Quote:
Originally posted by kevina This will be put on the list of things to do for a future release of Ultra Recall.
Thanks for agreeing to look into this. As much as I like UR, the searching capability is probably the area that needs the most work, IMO. It does no good to build up a wealth of knowledge if it is hard to find it later. I sorely miss having the capability to do a full-text search!
Reply With Quote
  #4  
Old 01-24-2007, 06:04 PM
danson danson is online now
Registered User
 
Join Date: 01-10-2006
Posts: 96
I bet there is some way to make this even cleverer -

Can you think of some kind of datastructure that allows you to index not only what words occur in what documents but also some kind of offset from the beginning value?

I suppose the current index looks like:

WORD DOCUMENT-ID
================
wordA: 2 5 9 1 3
wordB: 2 12 99 293

You could update the index to show not just what documents the word lies in but also it's position:

wordA: 2(4) 5(29)...
wordB: 2(5) 9(23)...

So wordA occurs in document 2, offset 4 and document 5, offset 29.

Then searching for the phrase "wordA wordB" would simply be a case of returning all documents and comparing offsets that are different by 1 (or perhaps with some tolerance factor).

That final comparison can probably also be optimised with the right algorithm.

Perhaps though you do something much more clever already...

Daniel
Reply With Quote
Reply

Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



All times are GMT -5. The time now is 05:06 AM.


Copyright © 1999-2023 Kinook Software, Inc.