Searching across linked databases [Archive]

View Full Version : Searching across linked databases

tfjern

05-16-2008, 11:32 PM

I searched the forums, but couldn't find an answer to this question: if two or more databases can be linked, is it possible to search (say for specific words) across databases?

ashwken

05-17-2008, 12:16 AM

I'm pretty sure that it can't be done yet.

See Road Map - Approximate Priority for Future Releases, towards bottom of list:

http://www.kinook.com/Forum/showthread.php?threadid=3204

tfjern

05-17-2008, 04:19 AM

Thanks for the link. I guess that would be Multi-DB searches, which is scheduled for "future releases." Judging from the usual wait between releases, this would probably mean years from now. Am I right, kinook?

armsys

05-17-2008, 04:54 AM

Hi tfjern,
Have you ever thought of the cons of multiple URD search?
Instead of getting some 200 hits, you may wait for 2,000 hits or more, resulting into more noise than high value matches. Just my 2 cents.
Armstrong

tfjern

05-17-2008, 06:00 AM

The search function in Ultra Recall is, well, less than robust (a topic that has been discussed several times before). I've tested almost every similar program, from AskSam to OneNote, and abandoned them out of disappointment (e.g., AskSam too buggy, OneNote too slow, TheBrain -- too huh?, etc.).

But I'm still with Ultra Recall, despite its less than spectacular search function (I'm tiptoeing because I like the product, and I don't want to focus too much on a negative, but Ultra RECALL is designed to RECALL stored data, isn't it?).

I realize that I could end up with thousands of hits when searching over multiple databases, but how else am I supposed to find something in a huge pile of information that I've accumulated? You may ask, why use multiple databases in the first place? Just dump everything into one? But I can foresee that I am going to reach a point where I have too much data stored in one file. It would be nice to be able to link several databases with a global search function. I realize there are several third-party software options available, but...

Anyway, Ultra Recall is about as good a product as we are going to get for some time to come, so I for one am going to support it as much as I can, and hope that a global search function will be included in future versions. I guess it's already on the road map.

Say, isn't 3.5 supposed to be out by now?

ashwken

05-17-2008, 08:12 AM

Originally posted by tfjern
Say, isn't 3.5 supposed to be out by now?
Send support an email requesting the url for the beta download, latest beta 05-15-2008 - not sure when it's moving out of beta.

I also work with many UR databases, but these are somewhat segragated by topic or focus so there's little overlap.

For those instances where you have the need to search more than one db perhaps the overlap is an indication of common ground between the databases.

You could always do a periodic merge of the diffierent databases if you find yourself needing to constantly search the same ones individually.

'Course none of this compares to your original request.

quant

05-17-2008, 08:24 AM

Originally posted by ashwken
For those instances where you have the need to search more than one db perhaps the overlap is an indication of common ground between the databases.
I'm with ashwken on this. It's kind of contradicting - you create more db's because you think they should be separate, and then you are trying to search and find sth in more that one db on a given topic, or whatever?

I'd prefer if the contents of UR files were searchable by (lets face it, much more powerful) third party tools ...

tfjern

05-17-2008, 09:06 AM

Well, in the real world things aren't as meticulously organized as the planet some beings apparently inhabit, and a global search engine could be a useful tool in UR (why else would it be on the road map?).

There is so much information to store, and so little time to organize it, only the obsessively organized or well-disciplined can keep everything together. So for us lesser mortals we need all the help we can get, so could you give us some slack for once?

Geek said he was with ashwken on this, but ashwken himself works with "many UR databases." So what is the "this" that Geek is with?

armsys

05-17-2008, 11:05 AM

Geek said he was with ashwken on this, but ashwken himself works with "many UR databases." So what is the "this" that Geek is with?
Ashwken is the known UR guru enjoying multiple URDs.
tfjern, perhaps you misunderstood Quant's intent. Quant just illuminates the view that Ashwken isn't a big fan of multi URD search. But I could be wrong.
Armstrong

ashwken

05-17-2008, 02:41 PM

Originally posted by armsys
Ashwken is the known UR guru enjoying multiple URDs.
...that Ashwken isn't a big fan of multi URD search.
Thanks for the compliment, but it's way overstated.

Search across Mulitple URDs would be benefical in many circumstances.

Part of the reason I end up with multiple URDs is that I tend to push the database aspect of UR (probably) more than the authors intended.

For example, I have one URD for cataloging music - granted there are many good music catalog programs available, but none that allowed me the depth and customization that UR allows. 'Course there are some features available in specific music catalog applications that I can't duplicate in UR, but I've been able to accept these deficencies and found work arounds.

Another example is a URD for cataloging and researching the works of a specific author, and the spin-offs that his work inspired.

An example of overlap could be found in a URD I have for Current Events, and a separate URD for Run Up to War. Both of these URDs are more research oriented, as opposed to cataloging, and as such could benefit from multiple db search. An arguement could be made to merge these two databases since there is some intersection of purpose, but on the other hand there is enough difference to keep them separate (Current Events is more general, where Run Up to War is more focused).

The flexibility that UR allows is both powerful and frustrating, and I really enjoy learning how others are putting the program to use.

quant

05-17-2008, 04:26 PM

Originally posted by ashwken
An example of overlap could be found in a URD I have for Current Events, and a separate URD for Run Up to War. Both of these URDs are more research oriented, as opposed to cataloging, and as such could benefit from multiple db search. An arguement could be made to merge these two databases since there is some intersection of purpose, but on the other hand there is enough difference to keep them separate (Current Events is more general, where Run Up to War is more focused)
ok, just for the sake of discussing different approaches:

So the databases could be merged to a single one (where one can easily limit searches to only a part of database if desired). And with favourites and hoisting, it could really feel like you have two databases in a single file. Plus, if there are some things which are common, these could be connected (multiparenting) and promote understanding.

You say there is enough difference to keep them separate. Why would this "enough difference" imply keeping them separate? Are there completely different templates used for the similar things that would cause confusion?
Or is there a speed issue? DB file too big for OS to handle? Or some other compelling reasons? ;-)

armsys

05-17-2008, 08:54 PM

Hi tfjern,
If you have to deal with a huge number of URDs, you may find the Database toolbar (hidden by default) immensely useful. It's more efficient than pressing F6.
Armstrong

armsys

05-17-2008, 10:50 PM

Hi tfjern,
For multiple URDs, you can hyperlink any items such as folders, appintments, contacts and documents across multiple URDs. For details, see http://www.kinook.com/UltraRecall/Manual/internallinking.htm.
Armstrong

tfjern

05-18-2008, 12:02 AM

Thanks, armsys. Finally, some constructive help.

ashwken

05-18-2008, 02:49 AM

Originally posted by quant
ok, just for the sake of discussing different approaches:

So the databases could be merged to a single one (where one can easily limit searches to only a part of database if desired). And with favourites and hoisting, it could really feel like you have two databases in a single file. Plus, if there are some things which are common, these could be connected (multiparenting) and promote understanding.

You say there is enough difference to keep them separate. Why would this "enough difference" imply keeping them separate? Are there completely different templates used for the similar things that would cause confusion?

Or is there a speed issue? DB file too big for OS to handle? Or some other compelling reasons? ;-)
Thanks for forcing me to think this thing thru.

The example that I used was not a good example in support of needing Search across Multiple DB, but a good example of poor initial design. Yes, these two databases will be merged at some point.

Part of what I was trying to show (poorly) were some of the "pitfalls" that reveal themselves down the road. You start out thinking that two things are disimilar enough to warrant separate databases, but over time and with the assemblge of data the similarities begin to show - as evidenced by the requirement to search more than one db for a thing.

Let's try a different case for Multiple db Search.

I've got a db for Lead Tracking which contains default Contact records and history of interaction.

I've got another db for Transactions Listing / Sales Tracking which contains custom Contact and other forms.

Although both databases have the common element of being related to my work (and being somewhat Contact centric), are they disimilar enough in purpose to warrant separation?

In the Lead Tracking db the history of interaction consists a lot of email (and attachments plus other contact events), this db is getting pretty big.

A Lead (Contact record and history) can advance to the state of being a Customer or Client (enter into a Listing or Sale Transaction), and also become a Prospect for a new Transaction at a later date.

This seems to indicate that a Lead should remain in the Lead Tracking db and not be phyiscally moved simply because it has entered into a Transaction.

But at some point you are going to want to see the complete picture of your dealings with a Contact, the complete picture resides in two separate databases.

You can create a link (copy w/url, paste to rtf) between the corresponding Contact records upon the first instance of a Lead taking part in a Transaction - each record would require a link to the other.

Would the result of a Search across both databases yeild a better picture?

Maybe hashing this stuff out will help identify db design considerations.

quant

05-18-2008, 03:34 AM

Originally posted by ashwken
Let's try a different case for Multiple db Search.

I've got a db for Lead Tracking which contains default Contact records and history of interaction.

I've got another db for Transactions Listing / Sales Tracking which contains custom Contact and other forms.

Although both databases have the common element of being related to my work (and being somewhat Contact centric), are they disimilar enough in purpose to warrant separation?
I'm not sure I was convinced. What is the decisive "thing" in this case that made you split it into two databases as opposed to just being two different directories in the same db?

Originally posted by ashwken
[B]You can create a link (copy w/url, paste to rtf) between the corresponding Contact records upon the first instance of a Lead taking part in a Transaction - each record would require a link to the other.

I probably don't understand all details, but from the database design point of view, all these seems to me like a perfect example for just having different tables in the same database. Plus in 3.5 beta, the transaction table (transaction template) could have a contact as attribute (Info Item attribute), nice.

tfjern

05-18-2008, 10:46 AM

After reading ashwken's post, it occurred to me once again that there are basically two groups of people using Ultra Recall, and the rest spread out in between: a) power users like ashwken, who frown on those who commit database blunders such as duplicating data, and b) haphazard, bumbling users like me, who use Ultra Recall merely to store assorted data as it is collected ad hoc (web pages, emails, pdf files, xps files, docs, you name it, I store it, and rarely have time or energy to label entries with attributes), ala Evernote, AskSam, OneNote, or the multitude of other such user-friendly programs available.

What a waste, you may say. I realize I'm not getting as much as I could from using Ultra Recall, but still it's better than the above-mentioned programs, that is, it works quite well for the purposes I need, minus a global search engine, but this deficiency can be handled by third-party software.

StephenUK

05-18-2008, 05:54 PM

It seems to me that the request to search across several databases is well made.

I agree that different databases should ideally contain different types of information. But sometimes one later on forgets where one put something, or indeed, simply mis-files.

The ability to search all databases would not mean having to do so all the time! The default would presumably be only to search the current database (or branch of a database), but with the further ability to make a search of all (or perhaps specified) databases by eg ticking a box.

Yes, those searches of all databases would generate more noise, but they would only be made where necessary and the noise would be a small price to pay when wondering in which database one had placed something.

InfoSelect allows searching of several databases when needed and it is a very useful facility. Certain databases or parts of databases can be excluded from a search to keep down noise.

ashwken

05-18-2008, 08:33 PM

Originally posted by tfjern
After reading ashwken's post, it occurred to me once again that there are basically two groups of people using Ultra Recall, and the rest spread out in between: a) power users like ashwken, who frown on those who commit database blunders such as duplicating data, and b) haphazard, bumbling users like me, ...
tfjern,

My apologies for coming across in this manner, it was never my intention to insult or offend you. My intention has been to highlight the struggle that many of us face when using this program.

This is an imperfect medium for communication.

ashwken

05-19-2008, 01:11 AM

Originally posted by quant
I'm not sure I was convinced. What is the decisive "thing" in this case that made you split it into two databases as opposed to just being two different directories in the same db?

Part of it is that the databases have been created and developed at different times, evolving under my learning curve (of both UR and my workflow) and the evolving capabilities of UR.

Another part of it stems from the (growing) size of the Lead Tracking db, and later the Transactions db (may be a workflow problem in both cases). Size relates to both operational performance (possibly subjective) and backup (originally restricted to CD media, change to other media is now possible).

The Lead Tracking db uses the default UR Contact record to accept a Contact from Outlook. An Outlook record is built for each Lead that comes in. The record in Outlook needs to exist so that I can email from the Multiple Listing Service (MLS) program. This Outlook record is pretty much the master.

The Lead Tracking db has various folders that represent different stages of interaction with the Contact. For example, when initally sending a Contact record from Outlook to UR, there are a number of target folders that correspond to the source type of the Lead (a number of specific pubs, our website, other), the Default Child Item for each of these folders contains distinct default data.

This Default Child Item is a Folder that is entitled with the name of the Contact, the contact record is moved to a child of this Contact Name folder. Its Default Child Item is a Contact Event Folder which will contain the various things that constitute the Event (emails and such, or the Detail Pane could be text notes of the event with no children).

\Lead Source Type\new Contact Item
\Lead Source Type\Contact Name Folder

\Lead Source Type\Contact Name\Contact Item
\Lead Source Type\Contact Name\(date) Event
\Lead Source Type\Contact Name\(date) Event\email

The continued growth of this db is directly related to my workflow of moving the relavent emails from Outlook into UR. In many instances these emails contain sizable attachments (images, pdfs, ...). This may be a case of improper workflow (design decesion) - perhaps it would be better to manage these emails within Outlook (moving them to a separate pst), then link/sync them from UR (which is the more reliable format, pst or urd).

Another area that may need to be re-examined is the use of the Contact Name folder. This was originally utilized to track user defined Attributes in UR, it now may be the case where these Attributes can be created in Outlook and "carried thru" to UR, eliminating the need for the Contact Name Folder. Or maybe I'm not fully utilizing Outlook Categories and how they translate to UR keywords.

After the initial response (Contact Event) to the new Lead, the Contact Name folder is moved to another Folder that correspondes to a level of activity - Lead Processed (waiting for some response), Link Accessed (a weblink within an email has been accessed), Working (on-going interaction), Inactive (Removed-Bad Info).

\Lead Source Type\

\Lead - Processed\Contact Name\
\Lead - Link Accessed\Contact Name\
\Lead - Working\Contact Name\
\Lead - Inactive\Contact Name\

This part of the workflow seems to be a toss-up between physically managing the differnt aspects of Lead interaction (represented by the folder structure), or "flattening-out" the tree and managing the Lead interaction with a UR Attribute and pre-defined Searches (hoisitng, favorites).

==========

The Transaction (Listing / Sales) Tracking db takes a slightly different tact in dealing with Contacts. Here I'm having to use a Custom Contact Form due to needing more Contact fields than are recognized by UR from Outlook. And I'm having to create individual Contact records for each person. This is a modeling requirement - a person can enter into a Transaction as an individual or as part of a group (depends on how the Deed is made out). I've also got a custom Contact Form for Vendors.

For a Listing there are various supporting docs (emails, pdfs, word docs,...) that are linked or stored in the db. A Listing also requires additional tables (tree branch) that deal with Adv Tracking (both specific to a Lisitng and generic for the Firm), Lockbox Tracking, and Signage.

For a Sale there are the various supporting docs and such.

This db is also growing in size due to the storage of emails (and their attachments), again could be a workflow problem.

Listing and Sale transactions remain in the db even after completion.

Originally posted by quant
I probably don't understand all details, but from the database design point of view, all these seems to me like a perfect example for just having different tables in the same database.
This is probably more detail than you wanted, but talking this thing thru may be of benefit. I can see where you are coming from, unfortunately the choices we make are based on our understandings at the time. I'm not sure if I've made a convincing case or simply revealed areas in my work that need addressing.

HalfCyborg

08-05-2009, 01:59 PM

I'd just like to add my personal bump (+1) to this feature request.

I ALWAYS have UR open when I'm browsing the web. If I see anything of interest that I might like to refer to later (or maybe just read later), I click on the UR icon in the browser to import the page into UR.

I start a new database about every three months, because on a new, empty database, the imports are virtually instantaneous. Once the database gets up around 300-500 MB, the imports get really, really slow.

I've got some fantastic reference articles stored up over the last couple of years (mostly programming topics), and I would LOVE to be able to search them all at the same time. Instead, I find myself saying, "Now what month was I reading about that, I know I saved it in one of these databases..."

Another reason to create multiple databases for "bulk data" like this is to facilitate differential backups. Only my "current" database needs regular backups, the "legacy" databases from previous months have long since been burned to DVDs. If I kept everything in one DB, it would currently be over 12GB in size (which I'd have to back up every day - bleech). If I could only search them without opening them one-by-one...

quant

08-05-2009, 02:20 PM

Originally posted by HalfCyborg
I'd just like to add my personal bump (+1) to this feature request.

I ALWAYS have UR open when I'm browsing the web. If I see anything of interest that I might like to refer to later (or maybe just read later), I click on the UR icon in the browser to import the page into UR.

I start a new database about every three months, because on a new, empty database, the imports are virtually instantaneous. Once the database gets up around 300-500 MB, the imports get really, really slow.

I've got some fantastic reference articles stored up over the last couple of years (mostly programming topics), and I would LOVE to be able to search them all at the same time. Instead, I find myself saying, "Now what month was I reading about that, I know I saved it in one of these databases..."

Another reason to create multiple databases for "bulk data" like this is to facilitate differential backups. Only my "current" database needs regular backups, the "legacy" databases from previous months have long since been burned to DVDs. If I kept everything in one DB, it would currently be over 12GB in size (which I'd have to back up every day - bleech). If I could only search them without opening them one-by-one...

I'm still not convinced :)
You can keep everything but the latest one in a single DB (which you wouldn't need to back up every day) ... and you'd need to search only two DBs, the current one, and your "legacy" one, and only in the case that you are sure that it's not in the current one ... ;-)

What we need, IMHO, is to be able to run external searches on the UR databases. Did you ever try to find sth on your 12GB repository? Say "operator overloading" ... you'd need to play a long time to really find what you want, because you could either search for exact phrase, or for single keywords. But these keywords could be very far from each other, and you'd get items that are not what you really want. Also, UR's search won't recognize how many times are the keywords found, so when you get 100 hits, you don't know which one is the most probable you are looking for ... and many many other features that proper search programs offer.

That's the reason why non-essential stuff is only linked to UR in my case. I'm 100000% sure you'd do better if all your websites that you accumulated are just in the standard Windows directories that are indexed with one of the powerful search index engines that UR cannot compete with.

mikeg

08-08-2009, 11:17 AM

Sorry I don't have time at the moment to read this thread as carefully as it warrants or to make an equally considered response, but I wanted to jump in with my own desire for searching across multiple databases.

In my case, since I deal with computing topics both inside and outside my job I've settled on having one database for work and another for home. There is inevitable overlap. However, with no easy way that I'm aware of to keep the two synchronized, and with some home material being inappropriate for work anyway, I now maintain a home and work URD which have an estimated 30% overlap in types of information, but less than 1% exact match on specific information items.

Anyway, for me it is simpler to back up the latest home URD to my work machine and vice versa, then open and search across both as needed.

As a DBA and data architect, I'm familiar with the need to avoid redundant data and redundant databases, but, as tfjern and others have expressed in their own words, a database designed to store ad hoc information and free users from the rigid confines of modeling and normalization would benefit from cross-domain search capability. Cross domain/database search increases opportunity for browsing information that is hopefully not redundant, but is tightly or loosly related.

Now this can get kind of scary with gigantic databases and expanding into the realm of desktop search engines if the user's goal is to store most or all content (ALL documents, etc.) within UR and expect a search in UR to find everything on the computer. I avoid huge databases (and search results) by using UR for storing important ad hoc infomation scraps (from notes, Web pages, docs, emails, etc.), but only linking to larger information stores such as folders containing related, but numerous and bulky items. This system depends in part on a habit of maintianing reasonably organized folder structures outside of UR as well as inside.

Search across multiple open UR databases would just be another tool to help fill the gaps and make life easier while allowing users to work more the way they want... Hope I didn't drift off point here by skimming through too quickly.

wordmuse

08-19-2009, 02:01 PM

Jumping in where both angels and demons fear to tiptoe... :)

I don't see even the slightest logic problem related to being able to search across databases. That is, if the search function enables you with something like this:

a - search current database only
b - search all open databases
c - search all URDs in this folder
d - search all URD in this and all subfolders
e - search all URDs wherever they may be accessed
f - search the following databases (and use your mouse to pick multiple databases to search)

Seems to me that if I had all of these options (are there any I'm missing?), then I could do what is available to me now, and much more.

At least that's the way I see it. Am I missing something?

mikeg

08-19-2009, 10:33 PM

I haven't put nearly as much thought into this as others may have, but this looks like a great cross-URD search functionality list to me. :)

p.s. Small edit on my previous post: "... is tightly or loosly related" should read "... could be tightly or loosely related". Also, cross-database search would be the best way to spot redundant information so it can be pruned or not according to preference. Number of databases should also be according to preference within the limits of the architecture.

Relaxed rules are appropriate this kind of unstructured data. Now if the discussion is about nodes of structured data that should be normalized, such as address info, that's another story...