"Search", "Replace", and SQL

Spliff · #1 05-20-2023, 06:49 AM

Aside - DB Design - from the user's perspective

For about 400,000 items (totalling about 10 GB) spread over about 30 UR DBs, I had went down to about 8 or 9, in order to lessen the "management fuss" for this corpus, so on level 1, I then had the former "finegrain" DBs, and the former level 1 of the latter had become level 2 on the "compound" DBs (source item always considered level 0).

The fuss with externally changing the db header, after every "compact and repair", for SQLite access, was lessened indeed, but my spreading about 30 "entities" over about 10 DBs - instead of either 30 (which multiplies the header fuss) or then just 1 (which technically is not possible with SQLite) - quickly proved a conceptional nightmare, since the "combining" up to 5 "entities" (i.e. former distinct DBs) into 1 DB just allowed for combining for the most likely "combined needs", whilst again and again, stuff needed in connection with DB *a" was "buried" into e.g. "c.x" (instead of just being available in db "x".

Thus, it's became clear as day that if your stuff's too much for 1 db, it's advisable to divide it in a way that any "main stuff" is combinable, on an equal 1-to-1 level, to other "main stuff", and that in case multiplies DBs, no way around it - and so the special UR header becomes a real nuisance, but which for the time being, is unavoidable; the user should NOT align their "workflow design" decisions in order to avoid the necessary external header change. (Perhaps code could be added at the end of the internal "compact and repair code", in order to reset the header again?) (So I'm back to about 30 DBs, which means that additional, technical management efforts are "maximized", in a certain way, partly due to that "bloody" header which prevents SQL access...)

Search / Replace Scope

Search scope in UR is very good (albeit not perfect), since both "whole db" and "this subtree" are available (and obviously, the former is technically contained in the latter, but its distinct availability on the user side spares the user to first navigate to the source item).

On the other hand, SQLite in itself (i.e. its FTS functionality) does not provide a real "global" search, i.e. over all, or better then, select DBs within a given folder, and so it has not been a surprise to me that another Windows "PIM" I had used before switching to UR, and which had used a "non-standard" db as backend, but with search over several DBs at the same, has LOST this functionality when lately it switched to the "PIM" SQLite backend - the ante-SQLite versions of PIM in question were very unstable in my personal experience, so I had to find something better, and indeed, UR is perfectly stable indeed - ; if we were to get such "search over several DBs", the developer would have to code the necessary FTS triggering even for non-loaded DBs, and then the gathering (combining, sorting, display) of th FTS data; obviously, the user should be able to "list" the DBs to be searched, scopes being "this subtree", "whole tree" (i.e. current db), "all currently loaded DBs", and "these DBs", from then a multi-select list...

The irony here being that porting an existant db to Postgres (in which all the data would be in just one db again, and then the Postgres FTS would indeed need also some scope "level-1-subtrees from list", beyond "whole db" and "current subtree", i.e. some easily available and combinable "tree-level-1 title = "" search attribute) would probably less work than the above...

Search-and-Replace (SR)

For the scope, see above, but, just as in other Windows "PIMs" relying on SQLite, SR (beyond the current content and notes fields, functionality being provided by MS via the editor component), is absent.

Thus, as already discussed, "content" has to be SR'red by scripting, invoking the items from the search result table 1-by-1, for an external rtf editor to do the necessary changes, then paste back the updated content into the UR content pane (from which then UR will correctly and automatically update in its own system when the script navigates to the "next" item in the result table).

As for "title changes", i.e. changes in item titles, it's much easier, but it's not as obvious as it seems, since if you change the title in the "item" table only, you will have left out two other tables only, and this will create chaos (which will not even be dissolved by a "default compact and repair", but only by a "complete" one, i.e. by first deleting the FTS corpus, then rebuilding it from scratch (as explained recently here, by doing TWO C&R in a row, first without fts, THEN only WITH fts again).

And this means that a (hopefully) correct "title change by SQL" for UR will have to be done as follows (and considering that "replace" IS case-sensitive whilst "like" is NOT, and pay special attention even to the "like"-'%oldstring%' in case of diacritics, and according to which way your fts (registry setting) has processed those!):

(EDIT and reformulated: I forgot to mention that before updating, you obviously should do a
select * from item where itemtitle like '%oldstring%', and in order to verify that the update will not do any unwanted changes (by changing substrings in words / phrases you would like to remain untouched), verify visually and, in case such "over-replacing" lurks, do:
1) verify the select-results count (provided by sql)
2) count the unwanted results in 1)
3) eliminate the unwanted results (1-by-1 if needed, always verifying the resulting counts), by enlarging the "where" clause with additional "NOT" conditions:
where itemtitle like '%oldstring%' AND itemtitle NOT like '%someOtherSubstring%' and itemtitle NOT like '%EvenSomeOtherSubstring%' etc )
4) adjust the updates' "where" conditions (below) accordingly
Doing that, to exclude some e.g. 5 unwanted updates from e.g. 100 in total, is less work than doing 100 replacements manually, obviously. End of Edit.)

update item set itemtitle = replace(itemtitle, 'oldstring', 'newstring') where itemtitle like '%oldstring%';
update ftsitem set title = replace(title, 'oldstring', 'newstring') where title like '%oldstring%';
update ftsitem_content set c0title = replace(c0title, 'oldstring', 'newstring') where c0title like '%oldstring%';
-- and/or probably:
update item set itemtitle = replace(itemtitle, 'Oldstring', 'Newstring') where itemtitle like '%oldstring%';
update ftsitem set title = replace(title, 'Oldstring', 'Newstring') where title like '%oldstring%';
update ftsitem_content set c0title = replace(c0title, 'Oldstring', 'Newstring') where c0title like '%oldstring%';
-- etc in case of more variants...

No need to say that if you only update the "item" table, then search (in UR), you will find the renamed titles by searching for the old titles. (only) ;-)

EDIT:

It should be noted that single quotes (') are presumed to be escaped by another single quote each, whilst for a per cent sign (%) in one of the strings, there seem to arise real problems, see https://sqlite.org/forum/info/846632411f3fd1d2 (I didn't try either.) This might help then: https://www.b4x.com/android/forum/th...e-query.56769/

And I forgot to mention: It's obviously devoid of sense to try to edit the "content" and "note" columns in table ftsitem, ditto for "c1content" and "c2note" in table ftsitem_content, since the "real" content of both is in the respective blobs, and next time you'd load the items, the blob content (which will not have been changed) will overwrite the content of these fields; this is obvious for the content fields, but also applies to the note fields, so any programmatical changes (i.e. by external SQL) should be made to "user attributes", not into the items' "Note" - that's the price for that field being rtf, too.

CORRECTION:

It's perfectly possible to use UR's own editor to do the necessary S&R, for every "found" item, switching forth and back between the editor pane and the search result list, for "regular" S&R, and provided that you DE-select, in Tools-Options-Search, or in Tools-QuickOptions, "Search:HighlightMatches"; I just forgot, and certainly don't want to complicate things.

The use of an external editor for that is just necessary for S&R which involves finding, and processing, newlines, tabs or other "complications", and where the MS rtf editor's functionality isn't sufficient. (So for "simple" S&R, you establish a search result table for the "oldstring", then switch back and forth between the "hits" and the content field (in which the - under the above condition, editable) respective "hit" content is displayed, and trigger (by ^h) the "local" S&R, with the given oldstring/newstring, etc. settings which will you have entered for a first, "manual" S&R, for the first item in the list. This is not elegant, but it works, provided you implement enough waiting time in-between, for loading the "next" items, and then also for saving them before loading the next one.

kinook · #2 05-21-2023, 11:46 AM

In the latest download (v6.3.0.1), compacting the DB will write

SQLite format 3

header instead of

Ultra Recall DB

Spliff · #3 05-22-2023, 09:05 AM

Fantastic! I had thought that was possible indeed... ;-)

Thank you so very much, Kyle, I'm absolutely delighted!