Sorting, etc. by Notes column? And user keywords in general, and in the Notes column

Spliff · #1 05-14-2022, 03:49 PM

1)

Nobody's happy with the current user keywords management: Having to open a dialog for entering a(nother) user keyword is one thing (better would be an additional pane, with the list, its entries to be checked / unchecked then), but having to open (and then to close again) a dialog for KNOWING which user keywords have (already) been assigned to the item in question, is simply not realistic.

Hence fellow users - the "user keyword" threads in this forum abound, and I have read them all (!), during the last days - trying to use "better", i.e. more practical, "user keywords", "abusing" UR's full-text search for that, and putting their "user keywords" into the content field.

Unfortunately, you, dear Kyle, have excluded most "special non-char chars" from full-text search, and it has to be said - this is a very important remark to retain, in order to prevent users' big deceptions - that the respective help item, "Auto-Generated Keywords", does NOT apply to possible special-chars in full-text search;

as the help item correctly states, "Auto-generated keywords are displayed only if full-text search enhancements are disabled.", and then, logically, even these remarks there,

"· Alphabet characters and $ _ @ characters are included.
· # - + % & characters are included if not the beginning of a word.
· . and : characters are included if not at beginning or end of a word."

unfortunately do NOT apply, for their alleged "positive (!) exceptional" side, so virtually ANY special, non-char character is excluded as user-side "user keyword" marker.

I finally found ¹ though, e.g. ¹a, ¹af, ¹afk (so that a search for ¹a would also find the other ones if you want it that way, i.e. "hierarchical tagging"), ¹whatever, so I have found a way, at last, to do "user keywords" which would be visible, and you would probably put them into your content pane.

(You can access ¹ by Alt-185 or Alt-0185 in the content pane, but only by 0185 in the search pane, so you would be well advised to assign the ¹, by Alt-0185, to one key or one key combination.)

Then, you also could - especially if you use these "user keywords" often or even systematically - put them into the Notes pane, always displaying that pane, e.g. quite flat, showing just one line, above the content pane.

Anyway, the ¹ "user keyword" marker (there might be others, just none of the "regular" ones that would come to mind) seems to be a big step forward; search would be without problem, general search including the Notes column, specific "Notes" search being available by "Advanced", but the ¹ sufficiently "standing out" in order for no "Advanced Search" being even necessary, wherever you put your "user keywords" marked that way.

=====OH!=====

I just had to discover, to my dismay, that, whilst even a single ¹ will be found (so that I had hoped it could be used within a search for "has ANY user keyword", searching for it also finds all occ of "1", and thus, yes, if as a user keyword, you just use ¹something without a space between the ¹ and something, and if you always use a space between numbers and text, it is possible to do your user keywords this way indeed (but without having the possibility to search for "any user keyword")...

but it's just another big deception re UR user keywording; Kyle, you really should allow for some start char for USER "user keyword" marking, be it the "dead" sign or something like that, we could perhaps discuss this again, in order to prevent unwanted results indeed, as with @/. and so on, but then, a LEADING dot, i.e.
.keyword
with space or newline or tab before could be a brilliant solution, since after all, only Windows' .NET would then be an unwanted hit in case.

I am absolutely sure that EVERY fellow user will want such a user keyword solution, since opening and closing a dialog, in order to just see your item's keywords, is surreal.

At least, ¹something, as a user keyword, is visually much more pleasing than 1something...

Remember, if your "marker plus user keyword" is not put into the full-text index, it's not worth anything, and that's our problem here which only Kyle could resolve! ;-)

2)

For newspaper (sites') clippings, I have got a special problem: Within every one of my respective "politics" categories (i.e. sub-trees), I want to order the items by publication date; those publication dates are, most of the time, within the first 300 or 400 chars of the clippings, and almost every time, they are the very first calendar date within them.

Unfortunately, they come, from multiple countries and multiple newspaper (sites), in multiple formats, so that any scripting for their normalization is error-prone and will not find (i.e. then treat) all occurrences, and I would have to check, afterwards, in order to rectify manually whenever my script didn't do it right (e.g. for dates in the form 04/05/2021, my script can also check which newspaper, so - just an example -, it knows it's May, when the clipping is from the Figaro, but April, from Daily Mail).

(UR items have a "creation date", but that being just quite "near" the publication date, I want the date I create to be precise, all the more so since I "follow" multiple, tinier sites just once every some weeks, then all the clippings getting the same creation date.)

UR comes with the table "Item", with ItemID, ItemTitle, and other columns; it comes with the table "ftsItem_content", with the columns docid, c0title, c1content and c2note; I have checked about 20 items for correspondence between ItemID (in Item) and docid (in ftsItem_content) (comparing their titles), and I HOPE (?) they will ALWAYS be identical in UR?

This is not assured per se, since docid is SQLite rowid, and those rowids can change, in generic SQLite, whilst ItemIDs are invariable once-and-for-all, so I hope the two IDs will never go apart within UR? (Of course, the user should refrain from external, rowid-changing SQL commands...)

Thus, in theory, I would be able to run an external script, onto a partial sql-to-csv dump, analyzing the c1content, and working on the c2note, then visually check the csv, manually adjust the erroneous note dates, and read back the corrected csv values into the UR SQLite file, externally. There are two problems then, one just technical, for me, whilst the other one possibly can't be overcome.

3)

There is the table ItemLink, with the column ChildPosition; I will have to look again into the tables, since the same item, if transcluded, will have different positions as child of different parent items, but once I will have understood where the "current" item position is listed - that should be some cross-table - I can then script the respective ITEM sorted orders in the UR tree (which remains to be done indeed after 2), and which is the reason for 2));

as for the DATA sort, that's easy to be done in any scripting language, once the dates are normalized, into the form 2021-12-31, and from there, you can sort the ChildPosition for the ItemID (=rowid) beneath its respective parent item.

4)

If I'm not mistaken though, external changes to the fts-table fields for content and note will NOT be preserved in UR? Since it seems that their unique use in UR is for full-text search('s index maintenance), and that their COMBINED, FORMATTED content is stored within the item's (unique?!) blob, and upon displaying any item, in UR, what we see on screen, is retrieved NOT from those columns, but from the item's blob column, and then, by a UR routine, distributed into those two panes?

This would mean that my external changes to these fields are lost upon loading the item in question, since their - externally changed - content is overwritten upon loading the item, or at least when UR leaves the item (and updates the full-text index).

5)

This would also mean that any SQLite3.exe changes into these fts-table columns (i.e. content and note), made in-between (i.e. when the file is run in UR, see my current thread upon SQLite3 use and problems), would also be discarded, probably not upon the "Refresh all" (shift-f5), but - since there would now an asynchronity between their (plain text) content and what would be their plain-text "expected" content, according to the (formatted, common) blob content - at the very next "loading" of the item in question, be it at load time, be it at save time for that item.

All this because the (common?!) blob, with its formatted content, always takes precedence over the (non-formatted and otherwise deemed identical) content of those text-only columns.

(It goes without saying that in all this, I assume that we can't, externally, fiddle around with the blobs; this would be technically possible, for an expert, but obviously is not realistic in our context.)

6)

I suppose that users with NEW DBs could quite easily overcome most of above-described problems by NOT using - as I have always done - the generic UR template, but some template with additional user attributes, and thus such a "user text attribute" always being visible on-screen; perhaps even sorting by the data in such a user attribute, from within UR, would be possible, even automatically, for some DB (or sub-hierarchy in a DB)?

7)

Could you give some hint how I would REPLACE all my items in such a way?

a) First creating a new DEFAULT template, with such an additional field (default for the db in question of course, not for all DBs); then saving the db; then externally (i.e. by SQLite3.exe / frontend) linking all my current items to this new default template, instead of their current one; then re-opening the db in UR: This should not crash I hope, but just show my current items, with that additional user attribute field empty? This would leave alone all the existing data, especially the existing blobs. or:

b) Create a new db, with, as default template, that "enhanced" template; then import my current db. Here again, my current data should be displayed then as it is now, just with those new, user attribute fields left empty. I'd prefer solution a) though, since that would prevent shuffling around gigabytes of data, with its inherent risks (and problems which might not become obvious but later on in case).

kinook · #2 05-17-2022, 10:33 PM

With FTS enabled, there isn't really such as thing as auto-generated keywords. Instead, SQLite's full text search indexing is used for searching of text content. "We" are not excluding anything, this is implemented by SQLite. The alternative is to not use FTS and utilize UR's auto-generated keywords.

And user-defined keywords are not restricted to alphanumeric characters, and can contain symbols, punctuation, etc.

Spliff · #3 05-18-2022, 03:14 AM

Thank you, Kyle, for clarifying SQLite's FTS is the culprit, I hadn't thought of this since in UR, there is a STOP WORD LIST: colum Keyword in table Keyword, which from its content, does NOT contain keywords, but stopwords, i.e. terms which are EXCLUDED from keywords, obviously?

a) Since I can't access that list (except from external front-ends), am I right that with FTS enabled, this stopword list (table) will never used by UR?

b) BUT from all what I assume (not: know), FTS is way better / faster than the "old" alternative, keyword management?

c) Ideally, I might thus find some special char that SQLite's FTS then has not excluded from its indexing (just Alt-0185 wasn't it)? (Technically, it would be possible - considering the user just uses (e.g.) Arial regular, bold and italics fonts - to replace a "char char" (i.e. not - excluded - "symbol char" within those fonts by a special sign (a big dot or square or something), to "deceive" the SQLite FTS...).

The SORT BY... problem

1) The only "sort children by" is (the not-so-obvious) "sort siblings", alphanumerically: This remains persistent, for those "siblings" and any new one later on, i.e. for the parent item's child items, as long as the user doesn't manually moves any item in there, from one sibling position to another.

2) Unfortunately, this is NOT able to recognize numbers or simili-numbers (e.g. calendar dates, e.g. in the form 2021-12-31) if they are BEHIND some text, so if you want to sort your parent item's child items by date, you would have to put the date in FRONT of the title, which is absolutely awful, whilst putting the date at the END of the title would be an acceptable solution!

Is the above here in (2) correct? In Tools-Options-Trees, there is "Alphabetically sort when adding children to new items", but this option will then (presumably) work upon ALL parent items in ALL databases, from then on, and will thus destroy any manual sorting anywhere, with any addition wherever - thus I suppose that almost nobody will ever "check" this (highly destructive) option.

I tried the assertion in (2), created some parent with the siblings:

Item 22
Item 12
Item 202
Item 3
Item 31

Then rightclicked and "sort siblings", result:

Item 12
Item 202
Item 22
Item 3
Item 31

Then, just for the fun of it, I renamed Item 22 to Itemsomething 22, and the item, as expected now, was automatically shifted from third position to last one.

Thus, my assertion above should be right:
- Automatic siblings sorting is strictly alphabetic, not alphanumeric, and ( logically, then, of course: )
- It doesn't identify numbers as such, and doesn't discard, for sorting, any text before the number, so the user can't expect some "ideal" sort in the way of

Item 3
Item 12
Itemsomething 22
Item 31
Item 202

3) Above, I had said that I often put press clippings, from several weeks, into some inbox, in bulk, and thus, they all get the same "creation date", whilst there is more or less disorder in their respective (but quite similar) publication dates; inadvertently, I had left out the much more important problem:

When then, I distribute those "inbox" items, into more appropriate "boxes" (i.e. sub-tree "headings"), I do this "chaotically", from often quite large "inboxes", and afterwards moving items into more appropriate "boxes" again, and this way of processing things will then create total chaos for publication dates, i.e. in the end, not even the publications from some specific week end up, within their final multiple, fine-grained "destinations", near to each other.

Hence the NEED for automatic sibling sorting of SOME kind, and in just SOME database(s), but there then for all parent items, e.g. by some setting as the aforementioned ("Alphabetically sort when adding children to new items"), or even for ALL items, BUT just DB-specific, or even (for users who use one db for "all"), specific for just one higher-up parent-item of a specific, in case quite big / deep sub-tree.

4) I haven't delved into title automation, but it seems whilst titles can automatically get the item creation date, I would have to continue to solve my publication date then by scripting indeed, since - very understandably - there is no AI implemented; on the other hand, external fiddling with UR's item titles is very easy, so the only user "problem" would be to get the (additional, here: publication) data into (i.e. at the end of) the titles, and then, they could visually check, change, etc., without creating db problems by that, since even externally, they would just work on ItemTitle in table Item, and as for FTS, if needed, they would just trigger a new index.

5) Many users, over the years (see those threads), have been asking for "tree children sorting by some other attribute", and I understand that just enhancing the already implemented abc-sort of the titles itself would be quite easy:

- manual sort as default as it is now

- automatic sort for all descendants of specific items, as further attribute values of those items (0 = no sort):

- this automatic sort attribute with two alternatives: 1 = alphanumeric (instead of alphabetic), and 2 = "precedence by 'numeric' sort, for numbers at the end of the title (after a space)" ('numeric' would include digit-strings, even separated by spaces and other special chars like "_", "-" and the like)

For 1 and 2 would then some easy regex code be needed, but I don't think this will slow down the tree display in any recognizable way, i.e. beyond some milliseconds.

6) MY MISTAKE, SORRY:

( other assumptions about Item Attributes pane EDITED out here: )

I was WRONG by asserting that the native UR "user keywords" would only be displayed by their dialogue: When there are such user keywords, then they appear in that pane, so the user will not have to open the dialogue but for setting more of them, not just for checking.