Export problems

Spliff · #1 06-24-2021, 10:56 AM

In another thread, I had said:

"In fact, UR's XML output - its RTF output seems, unfortunately, be limited to just ONE item every time (?) - seems very powerful, but from my (intermediate) findings, there is NO way to get FORMATTED XML output just into ONE file, so any XML export-to-file will need quite heavy scripting in order to then combine the UR-produced multiple XML files (which seems to be perfectly possible, it's just that up to now - using UR about 3 months now -, I haven't had the courage and the necessity to look further into this problem)?"

kinook had answered:

"You can export multiple rich text items by multi-selecting the items.
You can export XML -> HTML multiple items into a single file. Multi-select, or check the 'Export child items (recursive)' option and select an HTML conversion option.
https://kinook.com/UltraRecall/Manual/xmlexport.htm"

So we have got two problems here:

RTF export: You're right, multiple selection will do it! (I had, in vain, assumed that parent item plus its child items would be processed, but multiple selection will provide much more precise selecting, so in some cases, it's a little bit cumbersome, but overall, it's the better solution indeed (in the absence of more fine-tuned options being available).)

XML export: No way to get it into just ONE export file, it always creates a folder, even when, as target, I mention a file. (I have tried again today, but had tried for hours some weeks before, and after reading and re-reading the help file.)

Let's say I have item A with 5 child items aa, ab... ae (= 6 items in total). File - Export - "Documents and item rich text to HTML" ("Use saved settings: Default") - Next:

"Then I get FOLDER to export selected items to": "d:\downloads\try.xml" (i.e. a file name) - "Export child items (recursive)" = checked, and "Save settings as: Default"; "Finish" >

I get the folder (!) "try.xml", in which the folders "Export_Icons", "Export_StoredContent" (in there: 6 folders named after the respective IDs, here 41523...51528, and each folder with a file named "RespectiveTitelOfItem.html") and the files index.html and toc.html.

Now "File - Export - Items to an XML (OPML) file" > "Export - Select Attributes": "Item Details (RTF") (I never found "Item TITLE" in that list!) > Next >

"File to export..." "d:\downloads\trynone.xml" ("Export child items (recursive)" checked) "HTML: None"
> I get trynone.xml which is a container / reference file for the folder "trynone_StoredContent", and then as above

Ditto but "HTML: Dynamic Outline" (into d:\downloads\trydynamic.xml")
> I get the file Export.html which is a container / reference file for the folder trydynamic_StoredContent

Ditto, but "HTML: HTML" (into d:\downloads\tryhtml.xml)
> I get the folder tryhtml_StoredContent, and as above again.

Thus, I am not able currently to get a set of UR items (here: 1 parent with its 5 child items) into ONE XML (!) file, and probably the OPML format is natively an XML format which creates multiple folders, multiple files, as an exchange format with other outliners.

What I'm after though, is export into ONE XML file, like the XML files read into, or written by, e.g. screenplay applications, like Final Draft, Fade In, etc.

By external scripting, it would certainly be possible to get all that XML data spread over multiple folders and multiple files into just one XML file (which then might be necessary to be further adjusted, externally, in order to make it 100 p.c. compatible with the expected target XML format), but I had hoped UR had such an export format "many items to one xml file", as it has got for the .rtf format indeed, as you very kindly explained to me?

EDIT: So I'm speaking of strictly FLAT export, not any hierarchical one, just the title of item 1, formatted content of item 1, title of item 2, content of item 2, and so on, in tree order, and with no regards to possible indentation of the items within the tree. (And some "new page" indicator between the pages but which should not need to be explicit, since after all the title lines' (but as said, the title attribute is missing from the above set of attributes to include) XML formatting should be sufficient as "new page" indicator.)

EDIT 2: I forgot to mention the "Static Outline" alternative above, but in my tries, any content formatting with it is lost then, so that would not do it for me.

EDIT 3: It should also be noted that any script which tries to gather the different contents from the different folders and files, into one file, would need to check the "container / reference" file, for the ORDER in which the multiple xml files are to be processed then, since it's obvious those folders can not be processed in numeric order, since numeric order is, grosso modo, determined by the respective creation dates of the multiple items, not by their tree order. Thus, the script would have to look up every line in the reference file, get the corresponding ID number, then get the data out of the specific folder. It's evident that NOT creating all those numerically-named folders to begin with would greatly ease export into one target file, but perhaps it's possible (and with preserving content formatting) after all?

Spliff · #2 06-26-2021, 08:48 AM

Is there any way I could get the ItemID info, and/or the Flag info (in any form)

- into the exported content files (i.e. into the ItemTitle.html files within the ID-number-titled export sub-folders)

- into the export "reference files" ("Export.html" or "toc.html", in case)?

As it is today - I analyzed, for hours again, the various UR export alternatives, and whilst checking ("on") the "Flag" and the "ItemID" entries in the "Export - Select Atttributes" dialogue, NO trace of those (ID and Flag (e.g. the internal UR database flag code) were to be found in either of those files ("reference" file or specific content file) for any item, I tried Html - Html, Html - Static, Html - Dynamic...

- I would need the "flag" info (ideally already within the "reference" file, but then at least within the specific content file) in order to identify, in further processing, which child items of a given parent item (from which the "export" is triggered) will ultimately be imported into the target application, and I

- I would need the ID info, within the "reference file" already, in order to clearly distinguish specific content files where those have got identical titles

- For these identifications, it is true that the "reference files" mentioned above (i.e. "Export.html" or "toc.html") mention, for every item, its "tree number" (i.e. 1 for the parent item, then 1.1, 1.2., 1.2.1, etc.), but these "indentation numbers" (or whatever we call them) will not help either for the identification of the items since those numbers are NOT replicated within the respective content .html files.

Or then, worded otherwise:

Why do I find neither ItemID nor "Flag" (in whatever form) within the "export" (UR for the UR export naming its sub-folders by the ItemIDs anyway, if I "check" ItemID" in the "Export - Select Attributes" list or not), even when I specifically "check" (i.e. activate) those attributes in the above-mentioned dialogue?

In which format are those info (flag ideally within the "reference file", but then at least within the respective content file, in order for my script to decide "item is to be included into target file import y/n"; ID also in the "reference file", in order to distinguish two or more content files with the same title) implemented into the export - IF I "check" them in the "Export - Select Attributes" dialogue?

EDIT: I obviously also checked index.html, mktree.js, mktree.css and other "reference" files for those info where available, but to no avail in every case.

Spliff · #3 06-27-2021, 05:59 AM

The best you could get, according to me (ideas welcome):

In UR: File - Export - Items to XML OPML - Attributes Check All - File to export selected item attributes/ data to: "yourpath\none.xml" (export child items recursive = checked, html = none)

Open the non.xml file in some good editor (wordwrap yes or no) and, if you do not need the indentation levels (you could identify them by counting the tabs) and want a better overview before running your scripts, regexreplace "\t{1,}<" by "<".

Scrape the file for all "<outlinecreated" (= outer loop; do NOT search for "between <outlinecreated and </outline" instead), then (= inner loop) for the next-following "text" = item title and "flag" (value may be empty) and "item ID".

("item text" then lists the content NOT formatted; "item details RTF" is redundant for our use; the respective values may be empty in case the item has got no content (but just a title); do NOT search for "</outline": if you need indentation values, count the tabs instead (instead of deleting them that is).

This will give you the IDs, the titles and the flags (or an empty value for the flag); from that, you can select or deselect the IDs to be processed or not; then you create an xml file from what you will have got.

Then you will need the formatted contents, after the respective titles (and in case there is such content, i.e. IF the ID has got a corresponding sub-sub-folder within the "yourpath\none_StoredContent" sub-folder.

If that is the case for the ID in question, you then process the html body within the .html file within that sub-sub-folder, deleting the unnecessary html codes and converting the needed (for formatting) html codes to their xml equivalents, then you insert the product after the respective title in your xml file.

Thus, UR will natively create 1-file-XML export, but just of the un-formatted kind, as far as the item contents are concerned (So this is similar, in a way, to CSV export, and it's also similar to some un-formatted flat html export I got in some way from my previous tries here.), and in order to preserve the content text formattings in your final output, it seems you have to go the way I have described here. (?)

Alternatively, you could, additionally to your UR content's native rtf formattings, insert "markdown" (just google "markdown") or similar codes, which then would be preserved into UR export's text-only xml format, and thus could be further processed within the UR export .xml file, in order to match the requested xml target format, i.e. without you having to write the script accessing, analyzing and transforming the formatted UR-created .html files(' contents).

EDIT: You could further drill down the content processing by inserting code characters, e.g. AFTER the content part to be exported from the respective item, and/or paragraph lead characters which would indicate that the paragraph is not to be exported (i.e. natively to the UR export bodies yes, but then to be discarded by the scripts you'll need anyway).

That being said, I would have preferred UR doing formatted (!) xml export natively, it would have been so much easier; here again, I think it's a little bit unfortunate to spare the user frequent (i.e. up to "yearly" even) paid updates (i.e. version numbers before the dot), for a program that almost ANY of its users will use it as their main PC application, almost any minute their PC is "on", in an age (i.e. the "Twenty-Twenties") where much lesser applications have - even successfully for many of them - forced subscriptions upon their users. Don't take me wrong here, subscriptions are utter nasty, even if they allow to to continue to access your "stuff" after the subscription may have ended (for whatever reason), but I'd be happy to UR to develop from "good", "very good even, considering the competition" to something really smooth... and I'd be eager to pay for every substantial update ($50 every 18 months or so would be a steal indeed)... as my findings show, there would be so much room for further, active development... PAID development that is.

((My html syntax checker virtually runs amok upon UR's html output... which seems to be from 2006? (UR's html output... not my syntax checker)... and many a user could very probably do with something much more modern in that field as well?))

Spliff · #4 07-07-2021, 03:38 AM

So I had a thorough look into the above, and wrote the necessary script, not yet for transfer into the target xml format, but for the export from the above UR export (XML OPML, all attributes, children recursive, html "None" (which is not true since all the html folders and files will be created by this, additionally to the text-only xml file)).

As mentioned above, you do a loop which delimits and processes the items, one by one, then deciding for them, e.g. by the flag number, if they are to be included into the final output or not, and if yes, you also analyze the Item Text data, for further discarding parts into that, e.g. "comment lines" (starting with a special character), or discarding anything within the Item Text data below some "code", e.g. a separator line, between "text to be exported" and data which is not; also, within the Item Text, you may "code" some special data, again with leading special characters, to be then deleted from the "text" part, but to be written into special variables for further use.

Then (i.e. if you hadn't discarded the item altogether), you write those variables into the target variable (append), e.g. like

|tThe item Title
|iThe item ID if needed, etc.
|oSome Other data retrieved from the Item Text
|cThe item text / Content

and finally you write this variable into a file, or process it the way you need, replacing the above codes (|o and the like) by the corresponding, needed XML or other notation.

I can confirm that this works as expected, i.e. this UR export is reliable, including correct rendering of newlines and blanklines, etc., so yes, you can use this as output from the above-mentioned markdown and similar; this is a very positive finding since just for feeding the search index, UR's redundant text-only content storage would NOT have needed to preserve the correct newlines, etc.

On the other hand, the above-mentioned UR export produces a quite incredible overhead, since all the described selection work is just done in my script, AFTER UR export, and if I only want to use 10 p.c. of my items within the target application (by discarding about 90 p.c. of them via their flag number), UR produces, within the xml file alone, tenfold what I need, since there is no possibility to discard flag numbers within the UR export dialogue already, and worse, whilst I don't even need the html folders and files even for my 10 p.c. of items, UR, at every single export as described above, creates them though within its "raw" export, i.e. I then have to delete not just 100, but 1,000 newly created html folders and files.

Thus, I would suggest to implement another variant in that XML export dialogue, and which would be very easy, since it would be just a subset of the current "Html: none" variant, but without the html indeed, this time. (As said, the additional html files are needed if you want to preserve text formatting, and are not needed if not.)
_______________

For my means, and abhorring the extent of the overhead I got, I then tried csv export, just for ItemTitle, Flag and ItemText, and here again, I got some overhead since here again, I just can make my selection, via Flag, after the export (i.e. I get tenfold the number of csv "records" I really need), and here, the Flag NAME is exported, not the flag number, but whatever, I'll rename my flags to 1-character "names", then select by those.

I don't like the fact that the csv export is strict csv, i.e. with commata as only possible field separators, and thus "" as field starters and endings, with "" for " within text - having had the possibility to chose from tab or | or such as field separator would have come so much more handy -, BUT here again, I can confirm that the export is without fault, including preservation of newlines, blanklines and all, and thankfully, the Text/Content field is the last one (Title - Flag - Text), which facilitates the visual checking in some (good) text editor (e.g. EmEditor), and there are also specialized csv editors like Ron's Editor - I checked the UR csv output in both, and it's without fault.

(Btw, the "indent level" here is just that, a single number, it's not in the form 1, 1.1, 1.2 ... 18.3.45 or such, but for your individual use case, it may be of help indeed, since it preserves thus what the leadings tabs are within the above-described xml export.)

Thus, it's obvious that if you don't have a need for preserving text formatting, you will use csv export instead of xml export, thus getting incredibly less overhead, and in both cases, you will have to recreate the necessary xml (or other) notation for your target application from scratch anyway.

EDIT: It just occurred to me that the "Indent Level" info is extremely helpful indeed: Instead of the need to "flag" even the child items of "not to be exported" parent items, or then to have to flag all the "exportable" items as "exportable", you just "flag" as "non-exportable" the concerned sub-sub-trees' parent items, and then in your "export-from-export" script, for every "non-exportable" item, you get the indent-level, then check the following items for "indent level number greater than that given number, and while "yes", you discard those as well. This is a big facilitation for your general work as well, so yes, the "indent level" attribute is to be considered a core element in UR export.

Spliff · #5 07-08-2021, 06:19 AM

I should add that, whilst the above-mentioned tools correctly analyze UR's csv export (which implies that it's absolutely correct), for a layman, core csv analysis (i.e. "element retrieval" from csv which uses commata as field delimiter) is a nightmare, as soon as you also have double-quotes within Item Title and Item Text in this UR use case; "easy" "how-to-do's" from the web are unreliable, and whilst there exist "libraries" (e.g. "Ron's" above seems to use "CellParserSeparatorWrappedRes" (whatever that may be)), you're left alone in and with your tries to write a - reliable in any situation! - script.

(Of course, you could use some csv tool (which renders it all correctly, as said above), then use that tool to shift the field separator "comma" to some of your liking (e.g. "|" (which may occur in some third-party files though but which you could replace by "-" in those texts, possibly)), then only do the necessary scripting, using that third-party tool's output, but that's not how I see things, i.e. that's not an acceptable workflow for me.

Thus, I came up with two solution for me, you could make up others.

First, do a dummy csv export, with "Export attribute names as first row" checked = "yes", in order to get UR csv's export attributes order; it seems to be

"Keywords (user-defined)","Item Title","Date Created","Date Modified","Date Accessed",Flag,ItemID,"Parent Title","Date Deleted","Original Parent","Template Item","Access Count","Web Site",Company,"Indent Level",Lineage,"Item Text","Begin Date","Begin Time",Reminder,"Pending Reminder","End Date","End Time",Recurring,"Original Begin",Location,"Has Reminder","Date Completed","Billing Info",Mileage,"Completed %","Work Actual","Work Total",Priority,"Search Locked","Search TitleOnly","Search Whole Words","Search Manual Keyword Only","Tree Order","Object Exists",Attachments,"Phone (Fax)","Phone (Work)","Email (Home)","Phone (Mobile)","Due Date","Phone (Home)","Postal Code",State,Address,"Email (Work)","Middle Name",City,"Item Notes",BCC,CC,"Sync Date","Document Size",Encoding,Country,Newsgroups,"First Name","Last Name","Default Child Template","Message Date",URL,To,From,"Camera Model","Date Picture Taken",Artist,"Album Title",Author

in my tries and is different from the attributes order within the export dialogue.

Then, decide which attributes will never occur in your exports, which means that they will produce just a comma in UR's export file, and create groups of such empty output fields, which will then serve you as field indicators for the "real" fields.

In my case, needing Item Title, Flag, Indent Level (!) and Item Text (but NOT using user-defined keywords), I export:

"Keywords (user-defined)","Item Title",Flag,"Web Site",Company,"Indent Level","Item Text", this way getting distinct field, and new-item indicators (the user-defined keywords attribute giving me a leading comma (i.e. on position 1) as new-indicator).

If your user-defined keywords are NOT empty, you could use "Keywords (user-defined)","Item Title",Flag,"Web Site",Company,"Indent Level","Item Text",Address,City,Country instead (i.e. if you don't use those attributes for real data somewhere), which would give you four (if "Text" is empty) or at least 3 (if it's not) trailing commata, and from there, you would not have a "begin item", but an equally distinct "end item" indicator, which is as good for then writing a reliable analysis-and-process script.

This being said, and except for the nuisance of it also producing (hundreds, in case) unwanted html folders and files (in the current use case of just wanting plain text export (with markdown codings or not)), UR's above-described xml export, whilst producing less-human-readable output on first sight, is distinct from start-on, and without the need of csv analysis tools or the need of creating a bunch of empty attributes, in order to distinguish the exported attributes from each other.

From the above, I would kindly suggest:

- to create a csv output format where the user could enter their special character which will then serve as field separator

- to create a subset of the xml "html: none" output where "html: none" will be what it already promises: just the xml file as the only output

- to create several export "situations", "scenarios", since currently, for perfect csv output, you would need to select a specific attributes group, which may be considered to be a little bit "very specific", and then, for "ordinary", "regular" output, you will have to destroy that attributes selection, in order to enter another one, and so on again, so some presets, perhaps 6 or so, to freely choose from, would come really handy, or then, distinct presets for the different output formats, i.e. one for csv, a different one for xml, and so on.

The above does in no way imply you couldn't create perfect output from UR even today, it's just a little bit cumbersome currently to do so.

kinook · #6 07-21-2021, 08:03 AM

In a CSV file, two double quote characters are used to escape a literal double quote character in the content. If another delimiter was used, the same requirement would apply for that character.

For XML export, to prevent exporting of secondary files, uncheck these attributes:

Icon
Item Details (RTF)
Item Notes
Item Text

The export wizard supports "saved settings," and you can create any number of these for your customized export options. You select previously saved settings on the Output Type page and create saved settings on the Select Destination page.

https://kinook.com/UltraRecall/Manua...outputtype.htm

https://kinook.com/UltraRecall/Manua...estination.htm