newbie questions [Archive] - Kinook Software Forum

mlwang

04-27-2011, 03:50 PM

Hi,

I'm new to Ultra Recall and so far the experience has been pretty good. There are nevertheless a few small issues that I can't seem to figure out:

1. Is there a way to export an item to a web page (.html file) with the source URL and user-defined keywords included (at the bottom of the page)?

I currently have stuff (accumulated over many years) in Firefox Scrapbook extension, Evernote, OneNote, & Surfulater, and I'm trying to find a new home to consolidate those scattered data. When trying to get data out, I'm very thankful that Surfulater allows me to export with tags (keywords) appended at the end of an article (item in UR lingo).

Like most info. collection/notebook users I don't want to be locked in and prefer to be able to get my stuff out when necessary. While Ultra Recall does have extensive export functions, I can't seems to find a way to export to .html (or rich text) files with associated keywords and URLs, making the exported files much less useful.

Exporting to XML does provide options to include URLs & keywords, and I've tried the VB script in the FAQ that would turn the .xml file into an html file. The keywords and URLs are stored in a toc.html, however, not the individual items.

There are two main reasons why I prefer to have keywords stored with the individual items:

1) so I can rearrange/rename the articles without being tied to a specific toc.html.

2) so a local file indexer (in my case, Windows Search and Archivarius 3000) could point me to a file directly, not a toc.

2. Is there a way to tell Ultra Recall to put the source URL at the bottom of the captured page?

When I capture a web page (or part of it) with UR (using the Firefox extension), I would prefer to have the source URL readily visible at the end of the page.

Currently I use a customized layout with the Item Attributes Window docked at the bottom, and then customize the window to show only the Keywords and URL. It's only a partial solution, because ...

1) I have to unhide other attributes if I want to see them, and coming back and forth is a bit cumbersome;

2) I can't click on the URL to launch it in the browser; and

3) when opening the document externally (in a browser), and you decide you want to see the source page, you have to come back to UR to click "Click here to open linked document". Wouldn't it be easier if one could just click on a link at the bottom to launch it without leaving the browser?

Looks like this is turning into a long post. I'll stop here and post the rest in a separate post.

Thanks for listening.

mlwang

04-27-2011, 04:01 PM

Forgot to mention, I ran into a bug when testing the VB script (ConvertToHTML.vbs) that would turn export.xml into toc.html.

It seems the part of code for "check for non-ASCII charcters" doesn't work. (see screenshot below)

It worked after I replaced that part with simply "Unicode = True".

mlwang

04-27-2011, 04:30 PM

A couple of unicode-related oddities:

1. there seems to be issues with searching in Chinese/Japanese text? Some can be found straight up, some can only be found with an asterisk at the end, and still some others need to be sandwiched by a pair of asterisks. Is this a known issue?

2. when capturing selected part of a Japanese wikipedia page using the Firefox extension, the result shows mixed fonts (screenshot 1 below).

There's no such problem when capturing the whole page, nor is it a problem to do the same (capturing partial page) with IE8.

I'm using English Windows 7 x64 SP1.

mlwang

04-27-2011, 04:34 PM

The same text when captured properly should be like this:

mlwang

04-27-2011, 04:44 PM

One last question (for now): it seems Ultra Recall is quite slow at launch, even when no .urd file is loaded. I know this is subjective, but it's significantly slower than anyone else (Onenote, Evernote, Surfulater, MyInfo) here. In fact, it's slower than Firefox 4 with tons of extensions installed.

The performance is quite ok after launch (I don't have a big database yet), so it's not too big an issue. I'm just wondering if I'm doing anything wrong?

kinook

04-28-2011, 10:42 AM

Originally posted by mlwang
Hi,

I'm new to Ultra Recall and so far the experience has been pretty good. There are nevertheless a few small issues that I can't seem to figure out:

1. Is there a way to export an item to a web page (.html file) with the source URL and user-defined keywords included (at the bottom of the page)?

I currently have stuff (accumulated over many years) in Firefox Scrapbook extension, Evernote, OneNote, & Surfulater, and I'm trying to find a new home to consolidate those scattered data. When trying to get data out, I'm very thankful that Surfulater allows me to export with tags (keywords) appended at the end of an article (item in UR lingo).

Like most info. collection/notebook users I don't want to be locked in and prefer to be able to get my stuff out when necessary. While Ultra Recall does have extensive export functions, I can't seems to find a way to export to .html (or rich text) files with associated keywords and URLs, making the exported files much less useful.

Exporting to XML does provide options to include URLs & keywords, and I've tried the VB script in the FAQ that would turn the .xml file into an html file. The keywords and URLs are stored in a toc.html, however, not the individual items.

There are two main reasons why I prefer to have keywords stored with the individual items:

1) so I can rearrange/rename the articles without being tied to a specific toc.html.

2) so a local file indexer (in my case, Windows Search and Archivarius 3000) could point me to a file directly, not a toc.No.

2. Is there a way to tell Ultra Recall to put the source URL at the bottom of the captured page?

When I capture a web page (or part of it) with UR (using the Firefox extension), I would prefer to have the source URL readily visible at the end of the page.

Currently I use a customized layout with the Item Attributes Window docked at the bottom, and then customize the window to show only the Keywords and URL. It's only a partial solution, because ...

1) I have to unhide other attributes if I want to see them, and coming back and forth is a bit cumbersome;

2) I can't click on the URL to launch it in the browser; and

3) when opening the document externally (in a browser), and you decide you want to see the source page, you have to come back to UR to click "Click here to open linked document". Wouldn't it be easier if one could just click on a link at the bottom to launch it without leaving the browser?

Looks like this is turning into a long post. I'll stop here and post the rest in a separate post.

Thanks for listening. http://www.kinook.com/Forum/showthread.php?threadid=4126

kinook

04-28-2011, 10:46 AM

Originally posted by mlwang
Forgot to mention, I ran into a bug when testing the VB script (ConvertToHTML.vbs) that would turn export.xml into toc.html.

It seems the part of code for "check for non-ASCII charcters" doesn't work. (see screenshot below)

It worked after I replaced that part with simply "Unicode = True". If you're referring to http://www.kinook.com/Forum/showthread.php?threadid=2104, that has been superseded by the built-in HTML/XML export options (File | Export).

kinook

04-28-2011, 10:56 AM

Originally posted by mlwang
A couple of unicode-related oddities:

1. there seems to be issues with searching in Chinese/Japanese text? Some can be found straight up, some can only be found with an asterisk at the end, and still some others need to be sandwiched by a pair of asterisks. Is this a known issue?No.

2. when capturing selected part of a Japanese wikipedia page using the Firefox extension, the result shows mixed fonts (screenshot 1 below).

There's no such problem when capturing the whole page, nor is it a problem to do the same (capturing partial page) with IE8.

I'm using English Windows 7 x64 SP1. There's probably an issue with the encoding Firefox uses when copying to the clipboard.

kinook

04-28-2011, 10:59 AM

Originally posted by mlwang
One last question (for now): it seems Ultra Recall is quite slow at launch, even when no .urd file is loaded. I know this is subjective, but it's significantly slower than anyone else (Onenote, Evernote, Surfulater, MyInfo) here. In fact, it's slower than Firefox 4 with tons of extensions installed.

The performance is quite ok after launch (I don't have a big database yet), so it's not too big an issue. I'm just wondering if I'm doing anything wrong? Testing on a i7 860 Intel CPU machine with Win7 x64 SP1, startup time in our tests is under 2 seconds (with or without a 250MB .urd file opening). That seems pretty fast, but if you leave UR running all the time, startup time may not be a huge issue.

mlwang

04-28-2011, 12:06 PM

Originally posted by kinook
http://www.kinook.com/Forum/showthread.php?threadid=4126
Thanks for the pointer. I'll look into it, though please consider adding the ability to export into .html files with keywords and URLs appended in a future release. Thanks.

mlwang

04-28-2011, 12:14 PM

Originally posted by kinook
If you're referring to http://www.kinook.com/Forum/showthread.php?threadid=2104, that has been superseded by the built-in HTML/XML export options (File | Export).
No, I was referring to http://www.kinook.com/Forum/showthread.php?s=&threadid=2054, though I did try the one you mentioned as well.

The built-in HTML export option is nice, except it doesn't have keywords and URLs, so I thought I could use the VBS in the FAQ to turn exported XML into html.

It works, but only in a toc, as mentioned in my first post. Anyway, I'll try the customized form or see if I could write my own script to handle this.

mlwang

04-28-2011, 01:10 PM

Originally posted by kinook
Testing on a i7 860 Intel CPU machine with Win7 x64 SP1, startup time in our tests is under 2 seconds (with or without a 250MB .urd file opening). That seems pretty fast, but if you leave UR running all the time, startup time may not be a huge issue.
2 seconds? Then there's indeed something wrong here. Timed it just now, and it took 9.5 seconds after I hit the pinned icon on the taskbar for me to see the program window (with no .url file loaded at launch).

My system is a i5-2400 with plenty of RAM, and my system drive is an intel 25-M SSD. Before testing, I've disabled all pre-loaded programs and then rebooted the system. I also launched UR once (so the program should be loaded in system cache already), closed it, and then relaunched and timed the second launch.

I know i5 is no i7, but it's a Sandy Bridge cpu (so should be a little faster than i5 of the previous generation), on a P67 board. 9.5 seconds to launch doesn't seem right, does it? In comparison, Firefox 4 (a program notorious for slow start) with dozens of extensions and 4 tabs open took less than 2 seconds to launch. Any suggestion where I might look into?

The unicode issues will take more time to test. It's past 2 AM here and I'm going to bed. I'll report back tomorrow.

Thanks for your prompt replies to all my questions.

kinook

04-28-2011, 01:16 PM

Originally posted by mlwang
2 seconds? Then there's indeed something wrong here. Timed it just now, and it took 9.5 seconds after I hit the pinned icon on the taskbar for me to see the program window (with no .url file loaded at launch).That does seem slow. I'm not sure what could cause that. Some things to try: http://www.kinook.com/Forum/showthread.php?threadid=4041

mlwang

04-28-2011, 08:03 PM

Originally posted by kinook
Some things to try: http://www.kinook.com/Forum/showthread.php?threadid=4041
8.5 seconds in Safe Mode. Uninstalled and reinstalled UR. No difference.

Just to be clear (I thought it should be apparent, but anyway), I'm using the TRIAL version of UR Pro 4.2a, should this make any difference?

mlwang

04-28-2011, 09:39 PM

Tried it just now on my aging and underpowered notebook. 4+ seconds for the first launch, 3 seconds for relaunch. Obviously something wrong with my desktop. So I took the drastic step to image my system and then reinstall Windows. Install UR before anything else. Now the launch takes less than a second (couldn't time it properly with a stopwatch for my hand was too slow).

Sorry for the false alarm. Now off to finish rebuilding the system.

mlwang

04-30-2011, 03:42 AM

Turned out reinstalling Windows didn't really solve the issue. Ultra Recall was quick to launch when Win 7 was fresh, but soon the slow launch I observed the last few days crept back. After many trials and errors, I've finally pinned down the cause, sort of.

It's a combination of intel's RST driver ver. 10 and Realtek's LAN driver ver. 7 that causes UR's launch delay. I've tried 3 different versions of RST, including the current official version (10.1.0.1008), a newer WHQL version, and the newest beta (the latter two from station-drivers.com). I've also tried two different versions of Realtek LAN driver, including the one on Asrock's web site (my motherboard is Asrock P67 Extreme 6), and the newest official release from Realtek's web site. The results were all the same. I couldn't try older versions of RST; they refused to be installed on a P67 board.

No wonder I saw the problem quickly after I started rebuilding the system; the drivers are among the first things installed with a new system.

Now, I can't pin the blame squarely on either driver, for installing one of them alone would be no problem. There's serious delay at UR launch only after both drivers have been installed.

For the moment, I've removed the RST driver since I would lose my network connection without the LAN driver. Windows 7's native AHCI driver seems to work ok, though it's generally recommended to go with the RST driver when using an intel SSD on an intel chipset-based motherboard.

I surely hope someone with similar hardware would test it out, thanks.

mlwang

04-30-2011, 07:55 PM

Originally posted by kinook
There's probably an issue with the encoding Firefox uses when copying to the clipboard. [/B]
Firefox doesn't specify "lang=ja" as IE does. What Firefox does is correct, IMHO, for mixed-language web pages. Japanese wikipedia uses Japanese as the main language, of course, but it also often contains text in English, Chinese, and other languages.

Anyway, got it working by tweaking IE's font settings. Didn't realize Ultra Recall is so intricately related to IE.

mlwang

04-30-2011, 08:06 PM

Originally posted by kinook
http://www.kinook.com/Forum/showthread.php?threadid=4126 [/B]
Tried the custom Form method as described in the linked thread. It's not intuitive, but the instruction is detailed enough to follow step by step. Much appreciated.

Taking a look at the list of needs mentioned in my original post, however,

1) I have to unhide other attributes if I want to see them, and coming back and forth is a bit cumbersome;

2) I can't click on the URL to launch it in the browser; and

3) when opening the document externally (in a browser), and you decide you want to see the source page, you have to come back to UR to click "Click here to open linked document". Wouldn't it be easier if one could just click on a link at the bottom to launch it without leaving the browser?
and it should be apparent that the 3rd need can't be satisfied by this method, and the second need is only partially helped. It would be much easier to be able to just click on a link to open the source url in a browser.

Anyway, thanks for the great trick. Please consider my request as a feature wish for future releases. Thanks.

mlwang

04-30-2011, 09:36 PM

Originally posted by kinook
Originally posted by mlwang
A couple of unicode-related oddities:

1. there seems to be issues with searching in Chinese/Japanese text? Some can be found straight up, some can only be found with an asterisk at the end, and still some others need to be sandwiched by a pair of asterisks. Is this a known issue?
No.
Could you please copy the following text into UR as a text item (I think web page item will do as well, but I tried it as a text item)?

最前面加上一些與日文漢字不同的中文字，如「營、雜、舊」等等。And then some English text.

縄文時代

縄文時代（じょうもんじだい）は、年代でいうと今から約1万6,500年前（紀元前145世紀）から約3, 000年前（紀元前10世紀）、地質年代では更新世末期から完新世にかけてA列島で発展した時代であり、世界史では中石器時代ないし新石器時代に相当する時代である。旧石器時代と縄文時代の違いは、土器の出現や竪穴住居の普及、貝塚の形式などがあげられる。草創期・早期・前期・中期・後期・晩期の6期に区分される。この頃の日本列島人は縄文式土器を作り、早期以降定住化が進んで主に竪穴式住居に住んだ。弓矢を用いた狩猟、貝塚に見られる漁労、植物の採集などで生活を営み、打製石器、磨製石器、骨角器などを用いた。

日本の歴史（にほんのれきし）、日本史（にほんし）とは、日本または日本列島における歴史、国史（Nati onal History）のこと。本項では日本の歴史を概観する。

各時代の詳細は、各時代区分項目（各節の冒頭のリンク先）を参照されたい。

前幾天看了報導才發現這次出宏碁報告的麥格理 "產業" 分析師竟然是我過去在宏碁的同事張家福
當年他還只是一個剛出社會歷練的台大畢業生，想不到10年後卻由他對宏碁發出重重的一擊

Now, with full text search enhancement enabled,
1) the term "產業" (which means "industry") can be found straight up,
2) the term "日本" (Japan) can be found with an asterisk appended at its tail, and
3) the term "日文" (Japanese the language) can be found only when sandwiched by a pair of asterisks (all searched without the quotation marks).

The 3rd scenario applies to most of the Chinese or Japanese terms in my tests.

With full text search enhancement disabled and the "Match whole words" option unchecked, both ""產業" & "日本" can be found straight up, while "日文" can only be found with ONE asterisk at the end.

Now, one related questions: why can't "Match whole words" be disabled temporarily (thus working without the full text search enhancement)? It isn't practical to re-compact a large database between searches.

kinook

05-02-2011, 09:57 AM

For #1, the characters on both sides are word separators, for #2, only the character on the left side is a word separator, and for #3, the characters on both sides are not word separators, and * is required to match in the middle of words.

mlwang

05-02-2011, 07:20 PM

Originally posted by kinook
For #1, the characters on both sides are word separators, for #2, only the character on the left side is a word separator, and for #3, the characters on both sides are not word separators, and * is required to match in the middle of words.
Is this something can you may address in a future release? Having to use asterisks all the time is cumbersome, but it's tolerable if there's hope that it's only a temporary workaround.

BTW, the word "告" in the sample text I gave above couldn't be found no matter how I tried (with or without asterisks) if full text search enhancement is disabled (and the "Match whole words" option unchecked), what's with that?

kinook

05-03-2011, 08:29 AM

Originally posted by mlwang
Is this something can you may address in a future release? Having to use asterisks all the time is cumbersome, but it's tolerable if there's hope that it's only a temporary workaround.We will consider it.
BTW, the word "告" in the sample text I gave above couldn't be found no matter how I tried (with or without asterisks) if full text search enhancement is disabled (and the "Match whole words" option unchecked), what's with that? I'm not sure, but non-FTS support is only there for backward compatibility with older databases.

mlwang

05-04-2011, 04:16 AM

Originally posted by kinook
We will consider it.
For your information (if you didn't know), indexing Chinese/Japanese words shouldn't be hard. Just index every character, for each of them is a word. While there are hundreds of thousands of characters, most people regularly use only around 3-5 thousands (for Chinese, and much less for Japanese). I believe a typical UR database has more than that number of English words indexed.

Anyway, thanks for taking it under consideration, and after I've devised a VB script myself to convert xml outputs into html pages with keywords and URLs appended at the end, I think I'm almost ready to take the plunge.

The only hurdle left is the relatively high price tag of the Pro version. May I ask (sheepishly) two more questions:

1. Is there any chance UR will be featured on BitsDuJour (or better, grant DonationCoder members a discount) in the near future? I searched and found it was featured on BitsDuJour a little more than a year ago, and wonder if I'd have the luck to see it there again in a few weeks. More than a hundred people (including me) signed up asking for the deal as I checked just now.

2. If the above request is out of question and I have to settle for the Standard version, is there a way to upgrade to the Pro version at a discount in the future? (Couldn't see such an option on the Order page.)

Thanks for your patience with my many questions and requests.

kinook

05-04-2011, 06:35 AM

Originally posted by mlwang
For your information (if you didn't know), indexing Chinese/Japanese words shouldn't be hard. Just index every character, for each of them is a word. While there are hundreds of thousands of characters, most people regularly use only around 3-5 thousands (for Chinese, and much less for Japanese). I believe a typical UR database has more than that number of English words indexed.We use SQLite FTS3 (http://www.sqlite.org/fts3.html) for this. Either we're doing something wrong, or it doesn't index character by character for Chinese/Japanese text, but I believe that a match whole words option would resolve it.Anyway, thanks for taking it under consideration, and after I've devised a VB script myself to convert xml outputs into html pages with keywords and URLs appended at the end, I think I'm almost ready to take the plunge.It would be great if you could post this at http://www.kinook.com/Forum/forumdisplay.php?forumid=28 -- thanks.The only hurdle left is the relatively high price tag of the Pro version. May I ask (sheepishly) two more questions:

1. Is there any chance UR will be featured on BitsDuJour (or better, grant DonationCoder members a discount) in the near future? I searched and found it was featured on BitsDuJour a little more than a year ago, and wonder if I'd have the luck to see it there again in a few weeks. More than a hundred people (including me) signed up asking for the deal as I checked just now.It was last featured there on Jan. 10, 2011, and will probably not be back for several months. We'll consider offering a DC discount. If you are a student or educator or work for a non-profit organization, we can also offer you the educational discount (30%) at http://www.kinook.com/ordere.html.2. If the above request is out of question and I have to settle for the Standard version, is there a way to upgrade to the Pro version at a discount in the future? (Couldn't see such an option on the Order page.)

Thanks for your patience with my many questions and requests. You can upgrade from Std to Pro in the same major version using the Pro Upgrade order option ($50).

mlwang

05-04-2011, 09:04 AM

Originally posted by kinook
We use SQLite FTS3 (http://www.sqlite.org/fts3.html) for this. Either we're doing something wrong, or it doesn't index character by character for Chinese/Japanese text, but I believe that a match whole words option would resolve it.
With "Enhanced full-text Search" enabled, the "match whole words" option is always checked and grayed out (i.e., the setting can't be changed). Or am I doing anything wrong?

As to FTS3, I've absolutely no idea how it works. I'm not a programer; I can do only a little scripting. For now I simply pad all Chinese/Japanese search terms with two asterisks. I would be very grateful if you may turn your attention to this matter someday. Feel free to call on me to help testing when you do. I'm not a programer, but I'm pretty good at debugging. I've helped several programers making their products compatible with East Asian text over the years.

Originally posted by kinook
It would be great if you could post this at http://www.kinook.com/Forum/forumdisplay.php?forumid=28 -- thanks.
I'd love to, except my script relies on a shareware editor (EmEditor) for its objects (mostly the selection object). I guess it can be modified to work with MS Word (using Word's selection object) but I haven't tried it.

In addition, the script is a quick hack put together in half an hour and tested only on a few test notes in my test database. I'm not sure if it'll work as expected for others. As an example, I found out during my test that UR sometimes stores the URL in a format like "![CDATA[http://...]". I've no idea what the "CDATA" notation is for, and tell my script to strip it. It might not be the wisest thing to do, and I've no idea what other special notations might go into that field.

Now, given the caveats above, do you still think it's suitable to be published in the Tips forum?

Originally posted by kinook
It was last featured there on Jan. 10, 2011, and will probably not be back for several months. We'll consider offering a DC discount. If you are a student or educator or work for a non-profit organization, we can also offer you the educational discount (30%) at http://www.kinook.com/ordere.html.
Jan. this year? Too bad I didn't know it (nor did I have the time to try out UR at that time). But the educational discount is good enough for me (I'm a teacher) and I've put in my order and received my reg. info. just now. Thanks a lot for the discount.

kinook

05-04-2011, 09:57 AM

Originally posted by mlwang
With "Enhanced full-text Search" enabled, the "match whole words" option is always checked and grayed out (i.e., the setting can't be changed). Or am I doing anything wrong?No, that option isn't currently available with FTS enabled, and you must use *'s to achieve it. But we'll probably address this problem by enabling that option and adding *'s behind the scenes.I'd love to, except my script relies on a shareware editor (EmEditor) for its objects (mostly the selection object). I guess it can be modified to work with MS Word (using Word's selection object) but I haven't tried it.

In addition, the script is a quick hack put together in half an hour and tested only on a few test notes in my test database. I'm not sure if it'll work as expected for others. As an example, I found out during my test that UR sometimes stores the URL in a format like "![CDATA[http://...]". I've no idea what the "CDATA" notation is for, and tell my script to strip it. It might not be the wisest thing to do, and I've no idea what other special notations might go into that field.

Now, given the caveats above, do you still think it's suitable to be published in the Tips forum?I could be helpful as-is. The CDATA sections are an XML construct (see http://en.wikipedia.org/wiki/CDATA) that will be used for text that contains markup characters. If you use an XML parser to work with the document, it should be handled transparently.

mlwang

05-05-2011, 11:54 PM

Originally posted by kinook
But we'll probably address this problem by enabling that option and adding *'s behind the scenes.
Thanks in advance.

Originally posted by kinook
I could be helpful as-is. The CDATA sections are an XML construct (see http://en.wikipedia.org/wiki/CDATA) that will be used for text that contains markup characters. If you use an XML parser to work with the document, it should be handled transparently.
Thanks for the pointer, so I guess it's ok to remove it for my purpose (injecting into .html files).

Just made some modifications and add a few lines of notes and comments in the script so hopefully it'll be more comprehensible to others. It's been posted at http://www.kinook.com/Forum/showthread.php?s=&threadid=4846