PDA

View Full Version : copy web pages


janrif
04-09-2008, 08:26 AM
I don't understand why various programs -- Evernote & others -- can copy all kinds of web pages w/o issue yet UR cannot yet copy an https page type. This is getting silly. I still have to resort to scrapbook for that kind of work, then link those pages to URp which is a pain.

In the same silly vein, I still wonder why when user wants to clip to particular place via systray, the entire program comes to the foreground. IMO, this sort of undercuts the reason for having systray commands in the first place.

Oh well.

kinook
04-09-2008, 09:09 AM
Selecting all (Ctrl+A) before importing usually works to capture secure pages, pages requiring login, etc.

The UR window will only be displayed if 'Tools | Options | Import | Select location when importing or pasting from other applications' is checked.

janrif
04-09-2008, 09:45 AM
Originally posted by kinook
Selecting all (Ctrl+A) before importing usually works to capture secure pages, pages requiring login, etc.
As text, sometimes not formatted correctly. Try evernote & you'll see the difference.

The UR window will only be displayed if 'Tools | Options | Import | Select location when importing or pasting from other applications' is checked.
Exactly my point. If I wanted to see UR in the foreground, I would switch to in the task manaqer & paste in specific location.

kinook
04-09-2008, 11:19 AM
Originally posted by janrif
As text, sometimes not formatted correctly. Try evernote & you'll see the difference.I get text and formatting with UR.
http://www.kinook.com/Forum/showthread.php?threadid=2798

Exactly my point. If I wanted to see UR in the foreground, I would switch to in the task manaqer & paste in specific location. Then what's the problem? Keep that option unchecked and UR will only display if/when you explicitly activate it.

janrif
04-09-2008, 11:54 AM
Originally posted by kinook
I get text and formatting with UR.
http://www.kinook.com/Forum/showthread.php?threadid=2798
In original post. I cite https sites

Then what's the problem? Keep that option unchecked and UR will only display if/when you explicitly activate it.
The 'problem' is that UR should paste to the location in the background whether I select an explicit location or just use <WIN>+V.

kinook
04-10-2008, 06:25 AM
Originally posted by janrif
In original post. I cite https sitesWorks there too. I tried with 5 different secured sites and select all in IE imported into UR with images and formatting.

janrif
04-10-2008, 01:04 PM
Originally posted by kinook
Works there too. I tried with 5 different secured sites and select all in IE imported into UR with images and formatting.
Not here. Sent avi to illustrate to support email. Will look forward to your reply. Thank you.

kinook
04-10-2008, 03:38 PM
It looks like a Firefox issue/limitation. Again, see http://www.kinook.com/Forum/showthread.php?threadid=2798. And importing the page from Firefox to Evernote looks the same as in UR.

Navigating to https://www.comcast.com/Corporate/Customers/CustomerCentral.html in IE, select all, and import into UR works well and looks the same as the original page in IE. Conversely, formatting in Evernote imported from IE looks about the same as the Firefox import.

Tested with Win XP SP2, IE7, Firefox 2.0.0.13, UR 3.2.6, and Evernote 2.2.1.386.

janrif
04-10-2008, 03:48 PM
Originally posted by kinook
It looks like a Firefox issue/limitation.
Navigating to https://www.comcast.com/Corporate/Customers/CustomerCentral.html in IE, select all, and import into UR works well and looks the same as the original page in IE.
True enough.
Tested on w2k-sp4, ie6, current beta
Thanks.

quant
06-15-2008, 04:24 AM
Sorry for hijacking this thread.

The copying of web pages to UR still leaves a lot to be desired. Hopefully, Kinook can get it right by improving it steps by step. I'm attaching one example.

This is how UR stores the following webpage to UR (using firefox with UR buttons):

http://cambridge.org/us/catalogue/catalogue.asp?isbn=9780521539371

quant
06-15-2008, 04:27 AM
and this is how it looks like (and how it should look like) if I first store the webpage with scrapbook (ctrl+shift+L) and then store it by clicking on UR buttons in firefox. Why cannot UR get it right directly? :(

ashwken
06-16-2008, 08:19 AM
Yes, there are sites that will not copy correctly even when using Ctrl-A.

Ran into this over the weekend at Wikipedia, even though everything in the browser is selected, only the text frame content is copied - the side and top frames don't make it to UR. I had to resort to Save As to .mht from within IE.

From Wikipedia:
http://en.wikipedia.org/wiki/The_Princess_Bride_%28film%29

Yet other sites that use frames will be copied w/o problem. Granted there are a myriad ways to employ frames and various methods to achieve layout, but as a user it's difficult to know which sites are going to come thru w/o problems.

jjinwi
06-17-2008, 04:57 PM
FYI... I captured both problem webpages perfectly with Web Research 3.0.

I really wish UR used the same approach as WR to capture webpages.

jdk
06-25-2008, 05:39 AM
Originally posted by jjinwi
FYI... I captured both problem webpages perfectly with Web Research 3.0.

I really wish UR used the same approach as WR to capture webpages.
This is the reason I hesitate to use UR as my main "data dump". In all other respects, I think it's the best program out there in its category, but its inability to capture all web pages and snippets with complete accuracy (and quickly) has been a stumbling block. As others have said in other threads, with UR you have to check every capture. With other programs you don't.

I occasionally stop by the forum to see if things have changed, but going by this thread, it seems not.

UR developers have said in this forum that the problem is caused by Firefox doing a poor job of copying the HTML to the clipboard. That may be the case, but other programs capture Firefox pages flawlessly. The Firefox add-on Scrapbook, mentioned above, is one example..

You mention Web Research. Yes, it does a great job. Why? It uses the saving engine in Scrapbook (if installed) to import. In effect I guess they're simply doing what Quant does in his post above.

Maybe UR could borrow this approach? I imagine it could be as simple as a script that says: temporarily copy page to Scrapbook; copy from Scrapbook to UR; delete from Scrapbook.

If it works for Web Research I can't see why it wouldn't work for UR.

ashwken
06-25-2008, 11:59 AM
Originally posted by jdk
Maybe UR could borrow this approach? I imagine it could be as simple as a script that says: temporarily copy page to Scrapbook; copy from Scrapbook to UR; delete from Scrapbook.
This is not a Firefox only problem, see my post above.

It may not be clear in that post but I could not save the wiki page properly using IE, I had to resort to Save As to .mht to get a true representation of the page. Unfortunately this method leaves you with only a local copy of the page, you lose the reference to the orignal url and it involves several more steps than simply sending to UR from the browser.

I don't fully understand why saving to .mht is subsantially different than whatever UR is doing to Store a copy.

jdk
06-25-2008, 12:53 PM
Originally posted by ashwken
I don't fully understand why saving to .mht is subsantially different than whatever UR is doing to Store a copy.
A good question -- hopefully we'll get a response from Kinook at some stage.

And I imagine Scrapbook is just doing the equivalent of mht in Firefox: File --> Save as --> complete page. Scrapbook handles page snippets perfectly as well.

Given the number of complex features Kinook have managed to add to UR over the years (to their great credit), I can't imagine why this relatively simple yet important issue has not been addressed. At the risk of repeating myself, if a free Firefox add-on can guarantee perfect web page captures, so can UR.

kinook
06-26-2008, 07:38 AM
We'll investigate these pages to see if the fidelity can be improved when importing into UR.

J-Mac
06-27-2008, 04:30 PM
Hello all!

Brand new UR user and this is my first post.

Besides being generally lost regarding UR just yet (I'll get there, though!), I just ran into problems capturing part of a web page and it sounds like this very issue being discussed.

I needed to capture an item on Dell's support/accessory pages, so I highlighted the area - including text and a picture - and clicked on the UR Firefox button in Firefox 3. What I got was a fairly nice copy but without the image. Instead there was the familiar red "X" in its place.

So I opened that same page in IE7 and tried the same thing - same area of the same URL, with image and text, using the IE UR button. The capture looked pretty close to the same thing as Firefox 3 captured, red "X" image placeholder included.

Just to check if this was caused by a problem with where Dell was serving up the image from, I made the same capture in both Evernote 2.2 and OneNote 2007. THe clip looks perfect in both of those applications.

I do have "Download Images" selected in Options>Import (More).

I decided to look at the forum and post a question about it, but I searched first and found this thread.

So even as a "newbie" I can confirm that UR Pro is not quite capturing web page selection very well.

Thanks!

Jim

jdk
07-12-2008, 09:00 AM
After reading and participating in this thread, I decided to carry out a little test, comparing Ultra Recall to other programs that capture web pages.

I posted the results here:
http://www.donationcoder.com/Forums/bb/index.php?topic=14027.0

It's only one page, it's not scientific. However, it does underscore what I and others wrote in this thread.

There are programs out there that capture pages with nearly 100 per cent reliability (even difficult pages such as the one I chose for the test).

Therefore there's no reason why programs such as Ultra Recall should not achieve the same standard (and UR is not the only well-known program that has page capture problems as my test shows).

kinook
07-12-2008, 11:45 AM
We just released version 3.5a, which correctly captures formatting for these problem pages (and any others that define styles in a similar fashion):

http://news.bbc.co.uk/2/hi/africa/7501066.stm

http://en.wikipedia.org/wiki/The_Princess_Bride_%28film%29

http://cambridge.org/us/catalogue/catalogue.asp?isbn=9780521539371

quant
07-12-2008, 12:01 PM
Thank you!

jdk
07-12-2008, 12:44 PM
Excellent. That's great to see.

ashwken
07-12-2008, 02:11 PM
Just finished downloading, let's go fire up the browser!

EDIT: Also fixed the Title Expression quirks, thanks.

tfjern
07-12-2008, 08:53 PM
I installed the upgrade (3.5a), then dragged and dropped a Wikipedia webpage to test it out (to store and not to link), but there are still a few items on the original Wikipedia page that are missing when stored in UR (all in Wikipedia's lefthand column or frame). No big deal, but still ... Maybe this is an IE 7 issue, so I'll have to try it out in Foxfile 3.

Whereas if I File / Import the Wikipedia webpage, it stores perfectly in UR. Magnifico!

Once again we can see that Kinook is exceptionally responsive regarding its customers' needs and suggestions, so let's return the favor and enter the contest to come up with some user-friendly, step-by-step demos that will help this great company attract new customers.

ashwken
07-12-2008, 10:47 PM
Running under v.3.5a, IE 6

For the Wiki Princess Bride page

From the browser, via the UR Copy Button, regardless of the setting for Options | More (Import) - use IE cache

The page layout is coming thru as the original, except that images in the body text area are not visible - the "place holders" for the images are present (text is flowing around) and hyperlinks are showing as a cursor change with the path is shown in the UR status bar.

For example, in the "frame" that contains the movie poster, this frame also contains a list of credits. The space that this frame occupies is clearly apparent (white space w/ active urls), but it's contents (nor frame border) are not showing up in the UR viewer area.

If I use Ctrl-A (to select All in the IE browser) then import via the UR Copy Button, the page comes thru exactly as originally displayed.

tfjern
07-13-2008, 01:48 AM
Thanks, ashwken, that works well, though it would be nice to be able to avoid the Control + A step. OK, I know -- we are being spoiled by Kinook's constant attentiveness to our suggestions and whinnings.

By the way, when in IE 7 and I do the following -- Control + A / Copy of a Wikipedia webpage to Ultra Recall (via the UR taskbar button), the webpage is imported into UR with no problem, as I mentioned above.

However, as a test, if you click on Item / Synchronize for the same page in UR, you will find that the stored page (doc) size is slightly reduced, even though the Wikipedia webpage itself was not updated. In other words, some of the original webpage characters are being dropped when synchronization takes place in UR. For example, the (toggle) word "hide" to the right of the word Contents (in box) is present in IE, but lost in UR after synchronization. This is no big deal, but interesting nonetheless.

On more thing -- when using the UR Copy to UR taskbar button in IE 7, the options in the Import to UR popup window are Link, Move (grayed out), and Copy. Strangely, it doesn't matter if you select link or copy, the size of the file imported in UR is the same (since it is being copied, of course). Why, then, is the link option available if there is no linking going on? I realize the other taskbar button is UR Link to UR is for linking, but it would be less confusing if when copying the link / move options weren't available, and the other way around when linking.

kinook
07-13-2008, 07:12 AM
Originally posted by ashwken
Running under v.3.5a, IE 6

For the Wiki Princess Bride page

...

The page layout is coming thru as the original, except that images in the body text area are not visibleThis appears to be an IE6 problem. UR does capture the images, and IE7 (and Firefox) displays them properly.

And with more and more web sites no longer supporting IE6, this pushed me over the edge to finally update to IE7 myself.

kinook
07-13-2008, 08:00 AM
Originally posted by tfjern
I installed the upgrade (3.5a), then dragged and dropped a Wikipedia webpage to test it out (to store and not to link), but there are still a few items on the original Wikipedia page that are missing when stored in UR (all in Wikipedia's lefthand column or frame).Nothing is missing that I can see, testing with IE7 on Win XP SP3.

kinook
07-13-2008, 08:03 AM
Originally posted by tfjern
Thanks, ashwken, that works well, though it would be nice to be able to avoid the Control + A step. OK, I know -- we are being spoiled by Kinook's constant attentiveness to our suggestions and whinnings.All import methods for that page capture all the images I can see.

By the way, when in IE 7 and I do the following -- Control + A / Copy of a Wikipedia webpage to Ultra Recall (via the UR taskbar button), the webpage is imported into UR with no problem, as I mentioned above.

However, as a test, if you click on Item / Synchronize for the same page in UR, you will find that the stored page (doc) size is slightly reduced, even though the Wikipedia webpage itself was not updated. In other words, some of the original webpage characters are being dropped when synchronization takes place in UR.The web content (HTML code) of the original web page (and what UR retrieves on sync) and what IE puts on the clipboard for Ctrl+A/copy are not identical.

For example, the (toggle) word "hide" to the right of the word Contents (in box) is present in IE, but lost in UR after synchronization.In the original page, some JavaScript is used to conditionally display the text [show] or [hide] and to show/hide that section when clicked. When importing the original page, the script code to display and make that operational will be excluded unless 'Tools | Options | Import (More) | Download scripts' is checked (the default is unchecked, mainly for security purposes).

In the HTML code that IE copies to the clipboard when using Ctrl+A/copy (and processed by UR), the JavaScript that displays [show] or [hide] is replaced with an anchor tag. If the script option above is unchecked when imported this way, the text is captured (since it's no longer defined in script code), but it is not operational (since the script it invokes is not imported). If the import script option is checked when importing, it is operational, although the show / hide text is duplicated (because of the way IE tweaks the HTML).

On more thing -- when using the UR Copy to UR taskbar button in IE 7, the options in the Import to UR popup window are Link, Move (grayed out), and Copy. Strangely, it doesn't matter if you select link or copy, the size of the file imported in UR is the same (since it is being copied, of course). Why, then, is the link option available if there is no linking going on? I realize the other taskbar button is UR Link to UR is for linking, but it would be less confusing if when copying the link / move options weren't available, and the other way around when linking. Most of the time, linking is a valid option, but if part/all of the page content was selected before initiating the import, UR will prefer that and copy even if link was chosen on the popup dialog (or the toolbar).

ashwken
07-13-2008, 11:47 AM
Originally posted by kinook
This appears to be an IE6 problem. UR does capture the images, and IE7 (and Firefox) displays them properly.

And with more and more web sites no longer supporting IE6, this pushed me over the edge to finally update to IE7 myself.
I suspected that might be the case, thanks for the confirmation.

I'll take the plunge and upgrade to IE7.

J-Mac
07-13-2008, 11:06 PM
Thanks to all for the interesting thread. I learned a lot reading this.

And thank to kinook for the pretty amazing responsiveness. Above and beyond!

Jim

J-Mac
07-18-2008, 12:45 PM
Sorry guys, but I am still getting either poor-looking page copies or the wrong copies.

E.g., I like to grab pages when I make a purchase or see my personal settings for a web site.

If I right-click>Copy to UR, without first using Ctrl-A, then UR apparently tries to refresh the page and I am left with a copy of the sign-in page, which is worthless to me.

If I use Ctrl-A to highlight all first, then I am still getting a copy with all the top and side menu items listed down the page first, or last, amd the content also is listed tabularly - nothing like what the page looked like at all.

Either something is wrong with my UR installation, or what I am trying to do is not possible to do with UR.

Thanks!

Jim

kinook
07-18-2008, 04:47 PM
It sounds as though not all of the page's styles are being captured. Are you capturing from IE or Firefox, and which version? When you encounter a page with this problem, after select all/copy (Ctrl+A, Ctrl+C), download http://www.kinook.com/Download/HTMLClip.zip, extract it, and run HTMLClip.exe. This should create a file named %TEMP%\clipboard.htm. Literally type %TEMP% into Windows Explorer to navigate to that folder, then ZIP and send clipboard.htm to support@kinook.com for our analysis. Thanks.

J-Mac
07-18-2008, 11:02 PM
kinook,

I'm using Firefox 3.01. I'll try to do what you ask, but the instructions are not entirely clear. Please tell me where to find the file to zip rather than using the DOS-prompt command.

I do much better using plain old English instead.

Thanks!

Jim

$bill
07-18-2008, 11:40 PM
Originally posted by J-Mac
Please tell me where to find the file to zip rather than using the DOS-prompt command.

I do much better using plain old English instead.

Kinook didn't ask you to use the Dos-prompt command. Go to the address bar of Windows Explorer (or even IE) and type %temp% and enter. That will take you to the proper folder.
A longer explanation would be that the file is created in the folder that your system has set for temporary files...in XP guessing something like C:\Documents and Settings\your user name goes here\Local Settings\Temp or Vista perhaps C:\Users\your user name\AppData\Local\Temp.

J-Mac
07-19-2008, 12:27 AM
Hey Bill - I don't need longer explanations. I have a C:\Windows\Temp folder, a C:\Temp folder, the Temp folder in my User\LocalSettings folder, one in All Users\Temp; I have 4 internal hard drives and my system folders and files are not necessarily where yours are.

I just like a definitive location, as the %xxxx% locations on my machine are not always found in the saame location as someone with a run-of-the-mill, standard Windows installation.

$bill
07-19-2008, 01:13 AM
Originally posted by J-Mac
as the %xxxx% locations on my machine are not always found in the saame location as someone with a run-of-the-mill, standard Windows installation.

Of course your paths may not be standard, that is the purpose of the %temp% variable.

More explanation anyway>
Most likely kinook's program saved the file to the location found in your systems variable %temp%. I don't know where that location is, kinook doesn't either- but your system does.

Did you try my suggestion? What happened?

Start| Run %temp% will open explorer to the folder.

Dos prompt >echo %temp% will tell you what the folders path is...

How about just searching for the file...

###################
Edit>>> When I run HTMLClip.exe (on XP), it opens Windows Explorer to the folder containing the clipboard.htm file all by itself.