multi-threading web page import

igoldsmid · #1 02-17-2007, 12:21 AM

I have another tool that enables multiple web pages (any number of) to be dragged and dropped, imported, key worded etc, without having to wait for one to finish, and without locking up the browser while the download is happening..

Can Kinook consider providing this capability please?

Anyone else interested?

IJG

moku · #2 02-17-2007, 06:08 AM

This is a good topic and something that does immediately jump out at the new user.

I use FireFox 2.0 and have imported a number of webpages. The lag does get in the way of effective and efficient use of FireFox and UR.

Normally, drag-and-drop is not expected to be a system-locking heavy operation.

In my opinion, drag-and-drop from an outside application into UR should be a lightweight operation -- something that does a very shallow copy of what is dropped and then releases the source application / cursor.

If a user clicks on an immediately dropped item, this item would be seen in the UI as a tab with particular icon ("under construction"?) and perhaps the word "working..." in parentheses. Something that makes it clear what is happening to the user.

Behind the scenes, worker threads in UR should process what was dropped. As they finish, they can (each) post update messages to the GUI which would in turn update the item header so that the tabs would read normally.

If such a system were implemented, it would make UR and the source applications much more enjoyable to use. And it goes along with the whole "UR light and fast" credo.

quant · #3 02-17-2007, 07:13 AM

the lag is there because UR is doing indexing and other things when appending data (you can opt out from indexing imported websites, I suppose then the importing will be faster) ...

It comes all from searching and sorting theory:

1.You either put everything just as it is (maybe at the end of database), then appending is lightweight and very fast, but searching will be slow, cause you need to go through the whole database to find all occurrences of search string.

2. On the other hand, if you insert into whatever structure the underlying database in a way that you keep your data sorted (indexed, ...), then appending will be slower operation, but searching lighting fast ...

moku · #4 02-17-2007, 07:31 AM

Quote:

Originally posted by quant
the lag is there because UR is doing indexing and other things when appending data (you can opt out from indexing imported websites, I suppose then the importing will be faster) ...

It comes all from searching and sorting theory:

1.You either put everything just as it is (maybe at the end of database), then appending is lightweight and very fast, but searching will be slow, cause you need to go through the whole database to find all occurrences of search string.

2. On the other hand, if you insert into whatever structure the underlying database in a way that you keep your data sorted (indexed, ...), then appending will be slower operation, but searching lighting fast ...

The point is not "why there is lag". I have asked about the slowness via email and gotten the technical reasons why a page import takes so long even on a very high performance system.

The point is "there is no reason why UR cannot do this work in the background instead of locking up two applications and making the user wait, sometimes for a long time."

My note was a simple suggestion on a different way to think about the issue. In this day and age of quad-core processors, a single-threaded blocking approach that stops the user's workflow dead cold does not make any sense at all.

At the database level (SQLite), there is nothing that needs to be done in the import manager until the import takes place. All that is needed in the UI is a placeholder item. When the import manager gets to the particular import job associated with that item, then the appropriate database entries can be made, provided the import succeeds. If there is an error, the placeholder item can have its status updated with an error message and then the user can retry the import manually (which would put the import at the start of the queue) using some sort of "Try again" mechanism. A smarter queue mechanism would update the placeholder item and then retry the import after an appropriate delay.

I hope a future version of UR will be more intelligent about how to do long running operations in the background. It takes a little more development work to setup the infrastructure to handle background jobs, but there is a very large payoff. All sorts of long-running tasks can be pushed into the background and the user can keep working vs. coming to a complete total application lock standstill. And that is the ultimate goal: the user must be able to maintain an efficient and timely workflow with the application.

quant · #5 02-17-2007, 07:52 AM

ok, I see now what you meant ... I agree

Multi-threading would be nice, but probably much more difficult to maintain the database consistence. Don't know how the database that UR uses works. That's the question for them to ask.

(But to be honest, the database I have and use, is much more used to recall the data than to append

)

moku · #6 02-17-2007, 08:09 AM

Quote:

Originally posted by quant
ok, I see now what you meant ... I agree

Multi-threading would be nice, but probably much more difficult to maintain the database consistence. Don't know how the database that UR uses works. That's the question for them to ask.

(But to be honest, the database I have and use, is much more used to recall the data than to append )

I am not aware of how UR uses SQLite internally.

What I do know:

-- SQLite, per database, can only handle one writer at a time.

http://www.sqlite.org/lockingv3.html

SQLite v3 also provides some enhancements to help prevent what is called "writer starvation":

5.1 Writer starvation

In SQLite version 2, if many processes are reading from the database, it might be the case that there is never a time when there are no active readers. And if there is always at least one read lock on the database, no process would ever be able to make changes to the database because it would be impossible to acquire a write lock. This situation is called writer starvation.

SQLite version 3 seeks to avoid writer starvation through the use of the PENDING lock. The PENDING lock allows existing readers to continue but prevents new readers from connecting to the database. So when a process wants to write a busy database, it can set a PENDING lock which will prevent new readers from coming in. Assuming existing readers do eventually complete, all SHARED locks will eventually clear and the writer will be given a chance to make its changes.

-- Each UR database corresponds to one SQLite database.

Within the context of a single UR database, an import manager would allow sequential imports, each in turn locking the database to perform a write.

A background import/job manager would have to intelligently wait for its time to do its work. This is because a user might be doing something in the UI that updates the database. This is not as big of a problem as it sounds as import updates only update existing UI placeholder items. The import manager needs to do some basic validity checking (i.e. is the placeholder item still present and sane?) before running.

Of course any expensive operations should be done ahead of time so that SQLite is as efficient as possible.

One day, hopefully soon, SQLite may have row locking. And then the additional complexity of doing many background operations will be minimized. Until that day, it takes more thought due to SQLite's locking architecture.