Page 1 of 1

Importing is 1000x slower with the beta 16 version...

PostPosted: Mon Oct 07, 2013 11:27 pm
by SnakeByte
Here's a short video (no sound) showing system resources during an import, during a search, and running a benchmark test on the drive I've been using to store the newbin db to show just how inefficient the current beta is:

https://www.youtube.com/watch?v=-sQ3ryKCZ1I

Be sure to use HD.

Has anyone else done any benchmarks on the beta yet? I'm trying to determine if this is unique to my system, or if there is a problem with the software itself. The 10,000+ queue was the result of running the app for a day where it would simply download new headers every hour.

Re: Importing is 1000x slower with the beta 16 version...

PostPosted: Tue Oct 08, 2013 11:16 am
by Quade
I posted something then thought some more. Your fundamental problem is you're trying to track 2400 groups using a modestly powered PC. Then once you get behind, the processing overhead of having 10,000 import files kicks in. I'm going to think about it but, my gut feeling is that you're using Newsbin beyond it's design goals.

Re: Importing is 1000x slower with the beta 16 version...

PostPosted: Tue Oct 08, 2013 12:25 pm
by Quade
If you run ProcMon, you'll see it's spending most of its time writing to the DB3 files.

Re: Importing is 1000x slower with the beta 16 version...

PostPosted: Tue Oct 08, 2013 12:56 pm
by SnakeByte
Quade wrote:I posted something then thought some more. Your fundamental problem is you're trying to track 2400 groups using a modestly powered PC. Then once you get behind, the processing overhead of having 10,000 import files kicks in. I'm going to think about it but, my gut feeling is that you're using Newsbin beyond it's design goals.


Quade,

If my computer was underpowered, the metrics shown in the video would clearly show 100% usage SOMEWHERE. You still have not addressed this argument. How can it be the case that a computer is underpowered when it isn't being fully utilized in at least one piece of hardware?

The fact that a search from this very same app uses more system resources than the import proves that my pc has extra resources waiting to be tapped. Showing the SSD benchmark utility shows that that there are resources available even beyond what your search is using.

I've written software that crunches huge amounts of data... certainly more than a mere 8GB of gz files (What that amount of backup adds up to) and I guarantee that when appropriate, it is using all my cpu's and all my hard drive's resources while running. I /guarantee/ it doesn't take HOURS to process a mere 8GB of compressed files.

I should mention that reverting to 6.42 b2148 brings the system back to "normal" again .. It is handling the 2400 groups just fine. (I say "normal" because I know that even this has been written inefficiently, as was mentioned in a previous post) STRONGLY suggests the beta needs more work.

Note that Procmon doesn't profile an app. It won't tell you if it's just spinning its wheels somewhere waiting for a thread lock to release. Have you profiled the app? http://stackoverflow.com/questions/67554/whats-the-best-free-c-profiler-for-windows-if-there-are

Re: Importing is 1000x slower with the beta 16 version...

PostPosted: Tue Oct 08, 2013 12:58 pm
by dexter
SnakeByte wrote:If my computer was underpowered, the metrics shown in the video would clearly show 100% usage SOMEWHERE.


It won't go to 100% CPU by design, this is a low priority task to process data in the background without bringing your machine to its knees for no apparent reason.

Re: Importing is 1000x slower with the beta 16 version...

PostPosted: Tue Oct 08, 2013 1:09 pm
by SnakeByte
dexter wrote:
SnakeByte wrote:If my computer was underpowered, the metrics shown in the video would clearly show 100% usage SOMEWHERE.


It won't go to 100% CPU by design, this is a low priority task to process data in the background without bringing your machine to its knees for no apparent reason.


Ah, so we've gone to my system being underpowered to something that was put in by design. I'm glad I didn't run out and purchase a new PC!

How is this throttling accomplished? As my video shows, it certainly isn't looking at disk queue #'s or CPU utilization.

I noticed that under Options, there is a "Performance" section. Maybe the solution to this problem is to simply give the users the ability to disable this throttling?

The only question left to answer is why the throttling is orders of magnitude worse in the beta. Certainly new PC's are getting faster which would warrant a throttle adjustment in the opposite direction?

Re: Importing is 1000x slower with the beta 16 version...

PostPosted: Tue Oct 08, 2013 1:19 pm
by Quade
You're using it well beyond even what I use it for so, basically I'm guessing. It looks to me, looking at procmon, that most of the time my machine is importing it's writing to the DB3 files.

Re: Importing is 1000x slower with the beta 16 version...

PostPosted: Tue Oct 08, 2013 4:06 pm
by Quade
Going to try some things in B17. See if it improves things for you.

Re: Importing is 1000x slower with the beta 16 version...

PostPosted: Tue Oct 08, 2013 6:48 pm
by SnakeByte
Quade wrote:Going to try some things in B17. See if it improves things for you.


Thanks Quade. If you need to remotely debug, I'm game.

I'm tempted to write a perl script to mimic what the import function is doing just to see if this is a threading issue or not. I'm still amazed that a computer today cannot import on-the-fly the downloaded headers since I'd think the worst bottleneck on most people's systems is their internet connection which should be much slower than write times on a HD. I mean, think about it. Currently this app is downloading, compressing, saving to disk, then later reading from disk, decompressing, and importing to DB3. Seems like a lot of extra steps. I'd almost prefer to slow (or pause) the downloading of headers to keep up with the writing to the db3 files and ditch the "import" folder concept completely. Either way the result is the same - we have to wait for the import to complete before we can use the data.

One other thing... I've learned the hard way that SQL S U C K S for efficient text searches (and this with REAL sql servers with text indexes). I don't know if this fits with what you want to do with newsbin, but give Sphinx a look if you can:
http://sphinxsearch.com/about/sphinx/

I used this as an alternative to full text searching in MySQL of very large DB's and it BLEW THE DOORS off MySQL. The only potential downfall is that it only supports boolean searches, not regex. Of course, I'd think most users only do (or know of) boolean searches anyway, so maybe not a drawback? Either way, it's crazy fast and highly scalable (I think Craigslist uses it, among others). It takes sql queries too, so it may be a simple drop in replacement of some of your code. Of course even if you don't use it for the newsbin executable, it may help with the "internet search" service you are also selling.

Re: Importing is 1000x slower with the beta 16 version...

PostPosted: Tue Oct 08, 2013 11:04 pm
by dexter
Yeah, been using sphinx for over 4 years now. Do a search from the Search tab within Newsbin with it set to "Internet" mode and you'd be using it. Installing it on a PC is non-trivial, we decided it wasn't a viable option for newsbin.