Import folder not importing

Technical support and discussion of Newsbin Version 6 series.

Import folder not importing

Postby rwayneross » Thu Jul 03, 2014 10:55 am

I seem to be having a problem since last week with the import folder. I now have 438 files waiting to be imported and the number keeps growing! The cache at the bottom of NB shows 200/200 (438). I see a file disappear from the folder once in a while - maybe a couple a day, but I still have files from over a week ago (6/25/14) that have not imported. Also, I leave NB open 24 hours a day so there should have been plenty of time for the imports. v6.51, Build 3162 Registered. Is there a way to speed this up? It used to keep up.
rwayneross
Occasional Contributor
Occasional Contributor
 
Posts: 14
Joined: Wed Nov 03, 2010 1:24 pm

Registered Newsbin User since: 05/10/02

Re: Import folder not importing

Postby itimpi » Thu Jul 03, 2014 11:04 am

That sounds like strange behaviour!

Have you tried rebooting the system (the classic first recommendation when strange behaviour is encountered :) )?
The Newsbin Online documentation
The Usenettools for tutorials, useful information and links
User avatar
itimpi
Elite NewsBin User
Elite NewsBin User
 
Posts: 12607
Joined: Sat Mar 16, 2002 7:11 am
Location: UK

Registered Newsbin User since: 03/28/03

Re: Import folder not importing

Postby rwayneross » Thu Jul 03, 2014 11:21 am

Yes, I have rebooted, re-started NB, even re-installed.
rwayneross
Occasional Contributor
Occasional Contributor
 
Posts: 14
Joined: Wed Nov 03, 2010 1:24 pm

Registered Newsbin User since: 05/10/02

Re: Import folder not importing

Postby itimpi » Thu Jul 03, 2014 11:24 am

The next suggestion would be to delete the Downloads.db3 file from the Newsbin DATA folder (with Newsbin closed) as this is what holds the Download list (in case it has got corrupted) and let Newsbin create a new file when it next starts. Note that doing so will lose any existing contents of the Download list and Failed Files list.
The Newsbin Online documentation
The Usenettools for tutorials, useful information and links
User avatar
itimpi
Elite NewsBin User
Elite NewsBin User
 
Posts: 12607
Joined: Sat Mar 16, 2002 7:11 am
Location: UK

Registered Newsbin User since: 03/28/03

Re: Import folder not importing

Postby rwayneross » Thu Jul 03, 2014 11:34 am

No change. The file was re-created, but imports are still not happening.

I read in another thread abut the priority that the imports get, like what you currently are browsing getting imported first. In one set of groups, which I browse all the time, the files have been imported. In others, they go back over a week without importing.
rwayneross
Occasional Contributor
Occasional Contributor
 
Posts: 14
Joined: Wed Nov 03, 2010 1:24 pm

Registered Newsbin User since: 05/10/02

Re: Import folder not importing

Postby itimpi » Thu Jul 03, 2014 11:37 am

Oops sorry - I was think about importing to the Download list - not importing headers.

The only think I can think of is to see whether the files are accumulating for a particular group. You could then try resetting the database for that particular group.

Quade might have some suggestions when he spots this thread.
The Newsbin Online documentation
The Usenettools for tutorials, useful information and links
User avatar
itimpi
Elite NewsBin User
Elite NewsBin User
 
Posts: 12607
Joined: Sat Mar 16, 2002 7:11 am
Location: UK

Registered Newsbin User since: 03/28/03

Re: Import folder not importing

Postby rwayneross » Thu Jul 03, 2014 11:42 am

Actually it's dozens of groups, not just a few. It's only a couple that are up-to-date... most are still waiting on the import cache.

This used to work fine. I installed Beta4 around that old date but then went back to 6.51 Released when this started. Maybe that had an effect.
rwayneross
Occasional Contributor
Occasional Contributor
 
Posts: 14
Joined: Wed Nov 03, 2010 1:24 pm

Registered Newsbin User since: 05/10/02

Re: Import folder not importing

Postby Quade » Thu Jul 03, 2014 12:10 pm

It's like import has slowed way down. Wonder if something in the groups, some spam postings say are harder to process? I can still keep up but, it does feel a little slower now.

What's your storage age set to?
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby rwayneross » Thu Jul 03, 2014 12:13 pm

2000 days
rwayneross
Occasional Contributor
Occasional Contributor
 
Posts: 14
Joined: Wed Nov 03, 2010 1:24 pm

Registered Newsbin User since: 05/10/02

Re: Import folder not importing

Postby Quade » Thu Jul 03, 2014 12:45 pm

You could try moving all the GZ's out of the import folder, then feeding some groups in one at a time and see which ones are slow (sort of a hassle, I know).

What kind of disk is the data folder on?
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby rwayneross » Thu Jul 03, 2014 12:54 pm

It's an SSD Raid 5 - really fast.

I note that I have only seen one file import today.

I may have a bigger problem than I thought. I was browsing a group, expanded a post that was 3.4Gb and it changed to 4Mb with 2 tiny child records. May be database corruption in general.

I'm going to do a few tests - disable all but one group, purge headers and re-download them. Then try an update all.

I'll get back to you.

(Thanks for all your help as usual Quade!)
rwayneross
Occasional Contributor
Occasional Contributor
 
Posts: 14
Joined: Wed Nov 03, 2010 1:24 pm

Registered Newsbin User since: 05/10/02

Re: Import folder not importing

Postby rwayneross » Thu Jul 03, 2014 3:52 pm

I was able to resolve the problem.

I created a work folder and drug all the cached .gz files from Import to the work folder.
One file was in-use so I quite NB, deleted that file which was for a.b.warez, which I never follow any more.
I deleted the rest of the files for that newsgroup and then copied the files back into Import from the work folder.
I restarted NB, deleted Warez so no more headers would download... and all 498 files (more had been added since my earlier messages) imported within 10 minutes!

It must have been a corrupted file or just the crap that is in warez. Either way I found the bottle-necking file, deleted it and it was fixed!
rwayneross
Occasional Contributor
Occasional Contributor
 
Posts: 14
Joined: Wed Nov 03, 2010 1:24 pm

Registered Newsbin User since: 05/10/02

Re: Import folder not importing

Postby Quade » Thu Jul 03, 2014 4:52 pm

Nice. Now that's how you trouble-shoot!
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby rassafratz » Mon Aug 04, 2014 10:53 am

Okay, this thread is a month old, but hey...I have (had) a similar problem with failing to import, and it now seems to me to be directly tied into the "high cpu usage" problem outlined in this thread:

viewtopic.php?f=44&t=32356

Here's how it all went down...

I was running 6.42 and had a problem with it not finishing file saves. To this Quade suggested I upgrade to 6.53, which I did, and that effectively made my initial problem go away. At this point, I thought all was fine...until later on when I did a "local" search for a set of files that I KNEW DAMNED WELL were there in one of my subscribed groups just a couple of days ago, but my search came up empty. This is my "WTF" moment that has been driving me crazy since.

Now hold on, because my description here will be really tedious.

The first thing I did was to search for the files manually, group by group, and I was shocked to find that almost ALL my groups were EMPTY. When I tried to update the groups, nothing appeared in the download window indicating it was doing anything. In one of the few groups that had a few paltry hundred files instead of the expected thousands, I tried downloading a small file with the same result: The system just sat there and stared at me.

Now 6.53 has a new feature when you view the "servers" tab, in that it shows all the threads for all the servers and whether they are connected or not. Mine were all disconnected.

This is when I brought up my task manager to see if anything else was going on that might be interfering with communications - and this is when I discovered the OVER 50% CPU usage, and paging going on to the tune of 800MB!!!

And yet there was NOTHING visible going on with NB. Clearly there was something seriously wrong here.

So I started sniffing around the forums here, and first I found the above mentioned "high CPU" thread. There were clues, but not enough for me to work with.

I also found a thread about having to manually import files from earlier versions into 6.5, but the described option wasn't in my "post storage" selection. Now it's interesting to note that AT THIS POINT IN TIME, there was no spool_v6 directory, and there WERE group-named subdirectories in spool_v1, each of which had database files in them.

Ya with me so far? Because now is when it gets interesting...and kinda weird...

So after more sniffing around, I finally found this thread, which sorta-kinda fits the problem I'm having. When I saw Wayne's "resolution", I couldn't quite follow what he was talking about, so I went looking for an "import" folder on my system.

And I found it. And it was loaded with the gz files he menmtioned.

And oh yeah - the weird part - now a spool_v6 folder was there, and all the directories were in it and the spool_v1 directory was empty. Looking through the spool_v6 directories, I noticed some of the database files were downright dinky and others had relatively large ones. As it turns out, the "dinky" databases showed no files in the group when I looked at them, while there were files in the groups that had larger databases,

So I took wayne's experiment a step further.

First I shut down NB, and then I renamed my "import" directory to "junk" and created an empty "Import" directory.

Next I renamed "spool_v6" to "xxxspoolv6" hoping that NB wouldn't find it (I first tried zipping it up, but it was 13 gigs, so I tried it this way instead).

Then I loaded NB again and looked at my task manager. NB was being quiet as a mouse.

And then I took ONE of the gz files and moved it into the empty import directory.

Poof! CPU activity went to 60% and my page file went to 650mb. This lasted for about 2 minutes, after which the gz file disappeared, a directory spool_v6 was created, and a group directory was created inside that.

This happened much the same way for other gz files, but not predictably.
The gz files are named with the following convention:
groupname-servername-message#range.gz
So I might get multiple gz files for the same group, named something like:
alt.binaries.putz-astraweb-23456-123456.gz
alt.binaries.putz-astraweb-123456-223456.gz
alt.binaries.putz-astraweb-223456-323456.gz
alt.binaries.putz-usenet-news-7654-17654.gz
alt.binaries.putz-usenet-news-17654-27654.gz

Although most weren't quite that clean and had gaps, you get the gist.

Now these gz files range in size from 2 or 3mb up to 60mb. Frustratingly, NB's processing times for these files seem to have little to do with their size.

While many of the files process inside of in under 10 minutes, there are some that get ridiculous.

I have one group that had 3 gz files, all were 20-30mb in length (these 3 are all from usenet-news. I have 2 more files for the same group from astraweb, and they are 31mb and 2*KB*, but after the last brutal hour, maybe I'll just skip them and have the group updated directly from the server).
The first one took about 10 minutes to process, the second one took about 2 minutes, and the 3rd one is still processing after an HOUR.
And it doesn't appear to be "hung", since the "Storage.db3" and "StorageData.db3" files are still incrementing. They are currently up to 205,620KB and 181,928KB respectively, and gain about a meg every 2-3 minutes.

The point here is that this may well explain both the "high CPU usage" AND the apparent (but not real) failure to import previous folders.

I have over 50 groups subscribed, and probably 80% of them had well in excess of 100K headers under 6.42, and a couple had over a million.

Can you imagine NB actually trying to convert such a massive database? Why, it could sit there and stare at you like an idiot while it burned up massive CPU cycles while maxing out your pagefile. Moreover, it might just sit there and ignore you if you tried to download anything since it has better things to do...for the next week and a half...

You might say "we are NB, and your computer are belong to us!"

But seriously, I'm wondering if the particular newsgroup I'm currently importing wouldn't have been faster if I had simply deleted the group entirely, recreated it and "download(ed) all headers", even from all 4 of my servers?

I just checked the time difference from this import from the previous import: This import has now been going on for 3 hrs 15mins. This is becoming insufferable.

So that's it then. I'm done, and I'm tired. I only hope that this tome may be of some help to someone. :-)
rassafratz
Occasional Contributor
Occasional Contributor
 
Posts: 31
Joined: Sat Nov 02, 2013 6:52 am

Registered Newsbin User since: 10/29/13

Re: Import folder not importing

Postby Quade » Mon Aug 04, 2014 11:02 am

I summarize this as "importing GZ files takes a very long time on my pc".

What kind of PC do you have? How much RAM and what kind of disks. I sort of smell "AMD" in all of this.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby rassafratz » Mon Aug 04, 2014 11:24 am

Quade wrote:I summarize this as "importing GZ files takes a very long time on my pc"


Yes, and I think it's not an uncommon problem, and I believe that it is SO excruciatingly slow (especially on large databases such as I described) that at first glance it looks like it's failing to import completely. The problem is that there's no indication of any kind that it's slowly converting the databases except, as Wayne mentioned in his initial post, gz files sometimes disappearing.

I sort of smell "AMD" in all of this.


Nope. Not on MY homebrews! :-)

It's getting long in the tooth, admittedly, but it's still a good machine.

It's an intel e8400 dual core 3ghz mounted on a gigabyte ep45-ud3p mobo, a paltry but fast 2 gigs of ram, and just to irritate people, I'm still running XP pro. :-)

I also have 3 drives: 250gb ide, 300gb sata2, 3tb sata2
rassafratz
Occasional Contributor
Occasional Contributor
 
Posts: 31
Joined: Sat Nov 02, 2013 6:52 am

Registered Newsbin User since: 10/29/13

Re: Import folder not importing

Postby Quade » Mon Aug 04, 2014 1:42 pm

10 minutes per GZ is damn slow. I can do all 250 of my groups in 10 minutes.

A 60 meg GZ file represents 600 megs of header data that has to be processed and put away.

I have a dual 1 Ghz laptop I test for performance, It seems faster than that. It does have 4 gigs of RAM and an SSD so, it's sort of cheating.

I'll run some tests.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby rassafratz » Mon Aug 04, 2014 2:24 pm

Quade wrote:10 minutes per GZ is damn slow. I can do all 250 of my groups in 10 minutes


Yes, and 6 hours and counting for this last 40mb gz is completely intolerable. (the stats "...the "Storage.db3" and "StorageData.db3" files are still incrementing. They are currently up to 205,620KB and 181,928KB respectively..." which I posted 3 hrs and 15mins ago are now at 274,224KB and 239,364KB

This is so ridiculous that there's no way that it could be hardware.

I'm going to go to bed now, I'm really beat. I'll probably be back up in about 7.5 hours and I'll look to see what kind of progress was made.
And then I'll also search for a group with a comparable header count, subscribe to it and download all headers for it to see how long that takes. I doubt that even a million headers would take 6 freaging hours from a single server on a single group.

In the meantime, g'nite, Quade! :-)
rassafratz
Occasional Contributor
Occasional Contributor
 
Posts: 31
Joined: Sat Nov 02, 2013 6:52 am

Registered Newsbin User since: 10/29/13

Re: Import folder not importing

Postby Quade » Mon Aug 04, 2014 4:19 pm

This is so ridiculous that there's no way that it could be hardware.


I don't know how you can make that judgement from your results. I think your results are worse than mine because you're running XP and might not have AHCI mode enabled on the disk so, the disk runs slow. I suspect you're running into paging too because of the relatively low amount of RAM.

What security software do you use? As an experiment, I might remove it, reboot then download more headers and see if you get your performance back.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby rassafratz » Tue Aug 05, 2014 6:06 am

Quade wrote:
This is so ridiculous that there's no way that it could be hardware.


I think your results are worse than mine because you're running XP and might not have AHCI mode enabled on the disk so, the disk runs slow


I hope you aren't actually trying to suggest that XP and lack of AHCI could be responsible for database conversions hundreds of times slower than they should be...

And BTW, my IDE is my boot drive, and also holds my NB installation. My SATA drives are strictly for storage, so AHCI won't do me much good.
rassafratz
Occasional Contributor
Occasional Contributor
 
Posts: 31
Joined: Sat Nov 02, 2013 6:52 am

Registered Newsbin User since: 10/29/13

Re: Import folder not importing

Postby Quade » Tue Aug 05, 2014 7:35 am

I hope you aren't actually trying to suggest that XP and lack of AHCI could be responsible for database conversions hundreds of times slower than they should be...


Ok, how do you explain why your machine which is probably twice as powerful as my test machine is significantly slower than my test machine? You have a modestly powered machine and it's slowly chunking through gigabytes of data. My machine, dual 1 Ghz remember, doing the same thing seems to be 5 times faster. During import, one of my cores is running pretty much balls out.

A 60 Mb GZ files is 600 megs of data.

Before you do down the hostile route, you understand that I'm just guessing? You have a problem I don't have and I'm guessing at a solution. Your slow PC is part of the problem but, I'm not sure that's the only issue. That's why I dragged out my clunker to test with in parallel.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby Stefan1971HH » Tue Aug 05, 2014 11:51 pm

I seem to have a related problem. The number in brackets in "Cache" is only slowly decreasing and NB uses
50 % CPU (3,2 Ghz Quadcore). But I can't spot any .gz files. Any suggestions?
Stefan1971HH
Occasional Contributor
Occasional Contributor
 
Posts: 23
Joined: Mon Feb 25, 2008 2:21 pm

Re: Import folder not importing

Postby rassafratz » Wed Aug 06, 2014 12:11 am

Quade wrote:Before you do down the hostile route...


Heheh...sorry. That was in response to an "edginess" I started sensing from you. You're generally quite helpful and knowledgeable, so I'll chalk this up to crossed wires on my part and apologize.

how do you explain why your machine...is significantly slower than my test machine?


Unfortunately, we may now never know, but it's beginning to look like it may not have been a NB issue after all.

While I was fiddling around with the gz files, my entire system (even outside of NB) started crawling to the point I had to reboot; upon which I received the blood-curdling "ntldr is missing" message.

After unsuccessfully trying to resolve that error AND later discovering that the boot sequence in my BIOS was changed (which may or may not have been the "ntldr" culprit), I'm willing to admit that I MAY have been exposed to a virus which went undetected by AVG, and if so, the evidence is pointing to a rootkit.

As I said earlier, my SATA drives are storage drives, so I lost very little real data. Unfortunately, as I also said, my boot drive also housed my newsbin installation, so that went up in smoke (just the data, my downloads go to another disk), and so the initial problem I recorded in this thread can't even be duplicated.

But the good news is that once I decided that my then-current system wasn't worth recovering, I was back up and running in about 20 more minutes with an XP installation I have ghosted on another drive.

So what's done is done, and I'll get out of your hair now. I'll go install a fresh copy of NB and start over. :-)
rassafratz
Occasional Contributor
Occasional Contributor
 
Posts: 31
Joined: Sat Nov 02, 2013 6:52 am

Registered Newsbin User since: 10/29/13

Re: Import folder not importing

Postby Quade » Wed Aug 06, 2014 12:19 am

It's frustrating not being able to reproduce issues but, I don't think I take is out on you guys.

I spent some hours today dicking with this today. Trying to see if I could make this process less stressful to the machine. I don't really notice it on my desktop but, the slow machine got pretty grindy during the processing. I tweeked some of the parameters and believe I've achieved some success. I won't know for sure till things are out on the wild.

But I can't spot any .gz files. Any suggestions?


In the data folder, in the "Import" folder. If you go into the options and select "Open Data Folder". It'll take you right there.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby syshog » Mon Aug 11, 2014 6:23 pm

I cur5rently have 424 items waiting to be processed for last 5 hours. I'm using a xeon e5 1620 with 64GB of ram and 4.8GB is used by NB 6.53rc2 I'm on windows 7 64bit. RC3 made no difference to me but I see there's an RC4 I might try that in an hour. Resource monitor shows it's only been writing to the warez storage.db3. Great now I'm up to 426 items. Quade could you make a version of NB that logs the importing process? I have about 40 other groups waiting for files to be written.

edit: hard drives are raid10 4x 4TB seagate Enterprise drives with 128MBcache running off an areca18xx series controller.
system:
xeon system with 128GB+ of ram and a lot of storage. I don't update my sig much.
User avatar
syshog
Seasoned User
Seasoned User
 
Posts: 117
Joined: Sun Jun 18, 2006 7:26 pm

Registered Newsbin User since: 06/17/06

Re: Import folder not importing

Postby Quade » Mon Aug 11, 2014 8:42 pm

It's likely the problem is that group. How about moving all the "Warez" GZ's out and see what happens?

If you turn on debug logging, it does note the import of GZ files.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby syshog » Mon Aug 11, 2014 10:27 pm

Quade wrote:It's likely the problem is that group. How about moving all the "Warez" GZ's out and see what happens?

If you turn on debug logging, it does note the import of GZ files.


I've actually been rebuilding the warez group from the ground up. I'm now down to 50 left to finish importing. I'll try downloading headers without the warez group tomorrow. I don't mind nuking one group but not 200 ;) Sometimes resource monitor shows NB writing to 5 db3s at once other times an hour goes by and no db3s are being written to. When that happens the backlog goes through the roof. When I'm downloading a file that also seems to affect the write rate to the db3s. I've also enabled the debug mode I'm looking forward to see what that tells me. On the groups that it is currently processing the time it takes for it to go from db update started goes to checking autodownload is 1-2 minutes on average then there's 8 minutes for a.b.hdtv.x264? It's nice I did not have restart NB to see the extra logging. It's processing a warez gz right I'm curious to see how long that takes.
system:
xeon system with 128GB+ of ram and a lot of storage. I don't update my sig much.
User avatar
syshog
Seasoned User
Seasoned User
 
Posts: 117
Joined: Sun Jun 18, 2006 7:26 pm

Registered Newsbin User since: 06/17/06

Re: Import folder not importing

Postby Quade » Mon Aug 11, 2014 10:39 pm

It's not really surprising that the feeder takes awhile to feed gigs of data to the DB. Once it's current though, daily updates ought to run pretty quickly.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby syshog » Mon Aug 11, 2014 10:50 pm

Quade wrote:It's not really surprising that the feeder takes awhile to feed gigs of data to the DB. Once it's current though, daily updates ought to run pretty quickly.



one gz file for the warez group took 20 minutes but in between that time I started downloading a 854MB file. I like being to see the gz importing. It does them based on date stamp. Would it go quicker if a single group was processed at once?
system:
xeon system with 128GB+ of ram and a lot of storage. I don't update my sig much.
User avatar
syshog
Seasoned User
Seasoned User
 
Posts: 117
Joined: Sun Jun 18, 2006 7:26 pm

Registered Newsbin User since: 06/17/06

Re: Import folder not importing

Postby Quade » Tue Aug 12, 2014 6:44 am

Not really. It runs as fast as it can. The more you work the disk drive the data folder sits on, the more time it takes. You can look at the GZ sizes and multiply by 10 to get the real size.

I'm surprised a machine like yours takes this long though.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby syshog » Tue Aug 12, 2014 8:57 pm

Quade wrote:Not really. It runs as fast as it can. The more you work the disk drive the data folder sits on, the more time it takes. You can look at the GZ sizes and multiply by 10 to get the real size.

I'm surprised a machine like yours takes this long though.



It's been a problem for a while I think the issue started around 6.51. In the beginning I could not figure out the cause of the problem, but as I've posted on here I've been learning how NB truly works. I'm still at a loss about the slowdown of the NB imports, I'm glad at least someone else was having the issue. I've upgraded to version 6.53rc4 TODAY. Yesterday The average size is 60MB and I had a little more than 40 groups needing to be updated. Today through I was able to update my 200+ groups without issue the backlog went as high as 5 minus the warez group. It's been staying around 1 to 0 with ath and boneless being downloaded from scratch, I've switched provider and I've been trying to build a new db for the new server. I'm willing to purge warez and start over on that group again, it currently stands at 30GB. The problem is I've purged the group before but the problem kept creeping back in. besides watching the db3 writing what else should I look out for? will NB tell me if a gz import fails?
system:
xeon system with 128GB+ of ram and a lot of storage. I don't update my sig much.
User avatar
syshog
Seasoned User
Seasoned User
 
Posts: 117
Joined: Sun Jun 18, 2006 7:26 pm

Registered Newsbin User since: 06/17/06

Re: Import folder not importing

Postby Quade » Tue Aug 12, 2014 9:41 pm

Are you using watch lists? The more watch lists you have, sucking out downloaded headers and shoving them into the watch groups, the slower the import will be.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby syshog » Wed Aug 13, 2014 1:01 am

Quade wrote:Are you using watch lists? The more watch lists you have, sucking out downloaded headers and shoving them into the watch groups, the slower the import will be.


No I am not. I do not automatic download headers either. I download the headers manually and then load up the groups after the header download finishes.
system:
xeon system with 128GB+ of ram and a lot of storage. I don't update my sig much.
User avatar
syshog
Seasoned User
Seasoned User
 
Posts: 117
Joined: Sun Jun 18, 2006 7:26 pm

Registered Newsbin User since: 06/17/06

Re: Import folder not importing

Postby syshog » Thu Aug 14, 2014 3:39 pm

Ok so I deleted the headers (manually from the spool folder and selected purge headers from the post storage menu) for warez and started to "download all headers" the backlog shot up to 35 before I killed the download and so far 2 hours later the backlog has only been reduced to 16. No matter what NB refused to process warez gz files in a timely manner and it's not the warez gz larger are larger than the rest they are not. I'm at a loss as to what to do.
system:
xeon system with 128GB+ of ram and a lot of storage. I don't update my sig much.
User avatar
syshog
Seasoned User
Seasoned User
 
Posts: 117
Joined: Sun Jun 18, 2006 7:26 pm

Registered Newsbin User since: 06/17/06

Re: Import folder not importing

Postby Quade » Thu Aug 14, 2014 5:23 pm

Let it download .5 TB's of Warez header data, then let it import, then never purge it again?

I never purge the larger groups. It's just too damn much data. Warez is bad because it's absolutely chock full of spam.

1.7 Billion headers at 300 bytes each. You've basically set yourself up to download 510 Gigs of headers.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby syshog » Thu Aug 14, 2014 10:53 pm

Quade wrote:Let it download .5 TB's of Warez header data, then let it import, then never purge it again?

I never purge the larger groups. It's just too damn much data. Warez is bad because it's absolutely chock full of spam.

1.7 Billion headers at 300 bytes each. You've basically set yourself up to download 510 Gigs of headers.




I've been downloading headers from boneless and ATH at the same time neither are giving me this headache, both groups are larger than warez. For some reason I don't download headers as fast as articles and the other problem is when thunderstorms roll in I would not be able to keep the system on to ensure the data has been flushed from import. NB does not like like when I kill it while it's flushing the backlog, it corrupts the data. The headers show up red on the days the NB was processing when the group is loaded. The problem with the import stalling on warez is that it takes hours to process the import which will turn into days because NB is not actively writing or reading the disk (the raid array is not even being accessed). For whatever reason the process just stalls, I guess I have to wait for you to be able to recreate the problem.
system:
xeon system with 128GB+ of ram and a lot of storage. I don't update my sig much.
User avatar
syshog
Seasoned User
Seasoned User
 
Posts: 117
Joined: Sun Jun 18, 2006 7:26 pm

Registered Newsbin User since: 06/17/06

Re: Import folder not importing

Postby Quade » Thu Aug 14, 2014 11:58 pm

Warez does seem to be special and I do believe it's because it's become a spam trap.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby syshog » Sun Aug 24, 2014 6:26 pm

Quade wrote:Warez does seem to be special and I do believe it's because it's become a spam trap.


Every group has it's spam.. Warez still houses useful files. That still does not explain NB's problem with Warez when both ath and boneless are much larger groups and they are being downloaded without issue.
system:
xeon system with 128GB+ of ram and a lot of storage. I don't update my sig much.
User avatar
syshog
Seasoned User
Seasoned User
 
Posts: 117
Joined: Sun Jun 18, 2006 7:26 pm

Registered Newsbin User since: 06/17/06

Re: Import folder not importing

Postby Quade » Sun Aug 24, 2014 6:32 pm

If you look at the 3 groups you can see that the file mix is completely different.

Warez has a vast number of single/a few post files and boneless and HDTV have predominantly large files. Because of that it's much harder to pack warez into a database than either of the other two groups.

Once you catch up on Warez, the daily downloads won't be too much trouble. You're flooding the import with entire group downloads. It takes awhile to pack away 500 gigs of data.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby syshog » Sun Aug 24, 2014 6:53 pm

Quade wrote:If you look at the 3 groups you can see that the file mix is completely different.

Warez has a vast number of single/a few post files and boneless and HDTV have predominantly large files. Because of that it's much harder to pack warez into a database than either of the other two groups.

Once you catch up on Warez, the daily downloads won't be too much trouble. You're flooding the import with entire group downloads. It takes awhile to pack away 500 gigs of data.



Ok so I'll do a little bit each day. I wish there was a way to give NB's importing a higher priority it's not using much disk i/o, is there any setting in NB to speed that up? cpu load is around 27% and disk i/o is about 5%.
system:
xeon system with 128GB+ of ram and a lot of storage. I don't update my sig much.
User avatar
syshog
Seasoned User
Seasoned User
 
Posts: 117
Joined: Sun Jun 18, 2006 7:26 pm

Registered Newsbin User since: 06/17/06

Re: Import folder not importing

Postby Quade » Mon Aug 25, 2014 12:51 am

It's using 1 core pretty much balls out. It's designed not to push too hard. It's a background task after all. I could consider letting 2 cores handle it.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby patrick_visniewski » Sat Sep 20, 2014 1:29 pm

The NB team missed an important use case when they designed this.
That is the case where a user (for any of many commonly occurring reasons) needs to download all the headers of multiple groups at once and then expects to be able to search/browse those headers as quickly as possible.

This single core/in the background scheme only really fits usage where the user can or is actively using NB while also downloading headers.

But for example if someone had a drive die and has to reinstall NB and re-download all the headers, the user is basically SOL Image link not allowed for unregistered users
Because it could be days before they can finally get back to using it normally.
Similarly for a new user as well.
Or for someone that decides at some point to add a collection of news groups to those they track locally.
In all cases the user needs some way to tell NB to focus solely on loading headers in to the header DB(s)
patrick_visniewski
n00b
n00b
 
Posts: 7
Joined: Sun Apr 17, 2011 12:25 am

Re: Import folder not importing

Postby Quade » Sun Sep 21, 2014 8:45 am

The NB team missed an important use case when they designed this.


I don't think I qualify as a "team".

2 threads might help. Or it might just bottleneck in a different place. I'll have to try. 2 threads will probably only be better if the problem is CPU load. If the problem is disk IO then more threads won't help. One thought might be to split the work into a "reading and turning into files" part and another into the "compacting and writing to DB part. A new version of Sqlite, the database seems to have improved import speed significantly in 6.6.

I'm not sure you all have a sense for how much data we're talking about. Pick one group, a.b.hdtv for instance. It's probably got 2 billion posts in it. It's:

2 * 10^9 * 400 bytes = 800 Gigs of data. That's just one group. With Compression that's probably only an 80 gig download but, it's still 800 gigs of data that needs to processed.

Boneless has > 6 billion posts in it.

But for example if someone had a drive die and has to reinstall NB and re-download all the headers, the user is basically SOL :(


It's valuable data that represents 100's of gigs of downloads. You might want to back it up to external storage from time to time.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Import folder not importing

Postby patrick_visniewski » Wed Sep 24, 2014 11:43 am

Sorry if team is the wrong label. it was meant to be more general.
And more sorry, if you're really the main person, then I really sympathize with you.
Sometimes being the sole developer of a major app isn't fun.
Especially when it comes down to have to be the only person trying to think of ALL the possible pro's and con's to design choices. It's really easy to miss some edge case that turns out to be pretty crucial. And then you get to take all the heat from the unhappy users too.


Honestly 2 threads is probably not enough for some cases, but who knows it might be enough. Maybe a configurable number up to the max number of "cores"?
Otherwise there should be an option to toggle the import to switch from background priority to full control of the app.
Basically most of the time it makes perfect sense to for it run exactly how it is currently designed. But there are situations where that design won't work at all. In that case it needs a lot more resources to complete (catch up) in a timely fashion. But once it does catch up, it would be fine to go back to a background process.


I set up a bunch of groups to track and chose to download all the heads.
For a week now, I haven't had less than 1GB of files in the import directory.
The size of the directory was greater than 5gb at first; but now it is slowing growing and shrinking between around 1gb and 3.5gb.
The import process occasionally gives priority to newer files so some of the groups are kept current but other groups...
I don't know if they have a bunch of really old and really new headers while missing those in the middle, or if they just have some of the oldest headers and nothing near recent.
patrick_visniewski
n00b
n00b
 
Posts: 7
Joined: Sun Apr 17, 2011 12:25 am

Re: Import folder not importing

Postby Quade » Wed Sep 24, 2014 4:38 pm

What kind of machine are you doing this on? Your results seem to be far worse than mine.

What kind of hard disk is the data folder on?

As a data point, my machine processed 800 megs of header in about 7 minutes. This is a different version than you're using with I think a faster version of the database.

More threads aren't going to help unless the bottleneck is CPU. If the bottleneck is the DB or disk IO in general then more threads might make it worse. If you have a machine with poor single core integer performance like an AMD, it could probably help. Might make sense to change the GZ format into files instead of headers too. In that way the downloader CPU's do more of the work. Have to think about that one.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97


Return to V6 Technical Support

Who is online

Users browsing this forum: Google [Bot] and 2 guests