Recent (but not newest) headers on server but not downloaded

Technical support and discussion of Newsbin Version 6 series.

Recent (but not newest) headers on server but not downloaded

Postby fliptron » Fri Jan 10, 2014 2:33 am

This problem has been seen multiple times using nb650RC2 . I just updated to nb650-full.exe (december 2013) and still see the problem.

Environment: W7 Pro, 64 bit. NB650 update interval set to 60 minutes. News comes from NewsHosting.com

Observations (seen multiple times):
When the download of headers occurs, for most postings, all headers are found, allowing download. For some postings,
not all headers are available (post is still arriving at server), and so post is marked incomplete, and obviously can't be downloaded.

If at a later time, either automatically, or by me clicking the "Download latest updates" icon, new headers are downloaded
and new postings are shown in the screen, but sometimes, the postings that are missing headers, never gets them.
Retrying "Download latest updates" does not help.

How do I know they are on the server? I do the following:
    1) Open up group, select all posts and Shift-Delete, close group tab
    2) On Groups List tab, select group, and "PostStorage->DeleteStoredPosts"
    3) Open up group (Show Posts), there are zero posts.
    4) On Groups List tab, select group, and "DownloadSpecial->Download-xxx Older Posts"
    5) Usually the missing headers have now been found, and the post can be downloaded

This does not happen every time, I see it every few days, but then I only care if it affects a post I want , which is probably
less that 2% of stuff that flies by.

It seems to behave as if it picks a "anything after this time+day or after some internal post number" is the new stuff to
be downloaded, and anything older is ignored. If it picks this value wrong, then a bunch of headers wont be downloaded.

Note, when this occurs, the "DownloadSpecial->Download-xxx Older Posts" doesn't help if there are still posts in the
group, as it goes back to posts before the oldest in the group (hence steps 1 and 2 above)

Ideally I would like this bug fixed (unless this is not a bug and there is something I am missing) , but I would also
like to suggest 3 additions to the DownloadSpecial menu:
A) A way to enter the number rather than select from the 5 fixed options for older posts
B) A way to specify "get all headers after Date+time"
C) New entries: Reload xxxx most recent headers

Thanks for a great program!
fliptron
Occasional Contributor
Occasional Contributor
 
Posts: 11
Joined: Wed Jun 05, 2013 8:13 am

Registered Newsbin User since: 05/20/13

Re: Recent (but not newest) headers on server but not downlo

Postby DThor » Fri Jan 10, 2014 10:00 am

When you download older posts, you're obviously going back in time, not forward, so what you're doing is completing the file from the back end. So the question is, how many posts are you storing on disk, and how many of those are you loading into memory? These are set in the options. If for some reason you have local storage set to just a day or so, this will happen. Same thing if you're only loading a day or so into memory. One other variable is allowing the new spools some time to be processed - not knowing what sort of system you're running and the scale of posts we're talking about, that may or may not be relevant. If you're constantly downloading the largest groups, this is a server - level chore, check the cache status at the bottom of the program, those two numbers with a possible number in brackets.

Regarding B), the usenet protocol doesn't have any knowledge of how old a post is, of course there is a date stamp, but when talking to the server there's no syntax for 'give me all the headers that are 5 days or younger'. All you can do is ask for a bunch of headers and examine them after the fact.

DT
V6 Troubleshooting FAQ . V6 docs. Usenet info at Usenet Tools. Thanks!
User avatar
DThor
Elite NewsBin User
Elite NewsBin User
 
Posts: 5943
Joined: Mon Jul 01, 2002 9:50 am

Registered Newsbin User since: 04/01/03

Re: Recent (but not newest) headers on server but not downlo

Postby fliptron » Sat Jan 11, 2014 7:54 am

Hi DThor, thanks for trying to help me.

When you download older posts, you're obviously going back in time, not forward, so what you're doing is completing the file from the back end.

While that is true, by deleting all headers first, the reason I am using "DownloadSpecial->Download-xxx Older Posts" is because I don't want to download multiple days worth of headers, which is the setting for a new group that has never been downloaded, and is what will happen if I just click "Get New Headers". I only need headers that are at most a few hours old. If I don't delete all headers, there does not seem to be a way to rescan the server for headers it some how missed in the prior GetNewHeaders. ( that's why I asked for some way to do this. Basically, to get NewsBin to look at headers that are older than the most recent header, but newer than the oldest header that has been retrieved, I have found I need to delete every thing, so it starts clean. I hope I am making sense here.)

The headers that were not retrieved but are on the server are typically at most an hour or two old.

So the question is, how many posts are you storing on disk, and how many of those are you loading into memory? These are set in the options.


I have the following settings:
Download Age: 3
Display Age: 30
Storage Age: 60
As for how many posts are loaded into memory, the answer is "All of them". Maybe I am doing it wrong?

Here's how I look at a group:
I go to the Groups List tab, select the group name, and right-click and pick "Show Posts"
As I review the content of a group, I will select stuff I am uninterested in, and delete it with Shift-Del ( often using a filter to select lots of uninteresting stuff)
So even though Display Age and Storage Age are 30 and 60 days , typically I don't have any posts more than a few days old.
The tab title typically shows 100 to a few thousand posts.

If for some reason you have local storage set to just a day or so, this will happen. Same thing if you're only loading a day or so into memory.


So I don't think I am doing this

One other variable is allowing the new spools some time to be processed - not knowing what sort of system you're running and the scale of posts we're talking about, that may or may not be relevant.


When headers have been downloaded for a specific group (depending on group, between 20000 and 2000000 new headers (if I haven't looked for a few days)) , the system seems to be busy from the time the download completes, for at most 3 seconds, and then the new headers are displayed, and the tab title updates the number of posts.

My system has a 6-core 3.4 GHz processor, 16 GB of main memory, 14 TB of disk, and a few other bits. No need for a room heater.

If you're constantly downloading the largest groups, this is a server - level chore, check the cache status at the bottom of the program, those two numbers with a possible number in brackets.


My system is setup to check all the groups I care about for new headers once per hour. Other than the initial large number of posts if I haven't been running NewsBin for a day, these hourly updates typically complete all groups in a few minutes (unless someone is spamming a news group). The hourly updates are usually less than 20000 (I assume this is individual headers) as shown in the size column of the Downloading Files tab. The number of posts is far less as NB combines these individual headers.

The cache status is: 200/200(0). I tried looking this up in the online docs, but can find no explanation. There are references to Cache chunks, but the docs say that "This functionality was removed after version 5.52."

Regarding B), the usenet protocol doesn't have any knowledge of how old a post is, of course there is a date stamp, but when talking to the server there's no syntax for 'give me all the headers that are 5 days or younger'. All you can do is ask for a bunch of headers and examine them after the fact.
DT


Hopefully the above additional information will help in figuring out if this is a user error, server weirdness (since the steps I described in my first posting always do get the missing headers, so they are on the server), or maybe a bug in NB.

Thanks.
fliptron
Occasional Contributor
Occasional Contributor
 
Posts: 11
Joined: Wed Jun 05, 2013 8:13 am

Registered Newsbin User since: 05/20/13

Re: Recent (but not newest) headers on server but not downlo

Postby DThor » Sat Jan 11, 2014 8:56 am

Ok, so that removes a bunch of variables, sounds like you're all caught up (the cache numbers indicate available resources for processing spools, which is fine, and the number in brackets indicates how many groups are waiting to be processed, which is none). Keep that number in brackets in mind when you're interacting with newsbin - if it's non-zero it's still working.

If I'm grokking you correctly it sounds like you don't want to maintain spools in volume so you download them regularly (and from what I can tell: re-download them), dig through them, pruning, then realize you've pruned something you didn't mean to, then re-download again. You'll probably disagree with that, but it appears that way. Your storage /load numbers seem reasonable, so the only reason I can think of why you need to go back in time to snag more headers for a currently uploading post is that you've either manually deleted them, or you never downloaded them in the first place and we're talking about a post that's not in your regularly downloaded groups. As mentioned, going back in time is a bit like duck hunting with Dick Cheney : you can only fire off a request for nnn headers and hope you hit the target since you can't ask a server for posts by date.

I might have it wrong since I personally don't like to micromanage things this much, but I do know when I'm tracking a currently uploading post, it accrues normally and I never need to go back and resnag headers, so my guess is you're deleting them, based on your option settings. I would back off the pruning for a bit and see if it stops, it would at least find the culprit. The default behaviour leans towards a hands - off approach just working out of the box, so if you're constantly getting in there, there's likely to be some side effects. Oh, you might want to try bypassing your filters too, by way of a diagnostic.

DT
V6 Troubleshooting FAQ . V6 docs. Usenet info at Usenet Tools. Thanks!
User avatar
DThor
Elite NewsBin User
Elite NewsBin User
 
Posts: 5943
Joined: Mon Jul 01, 2002 9:50 am

Registered Newsbin User since: 04/01/03

Re: Recent (but not newest) headers on server but not downlo

Postby Quade » Sat Jan 11, 2014 11:50 am

The headers that were not retrieved but are on the server are typically at most an hour or two old.


I think this is the key. It's the newest headers that are missing some chunks. Not the headers on the end of your retention (60 days). If you want to make it re-download the last 3 days worth of headers without deleting what you have "Post Storage/Use Download Age" will reset the group and it'll use "Download Age" the next time you update the group.

In the advanced server options, you might try setting the overlap to 1000 and then see if this problem goes away. If it does, you might want to trim the overlap down till the problem comes back then increase it a bit. The downside of overlap is that deleted posts will come back if they're in the "overlap" range. Newsbin knows per group and per server where it left off downloading headers and asks for the next range on a header download but, if the server uses a header farm and they're not 100% synced. The actual downloaded headers can be off a couple. Overlap makes Newsbin backtrack N posts and re-download some of the headers to prevent gaps.

A) A way to enter the number rather than select from the 5 fixed options for older posts
B) A way to specify "get all headers after Date+time"
C) New entries: Reload xxxx most recent headers


A - Your issue doesn't seem to be "older posts". You're losing posts on the front end, not the back end of the data range. That's why I think this is a job for overlap.
B - is basically "Use Download Age" and setting "Download Age" to the proper number.
C - Just reload the group. It always reads in what i has.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Recent (but not newest) headers on server but not downlo

Postby fliptron » Mon Jan 13, 2014 2:57 pm

Thanks DThor and Quade,

First, with ref to DThor's post,

The explanation about Cache item on the status line is quite helpful.

You'll probably disagree with that, but it appears that way


Yep, I disagree :) The problem was not me pruning stuff I shouldn't have, it is that the headers arrived at the server after one download of new headers and before the next download of new headers, yet they were not retrieved. Additional downloads of new headers also failed to get these missing headers. I was able to prove they were on the server, by following the 5 steps I listed in my original post. It might not have been the best way to do this, but it worked.

But good news (sorry for the pun), read on...

All good suggestions about micromanaging less. I do it primarily so I can find the needles of desired stuff buried in the haystack of stuff that comes through on the groups I am looking at.

But wait, there's more...

Now with ref to Quade's post,

Your solution C is what I was kind of doing, sort of.

Thanks for the explanation of how "Post Storage/Use Download Age" works. The online docs (that I could find) did not explain this. From this explanation, I think I could set Download Age to 1, and then get new headers, which would retrieve a days worth of headers, including the ones I missed out on in the last hour (In my original post, I noted "update interval set to 60 minutes"). This would take a while but is less typing than my 5 step process.
So this is your solution B which is better

And now we get to the good stuff, the Overlap setting I was unaware of. My setting was 0.

And this is Solution A.
Here's my research (which I present, so it may help other users who might have the problems I have been seeing).

So I left Newsbin running and checked the status of posts after each hourly download of new headers until it showed a post that was incomplete. I waited an hour, and after new headers were downloaded, the post in question was still showing that it was incomplete. I clicked on the plus symbol, and it showed 7 rar files that were idle/new, and one that was incomplete. All were tagged 26 minutes old. I set overlap to 1000, and got new headers, nothing changed.
I tried 5000, still no joy. 10000: nope 20000: nope 50000: nope ..... 100000: all 27 rar pieces for the post now were now idle/new , all tagged 35 minutes old.

I assume the value depends on both how far back in time the missing headers arrived at the server, and how much new stuff has arrived since.

Since I run Newsbin mostly in the background, while I do useful work, I really don't care if downloading headers takes extra time, so solutions B and C are both good.

Thanks for the support!
fliptron
Occasional Contributor
Occasional Contributor
 
Posts: 11
Joined: Wed Jun 05, 2013 8:13 am

Registered Newsbin User since: 05/20/13

Re: Recent (but not newest) headers on server but not downlo

Postby Quade » Mon Jan 13, 2014 4:58 pm

Additional downloads of new headers also failed to get these missing headers


They wouldn't. Header retrieval is a sliding window that always moves forward. When you download headers, this sets the high water mark so, the next header download starts from the new high water mark.

I tried 5000, still no joy. 10000: nope 20000: nope 50000: nope ..... 100000: all 27 rar pieces for the post now were now idle/new , all tagged 35 minutes old


If you think about the high water mark, with each header downloading moving it forward, you can see how a really large overlap is like a "Use Download Age". Your large overlap worked because you forced it to cover the high water marks from several previous header downloads. What this really means is that a overlap of 1000 only really applies to the last header download. not to header downloads before the last one.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97


Return to V6 Technical Support

Who is online

Users browsing this forum: No registered users and 2 guests