Page 1 of 1

Header file import order

PostPosted: Wed Mar 04, 2015 8:44 pm
by ghend0
While downloading all headers for a group, I noticed that the import order for the header files is not oldest first. It appears that the most recently downloaded header file is the one that is imported next. The header files are downloaded far more quickly that they are imported. As a result, the database is loaded starting with the oldest headers, but then as importing continues, the import order skips ahead to the next most recent file leaving older headers waiting. After downloading is complete, the remaining header files are imported in reverse download order. This would seem to leave the database file with the oldest headers at both ends and the newest somewhere in the middle. I am wondering if this has any impact on the efficiency of the database when taking into account how it is used by Newsbin. Would there be any advantage to importing the headers in chronological order?

Re: Header file import order

PostPosted: Wed Mar 04, 2015 11:46 pm
by Quade
The benefit of this order is that users see the newest headers first. I'm not aware of any performance issues. Might be interesting to see if there are any.

The order is also changed depending on what group tab list is currently visible. It'll import the newest headers for the currently displayed tab before it imports any others.

I've made a change to how imports are displayed now though so, it's possible I don't need this reverse order anymore. At one point "Display Age" would prevent you from seeing new imports that were older than the display age. I've removed that filter for newly downloaded headers.

I'll experiment with it.

Re: Header file import order

PostPosted: Fri Mar 06, 2015 8:43 pm
by ghend0
Thank you for taking a look at it. If there would be some improvement in performance, maybe a switch could be added to toggle between the two methods.

Re: Header file import order

PostPosted: Fri Mar 06, 2015 9:56 pm
by Quade
I've already switched to oldest to newest. I confirmed that the imported posts show up then made the change.

Re: Header file import order

PostPosted: Sat Apr 04, 2015 9:11 pm
by ghend0
I installed the beta release to take advantage of the fix. It is loading headers chronologically, until the article number number rolls over and adds a digit. At that point, the load order looks to be alpha. While better than the previous import order, it is still leaving earlier posts behind for the end. Boneless has about 13 billion posts. Could the article numbers in the .gz filenames be padded to 11 or 12 digits to take care of this?

alt.binaries.boneless-usenetserver-97558837-98558837.gz
alt.binaries.boneless-usenetserver-98558837-99558837.gz
alt.binaries.boneless-usenetserver-99558837-100558837.gz
alt.binaries.boneless-usenetserver-267558837-268558837.gz
alt.binaries.boneless-usenetserver-268558837-269558837.gz

Re: Header file import order

PostPosted: Sun Apr 05, 2015 8:23 am
by Quade
I'll have to look into it. I'd have to change how I generate the filename since the code I use doesn't do leading zero's. Not saying its hard, just a change.

Re: Header file import order

PostPosted: Sat Apr 18, 2015 9:50 pm
by ghend0
Looks good now. Thanks for making the changes.