Duplicate Detector Question

Technical support and discussion of Newsbin Version 6 series.

Duplicate Detector Question

Postby jazzman4x » Tue Jun 18, 2013 2:32 pm

How susceptible is the Duplicate Detector to false positives? Say, if I have downloaded millions of files. Does it use a fairly robust hash like SHA or MD5 with minimal collisions or just CRC32 with the probability of lots of collisions?
jazzman4x
Occasional Contributor
Occasional Contributor
 
Posts: 21
Joined: Sat Dec 15, 2012 12:15 am

Registered Newsbin User since: 01/04/09

Re: Duplicate Detector Question

Postby Quade » Tue Jun 18, 2013 2:54 pm

That hash is RipemD

It doesn't use the whole file though. Just the first 24K I believe. The hash is at least as good as MD5 (I think better). Any file with identical prefixes would get filtered out. In my experience, it just doesn't happen. Possible but, rare.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Duplicate Detector Question

Postby Ribble » Thu Jun 20, 2013 9:12 pm

Wundows 7, AMD Quadcore
Newsbin 6.50B11 2854

Benn happening for last several versions.

Dupe detector doesn't let me "Read Message" twice, even if different messages with same "re: header".

I think, to read another "re:" with same header, I have to disable dupe detector, then set prior read messages to "new".

Otherwise, the dupe detector works very well, very aggressive. Thank you.
Ribble
Active Participant
Active Participant
 
Posts: 92
Joined: Sun Jul 03, 2011 10:22 pm

Registered Newsbin User since: 05/04/09

Re: Duplicate Detector Question

Postby Quade » Thu Jun 20, 2013 9:58 pm

Dupe detector doesn't let me "Read Message" twice, even if different messages with same "re: header".


It's not used for non-binary posts so, there's probably some other issue. It's possible all the identical subject posts are getting lumped together and only one gets pulled.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Duplicate Detector Question

Postby Ribble » Thu Jun 20, 2013 10:31 pm

The "re" message goes into Failed Files.

Messge Status says "Failed Download {PAR Process] HFC: Filename In First Chink Error: Re: (filename) 0%: Completed

May have something to do with marking read message as old.

These are replies to my uploads. Have a lot of RED subject lines, preceeded with "!" in red circle.

I will try to narrow in on what is happening.
Last edited by Ribble on Thu Jun 20, 2013 10:45 pm, edited 1 time in total.
Ribble
Active Participant
Active Participant
 
Posts: 92
Joined: Sun Jul 03, 2011 10:22 pm

Registered Newsbin User since: 05/04/09

Re: Duplicate Detector Question

Postby Quade » Thu Jun 20, 2013 10:42 pm

Are you downloading them or Ctrl-Ring them? Ctrl-R is how you read without actually downloading the file. If you're actually downloading them then it would apply.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Duplicate Detector Question

Postby Ribble » Thu Jun 20, 2013 10:46 pm

Ctrl Right, menu, Read Message.

Many are working now. Must have something to do with the fact that I uploaded files and the "Re" are replies with same header.

There are also a lot of connections. I counted 24 open connections during upload. Server is set "4".
No, my error. "Connections says "4". I was counting blocks - there were 24 blocks being posted. Now there are 12, 13, 14 blocks being downloaded, Connections still says "4".
Ribble
Active Participant
Active Participant
 
Posts: 92
Joined: Sun Jul 03, 2011 10:22 pm

Registered Newsbin User since: 05/04/09

Re: Duplicate Detector Question

Postby Quade » Fri Jun 21, 2013 1:01 am

There's some lookahead. It reduces CPU load. That's what you're seeing. blocks prepped for download that aren't downloading yet.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Duplicate Detector Question

Postby Ribble » Fri Jun 21, 2013 2:49 pm

Thanks Quade.

I think what is happening is "Free" RAM memory as shown in Windows Task Manager / Performance tab drops down near or at zero, Read Message fails. Shows 16382 MB RAM installed and "Available" is always around 12 GB. Not sure what "Free" and "Available" mean and I watch these often as I only reset this computer once per week and usually have about 64 processes and one or two dozen applications running. Yes, multitasking is fun.

When I run a program called FreeRam, I get a GB back as "Free" and Read Message works again, though takes a few minutes as though the command gets put at the bottom of the queue.
Ribble
Active Participant
Active Participant
 
Posts: 92
Joined: Sun Jul 03, 2011 10:22 pm

Registered Newsbin User since: 05/04/09

Re: Duplicate Detector Question

Postby Quade » Fri Jun 21, 2013 3:24 pm

If all your RAM is getting used up, running programs get their RAM flushed to disk and it takes time for that RAM to get pulled back in. That's probably what the delay is. If I was you, I'd want to know why you're running out of RAM.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Duplicate Detector Question

Postby Ribble » Fri Jun 21, 2013 4:41 pm

Firefox is the big user. Two open Firefox browsers.

Newsbin sits in the backround of this (main) computer and just keeps chugging along, quietly doing it's thing. Very robust. Works so well, that when (rarely) something doesn't work, I pay attention and try to figure out why.

Thanks again Quade.
Ribble
Active Participant
Active Participant
 
Posts: 92
Joined: Sun Jul 03, 2011 10:22 pm

Registered Newsbin User since: 05/04/09

Re: Duplicate Detector Question

Postby Quade » Fri Jun 21, 2013 7:43 pm

The android code is pretty much like a service. I've been thinking about making Newsbin a service now in order to really have it running in the background. The GUI would be optional and you'd be able shutdown the GUI and/or even log out and it would keep doing it's thing. I'm thinking about doing that to make the Android GUI easier to test.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Duplicate Detector Question

Postby Ribble » Fri Jun 21, 2013 8:10 pm

Do not take all the fun out of Newsbin. Some of us like to watch it do it's thing.

I spend at least an hour per week just watching the Newsbin GUI uploading or downloading, sometimes comparing with Networking I/O display on Windows Task Manager. Newsbin is the default window for my secondary HD LCD and is there when nothing else is.

Don't want to complain, but 6.50B11 2854 has all the Icons ghosted. I know what they are from their location, otherwise I would be lost.

Ah well, since I am complaining.... Is there a way to see cross-posted groups under the Groups tab on main window?

And it took me 2 years to figure out how to cross-post (select two groups in Groups List and Post to Group).
Ribble
Active Participant
Active Participant
 
Posts: 92
Joined: Sun Jul 03, 2011 10:22 pm

Registered Newsbin User since: 05/04/09

Re: Duplicate Detector Question

Postby Quade » Fri Jun 21, 2013 8:35 pm

Naw, you'd still be able to watch. The engine just wouldn't be in the GUI.

I'd split the actual downloading part into a service and then access it remotely with the GUI so, you can still turn on the GUI and interact with it normally but, when you're done you can exit the GUI and it'll keep working in the background. Some people use a shared computer and might not want others to know what they're downloading for example.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Duplicate Detector Question

Postby Ribble » Fri Jun 21, 2013 9:30 pm

From my point of view, I like to see everything contained within the program itself. If all I were running were Newsbin, then it would be easy to see that anything running would have to do with Newsbin.

In the real world, when something happens like I pick up a virus, spyware, Adware, etc., it is hard enough to see what should be there and what should not be there. Every now and then I go into Task Manager and "End Process" on things I haven't seen before, just to see what happens. And VLC for example, sometimes I get it so screwed up that it won't run until I "End Process". Re-booting the computer would fix the problem, but it takes about 20 minutes to set up the normally running programs after re-boot.

So, I am not violently against having an engine with external controls, but for me it only adds complexity. If you put hooks in the engine to make it also work with external controls, that is fine.
Ribble
Active Participant
Active Participant
 
Posts: 92
Joined: Sun Jul 03, 2011 10:22 pm

Registered Newsbin User since: 05/04/09

Re: Duplicate Detector Question

Postby Quade » Fri Jun 21, 2013 11:00 pm

Well, you wouldn't have to use it. Android already runs pretty much exactly the same code you're running on your PC for download and repair. This would just be another flavor of Newsbin. Regular Newsbin would still be around.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Duplicate Detector Question

Postby Ribble » Sat Jun 22, 2013 12:07 am

Quade, I am a poor choice to be in this discussion. No cell phone, no pad, no laptop that works. Simply two heavy hitter desktops and four HD LCDs. 20 years ago was an expert with assembly language programming. Seven years since my last micro-controller build. Now it is logic gates, LEDs, MOSFETs, crystals, and soldering things under a microscope.
Ribble
Active Participant
Active Participant
 
Posts: 92
Joined: Sun Jul 03, 2011 10:22 pm

Registered Newsbin User since: 05/04/09

Re: Duplicate Detector Question

Postby Quade » Sat Jun 22, 2013 8:18 am

You'll get there. My prediction is that Android will eventually replace windows for most people. 99% of the people in the world don't need a PC at home. Powerful tablet, wifi and the right software will serve most of their purposes.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44951
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97


Return to V6 Technical Support

Who is online

Users browsing this forum: No registered users and 3 guests

cron