Database Duplicates

Technical support and discussion of Newsbin Version 6 series.

Database Duplicates

Postby BZee » Thu Oct 01, 2015 8:07 am

I downloaded several thousand image posts both to individual files and to the image DB. When finished I used Clonemaster to check the individual files for duplicates (I have duplicate detection turned off as I don't mind duplicates form the past - but would prefer not getting duplicates in the same session). I had several hundred exact duplicates - some with the same name and some with different names. After exporting the DB pictures to individual files, I checked them for duplicates - total was zero. Does the DB eliminate duplicates by checking bitwise as well as by name.
BZee
Seasoned User
Seasoned User
 
Posts: 459
Joined: Thu Sep 27, 2001 9:10 pm
Location: California

Registered Newsbin User since: 04/13/03

Re: Database Duplicates

Postby Quade » Thu Oct 01, 2015 9:27 am

The files are duplicate detected on insert. So no matter the name or the subject field, only one of the same file is permitted into the DB. Unlike the main duplicate detector it uses the whole file MD5. If you add files, and then delete them from the DB, new files with the same signature can be added. It doesn't have a historical record of files. It just matches against the files that are currently in the DB.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44988
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Database Duplicates

Postby BZee » Fri Oct 02, 2015 6:36 am

One other problem with this area. I compared to files exported from the DB to the individual files. All the files in the DB were duplicates of the individual files but 144 of the individual files were not in the DB. Repeated with another download session: 7536 individual image files downloaded (no duplicates). Files exported from the DB totaled 7479 so 57 of the downloaded images did not make it into the DB.
BZee
Seasoned User
Seasoned User
 
Posts: 459
Joined: Thu Sep 27, 2001 9:10 pm
Location: California

Registered Newsbin User since: 04/13/03

Re: Database Duplicates

Postby Quade » Fri Oct 02, 2015 10:43 am

This seems to go back to your original issue where you were missing thumbnails. You mentioned that all the files ended up on disk and only the thumbnails didn't work. I'm wondering now if that was really the case and whether the files simply didn't land on disk at all. All of your symptoms, this, the lack of thumbnails when you don't use the image DB, suggests you have a number of files that simply don't land on disk properly.

The image DB gets fed after the file is assembled but before the file is closed the first time.

Might be interesting to try this test with a completely empty DB.

1 - I started with a clean download folder.

2 - I added 8135 Image files to the download list.

3 - When they finished downloading, I checked and there are 8135 files in the folder.

4 - I exported the image DB for this set of files and had 8080 files in the new folder.

So the next step for me is to do a duplicate detect pass on the 8135 original files and see if duplicates would explain why I'm missing 55 files from the ImageDb files. It's 55 files from a specific set of files. It's not a random occasional missing file.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44988
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Database Duplicates

Postby Quade » Fri Oct 02, 2015 3:46 pm

Ok, using a dup detector, I found 55 identical files in the downloads. That's why the Image DB had 55 fewer files in it than the download folder.

I'm not clear if the dup rejection is what you want or not what you want from your comments. In this case it was a two person image set and it had been posted with each person's name.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44988
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: Database Duplicates

Postby BZee » Sat Oct 03, 2015 5:51 am

BZee wrote: 7536 individual image files downloaded (no duplicates). Files exported from the DB totaled 7479 so 57 of the downloaded images did not make it into the DB.

7536 were the files on disk and none were duplicates (checked with Clonemaster which does a bit by bit comparison of the files).

Quade wrote:I'm not clear if the dup rejection is what you want or not what you want from your comments.

I was expecting results like yours - same number of files by both methods after duplicate detection.
BZee
Seasoned User
Seasoned User
 
Posts: 459
Joined: Thu Sep 27, 2001 9:10 pm
Location: California

Registered Newsbin User since: 04/13/03

Re: Database Duplicates

Postby BZee » Sat Oct 03, 2015 6:01 am

Quade wrote:This seems to go back to your original issue where you were missing thumbnails. You mentioned that all the files ended up on disk and only the thumbnails didn't work. I'm wondering now if that was really the case and whether the files simply didn't land on disk at all.


The files that didn't display thumbnails did get written to disk. I viewed them via ACDSee to see if I wanted to download the rest of the series (normally I select 6 posts in the series and look at the thumbnails as the files are downloading). When viewing in ACDSee is when I noticed that the ones that did not display a thumbnail also did not have a description while the other files did.
BZee
Seasoned User
Seasoned User
 
Posts: 459
Joined: Thu Sep 27, 2001 9:10 pm
Location: California

Registered Newsbin User since: 04/13/03


Return to V6 Technical Support

Who is online

Users browsing this forum: Google [Bot] and 2 guests

cron