Why is there no standard Duplicate file finder in Puppy?

Using applications, configuring, problems
Post Reply
Message
Author
Puppyt
Posts: 907
Joined: Fri 09 May 2008, 23:37
Location: Moorooka, Queensland
Contact:

Why is there no standard Duplicate file finder in Puppy?

#1 Post by Puppyt »

THAT old chestnut. Rears its ugly head periodically in the forums, but has been quiet lately despite my searches revealing no recent advances on command-line offerings of fdupes, fslint, rdupes etc. There is a gui for dupf that I downloaded via the Ubuntu repos with PPM (dupfinder 0.8 ) but I am returning an error with the 32-bit libraries on board

Code: Select all

dupfgui: error while loading shared libraries: libtiff.so.4: wrong ELF class: ELFCLASS64   
. I don't know how to address that problem - chasing realworld deadlines again and its just so frustrating that there doesn't appear to be a straightforward solution in Puppy Linux.
-Aha - finally got fdupes working, had problems with spaces in directory names that was borking my commands. I'll see how that goes. jdupes is a new fork on github that claims to be 10x faster, but that is a battle that I can't choose at this point in time.

I intend to revisit this topic periodically with results of my spring-cleaning attempts more thoroughly, but anyone with a satisfactory duplicate file finder solution (preferably GUI) please post your workarounds. I guess my main gripe is that it seems strange not to have a built-in application within Puppy. I wonder how difficult it would be to expand pFind's "advanced" functions into duplicate file finding??
Thanks in advance for your input.
Search engines for Puppy
[url]http://puppylinux.us/psearch.html[/url]; [url=https://cse.google.com/cse?cx=015995643981050743583%3Aabvzbibgzxo&q=#gsc.tab=0]Google Custom Search[/url]; [url]http://wellminded.net63.net/[/url] others TBA...

User avatar
trapster
Posts: 2117
Joined: Mon 28 Nov 2005, 23:14
Location: Maine, USA
Contact:

#2 Post by trapster »

This will list dupe files in a console.
Run this in the directory you want to search,

Code: Select all

find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
trapster
Maine, USA

Asus eeepc 1005HA PU1X-BK
Frugal install: Slacko
Currently using full install: DebianDog

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#3 Post by MochiMoppel »

trapster wrote:Run this in the directory you want to search
...and go out for lunch :lol:
At least on my system this is VERY slow and I wonder why this monster was overwhelmingly voted as best solution on commandlinefu.com.
The claim that it "saves time by comparing size first, then md5sum" sounds convincing but unfortunately isn't true. My result for directory /usr/share:
real 2m56.738s
user 0m33.748s
sys 2m22.264s

The No.3 on the list is much shorter and faster. Slightly modified for camparibility:

Code: Select all

find -not -empty -type f -exec md5sum '{}' ';' | sort | uniq -w32 --all-repeated=separate
real 0m24.203s
user 0m0.767s
sys 0m2.293s

That was much better, but still not good. This would be really fast:

Code: Select all

find  -not -empty -type f -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
real 0m1.235s
user 0m0.627s
sys 0m0.737s

.
Last edited by MochiMoppel on Wed 17 Jun 2020, 13:03, edited 2 times in total.

Puppyt
Posts: 907
Joined: Fri 09 May 2008, 23:37
Location: Moorooka, Queensland
Contact:

#4 Post by Puppyt »

Thanks trapster :) Stil on the hunt for a GUI of some sort for assessing which backups/duplicates to delete - dupfgui might be the best of a (?) Linux bunch.

UPDATE: Thanks for the suggested tweaks to the code, MochiMoppel - we posted simultaneously, it seems...
Search engines for Puppy
[url]http://puppylinux.us/psearch.html[/url]; [url=https://cse.google.com/cse?cx=015995643981050743583%3Aabvzbibgzxo&q=#gsc.tab=0]Google Custom Search[/url]; [url]http://wellminded.net63.net/[/url] others TBA...

hamoudoudou

Dupfinder installed tahrpup 5.8.3

#5 Post by hamoudoudou »

Dupfinder installed tahrpup 5.8.3 (home made Puplet) to check files sored on usb . l
Attachments
quickpet.png
Listed in menu 'utility' once installed
(67.99 KiB) Downloaded 234 times

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#6 Post by MochiMoppel »

I wrote:That was much better, but still not good. This would be really fast:

Code: Select all

find  -not -empty -type f -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
I hate to quote myself but in this case I have to because I was wrong. My code is fast when there are many relatively small files to compare like in /root or /usr/share. However when scanning directories with relatively large files like audio or image files my approach to test every file is much too slow. In such cases it makes sense to search for files with same size and then - in a second step - check only these same-size-files for duplicates.

My problem with the code posted by trapster is the apparent inefficiency. Uses find command multiple times, first to find same-size-files and then for each size to find matching files.

After some tinkering I found a way to use find only once. This makes the code almost 10 times faster than the trapster/commandlinefu.com code:

Code: Select all

find  -not -empty -type f -printf '%12s\t%p\n' | sort -n | uniq -Dw12 | cut -f2- | xargs  -d '\n' md5sum | sort | uniq -w32 --all-repeated=separate

Puppyt
Posts: 907
Joined: Fri 09 May 2008, 23:37
Location: Moorooka, Queensland
Contact:

#7 Post by Puppyt »

Thanks MochiMoppel :) Much appreciated!
Search engines for Puppy
[url]http://puppylinux.us/psearch.html[/url]; [url=https://cse.google.com/cse?cx=015995643981050743583%3Aabvzbibgzxo&q=#gsc.tab=0]Google Custom Search[/url]; [url]http://wellminded.net63.net/[/url] others TBA...

Post Reply