[clue-tech] CLUE Talk Mailing list mbox file too big to rsynch

Angelo Bertolli angelo.bertolli at gmail.com
Sun Feb 1 19:45:58 MST 2009


On Sun, Feb 1, 2009 at 6:56 PM, David L. Anselmi <anselmi at anselmi.us> wrote:

> Jed S. Baer wrote:
>
>> And looking around in the clue-talk directory on the server, I see
>> nothing amiss. But in a tree that size, it'd be easy not to notice it
>> using visual inspection.
>>
>
> You could do:
>
> find . -type f | sort | xargs cksum
>
> on both the server dir and your dir and see how they compare.  That would
> tell you whether the sync worked (modulo any more recent changes) and also
> exercise the server file system.
>

I wrote a findd (find duplicate) script for this sort of thing.  It's old
and probably makes excessive use of files, but it works.


#!/bin/bash

sumexec=/usr/bin/md5sum
tempfile=/tmp/$(date | $sumexec | cut -d " " -f1)

echo "tempfiles: $tempfile"

# Generate md5 sums
find . | while read file
do
   if [ -f "$file" ]
   then
      echo "checking $file"
      $sumexec "$file" >> $tempfile.sums
   fi
done

# Sort files
sort $tempfile.sums > $tempfile.sorted

# Get unique entries
awk '{print $1}' $tempfile.sorted > $tempfile.sums
uniq $tempfile.sums > $tempfile.uniq

echo "The following files have matching checksums"
echo "This means they MIGHT be duplicates."
echo "--------------------"
diff $tempfile.sums $tempfile.uniq | fgrep "<" | cut -d " " -f2 >
$tempfile.results
uniq $tempfile.results | while read file
do
   fgrep $file $tempfile.sorted
   echo ""
done
echo "--------------------"

rm $tempfile.*

exit 0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cluedenver.org/pipermail/clue-tech/attachments/20090201/d586c74c/attachment.html


More information about the clue-tech mailing list