[clue-tech] CLUE Talk Mailing list mbox file too big to rsynch
Angelo Bertolli
angelo.bertolli at gmail.com
Sun Feb 1 19:45:58 MST 2009
On Sun, Feb 1, 2009 at 6:56 PM, David L. Anselmi <anselmi at anselmi.us> wrote:
> Jed S. Baer wrote:
>
>> And looking around in the clue-talk directory on the server, I see
>> nothing amiss. But in a tree that size, it'd be easy not to notice it
>> using visual inspection.
>>
>
> You could do:
>
> find . -type f | sort | xargs cksum
>
> on both the server dir and your dir and see how they compare. That would
> tell you whether the sync worked (modulo any more recent changes) and also
> exercise the server file system.
>
I wrote a findd (find duplicate) script for this sort of thing. It's old
and probably makes excessive use of files, but it works.
#!/bin/bash
sumexec=/usr/bin/md5sum
tempfile=/tmp/$(date | $sumexec | cut -d " " -f1)
echo "tempfiles: $tempfile"
# Generate md5 sums
find . | while read file
do
if [ -f "$file" ]
then
echo "checking $file"
$sumexec "$file" >> $tempfile.sums
fi
done
# Sort files
sort $tempfile.sums > $tempfile.sorted
# Get unique entries
awk '{print $1}' $tempfile.sorted > $tempfile.sums
uniq $tempfile.sums > $tempfile.uniq
echo "The following files have matching checksums"
echo "This means they MIGHT be duplicates."
echo "--------------------"
diff $tempfile.sums $tempfile.uniq | fgrep "<" | cut -d " " -f2 >
$tempfile.results
uniq $tempfile.results | while read file
do
fgrep $file $tempfile.sorted
echo ""
done
echo "--------------------"
rm $tempfile.*
exit 0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cluedenver.org/pipermail/clue-tech/attachments/20090201/d586c74c/attachment.html
More information about the clue-tech
mailing list