I don’t know about you, but data management can get a bit of a head-ache sometimes. At work we’re in a situation where our managed backups failed to back up MySQL databases, as it dictates that we need twice the filesize of the database directory to back them up (assuming it copies them, then compresses them, then copies them to another machine). This is quite annoying really, as if your databases take up say 10G, you need 20G free to back them up!

Obviously sites grow, as do databases. One of the main things that we like to clear out is error_log files. Some of the sites’ developers seem too lazy to turn error reporting off (or the site was built before we knew what error_reporting was!) and the files can get quite large when a site has been running for years even if its just logging that your favicon is missing, let alone everything else!

Anyways, back to the point!

How do you find files over a certain size that you can delete? Well, a little bit of research led me to this:

A BASH Script to Find Large Files on a Linux Server

Which has an SSH script from those server daddies over at Rackspace!

# if nothing is passed to the script, show usage and exit
[[ -n "$1" ]] || { echo "Usage: findlarge [PATHNAME]"; exit 0 ; }
# simple using find, $1 is the first variable passed to the script
find $1 -type f -size +100000k -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'

Save the file (ex: findlarge.sh) and then make it executable with the following command:

chmod a+x findlarge.sh

Now run the script and send any output to a new file (largefiles.txt):

./findlarge.sh / > largefiles.txt &

Good stuff! The script can be easily modified to look for larger files. Currently it looks for over 100000k (100mb).

Image Credit: torkildr