BASH Script to clean duplicate files

I have folders that contain some backup files and configs. many of these backups are taken daily, and most of them don't change daily. The folder looks like this

config_A_2021-01-19.txt
config_A_2021-01-18.txt
config_A_2021-01-17.txt
config_B_2021-01-19.txt
config_B_2021-01-18.txt
config_B_2021-01-17.txt

It is possible that the first 3 files are identical, and the last 3 files are identical. I needed a script to clean them up, by keeping only the most recent files.

config_A_2021-01-19.txt
config_B_2021-01-19.txt

So, here we go:

declare -A hashes
for file1 in *
do
  sha256_output=$(sha256sum $file1|awk '{print $1}')
  if [ -z "${hashes[$sha256_output]}" ]; then
    hashes[$sha256_output]=$file1
  else
    echo "Duplicate with files:"
    file2="${hashes[$sha256_output]}"
    echo $file1
    echo $file2
    echo '---------------------'
    if [ "$file1" -nt "$file2" ]
    then
      printf '%s\n' "$file1 is newer than $file2"
      rm $file2
      hashes[$sha256_output]=$file1
    else
      printf '%s\n' "$file2 is newer than $file1"
      rm $file1
      hashes[$sha256_output]=$file2
    fi
  fi
done

TODO: The only current problem with the script, is that if it finds a directory in the folder, it will error. To be fixed later.


About Me

My name is Omar Qunsul. I write these articles mainly as a future reference for me. So I dedicate some time to make them look shiny, and share them with the public.

You can find me on twitter @OmarQunsul, and on Linkedin.


Homepage