The fun and exciting world of offsite data backups

Scanning all these magazines means I also have to back them up. If the whole point is to preserve them, it’s kinda dumb just to palm the files off to the Internet Archive, delete my copies and hope for the best.

Whilst the Internet Archive is great and all, there’s a non-zero chance that one day they’ll have funding cut or donations dry up and they’ll need to rationalise their storage. That could mean nothing but PDFs stick around, certain types of content disappear or some other situation where the high quality, large file size originals are lost.

What am I gonna do with this data???

I thought I’d take a good look at my off-site backup options based around storing 5TB for 10 years. I don’t know exactly how much data I’ll be hoarding over that time-span, but 5TB for a decade feels like a good starting point for comparison purposes.


My scanning & upload workflow

The short version of this post is simply:

  • Cut the spine off the magazine
  • Scan it in at 600DPI as an uncompressed TIFF
  • Convert those TIFFs into a bunch of JPGs inside a PDF
  • Put all the TIFFs into a ZIP file
  • Upload the PDF and ZIP file to the Internet Archive
  • Backup the original scans

But that’s no use to you if you’ve never scanned in a magazine or book before and want to get started, is it? Let me break down each step and explain the methods that have worked for me.