Survived today so far
If it survives the nightly jobs at 0700 tomorrow, but falls over later in the day, I think we'll have another piece of the puzzle. On Monday, Wednesday, and Friday, a script deletes all the copies of photos (thumbnails, full-sized, AvB-sized, etc., but obviously not the originals) in order to keep the disk space usage manageable. However, that obviously involves a lot of disk I/O. It's entirely possible that that's what killed the site on Monday, and might kill it tomorrow.
Quite
why this is a problem now, I have no idea. It may simply be that there are so many photos that, even with lower traffic than we're used to (not least because the site keeps falling over), so many get viewed, appear in searches, etc., that deleting them takes us over some limit, especially when the backup is running. Paradoxically, the answer to this might be to run the job more often, giving it less to do each time.
Kind of feeling my way through this one, as you can tell. But we'll get there.
My friend and I applied for airline jobs in Australia, but they didn't Qantas.