Do any cloud providers create backups on top of replication though? Backing up d...

tjohns · on Aug 18, 2020

Google talks a bit about their general backup strategy in the SRE book:

https://landing.google.com/sre/sre-book/chapters/data-integr...

The short version is that the backups are a mix of short-term local backups + backups across the network on distributed filesystems + offline tape backup.

I can't speak to Drive, but the tape backup is certainly used for GMail. (There's a case study about Gmail having to restore from backup in the above link.)

rsync · on Aug 18, 2020

"Do any cloud providers create backups on top of replication though?"

We do.[1]

For exactly 1.75x our normal pricing, we will replicate your entire account, nightly, to a geo-redundant site which is not open for normal customer use (and, therefore, has a lower risk profile). This GR site is the he.net core datacenter, in Fremont.

It's also worth pointing out that replication to rsync.net buys you malware/ransomware protection since your account is snapshotted, by ZFS, nightly, and those snapshots are immutable (read-only).

[1] rsync.net

RantyDave · on Aug 20, 2020

I've been wondering about ransomware protection through snapshots. Presumably (and I do know more or less nothing about it) the malware aspect of it is present on the system significantly before the ransomware aspect is triggered - so restoring to yesterday's backup just puts you back in the position to get pwn3d again? How do companies get around this?

slim · on Aug 19, 2020

No you don't. What would you do if all the files of your customers that did not pay for backups are lost ?

nix23 · on Aug 19, 2020

>customers that did not pay for backups are lost

But they pay for it.

sandworm101 · on Aug 18, 2020

>> we're talking maybe exabytes

I don't see the size as the big problem. If one cloud can handle the size, the backup storage system can too. For me, the real issue is timing. How do you backup a cloud full of constantly changing data? Do you draw a line in the sand, an image of the cloud state at a particular moment? Is that even possible? You have to do backups of smaller chunks, individual accounts, but eventually that just looks like another software-managed internal array structure rather than a true duplicate. Your backup system is just as susceptible to deletion error as the cloud it lives within.

Mandatory xkcd: https://xkcd.com/1737/

rsync · on Aug 18, 2020

"For me, the real issue is timing. How do you backup a cloud full of constantly changing data? Do you draw a line in the sand, an image of the cloud state at a particular moment?"

I can only describe what we do, and, of course, there is an enormous scale difference here, but ...

Every single rsync.net account is it's own ZFS filesystem which means every single account gets snapshotted[1] nightly on a schedule. This means that the enormous operation of "backing up" all of rsync.net happens in small, manageable, chunks.

Of course, the ZFS snapshots of a customer account are not "backups" per se, but if a customer choose "geo-redundant" storage in another facility, it is those very same snapshots that we (zfs) send over. Those are, indeed, backups.[2]

The most interesting part, in my opinion, is that the daily/weekly/monthly snapshots are immutable. So you can publish your rsync.net credentials or suffer a disgruntled employee or ransomware attack, etc., and those snapshots remain safe - they are read-only.

[1] Accessible, browseable, in ~/.zfs/snapshot

[2] GR storage costs 1.75x normal pricing.

RantyDave · on Aug 20, 2020

So it does not assert that the filesystem is in a consistent state before it snapshots. IE we could be midway through applying a database transaction; or there could be a reference in one file to another that doesn't exist (yet).

It could be argued that one needs to do a site wide "write cache flush", stop, and snapshot. Not that I think for one second the service provider should (or could) be in a position to detect when a good time to snapshot might be....

JulianWasTaken · on Aug 18, 2020

It can't be "just" as susceptible as what happened here.

If you even do the straightforward "don't freeze time anywhere, just rolling copy all the files onto a hard medium", you may end up with logically inconsistent data, in that files from later in the backup may actually not have existed alongside ones from earlier ones, but if you write down the timestamps periodically, you end up with a way better end state than "I lost all the files".

At worst, you end up with "I may have lost the last 3 minutes worth of files from the last 3 minutes of the last backup I did".

returningfory2 · on Aug 18, 2020

> Do any cloud providers create backups on top of replication though?

Yes - I work at one of the FAANGs for a team that's doing precisely this. We develops an internal disaster recovery tool that creates backups of data files that can't be touched by the creating application, and that can be read back in a disaster event to recover the data.

xyzzy_plugh · on Aug 18, 2020

Are those backups for you or for your customers? The parent is referring to cloud providers, e.g. AWS S3.

returningfory2 · on Aug 18, 2020

zxcvbn4038 · on Aug 18, 2020

I’m sure this is dated but here is a write up in Google’s backup system from High Scalability - http://highscalability.com/blog/2014/2/3/how-google-backs-up...

kevstev · on Aug 18, 2020

Just using AWS S3 as an example, you can copy a bucket to another bucket (preferably in a different region) with a policy of retaining all versions- preventing the problem of a delete whether malicious or accidental, from being reflected in the backup copy.

There is also the option of using Glacier for lower cost long term storage.

If you are asking does regular S3 provide a true backup capability out of the box, it does not, or at least did not last time I actively looked at this about 2 years ago.

JulianWasTaken · on Aug 18, 2020

We don't think AWS does periodic offline backups of stuff stored in S3 so that they don't find themselves in this exact embarrassing scenario? Regardless of whether it's user-facing for AWS users, I'd hope they do, or certainly that they did before they got a good sense of the long-term reliability of S3 as a big system.

munk-a · on Aug 18, 2020

I think they do all sorts of backup stuff - and as a consumer of S3 you need to evaluate how resilient their backing up is. S3 is pretty murky about permanence guarantees so I'd always look at making sure there is a separately maintained replication script to some medium I have control over if that data is irreplaceable and the costs of losing it are significant to the business.

Judging those costs and making that call is a complex matter of course.

JulianWasTaken · on Aug 18, 2020

What would such a service then do as an end-user beyond what `aws s3 sync s3://my/bucket /mnt/my/backup/drive/` does?

(Honest question, not provocation. To me either you trust AWS or any party to never lose your data [unwise], or you basically ask them to offer a way to rsync, which AWS does)

munk-a · on Aug 18, 2020

I think that's essentially what such a service would do. You might throw in some periodic less frequent syncs (maybe you sync down to the main backup every day and sync that backup to a secondary backup weekly or monthly) and maybe some of those successive syncs are done to hosts that are usually disconnected from the network to add in a firebreak.

Johnny555 · on Aug 19, 2020

They publish durability numbers for S3:

designed to provide 99.999999999% durability of objects over a given year

oneplane · on Aug 18, 2020

Well, versioned bucket replication is essentially backup when you configure it in a way that the writer to the first bucket can't do any operations on the second bucket. Since bucket replication doesn't replicate deletion markers you essentially end up with your data being duplicated instead of just replicated writes.

jeffbee · on Aug 18, 2020

A thing stored in google cloud is generally erasure coded, which provides redundancy against the failure of individual devices or hosts, and another copy is stored separately in a geographically separate place. So you might think of it as there being 1.7 copies of your file in each of at least two places.

ignoramous · on Aug 18, 2020

> Do any cloud providers create backups on top of replication though?

Any serious database does offer backups in addition to replication, and that kind of validates OP's point. As for object stores there are a variety of techniques [0].

> But an entire cloud storage for photo and video for millions, we're talking maybe exabytes. The notion of making "separate" backups seems cost-prohibitive.

Cold-stores like the one Facebook built would be apt [1].

> I know there must be redundancy in case a disk fails, but do they keep 2 instances of your data or more? And are the instances spread out geographically, some or all?

See [0]. Backblaze do share quite a bit too about the space, too [2][3]. With minio, one could run their S3-like system on-premise [4].

[0] https://maisonbisson.com/post/object-storage-prior-art-and-l...

[1] https://engineering.fb.com/core-data/under-the-hood-facebook...

[2] https://news.ycombinator.com/item?id=10540361

[3] https://news.ycombinator.com/item?id=17550837

[4] https://news.ycombinator.com/item?id=12392081

wlll · on Aug 18, 2020

Basecamp used to, I have no idea if they still do. Databases were replicated, and also backed up remotely. User uploads were replicated in the storage system, and also backed up to S3.

dec0dedab0de · on Aug 18, 2020

I haven't looked at dropbox in a while, but they at least used to have an option for keeping revisions. It did come with an extra cost.

TheKarateKid · on Aug 18, 2020

Unfortunately Dropbox's revision history isn't reliable enough for a backup. If you notice, their marketing carefully avoids use of the word "backup" despite the features seemingly implying it.

I had a client where a glitch in the Dropbox sync caused 300k+ files to be deleted when we added a new PC. Dropbox support was unable to successfully undo all the file changes, and I had to get a ticket with CS escalated to a special team to get everything restored. Even when it was finished, they could not give a guarantee that every single file deleted was restored.

Maybe things got better since they've launched Dropbox Rewind, but given that revision history has been a feature for years and it still didn't work right, I no longer trust them as a backup.

jimmaswell · on Aug 19, 2020

I'm completely soured on dropbox since they dialed the greed up with the arbitrary device count limit and nagging suggestions to upgrade that can't be turned off if you're near the max. I don't care enough to stop using it but I will never give them money after this.