This sounds to me like the classic “replication is not backups” situation where (at best) all of the user files were stored in RAID array someplace and that is what the malware ate. If there had been actual backups and effective backups then it should have been trivial to restore non-corrupted files. It also sounds like someone made the decision not to backup the raw images because they were “big” - that is actually the one thing they should have backed up because all of the smaller files can be regenerated from the raw ones. I would not be surprised if all of this was running under someone’s desk.
> It also sounds like someone made the decision not to backup the raw images because they were “big” - that is actually the one thing they should have backed up because all of the smaller files can be regenerated from the raw ones.
Ironically my experience has been exactly the opposite. It's the demosaic-ed, fully developed copies of my photos that are larger and harder to preserve than the original RAWs. And these files can't be trivially regenerated from the RAWs, either. Let me explain.
The most important issue is that the process of taking a raw image and turning it into an edited RGB-pixel image is not obvious, at all. Tons of steps have to happen in this process, and there's currently no way to describe this process in a way that's compatible with a single open standard or even multiple pieces of software. At all. The steps can be broken down roughly into a series of "instructions", but what those instructions mean (e.g. to Lightroom) is entirely opaque and even secret in the case of closed source programs. Even open source programs, like RawTherapee and Darktable, use entirely different and incompatible approaches, algorithms, and instructions.
What this means on a practical level is that you can preserve your raw files and the instructions for replicating the edits as carefully as you like, but without the exact same piece of software (often even the same version of the program with the same settings and defaults set), your edits are as good as gone forever.
As a result I've had to take fairly drastic steps to make sure my photos are safe. For my older photos, I keep an entire VirtualBox image with Windows and Lightroom backed up along with my raw images, so that I can be sure of restoring exactly the same output files if necessary. (This more-or-less has to be a hacked version of Lightroom because you can't take chances with licensing problems preventing the program from running now that Lightroom is subscription based.) And actually for newer photos, I've moved away from Lightroom to RawTherapee. Even though I feel it's usually inferior, I feel safer since I know the pipeline from raw to burned edit is essentially public. I keep a backed up copy of the RawTherapee source code, but even if that somehow failed someone could make a RawTherapee compatible raw converter from scratch.
So that's why edited images are actually harder to preserve: whereas with raw you can just save them in 2-3 different physical locations and storage devices, to keep the edits you have to take several additional steps and there are more points of failure. Why not just keep the static edited images too? Well, I do that for my most essential photos. But that gets into the other issue, that the edited images are actually larger than the original raws.
If the point is preserving my edits, not just having a copy that's good enough for Facebook, the images have to be lossless. This pretty much means PNG or TIFF, and in fact the latter seems to be necessary since PNG doesn't handle metadata anywhere nearly as well. Unfortunately, while the compression used on raw files tends to be pretty good (which is further aided by the images being mosaiced), the compression algorithms compatible with TIFF are pretty terrible. Add that to the fact that you almost certainly want 16 bit images, in order to preserve as much raw detail as possible (in case you want high quality prints or need to do further editing), and you end up with whopping huge TIFFs. I regularly see my TIFF output files 4-5 times larger than the corresponding raws.
In short, managing backups for a high quality photography workflow is actually a good bit more difficult than it seems at first sight - and how it looks at first is not that easy either!
That might be the case. Admittedly I don't know much about this cloud platform, and the article doesn't fill in a lot of details, but I would naively have assumed the target audience for something like this would be pro and semi-pro users, who would want to store RAW files, since amateurs and those just looking to share photos would get more out of Flickr, or even Google Photos / Facebook / Twitter / etc.
> Even open source programs, like RawTherapee and Darktable, use entirely different and incompatible approaches, algorithms, and instructions.
`git checkout v3.0.2`
`./build.sh`
No idea what you're talking about. As long as you have your RAWs and your sidecar file you can trivially reproduce the picture, at least that's how it works in darktable.
Hell, the old code that is deprecated and "hidden" in the newest versions is actually all still there and if you import your stuff in the newest version it is practically guaranteed to look exactly the same. In fact there is a CI system that checks for delta-Es in each release.
I mean that RawTherapee and Darktable don't use the same code as each other at all. The instructions (usually sidecar files) the tell each program what to do to generate the output are completely incompatible.
> the old code that is deprecated and "hidden" in the newest versions is actually all still there and if you import your stuff in the newest version
I'd expect this to be the case with any good photo developer, but it doesn't pay to take chances. There have been significant bugs in the past with DNG files, for example, and different bugs could be introduced in the future, or bugs you didn't know about could be fixed, and so on. There's a lot of stuff that could go wrong. It's good to be able to just tar the source code.
> I mean that RawTherapee and Darktable don't use the same code as each other at all. The instructions (usually sidecar files) the tell each program what to do to generate the output are completely incompatible.
I'll be honest, this is probably the most ridiculous argument against anything I've ever heard on hacker news. This is like saying C/gcc sucks because Javascript exists. What the fuck?
> I'd expect this to be the case with any good photo developer, but it doesn't pay to take chances. There have been significant bugs in the past with DNG files, for example, and different bugs could be introduced in the future, or bugs you didn't know about could be fixed, and so on. There's a lot of stuff that could go wrong. It's good to be able to just tar the source code.
> I'll be honest, this is probably the most ridiculous argument against anything I've ever heard on hacker news. This is like saying C/gcc sucks because Javascript exists. What the fuck?
Well, sorry you feel that way. /s
The difference is that C and Javascript are widely implemented open standards. Lightroom, Darktable and RawTherapee are using three different opaque, undocumented approaches to developing raw images. Neither one has been reimplemented in other software, and it would be extraordinarily difficult to actually do that, because of all the specificity and quirks. You basically need the original software, which means making sure you can still compile it, making sure you have a platform it can run on, and so on. This is more complexity than most people ordinarily expect when they talk about "backing up photos", and that's exactly my point.
> So... it's a non issue? I don't even...
I should probably not even bother responding, since you already showed with your original comment that you didn't bother to even read my post, but I really don't get this. I explained a very specific issue: that there's a lot of complexity to developing raw photos, which means you have to take extra steps to make sure your edits are properly backed up. Open source software has a distinct advantage because you can archive a copy of the software yourself, but it doesn't change the fact that you do need the original software. And in fact you might need the same version, that's what the example of DNG conversion is supposed to illustrate.
This is ... the opposite of a non-issue. In fact it's a very specific issue that I took quite a bit of time to explain in detail. Having to back up your software, or possibly even an entire VM along with your photos to make sure your edits are preserved, goes way beyond what your average person, or even many photographers think they have to do to keep proper backups.
To add to this, I have a love/hate relationship with Adobe updating their RAW Engine version, as opening up old RAW images (with edits stored in XMP sidecar format) will look completely different due to the engine interpreting edits differently, or in extreme cases, when Adobe decides to add or remove features, which has happened to me numerous times during the last ~15 years since going from RawShooter directly to Lightroom. I usually delete exports for non-critical projects to save space, but any paid jobs are backed up multiple times because I've been bitten before by not being able to render the exact same TIFF for print as 5+ years ago.
Actually this highlights a different issue--for photo editing, there isn't one "right" way to do things, like for instance text editing in like Microsoft Word or some shit. darktable has a very different approach compared to Lightroom for instance.
> Neither one has been reimplemented in other software, and it would be extraordinarily difficult to actually do that, because of all the specificity and quirks.
Why is this a problem? That's like saying "I need a good GPU to run this game, this sucks!" Well, no shit I guess?
And it's not even close to that because you can always just buy/compile/compile the right version. And for darktable you don't even need that, backward compatibility is guaranteed. Also darktable has a rudimentary Lightroom to darktable conversion tool (never used it).
> This is more complexity than most people ordinarily expect when they talk about "backing up photos", and that's exactly my point.
Not sure about the other ones, but for darktable, it's as simple as keeping the XML alive.
> I explained a very specific issue: that there's a lot of complexity to developing raw photos, which means you have to take extra steps to make sure your edits are properly backed up. Open source software has a distinct advantage because you can archive a copy of the software yourself, but it doesn't change the fact that you do need the original software. And in fact you might need the same version, that's what the example of DNG conversion is supposed to illustrate.
If you don't like having a non-destructive copy, save a high quality JPEG with your edits. Also, I can't think of any other applications (not just photo editing) that you can easily have future-proof non destructive copies of whatever you need to save.
For example: for audio, you can always just save a WAV or FLAC of your work, but you're still relying on the fact that your DAW workflow is still gonna exist 10 years in the future if you try to save your project as opposed to a mastered copy.
Code is also similar. You can save a binary which will probably work, but if you want to save your repo, unless it has good package managers you might have issues later down the line, like that left-pad incident a while back.
Hell, even for something completely different: language drifts over time, and reading Shakespeare is kind of hard unless you know how to read Middle English. And we don't seem to have a problem with forward compatibility, it's just a bit of a pain in the ass. I read it in High School and I didn't go insane.
This isn't a photography issue, this isn't a software issue, this is a core part of human experience.
> This is ... the opposite of a non-issue. In fact it's a very specific issue that I took quite a bit of time to explain in detail. Having to back up your software, or possibly even an entire VM along with your photos to make sure your edits are preserved, goes way beyond what your average person, or even many photographers think they have to do to keep proper backups.
You don't need a VM, at least for darktable. You're making an issue that hasn't appeared but is potentially possible into a real issue for yourself. Maybe it will become an issue, but just figuring out how to install the old version is gonna be good enough. You do you mane. I'll stick with running dt natively and trusting the forward compatibility and OpenCL acceleration.
For storing lossless rgb you might consider FLIF given you've demonstrated some flexibility in your setup, https://flif.info it supports 16-bit channels
Thanks for the suggestion! I tested it on one file. Support for the format might turn out to be a problem: to use the flif command line tool, I had to convert my input file from TIFF to PNG first, which means I'd have to be extremely careful not to lose metadata if I started doing this for real. It was also pretty slow.
File sizes (DNG lossless, other formats 16 bits):
* DNG: 18.5 MiB
* TIFF: 84.2 MiB
* FLIF: 55.1 MiB
So while FLIF is unsurprisingly much better than TIFF, it's still ~3 times larger than the DNG, which means that a pretty noticeable amount of additional disk space would have to be sacrificed to store the final edited versions of all my photos.
I'm not sold. In the cases where you lose the raw vs a post processed image, one of those is significantly more lost information. It may be non trivial to reproduce a post process image from a raw image, but the other way around is usually impossible.
In one case you've lost original sensor information, in the other case you've lost some parameters and possibly the algorithm used. The former is infinitely more difficult to synthesize from scratch.
I think that's a really shallow way to look at it. For a professional photographer it's the final version of the file that is actually sold, not the RAW or film negative. These final files are often not some 300x200 web preview. These are full sized images. You can argue the final edit has more value and is the one to preserve. Ansel Adams broke the process down into 3 parts with the initial capture being but the first. So losing the final edits is losing 66% of the total work.
I'm only a hobbyist photog and the bday parties and weddings I've shot result in thousands of images. I can't imagine having to shoot multiple events every week for years. For a pro, the edits represent thousands of hours of work. That the work can be redone theoretically might mean very little to a working photographer.
> For a pro, the edits represent thousands of hours of work. That the work can be redone theoretically might mean very little to a working photographer.
This is exactly the issue I had in mind, well said.
Some images require hours of painstaking editing. There are some photographers who will spend days or more editing an image. The edited image is as much an instantaneous snapshot of the photographer as it is the scene in front of the lens. Looking at my Darktable catalog, it is easy to estimate that I spend at least 250 hr/yr editing images.
Raw files, and sidecars, retain their value because sometimes one wishes to revisit an image/edit. If I had to pick one or the other, I might prefer to lose the edited images. That is because I believe I have more/better days of my editing life ahead of me and because I underestimate how much editing I have done. Moreover, there are plenty of RAW images awaiting the day they come to life.
In commercial work, like weddings, the final product is the edit. If there is one thing that must be retained, that is it.
So far, I have found that Darktable's edits have survived changes in versions. Someday there will probably be breaking changes, but I haven't yet felt it personally.
every single "dead" code is still in marketable and isn't accessible unless you try to import a sidecar from a previous version. LR might be shitty but the fact that the guy says darktable has this problem is just silly.
This, like reproducibility in machine learning workflows, is one of those things that needs to be baked in at the start or will become a complete nightmare.
This was a fascinating, but also disturbing, read; it does sound like your solution (quasi legal as it might be) is the one that makes the most sense if you do this professionally
The original story spoke of the rumours of malware involvement, but in the update which forms the first half of the linked article, Canon says there was no malware involved.
(“Replication isn’t backup” still applies, of course.)
Do any cloud providers create backups on top of replication though?
Backing up databases (terabytes) is feasible, they're not that big.
But an entire cloud storage for photo and video for millions, we're talking maybe exabytes. The notion of making "separate" backups seems cost-prohibitive.
I am curious, though -- for services like Dropbox or Google Drive, how many replicas are there of your files? I know there must be redundancy in case a disk fails, but do they keep 2 instances of your data or more? And are the instances spread out geographically, some or all?
The short version is that the backups are a mix of short-term local backups + backups across the network on distributed filesystems + offline tape backup.
I can't speak to Drive, but the tape backup is certainly used for GMail. (There's a case study about Gmail having to restore from backup in the above link.)
"Do any cloud providers create backups on top of replication though?"
We do.[1]
For exactly 1.75x our normal pricing, we will replicate your entire account, nightly, to a geo-redundant site which is not open for normal customer use (and, therefore, has a lower risk profile). This GR site is the he.net core datacenter, in Fremont.
It's also worth pointing out that replication to rsync.net buys you malware/ransomware protection since your account is snapshotted, by ZFS, nightly, and those snapshots are immutable (read-only).
I've been wondering about ransomware protection through snapshots. Presumably (and I do know more or less nothing about it) the malware aspect of it is present on the system significantly before the ransomware aspect is triggered - so restoring to yesterday's backup just puts you back in the position to get pwn3d again? How do companies get around this?
I don't see the size as the big problem. If one cloud can handle the size, the backup storage system can too. For me, the real issue is timing. How do you backup a cloud full of constantly changing data? Do you draw a line in the sand, an image of the cloud state at a particular moment? Is that even possible? You have to do backups of smaller chunks, individual accounts, but eventually that just looks like another software-managed internal array structure rather than a true duplicate. Your backup system is just as susceptible to deletion error as the cloud it lives within.
"For me, the real issue is timing. How do you backup a cloud full of constantly changing data? Do you draw a line in the sand, an image of the cloud state at a particular moment?"
I can only describe what we do, and, of course, there is an enormous scale difference here, but ...
Every single rsync.net account is it's own ZFS filesystem which means every single account gets snapshotted[1] nightly on a schedule. This means that the enormous operation of "backing up" all of rsync.net happens in small, manageable, chunks.
Of course, the ZFS snapshots of a customer account are not "backups" per se, but if a customer choose "geo-redundant" storage in another facility, it is those very same snapshots that we (zfs) send over. Those are, indeed, backups.[2]
The most interesting part, in my opinion, is that the daily/weekly/monthly snapshots are immutable. So you can publish your rsync.net credentials or suffer a disgruntled employee or ransomware attack, etc., and those snapshots remain safe - they are read-only.
So it does not assert that the filesystem is in a consistent state before it snapshots. IE we could be midway through applying a database transaction; or there could be a reference in one file to another that doesn't exist (yet).
It could be argued that one needs to do a site wide "write cache flush", stop, and snapshot. Not that I think for one second the service provider should (or could) be in a position to detect when a good time to snapshot might be....
It can't be "just" as susceptible as what happened here.
If you even do the straightforward "don't freeze time anywhere, just rolling copy all the files onto a hard medium", you may end up with logically inconsistent data, in that files from later in the backup may actually not have existed alongside ones from earlier ones, but if you write down the timestamps periodically, you end up with a way better end state than "I lost all the files".
At worst, you end up with "I may have lost the last 3 minutes worth of files from the last 3 minutes of the last backup I did".
> Do any cloud providers create backups on top of replication though?
Yes - I work at one of the FAANGs for a team that's doing precisely this. We develops an internal disaster recovery tool that creates backups of data files that can't be touched by the creating application, and that can be read back in a disaster event to recover the data.
Just using AWS S3 as an example, you can copy a bucket to another bucket (preferably in a different region) with a policy of retaining all versions- preventing the problem of a delete whether malicious or accidental, from being reflected in the backup copy.
There is also the option of using Glacier for lower cost long term storage.
If you are asking does regular S3 provide a true backup capability out of the box, it does not, or at least did not last time I actively looked at this about 2 years ago.
We don't think AWS does periodic offline backups of stuff stored in S3 so that they don't find themselves in this exact embarrassing scenario? Regardless of whether it's user-facing for AWS users, I'd hope they do, or certainly that they did before they got a good sense of the long-term reliability of S3 as a big system.
I think they do all sorts of backup stuff - and as a consumer of S3 you need to evaluate how resilient their backing up is. S3 is pretty murky about permanence guarantees so I'd always look at making sure there is a separately maintained replication script to some medium I have control over if that data is irreplaceable and the costs of losing it are significant to the business.
Judging those costs and making that call is a complex matter of course.
What would such a service then do as an end-user beyond what `aws s3 sync s3://my/bucket /mnt/my/backup/drive/` does?
(Honest question, not provocation. To me either you trust AWS or any party to never lose your data [unwise], or you basically ask them to offer a way to rsync, which AWS does)
I think that's essentially what such a service would do. You might throw in some periodic less frequent syncs (maybe you sync down to the main backup every day and sync that backup to a secondary backup weekly or monthly) and maybe some of those successive syncs are done to hosts that are usually disconnected from the network to add in a firebreak.
Well, versioned bucket replication is essentially backup when you configure it in a way that the writer to the first bucket can't do any operations on the second bucket. Since bucket replication doesn't replicate deletion markers you essentially end up with your data being duplicated instead of just replicated writes.
A thing stored in google cloud is generally erasure coded, which provides redundancy against the failure of individual devices or hosts, and another copy is stored separately in a geographically separate place. So you might think of it as there being 1.7 copies of your file in each of at least two places.
> Do any cloud providers create backups on top of replication though?
Any serious database does offer backups in addition to replication, and that kind of validates OP's point. As for object stores there are a variety of techniques [0].
> But an entire cloud storage for photo and video for millions, we're talking maybe exabytes. The notion of making "separate" backups seems cost-prohibitive.
Cold-stores like the one Facebook built would be apt [1].
> I know there must be redundancy in case a disk fails, but do they keep 2 instances of your data or more? And are the instances spread out geographically, some or all?
See [0]. Backblaze do share quite a bit too about the space, too [2][3]. With minio, one could run their S3-like system on-premise [4].
Basecamp used to, I have no idea if they still do. Databases were replicated, and also backed up remotely. User uploads were replicated in the storage system, and also backed up to S3.
Unfortunately Dropbox's revision history isn't reliable enough for a backup. If you notice, their marketing carefully avoids use of the word "backup" despite the features seemingly implying it.
I had a client where a glitch in the Dropbox sync caused 300k+ files to be deleted when we added a new PC. Dropbox support was unable to successfully undo all the file changes, and I had to get a ticket with CS escalated to a special team to get everything restored. Even when it was finished, they could not give a guarantee that every single file deleted was restored.
Maybe things got better since they've launched Dropbox Rewind, but given that revision history has been a feature for years and it still didn't work right, I no longer trust them as a backup.
I'm completely soured on dropbox since they dialed the greed up with the arbitrary device count limit and nagging suggestions to upgrade that can't be turned off if you're near the max. I don't care enough to stop using it but I will never give them money after this.
Ideally, a "delete" should mark images as unavailable and queue them for deletion at a later date (e.g. in 30 days). This provides protection against accidental deletion by users, accidental account deletions/deactivations, paid accounts terminated due to lack of payment, automated software mistakes (such as this), and so on.
The last company I worked at, 80% of our database was inactive, useless, or redundant data, all kept to protect against all kinds of issues (such as reused usernames, subpoenas, annual reporting, etc). We could always zero out PII from an account, but we never removed a record we didn't have to. It led to a lot more RAID migrations than I would have liked to have done, but it certainly made the application easier to manage. Plus, we never worried about fragmentation or holes in our data files, cascading deletes, etc.
There was an event when a startup I was at asked Basho (they were the company behind Riak db) about backing up our data. Backing was a little side-feature that was possible to rig up, but I recall they looked at this inquiry as if I had two heads -- as if to say, it's replicated, why jump the shark? There was a bug with one of the Riak releases, and all the data was lost. (When we scaled up with this buggy Riak release, the empty node assumed master roll, and all the child nodes went, ah... the new state has no data, let's all delete records 0..k. Fun times.)
What’s the difference between replication and backups? Is the distinction that backups must be stored on separate infrastructure, whereas replication might still be 1 or 2 points of failure?
Replication is about having data level redundancy to protect from drive failure. Backups are about having point in time snapshots of the system state, and about having them tiered from a location perspective. The 3-2-1[1] principle says to have 3 total copies, 2 of which are local but on different devices, and 1 of which is offsite. This gives you tiers of recoverability.
It’s important from a backup perspective that it’s point in time as well, otherwise as soon as you get ransomware that encrypts your file you now have replicated those changes everywhere.
Where I worked, weekly point in time backups were required. Those backups were put onto tape drives, those tape drives were set on a pallet and then driven by truck to an offline second location. IMO _that’s_ how it’s done properly.
When we used Iron Mountain at my last job, there were two interesting additional concerns:
1) That nobody in our company actually knew where they were physically stored (insider risk?), but
2) That we had assurance that the physical storage was far enough away that a physical disaster in our area wouldn't also touch where the offline storage was located.
There's a concomitant jump in RTO if that's what you do this, but hopefully that's well understood among the stakeholders.
Definitely a commitment jump, and not always necessary. We ran our own data centers, off the topic of backups, and it was wild to me that they were so disaster proof. There were different power lines from different power companies coming in in case either of the power companies had an outage, and some giant diesel power generator that could last a long time while fully powering the data center.
That’s a lot harder to pull off. What methods do you use to accomplish this?
Depending on how much data you're backing up, sneakernet works.
When I was still in the office, my company had me rotate a set of backup hard drives between the office lockup and a strongbox in my house. The notion was that it was unlikely that both the office building and my house 12 miles away would both burn down at the same time.
Of course, now I'm working from home, so all of the eggs are in one basket again.
External USB drives for me. One set of drives is stored in a fire safe at home, one set of drives is stored at the office, and one set is stored at a relative's house in another state. The ones at home get refreshed most often, the ones stored at the office get refreshed about once a quarter, and the one at the relative's house get refreshed during holidays and family get-togethers.
The drives are encrypted (Truecrypt) since they will be outside my physical control. The ones at the office I am prepared to abandon should I get fired/laid-off.
AWS S3 (and a few compliant providers) offer immutable options, both in governance/compliance mode.
Allegedly, compliance mode is unalterable by any account period, I guess the equivalent of the immutable attribute without an overriding account. I'm not immediately familiar with any literature on attacks on this feature, but I've also not searched hard; however, I know from my clients that it's an accepted form of WORM, and that cloud storages like S3 are considered in the same vein as tape when immutability is in play.
I suppose it will be a case in the future that proves the efficacy of AWS/Azure/s3 providers, but for now, a lot of regulatory policies for 3-2-1 allow for such storage to fulfill the "2" part of the 3-2-1 rule.
I’ve always wondered if Amazon backs up S3. I don’t think they explicitly say but I get the impression that it is the user’s responsibility to replicate to a second region to guard against data loss so I am guessing not. Object Lock wouldn’t protect against an S3 failure.
Though, tread carefully there. If the thing that makes it append only is that you're just not using its destructive update features, then the store isn't really append only. Just because everyone agreed to limit the ways they interact with the data store doesn't mean you can trust buggy code, sloppy programming, and attackers to honor that agreement.
I see a bunch of explanations, but I don't think any of them really drove home the reason for why you usually need both.
Imagine if you had a document that stored useful information.
You had this document automatically replicated to another system in a different time zone every time there was a change.
You think you're doing great, if there is an outage in the one system you just get your important document from the other system.
Then one day someone accidentally copies and pastes the wrong data into the file.
Now what do you do? If you goto your replicated copy it also has the bad data.
The answer is you should have had backups too. so you could go back an hour,a day or a week or even much longer. Depending on how much data you have, how often it changes, how important it is, and how quickly someone would notice bad data.
I think the biggest factor is timing (although backups should also be on a second system).
In short, if anything that happens to the files is immediately copied to the "backup" then you don't actually have a chance to recover from any software problems. Whereas, if you make a copy of the data every night and keep the last 30 copies you can find an issue like this and go back in time to retrieve the files from before it started.
Replication is intended to create a 1:1 copy of the data. If data is removed from the primary system, replication will ensure it's removed from the replica.
A backup can be a 1:1 copy but it should be set up such that something going wrong with the primary (e.g. cryptolocker malware), can't affect it (since "something went wrong with the primary" is what the backup is intended to resolve).
To do that you could take the backup offline or use features like filesystem snapshots to ensure that changes can be rolled back.
The problem is that most replication setups will replicate deletes (and other modifications) as well. So if misbehaving software deletes (or corrupts) things, those deletes (or corruptions) will be replicated, and the replication does not give you a way to actually recover from a mistaken delete.
Backups protect against a wider class of problems. For example, a software vulnerability/bug or a human error could result in the deletion of an object in all replicas (because deletions are replicated too), but you'd usually need another incident to occur for you to lose the backup as well.
If your drive fails (or a single computer fails) replication saves your data, and keeps things running seamlessly.
If you `sudo rm -rf / --no-preserve-root` your drive, replication deletes everything, while backups let you restore to the last time you took a backup.
Classically, backups are taken then stored offline (e.g. on archive tape, on optical media, or on drives that are disconnected and put in storage between backups). Otherwise, if they are not 'cold' backups, they are still disconnected so if the main storage blows up, it won't impact the data in the backups.
Replication usually involves storage that is powered up and connected somehow to the same systems as the main storage; and any data that's corrupted on the main storage would propagate to the replicated storage.