BetterBackup

Overcoming Backup Challenges
What are the biggest challenges you face with backup, and what are your strategies for solving them?
John Furrier75
Q5: What kind of backup policies exist for recovery? Ie, how long does data stay on your snaps? What kind of SLA exists to restore data from backups?
3 Votes Vote
Chris Dwan I write SLAs according to impact of the outage, both in terms of number of people and effect on the business.
4 Votes Vote
Chris Dwan Once we get above a single lab or department, we're talking about operational availability and failover rather than "backup"
2 Votes Vote
Chris Dwan We've got FDA rules that mandate seven year retention, and some clinical rules that specify "life of the patient." Coupled with exponential data growth, this means that we rarely delete anything. Ever.
3 Votes Vote
Andrew Miller What's sad is that often there aren't SLA's - just "here's how we've always backed it up b/c can't get clarity/agreement/signoff from the business".
6 Votes Vote
jeff dinisco in some cases I see SLA's driven from the wrong place, they're based on what the tech in place is capable of, not what the biz actually needs
5 Votes Vote
Stephen Pao @fdmts - what about the immediate recovery via snapshots or other techniques when data gets accidentally deleted? Do you go back to the 7 year/life of the patient backups or is there something for shorter term?
1 Votes Vote
jeff dinisco @andriven I see that as often as I see clear SLA's
3 Votes Vote
Andrew Miller When talk policies to the business, it's often hard if your solution has limitations - i.e. only once per day, restores from tape, etc. You may not even want to start the conversation based on your current solution (sad thoughts I know).
4 Votes Vote
Nick Kirsch Most common policy I see is: hourly, daily, weekly, and monthly snapshots - with daily replication of those snapshots offsite - and a monthly tape backup schedule. User-level restores via snaps, anything else IT-ticket driven.
2 Votes Vote
Chris Dwan @andriven I sometimes say that we have "good and bad reasons" for retaining data. We retain some data because we signed a contract promising that we would. We retain the rest because we're not sure whether or not we signed such a contract.
5 Votes Vote
Andrew Miller I do see most customers just doing daily backups b/c it's what they know and what the business is used to. In a perfect world, the focus starts with how often to backup, how long ot keep it, when to archive, when to replicate & match tech to that
2 Votes Vote
Andrew Miller Retention is often driven by policy around legal holds (if don't have the data can't be asked to get it back). Otherwise it's Operational Recovery focused (i.e. 30-60 days) or regulation (1 year, 7 years).
2 Votes Vote
I am John White Having a variety of customers we see all kinds of things. Most do daily backups held for 4 weeks, monthly clones held for 12 months, and annual clones held for 3 years. Very few test restores.
4 Votes Vote
Andrew Miller There's also some interesting blurring here between using snapshots (i.e. non-duplication) and incremental forever+dedup in next gen backup solutions.
3 Votes Vote
Chris Dwan @nkirsch Agreed. My default is dailies for a week, weeklies for a month, and monthlies until it deforms the storage budget.
5 Votes Vote
jameskobielus If you never delete anything ever, I'm surprised you have any IT budget left for anything other than backup.
3 Votes Vote
jameskobielus @andriven In those circumstances, your backup SLAs are effectively whatever your vendor baked into their solution.
2 Votes Vote
Nick Kirsch @jameskobielus Luckily the cost of the bits continues to decline fast enough.
2 Votes Vote
Andrew Miller @jameskobielus True...SLA's often are derived from the solution capabilities or limitations for better or worse.
2 Votes Vote
Chris Dwan @andriven Mutability in the data is an important factor. Most of the bytes in science are immutable ... they're records of what came off the instrument, or derived analysis based on it. In that case, snapshots would be mostly useless.
3 Votes Vote
jameskobielus @andriven That sounds like the backup policies are retained indefinitely through sheer business inertia, not in alignment with business-continuity imperatives, which may call for backup intervals at odds with existing policies.
2 Votes Vote
Andrew Miller @jameskobielus Sometimes I hear "business inertia" and think of stories where people use that as a cop-out, other times it's good IT folks that can't get clear direction and just focus on other projects with business impact. :/
2 Votes Vote
Andrew Miller Chris Dwan Can't believe I didn't talk sooner about mutability/immutability - have been discussing this a lot recently as it relates to ransomware and backups being a line of defense there.
2 Votes Vote
Chris Dwan @jameskobielus @andriven inertia, or perhaps a de-factor set of priorities from the business. The risks around loss or disclosure of data are very clear. Unless you can make a similarly concise statement around the -benefit- of deletion, the risk wins.
3 Votes Vote
Andrew Miller People are more motivated by risk than benefit especially with the politics of most organizations.
0 Votes Vote
John Furrier56
Q1: What kind of data do you work with?
2 Votes Vote
Dave Vellante All kindsa data...numbers, text, video, audio...big data, fast data, slow data, fat data, skinny data...
4 Votes Vote
Chris Dwan My clients are mostly in the life sciences. I see genomic data, as well as a wealth of other scientific data types.
2 Votes Vote
Nick Kirsch Primarily unstructured file data, with object growing in usage for new applications. Always a side of databases, machine images (although Docker is eliminating this), and other application-encoded formats.
2 Votes Vote
jeff dinisco work with many customers from many industries, but it's largely unstructured and sometimes unwieldy when it comes to file count
2 Votes Vote
Chris Dwan I'm increasingly being pulled into dealing with electronic health records and clinical trials data, which comes with all sorts of fun requirements.
4 Votes Vote
Nick Kirsch @dinisco The fact that we still mention "file count" as something we have to think about. Argh! ;)
4 Votes Vote
Andrew Miller Across the board but all things datacenter. Personally have dealt with everything from SQL backups to heavy VMware via VADP to even DB2 database dumps.
3 Votes Vote
Stephen Pao We're primarily working with unstructured data - images, videos, instrument data, sensor data - typically stored today in enterprise NAS.
2 Votes Vote
Chris Dwan @steve_pao I suppose I should admit that when I say "genomic and lab data," what I really mean is "massive piles of unstructured files."
3 Votes Vote
Andrew Miller Along with datacenter still seeing heavy focus on Remote Office, Branch Office - ROBO. Classic remote sites where WAN still isn't reliable enough that a few data/apps need to live local but need protection via centralized backup.
2 Votes Vote
Stuart Miniman data in apps, data in streams, multi-media, and do all of the in person interactions count too?
2 Votes Vote
Nick Kirsch @fdmts What kind of scientific data is growing the fastest at the moment? What's next on the list?
2 Votes Vote
Chris Dwan @nkirsch Cryo-electron microscopes scare the heck out of me in terms of raw data volumes. Easily terabytes per instrument run.
2 Votes Vote
Nick Kirsch @stu Tracking in-person interactions through audio, video, and shared presence in VR/AR is going to be pretty awesome... =)
3 Votes Vote
Andrew Miller That always raises the interesting question of data sets that get too large to backup - at what point do you have to drop back to replication vs. backup? Incremental forever helps there but huge datasets are a real challenge from a backup perspective.
3 Votes Vote
Nick Kirsch @fdmts What are some of the new (or most burdensome) requirements you are seeing around EHR and clinical data?
2 Votes Vote
Chris Dwan @nkirsch I was in a session just yesterday where the speaker referred to the "coffee break moment" when you type "ls" and go get a coffee.
2 Votes Vote
jameskobielus I'm just a spreadsheet dude, professionally.
2 Votes Vote
Nick Kirsch @andriven It seems to me that replication + synthetic incremental is the holy grail. Particularly with some "cloud" integration such that bits didn't traverse the same code paths in both places.
3 Votes Vote
Nick Kirsch @fdmts Ctrl-\ and move on! ;)
0 Votes Vote
Andrew Miller @nkirsch Really agree...although it's crazy that FedEx is still sometimes the highest bandwidth option out there (or we can be fancy and call it "seeding" the data).
3 Votes Vote
Christopher Jones @andriven Even Amazon has gotten into the FedEx movement of data.
2 Votes Vote
John Furrier53
Q2: What (tools/providers/strategies) are you currently using for backups?
3 Votes Vote
Chris Dwan I try to avoid "backups" if at all possible. While data protection, retention, and access are certainly critical - "backup" as a word isn't very useful in getting through to the details I would need to really help a client.
3 Votes Vote
jameskobielus In terms of my office productivity backup requirements, I've come to rely on cloud storage: Apple iCloud, Google Cloud, etc.
1 Votes Vote
Nick Kirsch I see primarily snapshots, remote replication (for DR), and traditional backup to tape (and then Iron Mountain) as the most common backup flow. That said, this is often highly dependent on and integrated with primary storage vendors.
5 Votes Vote
Andrew Miller I've used a ton over the years - started with Backup Exec out of college (no one use wanted to be the "backup guy"). Went to Tivoli Storage Manager (consultant stood it up) and loved incremental forever. We ran out of consultant $$ so I rebuilt it
3 Votes Vote
jeff dinisco trying to move away from the traditional approach that results in large catalogs, complex architectures, and high license costs
5 Votes Vote
Nick Kirsch @jameskobielus Personally, I leverage as many highly redundant cloud services as possible (iCloud, Evernote, Dropbox, and GitHub, to name a few.)
3 Votes Vote
Chris Dwan Data segmentation and tiering is critical, but getting the conversation to the point where people can make informed decisions between cold archives for regulatory purposes, vs. DR systems takes a -lot- of talking.
1 Votes Vote
Andrew Miller @andriven Then a lot of time with source and target dedup platforms. In recent years, more focused on scale out architectures and true SLA policies so backup admins/engineers aren't just glorified job schedulers.
1 Votes Vote
Stephen Pao @dinisco Seeing the same thing about the whole large catalog problems. Results in segmenting data into backup silos and actually having to worry about backing up your backup catalog. Some circularity there!
2 Votes Vote
Nick Kirsch @fdmts When do we get to the point when these are just policy specifiers around the primary data? These buckets/directories need to be protected in this way, provide compliance in this way, etc.
1 Votes Vote
Chris Dwan @steve_pao Absolutely. One of my customers is maintaining at least seven copies of the same information, in at least three formats.
4 Votes Vote
jeff dinisco getting to a better ILM model can eliminate the same backup process backing up the same data over and over, it would be easier to manage than just throwing dedup at the problem
3 Votes Vote
Andrew Miller Given who I work for now, I could just say that...but it's a huge market out there. It does seem like recently innovation in this area has accelerated which is cool to see.
1 Votes Vote
Stephen Pao @andriven Yes, scale out architectures is definitely the trend we're seeing here. The backup infrastructure shouldn't be something you have to spend a lot of time architecting and managing...
3 Votes Vote
John Furrier I hear all the time on @theCUBE that backup is broken esp as data is store a zillion places; forget about the edge which makes it harder
1 Votes Vote
Andrew Miller If your backup is broken, don't just assume there's no new options. Key characteristics for remote sites IMHO are incremental forever, dedup/compression, scalability at the remote site, potentially cloud archive capabilities.
2 Votes Vote
John Furrier My mind goes crazy when thinking about the data options when you add #IoT to mix
1 Votes Vote
Andrew Miller Agree - prefer "data protection" as it's really a continuum around backup frequency (RPO), retention, archive policy (RTO - cloud, tape, etc.), replication (DR). Those items should be the focus IMHO.
2 Votes Vote
jameskobielus It's not either-or in their minds, is it? Both storage requirements are often mandates that can't be compromised or traded off for each other (budget constraints notwithstanding)
1 Votes Vote
jameskobielus "Business continuity" and "disaster recovery" have more urgency, in terms of use case justification, than "backup" where storage investments are concerned.
3 Votes Vote
jameskobielus @andriven I wonder if "no one else wanted to be the backup guy" can be generalized to companies in general. How hard is it to recruit IT professionals and retain them into this thankless role.
2 Votes Vote
jameskobielus When you add #IoT to the mix, everything (storage, backup, security, orchestration, analytics, etc.) goes a bit nutsy-cuckoo.
1 Votes Vote
Dave Vellante AWS, Carbonite & icloud
1 Votes Vote
Andrew Miller @steve_pao given data growth, I don't see how you can't do a scale out/web scale type architecture. Unfortunately for some incumbents, that requires some major work around file systems, distributed scheduled, distributed metadata/catalog, bottlenecks, etc.
0 Votes Vote
Stephen Pao @andriven I think the key is how you build things like cataloging and jobs into the infrastructure itself.
0 Votes Vote