Overcoming Backup Challenges
What are the biggest challenges you face with backup, and what are your strategies for solving them?
John Furrier
Q5: What kind of backup policies exist for recovery? Ie, how long does data stay on your snaps? What kind of SLA exists to restore data from backups?
Chris Dwan I write SLAs according to impact of the outage, both in terms of number of people and effect on the business.
Chris Dwan Once we get above a single lab or department, we're talking about operational availability and failover rather than "backup"
Chris Dwan We've got FDA rules that mandate seven year retention, and some clinical rules that specify "life of the patient." Coupled with exponential data growth, this means that we rarely delete anything. Ever.
Andrew Miller What's sad is that often there aren't SLA's - just "here's how we've always backed it up b/c can't get clarity/agreement/signoff from the business".
jeff dinisco in some cases I see SLA's driven from the wrong place, they're based on what the tech in place is capable of, not what the biz actually needs
Stephen Pao @fdmts - what about the immediate recovery via snapshots or other techniques when data gets accidentally deleted? Do you go back to the 7 year/life of the patient backups or is there something for shorter term?
jeff dinisco @andriven I see that as often as I see clear SLA's
Andrew Miller When talk policies to the business, it's often hard if your solution has limitations - i.e. only once per day, restores from tape, etc. You may not even want to start the conversation based on your current solution (sad thoughts I know).
Nick Kirsch Most common policy I see is: hourly, daily, weekly, and monthly snapshots - with daily replication of those snapshots offsite - and a monthly tape backup schedule. User-level restores via snaps, anything else IT-ticket driven.
Chris Dwan @andriven I sometimes say that we have "good and bad reasons" for retaining data. We retain some data because we signed a contract promising that we would. We retain the rest because we're not sure whether or not we signed such a contract.
Andrew Miller I do see most customers just doing daily backups b/c it's what they know and what the business is used to. In a perfect world, the focus starts with how often to backup, how long ot keep it, when to archive, when to replicate & match tech to that
Andrew Miller Retention is often driven by policy around legal holds (if don't have the data can't be asked to get it back). Otherwise it's Operational Recovery focused (i.e. 30-60 days) or regulation (1 year, 7 years).
I am John White Having a variety of customers we see all kinds of things. Most do daily backups held for 4 weeks, monthly clones held for 12 months, and annual clones held for 3 years. Very few test restores.
Andrew Miller There's also some interesting blurring here between using snapshots (i.e. non-duplication) and incremental forever+dedup in next gen backup solutions.
Chris Dwan @nkirsch Agreed. My default is dailies for a week, weeklies for a month, and monthlies until it deforms the storage budget.
jameskobielus If you never delete anything ever, I'm surprised you have any IT budget left for anything other than backup.
jameskobielus @andriven In those circumstances, your backup SLAs are effectively whatever your vendor baked into their solution.
Nick Kirsch @jameskobielus Luckily the cost of the bits continues to decline fast enough.
Andrew Miller @jameskobielus True...SLA's often are derived from the solution capabilities or limitations for better or worse.
Chris Dwan @andriven Mutability in the data is an important factor. Most of the bytes in science are immutable ... they're records of what came off the instrument, or derived analysis based on it. In that case, snapshots would be mostly useless.
jameskobielus @andriven That sounds like the backup policies are retained indefinitely through sheer business inertia, not in alignment with business-continuity imperatives, which may call for backup intervals at odds with existing policies.
Andrew Miller @jameskobielus Sometimes I hear "business inertia" and think of stories where people use that as a cop-out, other times it's good IT folks that can't get clear direction and just focus on other projects with business impact. :/
Andrew Miller Chris Dwan Can't believe I didn't talk sooner about mutability/immutability - have been discussing this a lot recently as it relates to ransomware and backups being a line of defense there.
Chris Dwan @jameskobielus @andriven inertia, or perhaps a de-factor set of priorities from the business. The risks around loss or disclosure of data are very clear. Unless you can make a similarly concise statement around the -benefit- of deletion, the risk wins.
Andrew Miller People are more motivated by risk than benefit especially with the politics of most organizations.
John Furrier
Q1: What kind of data do you work with?
Dave Vellante All kindsa data...numbers, text, video, audio...big data, fast data, slow data, fat data, skinny data...
Chris Dwan My clients are mostly in the life sciences. I see genomic data, as well as a wealth of other scientific data types.
Nick Kirsch Primarily unstructured file data, with object growing in usage for new applications. Always a side of databases, machine images (although Docker is eliminating this), and other application-encoded formats.
jeff dinisco work with many customers from many industries, but it's largely unstructured and sometimes unwieldy when it comes to file count
Chris Dwan I'm increasingly being pulled into dealing with electronic health records and clinical trials data, which comes with all sorts of fun requirements.
Nick Kirsch @dinisco The fact that we still mention "file count" as something we have to think about. Argh! ;)
Andrew Miller Across the board but all things datacenter. Personally have dealt with everything from SQL backups to heavy VMware via VADP to even DB2 database dumps.
Stephen Pao We're primarily working with unstructured data - images, videos, instrument data, sensor data - typically stored today in enterprise NAS.
Chris Dwan @steve_pao I suppose I should admit that when I say "genomic and lab data," what I really mean is "massive piles of unstructured files."
Andrew Miller Along with datacenter still seeing heavy focus on Remote Office, Branch Office - ROBO. Classic remote sites where WAN still isn't reliable enough that a few data/apps need to live local but need protection via centralized backup.
Stuart Miniman data in apps, data in streams, multi-media, and do all of the in person interactions count too?
Nick Kirsch @fdmts What kind of scientific data is growing the fastest at the moment? What's next on the list?
Chris Dwan @nkirsch Cryo-electron microscopes scare the heck out of me in terms of raw data volumes. Easily terabytes per instrument run.
Nick Kirsch @stu Tracking in-person interactions through audio, video, and shared presence in VR/AR is going to be pretty awesome... =)
Andrew Miller That always raises the interesting question of data sets that get too large to backup - at what point do you have to drop back to replication vs. backup? Incremental forever helps there but huge datasets are a real challenge from a backup perspective.
Nick Kirsch @fdmts What are some of the new (or most burdensome) requirements you are seeing around EHR and clinical data?
Chris Dwan @nkirsch I was in a session just yesterday where the speaker referred to the "coffee break moment" when you type "ls" and go get a coffee.
jameskobielus I'm just a spreadsheet dude, professionally.
Nick Kirsch @andriven It seems to me that replication + synthetic incremental is the holy grail. Particularly with some "cloud" integration such that bits didn't traverse the same code paths in both places.
Nick Kirsch @fdmts Ctrl-\ and move on! ;)
Andrew Miller @nkirsch Really agree...although it's crazy that FedEx is still sometimes the highest bandwidth option out there (or we can be fancy and call it "seeding" the data).
Christopher Jones @andriven Even Amazon has gotten into the FedEx movement of data.
John Furrier
Q2: What (tools/providers/strategies) are you currently using for backups?
Chris Dwan I try to avoid "backups" if at all possible. While data protection, retention, and access are certainly critical - "backup" as a word isn't very useful in getting through to the details I would need to really help a client.
jameskobielus In terms of my office productivity backup requirements, I've come to rely on cloud storage: Apple iCloud, Google Cloud, etc.
Nick Kirsch I see primarily snapshots, remote replication (for DR), and traditional backup to tape (and then Iron Mountain) as the most common backup flow. That said, this is often highly dependent on and integrated with primary storage vendors.
Andrew Miller I've used a ton over the years - started with Backup Exec out of college (no one use wanted to be the "backup guy"). Went to Tivoli Storage Manager (consultant stood it up) and loved incremental forever. We ran out of consultant $$ so I rebuilt it
jeff dinisco trying to move away from the traditional approach that results in large catalogs, complex architectures, and high license costs
Nick Kirsch @jameskobielus Personally, I leverage as many highly redundant cloud services as possible (iCloud, Evernote, Dropbox, and GitHub, to name a few.)
Chris Dwan Data segmentation and tiering is critical, but getting the conversation to the point where people can make informed decisions between cold archives for regulatory purposes, vs. DR systems takes a -lot- of talking.
Andrew Miller @andriven Then a lot of time with source and target dedup platforms. In recent years, more focused on scale out architectures and true SLA policies so backup admins/engineers aren't just glorified job schedulers.
Stephen Pao @dinisco Seeing the same thing about the whole large catalog problems. Results in segmenting data into backup silos and actually having to worry about backing up your backup catalog. Some circularity there!
Nick Kirsch @fdmts When do we get to the point when these are just policy specifiers around the primary data? These buckets/directories need to be protected in this way, provide compliance in this way, etc.
Chris Dwan @steve_pao Absolutely. One of my customers is maintaining at least seven copies of the same information, in at least three formats.
jeff dinisco getting to a better ILM model can eliminate the same backup process backing up the same data over and over, it would be easier to manage than just throwing dedup at the problem
Andrew Miller Given who I work for now, I could just say that...but it's a huge market out there. It does seem like recently innovation in this area has accelerated which is cool to see.
Stephen Pao @andriven Yes, scale out architectures is definitely the trend we're seeing here. The backup infrastructure shouldn't be something you have to spend a lot of time architecting and managing...
John Furrier I hear all the time on @theCUBE that backup is broken esp as data is store a zillion places; forget about the edge which makes it harder
Andrew Miller If your backup is broken, don't just assume there's no new options. Key characteristics for remote sites IMHO are incremental forever, dedup/compression, scalability at the remote site, potentially cloud archive capabilities.
John Furrier My mind goes crazy when thinking about the data options when you add #IoT to mix
Andrew Miller Agree - prefer "data protection" as it's really a continuum around backup frequency (RPO), retention, archive policy (RTO - cloud, tape, etc.), replication (DR). Those items should be the focus IMHO.
jameskobielus It's not either-or in their minds, is it? Both storage requirements are often mandates that can't be compromised or traded off for each other (budget constraints notwithstanding)
jameskobielus "Business continuity" and "disaster recovery" have more urgency, in terms of use case justification, than "backup" where storage investments are concerned.
jameskobielus @andriven I wonder if "no one else wanted to be the backup guy" can be generalized to companies in general. How hard is it to recruit IT professionals and retain them into this thankless role.
jameskobielus When you add #IoT to the mix, everything (storage, backup, security, orchestration, analytics, etc.) goes a bit nutsy-cuckoo.
Dave Vellante AWS, Carbonite & icloud
Andrew Miller @steve_pao given data growth, I don't see how you can't do a scale out/web scale type architecture. Unfortunately for some incumbents, that requires some major work around file systems, distributed scheduled, distributed metadata/catalog, bottlenecks, etc.
Stephen Pao @andriven I think the key is how you build things like cataloging and jobs into the infrastructure itself.