[LIVE CHAT] Overcoming Backup Challenges

Q5: What kind of backup policies exist for recovery? Ie, how long does data stay on your snaps? What kind of SLA exists to restore data from backups?

3 Votes Vote

[+] Show Hidden Comments

[-]

Chris Dwan

I write SLAs according to impact of the outage, both in terms of number of people and effect on the business.

4 Votes Vote

[-]

Chris Dwan

Once we get above a single lab or department, we're talking about operational availability and failover rather than "backup"

2 Votes Vote

[-]

Chris Dwan

We've got FDA rules that mandate seven year retention, and some clinical rules that specify "life of the patient." Coupled with exponential data growth, this means that we rarely delete anything. Ever.

3 Votes Vote

[-]

Andrew Miller

What's sad is that often there aren't SLA's - just "here's how we've always backed it up b/c can't get clarity/agreement/signoff from the business".

6 Votes Vote

[-]

jeff dinisco

in some cases I see SLA's driven from the wrong place, they're based on what the tech in place is capable of, not what the biz actually needs

5 Votes Vote

[-]

Stephen Pao

@fdmts - what about the immediate recovery via snapshots or other techniques when data gets accidentally deleted? Do you go back to the 7 year/life of the patient backups or is there something for shorter term?

1 Votes Vote

[-]

jeff dinisco

@andriven I see that as often as I see clear SLA's

3 Votes Vote

[-]

Andrew Miller

When talk policies to the business, it's often hard if your solution has limitations - i.e. only once per day, restores from tape, etc. You may not even want to start the conversation based on your current solution (sad thoughts I know).

4 Votes Vote

[-]

Nick Kirsch

Most common policy I see is: hourly, daily, weekly, and monthly snapshots - with daily replication of those snapshots offsite - and a monthly tape backup schedule. User-level restores via snaps, anything else IT-ticket driven.

2 Votes Vote

[-]

Chris Dwan

@andriven I sometimes say that we have "good and bad reasons" for retaining data. We retain some data because we signed a contract promising that we would. We retain the rest because we're not sure whether or not we signed such a contract.

5 Votes Vote

[-]

Andrew Miller

I do see most customers just doing daily backups b/c it's what they know and what the business is used to. In a perfect world, the focus starts with how often to backup, how long ot keep it, when to archive, when to replicate & match tech to that

2 Votes Vote

[-]

Andrew Miller

Retention is often driven by policy around legal holds (if don't have the data can't be asked to get it back). Otherwise it's Operational Recovery focused (i.e. 30-60 days) or regulation (1 year, 7 years).

2 Votes Vote

[-]

I am John White

Having a variety of customers we see all kinds of things. Most do daily backups held for 4 weeks, monthly clones held for 12 months, and annual clones held for 3 years. Very few test restores.

4 Votes Vote

[-]

Andrew Miller

There's also some interesting blurring here between using snapshots (i.e. non-duplication) and incremental forever+dedup in next gen backup solutions.

3 Votes Vote

[-]

Chris Dwan

@nkirsch Agreed. My default is dailies for a week, weeklies for a month, and monthlies until it deforms the storage budget.

5 Votes Vote

[-]

jameskobielus

If you never delete anything ever, I'm surprised you have any IT budget left for anything other than backup.

3 Votes Vote

[-]

jameskobielus

@andriven In those circumstances, your backup SLAs are effectively whatever your vendor baked into their solution.

2 Votes Vote

[-]

Nick Kirsch

@jameskobielus Luckily the cost of the bits continues to decline fast enough.

2 Votes Vote

[-]

Andrew Miller

@jameskobielus True...SLA's often are derived from the solution capabilities or limitations for better or worse.

2 Votes Vote

[-]

Chris Dwan

@andriven Mutability in the data is an important factor. Most of the bytes in science are immutable ... they're records of what came off the instrument, or derived analysis based on it. In that case, snapshots would be mostly useless.

3 Votes Vote

[-]

jameskobielus

@andriven That sounds like the backup policies are retained indefinitely through sheer business inertia, not in alignment with business-continuity imperatives, which may call for backup intervals at odds with existing policies.

2 Votes Vote

[-]

Andrew Miller

@jameskobielus Sometimes I hear "business inertia" and think of stories where people use that as a cop-out, other times it's good IT folks that can't get clear direction and just focus on other projects with business impact. :/

2 Votes Vote

[-]

Andrew Miller

Chris Dwan Can't believe I didn't talk sooner about mutability/immutability - have been discussing this a lot recently as it relates to ransomware and backups being a line of defense there.

2 Votes Vote

[-]

Chris Dwan

@jameskobielus @andriven inertia, or perhaps a de-factor set of priorities from the business. The risks around loss or disclosure of data are very clear. Unless you can make a similarly concise statement around the -benefit- of deletion, the risk wins.

3 Votes Vote

[-]

Andrew Miller

People are more motivated by risk than benefit especially with the politics of most organizations.

0 Votes Vote

John Furrier56

Q1: What kind of data do you work with?

2 Votes Vote

[+] Show Hidden Comments

[-]

Dave Vellante

All kindsa data...numbers, text, video, audio...big data, fast data, slow data, fat data, skinny data...

4 Votes Vote

[-]

Chris Dwan

My clients are mostly in the life sciences. I see genomic data, as well as a wealth of other scientific data types.

2 Votes Vote

[-]

Nick Kirsch

Primarily unstructured file data, with object growing in usage for new applications. Always a side of databases, machine images (although Docker is eliminating this), and other application-encoded formats.

2 Votes Vote

[-]

jeff dinisco

work with many customers from many industries, but it's largely unstructured and sometimes unwieldy when it comes to file count

2 Votes Vote

[-]

Chris Dwan

I'm increasingly being pulled into dealing with electronic health records and clinical trials data, which comes with all sorts of fun requirements.

4 Votes Vote

[-]

Nick Kirsch

@dinisco The fact that we still mention "file count" as something we have to think about. Argh! ;)

4 Votes Vote

[-]

Andrew Miller

Across the board but all things datacenter. Personally have dealt with everything from SQL backups to heavy VMware via VADP to even DB2 database dumps.

3 Votes Vote

[-]

Stephen Pao

We're primarily working with unstructured data - images, videos, instrument data, sensor data - typically stored today in enterprise NAS.

2 Votes Vote

[-]

Chris Dwan

@steve_pao I suppose I should admit that when I say "genomic and lab data," what I really mean is "massive piles of unstructured files."

3 Votes Vote

[-]

Andrew Miller

Along with datacenter still seeing heavy focus on Remote Office, Branch Office - ROBO. Classic remote sites where WAN still isn't reliable enough that a few data/apps need to live local but need protection via centralized backup.

2 Votes Vote

[-]

Stuart Miniman

data in apps, data in streams, multi-media, and do all of the in person interactions count too?

2 Votes Vote

[-]

Nick Kirsch

@fdmts What kind of scientific data is growing the fastest at the moment? What's next on the list?

2 Votes Vote

[-]

Chris Dwan

@nkirsch Cryo-electron microscopes scare the heck out of me in terms of raw data volumes. Easily terabytes per instrument run.

2 Votes Vote

[-]

Nick Kirsch

@stu Tracking in-person interactions through audio, video, and shared presence in VR/AR is going to be pretty awesome... =)

3 Votes Vote

[-]

Andrew Miller

That always raises the interesting question of data sets that get too large to backup - at what point do you have to drop back to replication vs. backup? Incremental forever helps there but huge datasets are a real challenge from a backup perspective.

3 Votes Vote

[-]

Nick Kirsch

@fdmts What are some of the new (or most burdensome) requirements you are seeing around EHR and clinical data?

2 Votes Vote

[-]

Chris Dwan

@nkirsch I was in a session just yesterday where the speaker referred to the "coffee break moment" when you type "ls" and go get a coffee.

2 Votes Vote

[-]

jameskobielus

I'm just a spreadsheet dude, professionally.

2 Votes Vote

[-]

Nick Kirsch

@andriven It seems to me that replication + synthetic incremental is the holy grail. Particularly with some "cloud" integration such that bits didn't traverse the same code paths in both places.

3 Votes Vote

[-]

Nick Kirsch

@fdmts Ctrl-\ and move on! ;)

0 Votes Vote

[-]

Andrew Miller

@nkirsch Really agree...although it's crazy that FedEx is still sometimes the highest bandwidth option out there (or we can be fancy and call it "seeding" the data).

3 Votes Vote

[-]

Christopher Jones

@andriven Even Amazon has gotten into the FedEx movement of data.

2 Votes Vote

[-]

Andrew Miller

@chjonesDNA Snowball!

0 Votes Vote

John Furrier53

Q2: What (tools/providers/strategies) are you currently using for backups?

3 Votes Vote

[+] Show Hidden Comments

[-]

Chris Dwan

I try to avoid "backups" if at all possible. While data protection, retention, and access are certainly critical - "backup" as a word isn't very useful in getting through to the details I would need to really help a client.

3 Votes Vote

[-]

jameskobielus

In terms of my office productivity backup requirements, I've come to rely on cloud storage: Apple iCloud, Google Cloud, etc.

1 Votes Vote

[-]

Nick Kirsch

I see primarily snapshots, remote replication (for DR), and traditional backup to tape (and then Iron Mountain) as the most common backup flow. That said, this is often highly dependent on and integrated with primary storage vendors.

5 Votes Vote

[-]

Andrew Miller

I've used a ton over the years - started with Backup Exec out of college (no one use wanted to be the "backup guy"). Went to Tivoli Storage Manager (consultant stood it up) and loved incremental forever. We ran out of consultant $$ so I rebuilt it

3 Votes Vote

[-]

jeff dinisco

trying to move away from the traditional approach that results in large catalogs, complex architectures, and high license costs

5 Votes Vote

[-]

Nick Kirsch

@jameskobielus Personally, I leverage as many highly redundant cloud services as possible (iCloud, Evernote, Dropbox, and GitHub, to name a few.)

3 Votes Vote

[-]

Chris Dwan

Data segmentation and tiering is critical, but getting the conversation to the point where people can make informed decisions between cold archives for regulatory purposes, vs. DR systems takes a -lot- of talking.

1 Votes Vote

[-]

Andrew Miller

@andriven Then a lot of time with source and target dedup platforms. In recent years, more focused on scale out architectures and true SLA policies so backup admins/engineers aren't just glorified job schedulers.

1 Votes Vote

[-]

Stephen Pao

@dinisco Seeing the same thing about the whole large catalog problems. Results in segmenting data into backup silos and actually having to worry about backing up your backup catalog. Some circularity there!

2 Votes Vote

[-]

Nick Kirsch

@fdmts When do we get to the point when these are just policy specifiers around the primary data? These buckets/directories need to be protected in this way, provide compliance in this way, etc.

1 Votes Vote

[-]

Chris Dwan

@steve_pao Absolutely. One of my customers is maintaining at least seven copies of the same information, in at least three formats.

4 Votes Vote

[-]

jeff dinisco

getting to a better ILM model can eliminate the same backup process backing up the same data over and over, it would be easier to manage than just throwing dedup at the problem

3 Votes Vote

[-]

Andrew Miller

Given who I work for now, I could just say that...but it's a huge market out there. It does seem like recently innovation in this area has accelerated which is cool to see.

1 Votes Vote

[-]

Stephen Pao

@andriven Yes, scale out architectures is definitely the trend we're seeing here. The backup infrastructure shouldn't be something you have to spend a lot of time architecting and managing...

3 Votes Vote

[-]

John Furrier

I hear all the time on @theCUBE that backup is broken esp as data is store a zillion places; forget about the edge which makes it harder

1 Votes Vote

[-]

Andrew Miller

If your backup is broken, don't just assume there's no new options. Key characteristics for remote sites IMHO are incremental forever, dedup/compression, scalability at the remote site, potentially cloud archive capabilities.

2 Votes Vote

[-]

John Furrier

My mind goes crazy when thinking about the data options when you add #IoT to mix

1 Votes Vote

[-]

Andrew Miller

Agree - prefer "data protection" as it's really a continuum around backup frequency (RPO), retention, archive policy (RTO - cloud, tape, etc.), replication (DR). Those items should be the focus IMHO.

2 Votes Vote

[-]

jameskobielus

It's not either-or in their minds, is it? Both storage requirements are often mandates that can't be compromised or traded off for each other (budget constraints notwithstanding)

1 Votes Vote

[-]

jameskobielus

"Business continuity" and "disaster recovery" have more urgency, in terms of use case justification, than "backup" where storage investments are concerned.

3 Votes Vote

[-]

jameskobielus

@andriven I wonder if "no one else wanted to be the backup guy" can be generalized to companies in general. How hard is it to recruit IT professionals and retain them into this thankless role.

2 Votes Vote

[-]

jameskobielus

When you add #IoT to the mix, everything (storage, backup, security, orchestration, analytics, etc.) goes a bit nutsy-cuckoo.

1 Votes Vote

[-]

Dave Vellante

AWS, Carbonite & icloud

1 Votes Vote

[-]

Andrew Miller

@steve_pao given data growth, I don't see how you can't do a scale out/web scale type architecture. Unfortunately for some incumbents, that requires some major work around file systems, distributed scheduled, distributed metadata/catalog, bottlenecks, etc.

0 Votes Vote

[-]

Stephen Pao

@andriven I think the key is how you build things like cataloging and jobs into the infrastructure itself.

0 Votes Vote

BetterBackup

Stream Ended

BetterBackup

Invite People to #BetterBackup

1. Select Contacts

2. Compose Message

3. Send

Invite to #BetterBackup

Stream Ended

Extend Time Prompt

How many minutes would you like to add?