ThirdWave

Next Wave Data Management
Learn how to gain visibility, insight & control over data by automating workflows in a closed loop.
   9 years ago
#ThirdWaveNext Wave Data ManagementLearn how to gain visibility, insight & control over data on premise as well as in the hybrid cloud
Storage Alchemist
Q5. What does it mean to ‘tackle’ the ‘copy data’ deluge? How do you manage it? How many people does it take?
Peter Eicher
Start by making better use of primary storage. New arrays are much more efficient. You can actually use snaps now without killing performance.
Dave Vellante
it starts with gaining visibility on the copies at your shop
John Furrier
first step is to review the process and practices in place
Storage Alchemist
@Peter_Eicher would you leverage your copies on the primary storage box? wont that kill performance
storageswiss
Not the Jacksonville Jaguar defense. They can't stop anything. Not points and probably not data deluge
Sathya Sankaran
You cant manage what you cant measure. First step is to observe and measure the problem!
Peter Eicher
Some vendors like @HDS now let you take snaps off clones. The base clone is 100% copy, but the snaps are now on separate spindles so no production hit. Good idea, but not cheap.
Peter Eicher
@storageswiss Lol! Somebody had a bad fantasy football week!
storageswiss
I think step number one is a full copy on a secondary (less expensive) device. Then trigger snapshots and what have you from there.
ttessks
@ I wonder how many IT centers really know how many copies they even have or the size of their problem. Seems like we would need to start there.
Peter Eicher
@storageswiss Yes, snap-and-replicate is a great solution. Shift the load to an alternate array. NetApp banging that drum for years.
storageswiss
@ttessks In my experience IT guys know it is a problem. You are correct that they may not know the scope of the problem, but they will admit is is a problem. And are looking for a solution
Ira Goodman
@storageswiss You are so right. The IT guys are always the ones who run around trying to make sure that this problem is watched after and need help to minimize the problem that they live with day by day.
Frank Weitz
@storageswiss from my experience in switzerland: the IT Guys really know....but it´s almost the same: they they need the approval to buy a solution
Dave Vellante
Good white paper on the copy data problem (sponsored by @actifio) by @baldydubois http://www.actifio.c...
Storage Alchemist
Q4. )If the problem is too much data don't data reduction technologies like #compression and #dedupe solve that?
Peter Eicher
They help for certain use cases, not others.
Peter Eicher
Running data mining off a deduped backup store w SATA drives? See you next month when the job is finished.
Sathya Sankaran
It is the difference between reducing garbage and compacting garbage!
storageswiss
My opinion is that dedupe is an over-rated technology. Like it but it should not be all things to all people.
storageswiss
Dedupe essentially rewards bad behavior, you wouldn't do that for your kids and you shouldn't do that with your data
Peter Eicher
@storageswiss Yup. You keep shoveling the same garbage into the box. But the shoveling itself takes a toll. (Too much metaphor)
Storage Alchemist
@storageswiss agree - need to have undeduped copies to run tests and analytics
Storage Alchemist
@Peter_Eicher and you end up breaking the shovel
Dave Vellante
data-deduplication was a 1-time hit - it created a baseline and now it's off to the data races
Jay Livens
You need to understand the data that you are storing and why and use the right technology.
Storage Alchemist
@JLivens good point Jay, question is, how do manage it all? #copydata #ThirdWave
Sathya Sankaran
Deduplication is not free, it doesnt solve the underlying problem... it gets you some space and time to get your act together
Jay Livens
In my view the challenge comes back to metadata. You need to understand what you have and why. Without that knowledge, it is diff to make intelligent choices.
Storage Alchemist
@JLivens you get the prize man... It is all about the metadata - question is, when is that data set too big? what is the best way to manage that? is it a #datacatalog?
Stuart Miniman
from @otherscottlowe - 10 things to know about data dedupe http://wikibon.org/w...
10 Data Deduplication Facts - Wikibon
The top ten important points on the topic of data deduplication.
Jay Livens
Doesn't this problem closely mirror the archive problem of moving inactive and even duplicate data to less expensive storage?
Sathya Sankaran
@JLivens I think the problem definition includes avoiding that inactive and duplicate data as well
Jay Livens
Right and so I wonder if we should differentiate between active and inactive duplicate data. Primarily because the SLA is different between the two.
Storage Alchemist
What are the other copy data copies besides #backup, #archive, #dr, #test/dev, #analytics?
Dave Vellante
data mart/data warehouse
ttessks
@ remote sites tend to have copies for their own use
Peter Eicher
@ttessks Yes, geographic distribution can multiply all of the above.
Dave Vellante
#testdev and #devops practitioners are copy crazed maniacs
ttessks
@Peter_Eicher that's what she said
Peter Eicher
@dvellante Also take into account version control. I just tweaked this, "Save." I just changed a line, "save." Let's see how this works, "Save."
Storage Alchemist
@Peter_Eicher good point - do you think it is helpful to control costs?
Sathya Sankaran
#VMSprawl... Imagine that when VMs traverse different HyperVisors, Hybrid, Private and Public Clouds!