eweekchat

Data Storage and Protection
JOIN US: Discuss trends in Data Storage, Protection and Privacy.
   a month ago
#eweekchatDevOps & Agile DevelopmentJOIN US: Discuss all things DevOps & Agile Development, focusing on key trends, best practices, and predictions.
James Maguire
Q8. What do you expect for the future of data storage, protection and privacy, say 3-5 years out?
Paul Speciale
A8. Software defined (obvious), more automated (autonomous?) management and more power in custom data management policies.
Stephen Manley
A8. Prediction 1: Smarter systems that manage storage, data, and protection for you. With data in so many places, people can't keep managing it directly. We should never have to provision storage, update SW, install security patches, configure backup schedules, or refresh HW.
Paul Speciale
A8. Solutions will be more portable, as we vendors will also deliver solutions that can run where they are needed.
Panasas, Inc.
In spite of us all wanting it, in 3-5 years flash prices will not have come down enough for all storage to be flash-based. Our appetites for data are growing too quickly, even a small delta in price multiplied by PBs will be a big cost.
Panasas, Inc.
To protect your data, all compute and storage will have to go to “zero trust” architectures. That’s a buzzword that the industry doesn’t quite know how to implement yet, but we’re going to see vendors trying to figure it out over the next 3-5 years.
Stephen Manley
A8. Prediction 2: Built in ransomware protection. Ransomware isn't going away. Therefore, every "DIY" guide, every manual step is going to be replaced with something automatic.
Panasas, Inc.
“Data has mass, it takes energy to move it” is an old saying, but it has value as a simple analogy for what drives storage architectures. As data grows it will tend to clump together, as mass does, because of the lower cost to move it shorter distances.
Panasas, Inc.
Extreme centralization of data brings all sorts of value in making use of disparate data for a common goal, but also brings all sorts of risks as the motherload of hacking. Balancing those forces (the mass analogy again) will be the job of “storage architecture”.
Steve McDowell
A8. Prediction: Intelligent tiering will be built into storage systems. There are significant cost disparities between 3D XPOINT, TLC, QLC, spinning HDDs + all the cloud options. IT needs some help. Storage vendors will step up.

(edited)

Panasas, Inc.
@makitadremel What is the core value prop of public clouds? They'll take the load of managing your fleet of hardware and networking. Storage is not terribly different, the fleet of hardware needs managing, and it can be done with lots of automation.
Celebrus
A5. With more focus on owning first-party data, and the compliance elements therein, we are seeing a continued push toward building a “single source of truth” for greater visibility and governance moving forward. #eWEEKChat
Stephen Manley
A8. Prediction 3: People will pay for results. We've already moved away from bespoke data environments. As data infrastructure becomes more standardized, organizations will not stand for many bills for piece parts. They want a single, clear bill for the value they're getting.
Steve McDowell
A8. Storage-as-a-Service becomes a much bigger piece of the market. In a world where data streams endlessly, its very compelling to have the capacity-on-demand that as-a-service offers. And OpEx dollars are always easier to find than CapEx!
Daniel Graves
A8 We've discussed the trends - more data, more places, more regulations, more complexity. Investments will be made to make data easier despite these headwinds. Programmable data infrastructure will enable automation, and ML will help classify and govern.
Panasas, Inc.
@CelebrusTech A single data storage solution that handles all your data can be both much easier to manage and higher performance than a set of silos.
Panasas, Inc.
@sr_mcdowell I'll debate tiering as a solution. The core question is whether your data goes cold over time and is rarely referenced after that. If that's not true, then tiering is slower and more expensive. In AI/ML its not true.
Panasas, Inc.
If your working set is larger than your primary tier, then it's a guaranteed net loss from thrashing. Does a user know how big their working set is? Does it change over time?
Panasas, Inc.
@makitadremel I agree that users wants simple usage and simple billing. They want pay-as-you-access and pay-as-you-grow. With CPUs that's easy, see AWS or VMWare, with storage and networking it's harder to offer those.
James Maguire
Q1. What is the biggest current challenge in data storage?
Paul Speciale
A1. Data creation and consumption is everywhere now, data centers, cloud and edge
James Maguire
@pspeciale No doubt, data creation has exploded!

(edited)

Paul Speciale
this creates problems in visibility (where is the data) and management
Stephen Manley
A1. The biggest challenge is data sprawl - data in so many apps and locations - it's almost impossible to secure and protect it... much less let other people in the organization find what they need.
Steve McDowell
A1. Agree with @makitadremel - it's getting a handle on just data you have.
Daniel Graves
A1 Building on what Paul said, with creation everywhere and consumption everywhere, storage is also everywhere. so one key challenge is data movement, from the point of creation to all the consumers.
Paul Speciale
@makitadremel no doubt about it, just figuring out what is where and how to access it securely is a huge challenge
Steve McDowell
A1 Understanding what data needs to be kept, and where, is also a huge challenge. Not all data generated at the edge, for example, needs to come home to the data center.
James Maguire
@makitadremel Seems like less and less data is sent back home these days.
Daniel Graves
A1 movement challenges include security, privacy, governance; speed and cost; identification and categorization; and making the data usable for the intended consumption use case
Paul Speciale
@sr_mcdowell this is true, edge has extra considerations for mobility and filtering of what needs to be mobilized
Paul Speciale
A1. Another challenge is long term retention of data, especially for stuff that matters and has value.
Panasas, Inc.
The biggest challenge is, always has been, and always will be price/performance. Customers should always be looking for the “best deal”, but there’s more to the “best deal” than just acquisition price.
Panasas, Inc.
What’s the true value of reliability? How much data loss per week would you be OK with if you could save 40% on the purchase cost?
Panasas, Inc.
What’s the true value of availability? Would you be OK if your storage was down for a week, every quarter, with all that compute and networking sitting idle, if the price was 40% lower than the other products?
Panasas, Inc.
What’s the true value of ease-of-use? Would you be OK if you needed to find and retain 3 experienced PhDs (at mid-6 figure salaries) just to keep your storage running, when everyone else needed them too?
Panasas, Inc.
CPUs are a commodity, completely fungible, chosen solely based upon price/performance. Storage is not a commodity, choose based purely on purchase price and you will end up unhappy with your decision.
Daniel Graves
A1 another challenge is determining what to store. With traditional analytics, companies predetermined which data should funnel into a data warehouse. But with ML adoption, data scientists want access to a lot more, increasing the storage pressure.
Panasas, Inc.
@db_graves Yup, networking is the totally unrecognized part of storage. Everything these days is "network attached storage", but (almost) all storage companies disclaim any knowledge of the customer's network.
James Maguire
@Panasas Interesting. Willful lack of knowledge.
Panasas, Inc.
Yet if the customer's network is misconfigured somehow, the storage is slow and the customer suspects the storage, we need better network management tools in the storage business.
James Maguire
Q5. What trends are you seeing in object storage, is customer usage increasing? Innovations in object storage?
Steve McDowell
A5. Object storage is emerging as the technology of choice for purely unstructured data. And there's a lot of unstructured data.
Paul Speciale
A5. This is one place where AWS is pushing some innovations that storage vendors will leverage. For data protection, recent innovations in data immutability (S3 Object Lock) do provide solutions for compliance and ransomware protection.
James Maguire
@sr_mcdowell Do you agree that an ever larger percentage is unstructured?
Stephen Manley
A5. Customers who were shoehorning applications into NAS (e.g. anything where the pathname was auto-generated and stored in a DB for lookup) are increasingly shifting to object storage. We're also seeing a lot more "high performance" apps on object storage.
Paul Speciale
A5. We also see solutions providers starting to embrace object storage more. For example, Splunk with SmartStore (S3), Vertica with Eon Mode (S3), Veeam with v10 (S3 object locking), and more
Steve McDowell
I think that we're maybe more aware of how much unstructured data we have than we used to be. Some of these technologies force us to look at what we have.
Stephen Manley
@pspeciale To be fair, most NAS systems offer solid immutability solutions (e.g. NetApp SnapLock, Data Domain Retention Lock)
Paul Speciale
@makitadremel - agreed, its catching up here and is a natural fit for semantics of object storage
Steve McDowell
A5. @pspeciale is dead-on: AWS continue to innovate around object. Beyond data protection, just simple things like intelligent tiering can make a huge difference in IT efficiency.
Panasas, Inc.
Objects are great! Panasas PanFS is an HPC-class filesystem built using our own Object store, our “application” is a filesystem, and we get our linear scale-out in capacity and performance from that Object store.
Panasas, Inc.
Now look at AWS S3, at 300ms access latency that’s an archive tier, it’s not high performance. It’s nearly impossible to build high perf apps on top of a slow Object store, we need low-latency Objects and modifiable Objects.
Panasas, Inc.
One challenge in storage is finding the ‘thing’ you need, an image, a document, whatever. It’s attractive to store each such ‘thing’ in an Object, but users want to categorize and group their content, in filesystems that’s directories and files.
Panasas, Inc.
Objects APIs have gradually recreated directories, etc, as a result. POSIX filesystems and Object APIs may merge at some point, but until then both APIs will be required to the same content.
Paul Speciale
A5. It does feel like we are in another wave of object storage growth, more storage solutions, more apps - and newer apps coming online after the initial archival/backup apps
Steve McDowell
A5. @panasas makes a great point: objects are (at some level) just a building block. We're seeing technologies like PanFS use objects in very interesting & insightful ways.
Panasas, Inc.
@sr_mcdowell IMHO there's a big difference between "storage for software" and "storage for users". Objects are fine for apps, they can do the mapping from 'thing' to Object URL. Users want a folder tree that can look around in
Paul Speciale
A5. POSIX presentations will still be needed for many unstructured data use-cases, too many apps require and depend on file systems. It will coexist with object storage.
Panasas, Inc.
@makitadremel I agree but would point out that the constraint there was "filesystem size" not "filesystem semantics". If the NetApps and Isilons could grow larger there might be a different calculus than just moving to Objects.
Panasas, Inc.
Objects are good, but I'm not sure all the different use cases and customer desires are being weighted correctly. They're clearly part of our future (Panasas uses them), but for which use cases?
Stephen Manley
@Panasas I disagree. Size may have been a secondary issue, but what was the point of storing objects in files (e.g. an image for a render farm or a medical image). There was a DB that stored image -> filename mapping. Why not skip the middle man and go directly to the object?
Panasas, Inc.
The case you quoted was of a NAS system being fragmented, which I took to mean from limited size. If the app was badly built to put render farm images into a database, that's a great fit for object storage. Databases are all about unanticipated queries.