[LIVE CHAT] AI and Hybrid Cloud Storage

We're nearing the end of this #ActionItem CrowdChat on AI & Hybrid Storage. #Wikibon wants to thank the experts who've participated in the discussion. It's been excellent and stimulating.

0 Votes Vote

[+] Show Hidden Comments

jameskobielus12

Are cloud storage advances keeping up with AI processor performance improvements? https://www.crowdchat.net/s/15ud2

0 Votes Vote

[+] Show Hidden Comments

[-]

Keith Townsend - Light will overcome darkness

It's a "it depends" answer. Obviously, you can't beat the speed of light. So, the centralized model of cloud storage doesn't meet that requirement. As you look at solutions form EMC, NetApp, Faction etc. you see the potential for distributed access.

2 Votes Vote

[-]

Bobby Allen

- No. Storage is lagging behind AI Processing advancements. Just look at the number of services and options in AI and related fields in the public providers. Most of the advances in cloud storage are higher IOPS or a bit of lifecycle management.

3 Votes Vote

[-]

Sarbjeet Johal

simple answer to this question is "No". In order to get there we need to improve "software side of the storage", likes of #hadoop ecosystem have helped us but a lot need to be done. We need better storage algorithms (besides better hardware performance).

3 Votes Vote

[-]

Joe McKendrick

Moore's Law -- or whatever it is these days. Software is always catching up with faster and denser processors.

0 Votes Vote

[-]

David Floyer

If the data originates in the cloud, the performance capabilities of storage are generally very good. However, this does require a significant amount of setup time and integration of data sources & compute/GPU resources.

0 Votes Vote

[-]

Stuart Miniman

cloud storage today offers a broad spectrum of offerings from a diverse ecosystem, including so many options in processors, GPUs, etc.

0 Votes Vote

[-]

Keith Townsend - Light will overcome darkness

But you do have compute instances that are AI-friendly. Cloud providers are offering supper fast local storage options and high-bandwidth low-latency network options to create parallel applications. So, it depends on the use case.

3 Votes Vote

[-]

Sarbjeet Johal

Also, IMO Compute and Network are getting bigger share of investment dollar but storage tech isn’t far behind. I see many startups trying to tackle this bottleneck.

0 Votes Vote

[-]

Peter Burris

cloud storage? I guess so. GCP, in particular, is providing well-regarded services. Are merchant storage solutions keeping up with merchant AI silicon? Not without a fair amount of detailed administrative work, but getting closer.

0 Votes Vote

[-]

David Floyer

For private clouds there are integrated options with flash and compute.

0 Votes Vote

[-]

Bobby Allen

- We need software to help us push more into data lifecycles and future access patterns. One advancement I'd like to see is around user notification. If AI is applied to storage, I should be notified is interesting comes to light in the future.

0 Votes Vote

[-]

Joe McKendrick

Let's hope the cloud storage requirements never surpass the power available on processors!

0 Votes Vote

[-]

Sarbjeet Johal

Traditionally we used to see network lagging behind compute and storage, I believe not Storage seem to lag behind network. Storage has slipped to #3

0 Votes Vote

[-]

Keith Townsend - Light will overcome darkness

When I can get a bunch high-memory, FPGA/GPU backed instances on a 10Gbps network, you can create some AI solutions not easily recreated in the enterprise.

1 Votes Vote

[-]

Bobby Allen

@sarbjeetjohal - I would have said storage techs were getting lots of the investment dollars. It seems like we always hear about storage startups getting funding - though the solutions feel like only incremental advancement from the current options.

0 Votes Vote

[-]

jameskobielus

@ballen_clt Lifecycle management, yes. Such as using AIOps (machine-learning driven predictive optimization).

0 Votes Vote

jameskobielus11

As data storage shifts toward edges, when will it become cost-effective to distribute most AI training workloads? https://www.crowdchat.net/s/65uf3

0 Votes Vote

[+] Show Hidden Comments

[-]

Bobby Allen

- Not sure I have a good answer but feels like it will be tied to adoption of onprem public cloud tech like AzureStack and AWS Outposts. Devs will be able to apply the same models locally that they modeled in public providers.

2 Votes Vote

[-]

Joe McKendrick

It depends on how available data scientists, AI specialists, etc. are at edgy sites. Their salaries and time will drive the costs.

4 Votes Vote

[-]

Sarbjeet Johal

We will do the most of the training in cloud but most of inference will be done on the edge.! #AI #ML #DataScience https://www.crowdchat.net/s/15uf7

3 Votes Vote

[-]

Peter Burris

Probably when it's technically feasible. Again, AI for data management may well be a prerequisite for AI generally.

0 Votes Vote

[-]

David Floyer

There are two ways that data at the edge can be used to improve functionality. The first is local data that can go to improve knowledge of the environment (e.g., pot-hole in road). The second is changes to the inference code. Compliance will make this rare!

1 Votes Vote

[-]

Joe McKendrick

With AI talent so precious and few right now, it will remain centralized

0 Votes Vote

[-]

jameskobielus

@sarbjeetjohal So are you arguing that it will never be cost-effective to do most training on the edges at any point in our lifetimes?

0 Votes Vote

[-]

Sarbjeet Johal

While distributing #AI workloads, number one thing to keep in mind is manageability which IMHO can turn out to be one of the biggest cost factor for at scale #AI programs.

1 Votes Vote

[-]

Bobby Allen

@sarbjeetjohal - Agreed. I can see folks training ML workloads in AWS and deploying inference models on Outposts in the interim until the edge is ready. Outputs would be an intermediate hub to collect edge data.

0 Votes Vote

jameskobielus8

Does the AI industry need to rethink storage in the era of the hybrid cloud? https://www.crowdchat.net/s/85udl

0 Votes Vote

[+] Show Hidden Comments

[-]

Sarbjeet Johal

#AI industry need to more than rethink storage in the era of the hybrid cloud.

2 Votes Vote

[-]

Joe McKendrick

Storage is always an underappreciated field of endeavor.... always kind of an afterthought. We hear about the "sexy' stuff associated with AI and ML, assuming storage will somehow keep up

1 Votes Vote

[-]

Keith Townsend - Light will overcome darkness

@sarbjeetjohal Agreed. Well we ever have enough compute for AI as it conceived today?

0 Votes Vote

[-]

Stuart Miniman

I'm skeptical of many of the storage solutions that are "optimized for AI". The changing state of applications definitely requires some redesign and consideration of storage.

1 Votes Vote

[-]

Sarbjeet Johal

#AI industry need to recalibrate (rather than rethink) storage architectures in the era of the hybrid cloud. Starting from using the right mix of storage (performance) for training the models vs inference workloads to performant software platforms & compression...

2 Votes Vote

[-]

Bobby Allen

- Yes. Storage lifecycle and costing is typically separated based on users accessing the data (or not). I think we'll see a distinction between user access, AI access and offline / archived. AI models need to still run against data I don't think I need (yet)

0 Votes Vote

[-]

David Floyer

AI and big data practitioners have always struggled with storage, from the days of Hadoop and beyond. Cost, access, and speed are constant challenges.

1 Votes Vote

[-]

jameskobielus

@joemckendrick I think data lakes are sexy, and they're key components of the AI development and operations pipeline. They're nothing without mass storage.

1 Votes Vote

[-]

Peter Burris

If the AI industry believes that all data will reside in centralized storage pools under centralized control, yes. Lots of questions regarding how and how much data will get moved. Most likely: Much derivative AI modeling will be distributed, with big implications.

0 Votes Vote

[-]

jameskobielus

@sarbjeetjohal I agree. It's a matter of fitting the storage tech to the specific workloads in each tier of the AI pipeline deployed in public v. private clouds in hybrid architectures.

0 Votes Vote

[-]

Joe McKendrick

I think the cloud providers will keep up with storage requirements. It may be a tall order for enterprises onsite. But the cloud providers will always be in a race at the back-end to shore up speed and performance

0 Votes Vote

[-]

Joe McKendrick

Data lakes are pretty cool. I think governance issues still need to be worked out for full embrace

0 Votes Vote

jameskobielus8

What are the principal bottlenecks that traditional storage techs impose on AI workloads? https://www.crowdchat.net/s/25ucm

0 Votes Vote

[+] Show Hidden Comments

[-]

Peter Burris

millions of small files randomly accessed.

1 Votes Vote

[-]

Keith Townsend - Light will overcome darkness

no AI expert but one obvious challenge is I/O. Centralized storage systems introduce latency in getting data close to the inference and modeling engines.

2 Votes Vote

[-]

jameskobielus

@CTOAdvisor So "traditional" means centralized storage?

0 Votes Vote

[-]

Keith Townsend - Light will overcome darkness

I think that's the model most enterprises deploy. Even if you look at HCI.

0 Votes Vote

[-]

Bobby Allen

- Traditional storage assumes that we know enough about the data to accurately categorize it. I think that's part of why we see an explosion in unstructured data. Data will be more in the form of media (pictures, sounds, etc) than traditional blocks.

0 Votes Vote

[-]

David Floyer

For inference workloads at the Edge (many different types) the emphasis is on real-time in-context support of inference code running in a mesh of nodes. DRAM memory together with Flash (NVDIMMs) will be an important technology at the Edge.

1 Votes Vote

[-]

Sarbjeet Johal

IMO, top 3 bottlenecks in traditional storage (for #AI workloads) are:
- Performance (less parallelism)
- Programmability
- Rapid scalability

3 Votes Vote

[-]

Joe McKendrick

Disk I/O -- the time it takes to make the round-trip between disk and memory

0 Votes Vote

[-]

Keith Townsend - Light will overcome darkness

@sarbjeetjohal Is the lack of parallelism a symptom of poor I/O?

0 Votes Vote

[-]

Joe McKendrick

There are many data silos across enterprises. Much of the data is still inaccessible

1 Votes Vote

[-]

jameskobielus

@ballen_clt Really? I thought that "traditional" in storage meant data structuring--relational, columnar, file, etc--but not necessarily any deeper semantic understanding of the data.

0 Votes Vote

[-]

Keith Townsend - Light will overcome darkness

So, I'm hearing a theme of networking as a bottleneck. Even if you have fast I/O to NVRAM the distributed nature of AI provides a challenge.

0 Votes Vote

[-]

Sarbjeet Johal

programmability of storage will enable policy based storage allocation which can further help in #AI workloads. Not all #AI related workloads are equally demanding on storage.

0 Votes Vote

[-]

David Floyer

For development workloads the emphasis is on large amounts of data (much of which will use HDD), & smaller amounts of active data held in flash. The optimum way of holding this data is shared mode with snapshots. Distributed data should have code moving to the data

0 Votes Vote

[-]

jameskobielus

@dfloyer What are the bottlenecks in this regard from traditional storage?

0 Votes Vote

[-]

jameskobielus

@CTOAdvisor AI can be as centralized or distributed as you need it to be. But it's often modeling, trained, and served from highly centralized storage/compute platforms.

0 Votes Vote

actionitem

Stream Ended

actionitem

Invite People to #actionitem

1. Select Contacts

2. Compose Message

3. Send

Invite to #actionitem

Stream Ended

Extend Time Prompt

How many minutes would you like to add?