[LIVE CHAT] AI: Make the Right Choices

Q #1: What are some of the common mistakes IT organizations are making when deciding on infrastructure for an AI PoC/Deployment?

0 Votes Vote

[+] Show Hidden Comments

[-]

Rangan Sukumar

(1/2) Decisions are made on price and/or hype and short-term budgets. Investments in AI infrastructure are not considering data and model lifecycle management. Infrastructure lock-in based on price and hype locks-out future-proofing and user-productivity.

1 Votes Vote

[-]

Rajesh Anantharaman

Many organizations make the mistake of equating AI only with Deep Learning, and as a result they invest in AI infrastructure that supports only Deep Learning.

0 Votes Vote

[-]

Rangan Sukumar

(2/2) Organizations are unable to find facts around AI infrastructure investments: (e.g.: Buying vs. Renting, Component-integration vs. System-integration, the value of supported hardware and software vs. doing-it-yourself.)

1 Votes Vote

[-]

Rajesh Anantharaman

Although Deep Learning is an important part of AI, it is typically only a part of a broader AI workflow and also only one choice of model from a variety of other practical ML models

0 Votes Vote

[-]

Rajesh Anantharaman

Beyond compute, storage is a very important consideration throughout the workflow as well as software that allows you to move through the workflow seamlessly to manage data and build various model types.

0 Votes Vote

[-]

Rajesh Anantharaman

The rapidly evolving landscape of hardware and software in AI makes it essential to invest in broad infrastructure to “future-proof” hardware along with a supported and updated software stack

"build your own"

@aaronrhoden Mind sharing your experience with "build your own" ?

1 Votes Vote

[-]

Rajesh Anantharaman

@aaronrhoden That's a great point. This is one of the reasons Cray came up with Accel AI reference configs that have already been architected with best practice AI workflows from our experience.

1 Votes Vote

[-]

Aaron A. Rhoden

(1/2) @Rangan_Sukumar Customers have a couple of resources from data warehousing who feel they can pull off AI with some x86s servers with pci slots for gpus. It is possible, but the duration from 0 to 100% is so long the problem, purpose and value of the solutions have changed.

0 Votes Vote

[-]

Aaron A. Rhoden

(2/2) Without software stacks determined, and using some O'Reilly books alone is not the way to go. The business will never trust the resources again, and the mere mention of AI, DL or ML becomes a sore spot.

0 Votes Vote

[-]

Rajesh Anantharaman

@aaronrhoden Agreed that for many customers, having an experienced consultant/solution architect can help greatly in their first project win so they continue to invest in AI/ML/DL

1 Votes Vote

[-]

Rangan Sukumar

@aaronrhoden Excellent point! There are gaps between the value of the business problem and the value of data. Honest AI prototypes without the hype may be the way to avoid another AI winter.

0 Votes Vote

Cray Inc.8

Q #4: What are my options if I want to try some of this out?

1 Votes Vote

[+] Show Hidden Comments

[-]

Aaron A. Rhoden

1. call cray 2. call Sirius. shameless plug

(edited)

@aaronrhoden Nice :)

Cray has multiple offerings to help customers get started in AI. We have an Accel AI Lab where you can get access to our hardware and software and try out some of your workloads before you invest in a system.

0 Votes Vote

[-]

Rajesh Anantharaman

Cray also has multiple reference configurations for different stages of the AI journey - we call these Accel AI configs. We have a one node config for you to get started, a prototype config when you have a small team to support for the Ai workflow

0 Votes Vote

[-]

Aaron A. Rhoden

..but seriously, i want to get more hands-on with the toolsets like tensorflow, caffe, caffe2, etc. to really know what i am recommending to customers. i want to be able to sit in their seat for a few, so when i sit alongside them i can be of help.

0 Votes Vote

[-]

Rajesh Anantharaman

and a production config when you want to scale out to support a larger team for the AI workflow

0 Votes Vote

[-]

Aaron A. Rhoden

i am also curious about the Cray unique software contributions to the commercial set.

0 Votes Vote

[-]

Rajesh Anantharaman

@aaronrhoden NVIDIA and Cray also offer some DLI workshops where you can get some hands on experience with TF and building some neural network models. You can also take some online courses on coursera or udacity if you really want to get into it :)

0 Votes Vote

[-]

Rajesh Anantharaman

@aaronrhoden Cray offers a Urika-CS software suite which is a pre-integrated software stack with tools for the AI workflow. We integrated Spark, Tensorflow and other libraries and made sure they work well together so that customers can save time and get started.

1 Votes Vote

[-]

Rajesh Anantharaman

@aaronrhoden We have optimized our software stack to run on heterogeneous compute resources and hybrid storage so you can utilize all the infrastructure you have invested in.

2 Votes Vote

[-]

Rajesh Anantharaman

@aaronrhoden We also bring our supercomputing experience into the software stack in order to run distributed training across a large number of heterogeneous nodes with >90% efficiency.

1 Votes Vote

[-]

Aaron A. Rhoden

@RajeshAnanthara Thanks! I lobbed that one over the net for you. :-)

0 Votes Vote

[-]

Aaron A. Rhoden

@RajeshAnanthara HPC re-purposed for model training sounds amazing.

1 Votes Vote

Cray Inc.8

Q #2: What is the entire workflow?

0 Votes Vote

[+] Show Hidden Comments

[-]

Ted Slater

Workflows are many and varied, but they have some common stages. Data acquisition and "clean-up" come first, and can take 60-80% of a data scientists time.

0 Votes Vote

[-]

Ted Slater

In the middle you'll see model development. This can be very computationally intense, depending upon what you're doing. Deep learning, in particular, requires lots of data and a significant amount of compute power to accomplish.

0 Votes Vote

[-]

Ted Slater

In the end, you're looking for real insight from all of that work. In deep learning, this is the "inference" phase, where new data come into the model and your hard-earned results come out.

0 Votes Vote

[-]

Ted Slater

Workflows can be iterative, where results come out and feed back in to an earlier stage of your pipeline to improve results.

0 Votes Vote

[-]

Rangan Sukumar

There can be more to the “entire workflow” - the ability to integrate new datasets, associating new labels to new and existing data, conducting A/B tests to make sure the model is current, and triggering auto-tuning jobs to retrain model parameters to new data and behaviors.

1 Votes Vote

[-]

Rajesh Anantharaman

AI workflows are also highly iterative in nature – you need to constantly iterate between data prep and model development in order to get a good performing model.

0 Votes Vote

[-]

Rajesh Anantharaman

You also need to iterate constantly across the workflow as models go into production and new data comes in and you need to update the models.

0 Votes Vote

[-]

Ted Slater

You can see that workflows, start to finish, can be complex, and your compute architecture (compute, storage, etc.) has got to be up for the difficult things as well as the easy things.

0 Votes Vote

[-]

Aaron A. Rhoden

@Cray_Inc Though there is no standard iterative model, are there any best practices beyond data set collection, featurisation, and the model training, model test loops to make a best model?

0 Votes Vote

[-]

Aaron A. Rhoden

how do we in tech sales/consulting help customers understand differences between ai/ml/dl and big data analytics?

1 Votes Vote

[-]

Rangan Sukumar

@aaronrhoden The best practices are constantly in flux depending with the fast-paced AI world. That said, there are tools emerging for automating the workflow itself with considerations to model versioning, provenance, etc.

1 Votes Vote

[-]

Ted Slater

@aaronrhoden Hey, Aaron, it's a good question. At a lower level, like the DL level, it's pretty easy: you've got a neural network with a bunch of hidden layers, or you don't. DL usually needs a lot of (often labeled) data, so it's a Big Data thing most of the time.

1 Votes Vote

[-]

Ted Slater

@aaronrhoden Once you get up a level or two, the distinctions become a little less important. What's AI and what's not? It really doesn't matter that much -- we're just trying to get some work done. ;-)

0 Votes Vote

[-]

Rajesh Anantharaman

@aaronrhoden The difference between AI/ML/DL and Big Data sometimes tends to be based on tools and ecosystem. Big Data tends to be on the Hadoop ecosystem, versus AI is on the TF/Caffe2 ecosystem. Spark, however, moonlights between the two

0 Votes Vote

[-]

Ted Slater

@aaronrhoden And Big Data is often just what you make of it. Some data sets can be small, but incredibly "dense" in some ways (like some graphs, for example). These aren't big, but they'll challenge your compute environment so they need to be treated like Big Data.

0 Votes Vote

[-]

Aaron A. Rhoden

@Rangan_Sukumar Variation is expected. I am optimistic that Cray can lead the automations and potential standards, etc. you mention.

0 Votes Vote

[-]

Aaron A. Rhoden

@tedslater Thanks, Ted. "We choose to go the moon...because it is hard."

1 Votes Vote

[-]

Rangan Sukumar

@aaronrhoden @aaronrhoden There is no AI without data and no DL without Big Data. If one is allowed to hand wave , AI/ML/DL is the like a toolbox (magic wand !) to discover patterns in Big Data.

1 Votes Vote

[-]

Rajesh Anantharaman

@aaronrhoden Also, another difference is that Big Data tends to be about collecting and finding insights in data, versus AI goes beyond to making sense or finding patterns in the data

1 Votes Vote

[-]

Aaron A. Rhoden

@Rangan_Sukumar So I have been on track in that outputs from Big Data analytics can be inputs to these other forms (algorithms) of computing. I write this because a number of customers have hadoop (good and bad implementations). I want to inject AI, etc to add more value.

0 Votes Vote

[-]

Aaron A. Rhoden

@RajeshAnanthara Totally with you on the pattern matching concept. Couple that with actionable steps (pre-programmed or alerting) and we have something that can be used for good.

0 Votes Vote

[-]

Rangan Sukumar

@aaronrhoden Well said. Its all about value and extracting it from the Big Data.

1 Votes Vote

Tami Wessley4

Are there any particular phases of the workflow that are likely to be bottlenecks?

1 Votes Vote

[+] Show Hidden Comments

[-]

Rajesh Anantharaman

Different workflows have different bottlenecks, it is difficult to generalize bottlenecks across different use cases and workflows.

0 Votes Vote

[-]

Rajesh Anantharaman

Two stages in the data science workflow that we see bottlenecks are model training, particularly deep learning model training, and data preparation.

1 Votes Vote

[-]

Rajesh Anantharaman

Data acquisition and data preparation could be a bottleneck for use cases where data are hard to acquire and require a lot of cycles to label and prepare.

0 Votes Vote

[-]

Rajesh Anantharaman

Some use cases need to train as fast as possible to get the best possible model at the highest possible accuracy, and trying lots of different models with different parameters becomes a huge bottleneck.

1 Votes Vote

[-]

Rajesh Anantharaman

Some use cases need a lot of iteration between model development and production deployment as the business and technical constraints in production dictate different tradeoffs for model development (both in terms of the model and actual software implementation), which can be a

0 Votes Vote

[-]

Rajesh Anantharaman

Therefore, based on your usecase and workflow, you need to carefully consider your storage and compute infrastructure as well as your software stack

1 Votes Vote

[-]

Rangan Sukumar

(1/3) Depends a lot on the organization and the type of organization, types of data being used etc. Seconding @Tedslater @RajeshAnanthara, for most workflows, data cleaning and preparation is the bottleneck.

0 Votes Vote

[-]

Rangan Sukumar

(2/3) For others moving data back and forth from a data source to compute cores, creating new labels to make DL work for them. Some organizations complain they cannot train models fast enough and others find creating new neural architectures.

0 Votes Vote

[-]

Rangan Sukumar

(3/3) We also hear the need for expertise to be able to understand and solve the bottlenecks that are unique to the data and the organization.

0 Votes Vote