datasciencedevops

DevOps in Data Science
Discuss how developers are bringing DevOps practices into the data-science pipeline.
jameskobielus
http://www.via-cc.at...

Peter Burris
I know it sounds recursive, but devops is going to need a LOT of data science-like stuff to reach full potential.
John Furrier
#devops enablers are integrated "open toolchains"
John Furrier
#devops enable #2: use of analytics and cognitive/deep learning
John Furrier
#devops enablement #3: microservices and container adoption
jameskobielus
I'll jump in here: source-control repository, data lake, and integrated collaboration environment that spans the entire pipeline.
John Furrier
Just did a #crowdchatstorm in a thread take that @pmarca
jameskobielus
@furrier The integrated toolchain needs to be embedded within the integrated DevOps collaboration environment that I alluded to.
jameskobielus
Question 6 coming.
Kirk Borne
#Microservices come to mind. Also, APIs... here are just a few: https://twitter.com/...
Kirk Borne
#Containers have really had a huge positive impact on productivity for #MachineLearning #DataScience teams in my organization
Kirk Borne
#DataLakes done right will definitely be a big plus in breaking down data silos, enabling rapid innovation, and zero-day discovery from new data sources: https://twitter.com/...
Peter Burris
@KirkDBorne How? I can see why it might, but can you offer specifics?
jameskobielus
@plburris Right. Automated ML-driven code-gen tools are coming along fast and furious. Microsoft, for example, is making great strides to use ML for rapid app development and iteration.
Kirk Borne
@plburris Schema-on-read is fast. Schema-on-write requires months of data modeling, design, development, testing,... i.e., DevOps in the database-build phase, yes, but which I am not seeing too much of.
Kirk Borne
See my article "Mining the #BigData Wheel" that mentions fast schema-on-read day-zero analytics here: https://mapr.com/blo... at @MapR #DataScience
jameskobielus
@KirkDBorne You are quite right. Precious little DevOps drives the database modeling process.
Peter Burris
How many DevOps-using companies are applying these tools and techniques to the development pipeline for machine learning and other data science apps?

How many DevOps-using companies are applying these tools and techniques to the development pipeline for machine learning and other data science apps?

jameskobielus
http://www.via-cc.at...

jameskobielus
Posting another poll. Look up and/or refresh your browsers.
John Furrier
I think this is developing now. ML gets people excited but the value will be in the automation of tasks
John Furrier
Some of the most successful organizations we see employ end to end orchestration solutions to automation the software delivery pipeline from developer check-in to production release and feedback loops from production monitoring
Kirk Borne
Yes, it is developing now, and it is about time. Very excited to see this upcoming conference: https://www.dataopss...
John Furrier
@KirkDBorne best practices as key as early adopter and pioneers are learning lots fast here
Peter Burris
By definition, how can an ML pipeline be run any way other than by a method that looks a lot like devops.
Kirk Borne
I suspect that many orgs who do #DataScience and #MachineLearning pipelines and #DataProduct development are using #DevOps, because it is natural to do so, and there is no need to broadcast something so obvious to the world.
jameskobielus
@furrier Yes. Cross-role orchestration of disparate processes in the ML pipeline--ingest, preparation, modeling, training, deployment, feedback, etc.
jameskobielus
Now for Question 5. Look above.
David Floyer
Until the outcome of data scientist is measured by automation of business processes, very little adoption of DevOps will happen
jameskobielus
@plburris You can run an ML pipeline through sneakernet, but that's an extraordinary waste of high-priced data scientists' workdays.
jameskobielus
@KirkDBorne That's right. "Something so obvious." We see the automation of the data-science pipeline in A/B testing that's going on in every org's ML-driven e-commerce, recommendation engine, mobile apps and other app infrastructures.
Kirk Borne
@dfloyer @jameskobielus Automation is so important and often so lost on folks who play in the #DataScience sandbox. https://xkcd.com/974...
Peter Burris
Yes! Waste! Hence, my equating of DevOps --> Lean.
Kirk Borne
Survey results on DevOps adoption in IT projects: http://www.via-cc.at...

John Furrier
great the new thread is where links expand. Images, slideshares, videos, and blog links. Thanks for sharing
jameskobielus
http://www.via-cc.at...

John Furrier
This is a loaded question but I'd say it depends where the conversation is started in organization or C-Level
John Furrier
it doesn't matter where the initiative starts but where it ends. Adoption should yield results
Kirk Borne
Okay, I am going to cheat. Here are the results of a recent survey on DevOps adoption: https://betanews.com...
John Furrier
I still think the chasm is being crossed as we speak and #devops pioneers still view devops as devops; mainstream call is #cloudOps
jameskobielus
@KirkDBorne Great article. Thanks for sharing, Kirk.
John Furrier
Company putting out a manifesto doesn't make it #devops real agility and proof points wins the day
John Furrier
35% of some projects proves my thought on chasm crossing #cloudops is here which is #devops made easy more automation required
Peter Burris
Like most complex, social changes: It's selective.
jameskobielus
I have not seen any research pointing to DevOps adoption rates in enterprise data science. but this cited research gives numbers on DBA adoption of DevOps, which is interesting.
jameskobielus
Question #4 up above.
John Furrier
#devops challenge is scaling it beyond the small number of teams and projects
Peter Burris
Where Agile is the dev process, and ops process less driven by hardware (i.e., cloud), more likely to find devops.
John Furrier
main comment from #devops pros is: "devops is never finished"
John Furrier
I think incentives across siloed executive leadership are the largest inhibitors to the DevOps transformation; once executive incentives motivate collaboration over siloed transformations, then you win
Kirk Borne
There are various definitions of DataOps, but I prefer this one: #DevOps for #DataScience = #DataOps (IMHO). It's about #DataProduct design, development, deployment lifecycle.
John Furrier
This also might be good to talk value stream automation
David Floyer
Only when development perceive that the Ops is useful in getting code out faster.
jameskobielus
@furrier Another key challenge is scaling to handle the growing range of artifacts--code snippets, statistical models, metadata, schemas, etc.--in a complex app-dev pipeline.
Kirk Borne
The rate of change (actually, rate of acceleration) in digital transformation is really high and is jerking biz around (including tipping points and future shock): https://www.amazon.c... by @csurdak
Peter Burris
@KirkDBorne We equate digibiz = differential use of data. As more firms institutionalize work around data assets, more digibiz -- which amplifies the role of data assets.
Peter Burris
@KirkDBorne Hence, the acceleration.