datafriction

Modern Infrastructure Mgmt
Accelerating Productivity Through Machine Learning
Jason Johnson PMP
http://www.via-cc.at...

tomgolway
doubling annually
George Gilbert
growth is as fast as enterprises can capture the data. it's not the 40% CAGR IDC and EMC keep citing
George Gilbert
40% is just Moore's Law
Randy Arseneau
I'm repping a vendor, but I can say from our perspective hearing from clients, it's dramatically - esp. in retail, distribution, telco and managed services sectors.
Matt Cauthorn
Similar to Randy, but for our customers it's massive growth
jameskobielus
@readyforthenet I've heard that as a general rule of thumb for annual data growth across enterprises of various sizes, industries, etc.
tomgolway
it volume is growing related to increased variety of metrics being captured
Colin Walker
Management data is slower than data overall, but not by much. It's massive growth. Easily 40%+ a year.
Randy Arseneau
Retail supply chain is driving enormous data sprawl.
jameskobielus
@dorkninja The industries that are doing big data most avidly tend to show greatest growth. Especially those, such as retail and media, that are making huge investments in machine learning and AI and need huge training data sets.
George Gilbert
one way to think about growth: number of entities in the application and infrastructure landscape (ever more fine-grained) emitting ever more telemetry per entity
Matt Cauthorn
IT Ops is becoming a data-driven practice, hence growth. Need drives growth!
jameskobielus
@mcauth Can you give an order of magnitude on that? 10x annual growth? 100x?
Randy Arseneau
@mcauth It's the "DevOps-ification" if IT!
Matt Cauthorn
@jameskobielus More anecdotal, but for many customers it's 2-5x annually
jameskobielus
@mcauth Is the volume of infrastructure data growing in direct proportion to the size and complexity of the IT infrastructure itself? Or is the amount of infrastructure data growing faster (or slower) than the infrastructure?
jameskobielus
@colin_walker I'm curious why management data is growing more slowly (albeit slightly) than application data. Are IT management tools growing more sophisticated in deriving analytic insights for the tasks they perform?
Colin Walker
I just think there's a naturally offset correlation. You deal with more bytes in/out as things like HD vid, audio streams, higher graphic content etc. become the norm. But those things don't automatically drive more mgmt data.
tomgolway
I see scale-out app architectures and associated service policies driving the need for increased measurements
Peter Burris
Big issue. Integrated ITOM data and tooling implies more integrated roles and orgs. Scares a lot of folks.
Peter Burris
One CIO told me: This is the biggest challenge to his business's adopting new technology.
Colin Walker
@plburris Yep, people are afraid of change, afraid of retooling not only the compute side, but the human side. Understandable but also necessary.
jameskobielus
@ggilbert41 Great framework for sizing IT management data.
jameskobielus
@mcauth IT ops is also becoming an ML/AI driven process. Anomaly detection amid petabytes+ absolutely demands precision "pattern-sniffing" in real-time.
tomgolway
@jameskobielus IT Ops will begin to look more like IoT as more intelligence and decision ability is put into end systems
Matt Cauthorn
@readyforthenet Or perhaps it's the other way around :) IoT looks more like ops. Either way, fascinating perspective.
Matt Cauthorn
@jameskobielus ...and effective feature extraction.
Chris Selland
The key is to not treat data in a stovepiped way but be able to manage and integrate all forms of data the SAME way
George Gilbert
with that approach you can create a data lake for IT which can be further refined into machine learning models of how specific domains work
Randy Arseneau
Agreed, although challenging from an integration and orchestration perspective sometimes.
Chris Selland
you can create a #datalake for the entire org - not just IT. Which is how it should be
George Gilbert
@dorkninja exactly - that's the trade-off
Colin Walker
Agreed, assuming you can actually make use of that data lake in am meaningful way. So often people get huge cess pools of unrefined data that isn't useful, doesn't have appropriate ML features, and they can garner no insight from. #SadMLFails
Matt Cauthorn
here too I'd say that different data sources warrant different treatment, though ultimately they can be unified
Jim Shocrylas
see cess pools being created and access to data through same old stove piped approach
Chris Selland
@mcauth that's the reality today but also the goal
Colin Walker
+1. Hurts my data loving soul, but it's a song all too common. Companies know they "need" big data, but have no idea how to implement/refine/use/benefit. So they end up with a non useful solution, are mired in it for years, and fall behind.
Chris Selland
@Jshoc I've heard about data lakes that became swamps but first time I've heard cesspool
Randy Arseneau
@colin_walker Yup. And that inhibits future innovation and the appetite to experiment.
jameskobielus
Nice to have you on the chat.
Colin Walker
Precisely. Suddenly the message is "We tried that big data thing. It's bogus. Why should we try it again?" when new, innovative, powerful approaches surface. Same issue "cloud" and many new techs have faced, frankly.
George Gilbert
@colin_walker some vendors build in the data/ML smarts for a particular domain so their application collects and organizes only the data they need in the data lake
Chris Selland
@jameskobielus happy to be here - see you in NYC in a few weeks?
jameskobielus
What's the "SAME way" that all data should be managed and integrated? A data lake? Data warehouse? Data refinery?
Colin Walker
Truth. Unfortunately I think the biggest value of ML won't come from silo driven lakes. Correlation is *money* when it comes to trends/patterns.
Jim Shocrylas
akin to to dashboard exhuast
jameskobielus
@ggilbert41 Specific IT management domains? Such as incident response? etc.
Chris Selland
@jameskobielus data lake with governance - I like Steve Smith's piece https://www.eckerson...
Chris Selland
@colin_walker that's right and you often won't know what's correlated up front
jameskobielus
I'll be at Strata at the end of the month. Let's talk. You guys doing the Cube?
Peter Evans
Data lakes only become cesspools if they do not have a good ILM process in place from the outset
Colin Walker
Precisely. Process/analyze/correlate the best you can, and store for further analysis? Yes please. #DataNirvana
Matt Cauthorn
@colin_walker It's all about getting to the data with minimal friction, max velocity
Chris Selland
Great let's do that - @2thebeach coordinating our Cube participation
George Gilbert
@jameskobielus i think incident response is more horizontal. i was thinking about intrusion detection or management of an application's full stack such as SAP, etc
Neil Raden
@colin_walker We're the emergence lineage, provenance, governance and security capabilities for data lakes, but what is still laking is a clear vision of the desired outcome.
Colin Walker
@NeilRaden Oof, that hits home hard. 100% agree. People don't know what they want to know, yet. They're looking for not only the answers, but the questions they're supposed to be asking. Makes it tough to set up appropriate ILM, practices, etc.
Colin Walker
@mcauth Exactly! How fast can I get to insights? How little time can I waste digging through what I don't need, or care about?
Randy Arseneau
@colin_walker There are some emerging autodiscovery and pattern sniffing techniques that can help here I think. Akin to pharma saving all clinical trial and vector data forever, in case a future superbug appears.
Chris Selland
@EvansBI yes but - you don't always know what you're looking for so it needs to be flexible - historically ILM hasn't been
Matt Cauthorn
@NeilRaden ...and perhaps a commitment from the organization to tap into the potential. It'll provide returns if the commitment is there.
Peter Evans
@NeilRaden problem I believe is confusion from vendors about what outcome is best for the data you have ingested - specific use cases should be designed by industry to enable governance to work and be compliant with regulatory rules GDPR etc
Colin Walker
@dorkninja Agreed. Lots of options coming that use ML to ask the questions of the data without humans having to know what to ask. Definitely the way forward, IMO. #OneMansOpinion. Part of why we use this technique in our tech. #DontKnowWhatYouDontKnow
Randy Arseneau
@colin_walker Spent a little time with Jeff Jonas while at IBM - he would vehemently agree!
Peter Evans
ILM based on regulatory compliance and industry methodology can be tuned to be both flexible and encompassing - problem with Big Data technologies is that they primarily did not start out that way so difficult to enact ILM correctly
Matt Cauthorn
@dorkninja patterns for sure...and relevant features extracted from the operational data. This is key to the ML equation, to put it mildly
jameskobielus
@mcauth What's the role of machine learning in unifying the treatment of different data sources while also enabling differentiated task-specific IT management insights?
Matt Cauthorn
@jameskobielus Increasingly large. Putting data sources aside for a moment, the growth alone warrants machine assist. Human analyst error rates are high vs. machines for a huge swath of ops intel.
Chris Selland
@EvansBI yup that's the challenge but also the opportunity - we're working on it at @unifisoftware
tomgolway
@jameskobielus ML has a substantial role in demand forecasting/management
Colin Walker
@mcauth Exactly. Forget sources or silos. The sheer growth and volume of inbound to be processed means ML is either currently or soon 100% required, full stop. Otherwise what's your option? Call center sized buildings full of data analysts? #Pass
Peter Evans
Funny that so do we at @solixbigdata :-)
jameskobielus
@colin_walker Industrial-grade feature discovery on that infrastructure data can call out the predictors from unstructured data. Perhaps cluster analysis algorithms.
Colin Walker
@jameskobielus Precisely. Proper Discovery -> ML -> Analysis -> Reporting chains are the way of the future. #NotKickBoxing
Chris Selland
@EvansBI we should probably chat offline!