BigData

Systems of Intelligence
Conversation with Wikibon #bigdata research analyst on Systems of Intelligence & it's impact

Which statement best reflects the future of big data?

   8 years ago
#bigdata@theCUBE Talks Big DataConversation with George Gilbert Wikibon Analyst about bigdata Hadoop & other cool news & trends
John Furrier
Q5: What about the impact of #Spark on this mega trend of Systems of Intelligence? the pre chat poll was: a) it will turbo charge hadoop; 2) disrupt hadoop 3) neutral 4) no impact?
George Gilbert
#ApacheSpark makes a lot of analytics easier and faster by running different workloads on the same engine. Still needs more performance improvements. #IBM contributions could really change things
David Floyer
Horses for courses; Hadoop is batch and most efficient/greatest throughout. Spark is microbatch, gets an answer quicker, but less efficient/slower throughout.
Dave Vellante
@dfloyer Is Hadoop the new tape :-)
David Wild
Means you can develop for scale but still work on tab files
George Gilbert
@dfloyer Project Tungsten for #ApacheSpark should get them to pure streaming but that will never completely replace need for batch and #Hadoop
Rodrigo Gazzaneo
@dvellante HDFS is the new long term storage media Mapreduce and Spark can read from
Kirk Borne
Spark (fast, in-memory) vs Hadoop (batch) = the TWIN TOWERS of the Lord of the Things (#IoT): my new @MapR blog will discuss this
Jen(Cohen)Cheplick
@ggilbert41 Completely agree. There is a role for both
David Floyer
Infostreams is real-time, can be supported in development by Hadoop and Spark
Rodrigo Gazzaneo
@KirkDBorne loved the Twin Towers analogy! #LOTR
George Gilbert
@KirkDBorne the debate about running all analytics on fast, in-memory (i.e. Spark) is likely misleading
George Gilbert
@KirkDBorne the analysis around throughput / volume is likely to have different logic than analysis around per event updates
Rodrigo Gazzaneo
@ggilbert41 Memory x Flash x Disk is a matter of cost per capacity and potential revenue from insight
Kevin Petrie
@ggilbert41 Great SoI deck. Can you elaborate on why Spark is at "slow" end of innovation axis on Slide 24? Would think it is high innovation level
Jeff Frick
@dvellante > Hadoop = Tape - Love the analogy. What is "spinning rust?" in this model?
George Gilbert
@KevinPetrieTech good question: it was a tough call - but having the wild west ecosystem of databases or even the Hadoop ecosystem means each component can evolve independently at own pace. Spark libraries must evolve to integrate with each other
Kevin Petrie
@ggilbert41 Got it. Spark has arisen quickly, but nature of Spark libraries throttles future innovation vs. other platforms
Sugandh Mehta
somewhere heard the anology of DW (bottled water) versus DL (natural stream)
Kirk Borne
See my article on the data lake, sea, ocean, flood, tsunami, stream,... https://www.mapr.com...
John Furrier
Q4: Besides big and semi-structured data, just how different are Data Lakes and Data Warehouses?
George Gilbert
#datalake is repository for unrefined, uncurated data that data scientists and biz analysts can explore. repeatable analytics can go to DW or a production #Hadoop cluster
Rodrigo Gazzaneo
it's about the schema. Data Lakes are flexible, Data Warehouses are rigid.
Jen(Cohen)Cheplick
Data lakes should be easier to adapt to changing business & infrastructure needs vs. EDW
George Gilbert
@vGazza also correct. the flexible schema is part of making the #datalake a self-service environment - you add the schema as you explore
Kirk Borne
Diverse multi-source heterogeneous data sets are the norm in Data Lakes, but not in Data Warehousing.
Rodrigo Gazzaneo
Data Lakes support schemas on demand, so you can improve the models continuously
Crowd Captain
Data warehouses and data lakes are slow and sound so old..what's new & different with Systems of Intelligence models ?
Kirk Borne
Data Warehouse = schema on write.. Data Lake = schema on read.
Rodrigo Gazzaneo
@CrowdCaptain Data Lakes support flexible ingestion and insight layers, so they can be fast also
Kirk Borne
Data Lake allows easy updates (data "columns"). DW requires new schema and index builds when adding new
Jen(Cohen)Cheplick
Data lakes retain all (more) data vs. DW
John Furrier
ok it's all about the #dataocean bc oceans have currents and are always highly dynamic so the real time intelligence algos and tech are in the #dataoceans
Jasdeep Singh
DW are archived hierarchially... Data Lakes are mostly object-based
George Gilbert
@vGazza flexible ingestion enables self-service, but repeatability invites structure and therefore performance
Jen(Cohen)Cheplick
We are talking a lot about technical differences, but what is the difference for business value?
Kirk Borne
@CrowdCaptain #SystemsOfIntelligence should derive their value from its apps (smart Machine Learning) not its data model (DW or Data Lake)
David Floyer
data lakes are just a cheaper and bigger version of the failed data warehouses model
George Gilbert
@KirkDBorne it's all about flexibility vs. performance trade-off
Rodrigo Gazzaneo
a Data Lake can be a source of data for a Data Warehouse once you know what to ask
George Gilbert
@KirkDBorne Kirk, i couldn't have said it better. Data Lakes are training wheels. #MachineLearning is what drives Systems of Intelligence
Rodrigo Gazzaneo
@ggilbert41 systematic queries can be optimised and run on Data Warehouses for performance
David Floyer
data streams, data rivers power things along. Data lakes fester.
George Gilbert
@vGazza that's why DataLakes coexist with DataWarehouses - exploration vs. production performance
David Wild
Completely new methods needed for Data Lakes. #Machine Learning and stats miss many of the possibilities
Jeff Frick
@CrowdCaptain > Getting out front in the decision process.
Sugandh Mehta
System of Intelligence/Insights have to have the context based to be effective
John Furrier
context is the data - great point. where is it stored what is the metadata..etc etc..
George Gilbert
another great comment: context is ambient intelligence - the app can never get enough. developers/data scientists always adding more to their model
Kirk Borne
If content is king, then Context is Super-King! :) Context matters immensely!
John Furrier
Q3: What can Systems of Intelligence apps do at their best or most desired outcomes?
George Gilbert
at their most sophisticated they can act automatically, without recommending user action. ex: systems management, smart grid, ad exchange...
Rodrigo Gazzaneo
#SystemsOfIntel add predictive capabilities to the business when few #SoR could
David Floyer
The best ROI come from deployment in automating business processes; system to system, not system to people
John Furrier
running datacenters, apps, #iot lots of things are connected and taking action on data is the big thing
Jen(Cohen)Cheplick
Make it easy for business users - not just IT professionals - to make better decisions with more complete and timely data - as well as recommendations about what those decisions should be
Jen(Cohen)Cheplick
@dfloyer Great point -- automating between systems means the data will actually be used in a timely fashion - and not left up to humans to incorporate
Dave Vellante
@jscheplick this is like the holy grail of data...will the "citizen data scientist" become a reality?
David Floyer
It should be a continuous process of improvement, selecting the best signals from multiple streams, and making them real or near real-time inputs to automation
Kirk Borne
A3: fast real-time autonomous decisions come from #SystemsOfIntelligence if you push #MachineLearning out to the sensor (data collector)
Jen(Cohen)Cheplick
I think it will in some organizations - again, the ones that have the most to gain (and lose) if they don't use real-time data to their advantage
George Gilbert
@jscheplick yes - it's not just consumers - that was easy example. Workday makes it possible for HR professionals to anticipate which high performance emps might leave and how to intervene
Jen(Cohen)Cheplick
However, that won't be the case in most organizations - at least not in the near future. Again -- there is a culture shift -- not just technology -- at play here
Kirk Borne
@dvellante There already exist "Citizen Data Scientists" -- just check out Zooniverse.org, OpenDataThons, and hackathons
David Floyer
Pick the biggest problem that can be solved with automation - e.g. Fraud detection if you are a health insurance provider, customer churn for mobile telecommunication companies
Kirk Borne
A3: I also think mobile devices will become the default ubiquitous input source and output response for #SystemsOfIntelligence
David Floyer
Limit the number of data scientists deployed or drown. Use domain experts; use system of intelligence to extract signal and streams, and continuously monitor and improve.
Kevin Petrie
@dfloyer Great SoI use cases in #Healthcare - IBM Watson can diagnose and treat conditions better than doctors. Healthcare professional roles will be more consultative, relationship based in the future as #AI plays traditional doctor role
Jeff Frick
@ggilbert41 > How much can be automated? How much should be automated?