#BigData - CrowdChat

Systems of Intelligence

Conversation with Wikibon #bigdata research analyst on Systems of Intelligence & it's impact

#bigdata @furrier @ggilbert41 @wikibon @thecube

Wikibon Systems of Intelligence

Which statement best reflects the future of big data?

LeaderBoard

#bigdata@theCUBE Talks Big DataConversation with George Gilbert Wikibon Analyst about bigdata Hadoop & other cool news & trends

Stream Ended

Establishing a secure connection

Q5: What about the impact of #Spark on this mega trend of Systems of Intelligence? the pre chat poll was: a) it will turbo charge hadoop; 2) disrupt hadoop 3) neutral 4) no impact?

[+] Show Hidden Comments

#ApacheSpark makes a lot of analytics easier and faster by running different workloads on the same engine. Still needs more performance improvements. #IBM contributions could really change things

Horses for courses; Hadoop is batch and most efficient/greatest throughout. Spark is microbatch, gets an answer quicker, but less efficient/slower throughout.

@dfloyer Is Hadoop the new tape :-)

Means you can develop for scale but still work on tab files

@dfloyer Project Tungsten for #ApacheSpark should get them to pure streaming but that will never completely replace need for batch and #Hadoop

Rodrigo Gazzaneo

@dvellante HDFS is the new long term storage media Mapreduce and Spark can read from

Spark (fast, in-memory) vs Hadoop (batch) = the TWIN TOWERS of the Lord of the Things (#IoT): my new @MapR blog will discuss this

Jen(Cohen)Cheplick

@ggilbert41 Completely agree. There is a role for both

Infostreams is real-time, can be supported in development by Hadoop and Spark

Rodrigo Gazzaneo

@KirkDBorne loved the Twin Towers analogy! #LOTR

@KirkDBorne the debate about running all analytics on fast, in-memory (i.e. Spark) is likely misleading

@KirkDBorne the analysis around throughput / volume is likely to have different logic than analysis around per event updates

Rodrigo Gazzaneo

@ggilbert41 Memory x Flash x Disk is a matter of cost per capacity and potential revenue from insight

@ggilbert41 Great SoI deck. Can you elaborate on why Spark is at "slow" end of innovation axis on Slide 24? Would think it is high innovation level

@dvellante > Hadoop = Tape - Love the analogy. What is "spinning rust?" in this model?

@KevinPetrieTech good question: it was a tough call - but having the wild west ecosystem of databases or even the Hadoop ecosystem means each component can evolve independently at own pace. Spark libraries must evolve to integrate with each other

@ggilbert41 Got it. Spark has arisen quickly, but nature of Spark libraries throttles future innovation vs. other platforms

somewhere heard the anology of DW (bottled water) versus DL (natural stream)

[+] Show Hidden Comments

what about the #dataocean ?

See my article on the data lake, sea, ocean, flood, tsunami, stream,... https://www.mapr.com...

Q4: Besides big and semi-structured data, just how different are Data Lakes and Data Warehouses?

[+] Show Hidden Comments

#datalake is repository for unrefined, uncurated data that data scientists and biz analysts can explore. repeatable analytics can go to DW or a production #Hadoop cluster

Rodrigo Gazzaneo

it's about the schema. Data Lakes are flexible, Data Warehouses are rigid.

Jen(Cohen)Cheplick

Data lakes should be easier to adapt to changing business & infrastructure needs vs. EDW

@vGazza also correct. the flexible schema is part of making the #datalake a self-service environment - you add the schema as you explore

Diverse multi-source heterogeneous data sets are the norm in Data Lakes, but not in Data Warehousing.

Rodrigo Gazzaneo

Data Lakes support schemas on demand, so you can improve the models continuously

Data warehouses and data lakes are slow and sound so old..what's new & different with Systems of Intelligence models ?

Data Warehouse = schema on write.. Data Lake = schema on read.

Rodrigo Gazzaneo

@CrowdCaptain Data Lakes support flexible ingestion and insight layers, so they can be fast also

Data Lake allows easy updates (data "columns"). DW requires new schema and index builds when adding new

Jen(Cohen)Cheplick

Data lakes retain all (more) data vs. DW

ok it's all about the #dataocean bc oceans have currents and are always highly dynamic so the real time intelligence algos and tech are in the #dataoceans

DW are archived hierarchially... Data Lakes are mostly object-based

@vGazza flexible ingestion enables self-service, but repeatability invites structure and therefore performance

@bigdataryan care to chime in?

Jen(Cohen)Cheplick

We are talking a lot about technical differences, but what is the difference for business value?

@CrowdCaptain #SystemsOfIntelligence should derive their value from its apps (smart Machine Learning) not its data model (DW or Data Lake)

data lakes are just a cheaper and bigger version of the failed data warehouses model

@KirkDBorne it's all about flexibility vs. performance trade-off

Rodrigo Gazzaneo

a Data Lake can be a source of data for a Data Warehouse once you know what to ask

@KirkDBorne Kirk, i couldn't have said it better. Data Lakes are training wheels. #MachineLearning is what drives Systems of Intelligence

Rodrigo Gazzaneo

@ggilbert41 systematic queries can be optimised and run on Data Warehouses for performance

data streams, data rivers power things along. Data lakes fester.

@vGazza that's why DataLakes coexist with DataWarehouses - exploration vs. production performance

Completely new methods needed for Data Lakes. #Machine Learning and stats miss many of the possibilities

@CrowdCaptain > Getting out front in the decision process.

Sugandh Mehta13

System of Intelligence/Insights have to have the context based to be effective

[+] Show Hidden Comments

context is the data - great point. where is it stored what is the metadata..etc etc..

another great comment: context is ambient intelligence - the app can never get enough. developers/data scientists always adding more to their model

If content is king, then Context is Super-King! :) Context matters immensely!

Q3: What can Systems of Intelligence apps do at their best or most desired outcomes?

[+] Show Hidden Comments

at their most sophisticated they can act automatically, without recommending user action. ex: systems management, smart grid, ad exchange...

Rodrigo Gazzaneo

#SystemsOfIntel add predictive capabilities to the business when few #SoR could

The best ROI come from deployment in automating business processes; system to system, not system to people

running datacenters, apps, #iot lots of things are connected and taking action on data is the big thing

Jen(Cohen)Cheplick

Make it easy for business users - not just IT professionals - to make better decisions with more complete and timely data - as well as recommendations about what those decisions should be

Jen(Cohen)Cheplick

@dfloyer Great point -- automating between systems means the data will actually be used in a timely fashion - and not left up to humans to incorporate

@jscheplick this is like the holy grail of data...will the "citizen data scientist" become a reality?

It should be a continuous process of improvement, selecting the best signals from multiple streams, and making them real or near real-time inputs to automation

A3: fast real-time autonomous decisions come from #SystemsOfIntelligence if you push #MachineLearning out to the sensor (data collector)

Jen(Cohen)Cheplick

I think it will in some organizations - again, the ones that have the most to gain (and lose) if they don't use real-time data to their advantage

@jscheplick yes - it's not just consumers - that was easy example. Workday makes it possible for HR professionals to anticipate which high performance emps might leave and how to intervene

Jen(Cohen)Cheplick

However, that won't be the case in most organizations - at least not in the near future. Again -- there is a culture shift -- not just technology -- at play here

@dvellante There already exist "Citizen Data Scientists" -- just check out Zooniverse.org, OpenDataThons, and hackathons

Pick the biggest problem that can be solved with automation - e.g. Fraud detection if you are a health insurance provider, customer churn for mobile telecommunication companies

A3: I also think mobile devices will become the default ubiquitous input source and output response for #SystemsOfIntelligence

Limit the number of data scientists deployed or drown. Use domain experts; use system of intelligence to extract signal and streams, and continuously monitor and improve.

@dfloyer Great SoI use cases in #Healthcare - IBM Watson can diagnose and treat conditions better than doctors. Healthcare professional roles will be more consultative, relationship based in the future as #AI plays traditional doctor role

@ggilbert41 > How much can be automated? How much should be automated?