
IBM Analytics59

























Q #1 : What is Spark?

Ira Michael Blonder
an Apache project focusing on server cluster architecture

Andrew C. Oliver
Spark is and in-memory distributed computing platform which essentially executes your functional Scala or Python code across the cluster as opposed to one machine. It also allows for micro-batching which tastes like streaming, but less filling.

Ira Michael Blonder
the architecture is claimed to lend itself to machine learning applications and "big data"
Ali Khanafer
An API that abstracts MapReduce and makes distributed computation easy. More like transitiong from C/C++ -> Java

John Furrier
. @IBMbigdata Spark is a very big part of extending the value of #hadoop we had a big discussion on this during @theCUBE preproduction meeting with @wikibon research team

Ira Michael Blonder
I put big data in quotes to establish orig association of this term with map reduce, etc

jameskobielus
An advanced analytics tool for in-memory distributed computation of machine learning, streaming, and graph analytics.

John Furrier
we posted a summary of the #hadoopsummit love fest and what it means to the market http://siliconangle....

Joel Horwitz
@mikethebbop id say its a lot more than that

jameskobielus
A hot new market in big data analytics that focuses on empowering data scientists with tools for a wide range of low-latency statistical analysis challenges.

John Furrier
. @IBMbigdata #hadoopsummit Spark is not yet ready for prime time,” said Wikibon’s Gilbert. Rather say: "Spark is still going through the process of being hardened that any large scale engine requires before mainstream adoption

John Furrier
. @IBMbigdata #hadoopsummit he Hadoop ecosystem needs to deliver real-time agile applications to support more interactivity and engagement data

jameskobielus
An Apache project that leverages and builds on much of the core of Hadoop, especially HDFS, while adding new high-performance runtime engines that are optimized for real-time interactive statistical analysis and distributed computation.

Ashok Nellikar
Apache Spark is a powerful open source processing engine built around sophisticated analytics, speed, & ease of use

Joel Horwitz
@furrier that to me, undersells its full potential.

Andrew C. Oliver
@furrier ..but Spark isn't and can't be the only tool in that toolbox.

Ira Michael Blonder
@JSHorwitz Thanks

jameskobielus
A community of startups, ISVs, and established solution providers (e.g., IBM) doing exciting new things to empower a new generation of data scientists.

Rahul Kumar
Spark is a in-memory distributed data processing engine, it is alternative to Map-Reduce framework. It have very rich data ingestion connectors and higher order functions for solve bigdata problems.

jameskobielus
Great to see that Joel Horwitz is on the chat...had me worried there for a sec, Joel....chat away!

John Furrier
. @IBMbigdata #hadoopsummit Spark is amazing for in-memory and more importantly iterative computing. The key benefit it offers is caching intermediate data in-memory for better access times

Andrew C. Oliver
@furrier The snark in me wants to say: This is the tech industry, nothing is production ready until the month it goes obsolete, then after that it is too poorly maintained for production.

Ira Michael Blonder
@furrier The "hardening" pt is very important. I like the Wikibon opinion

IBM Analytics
Let us move on to Q #2 on top of your screen please.

jameskobielus
Another summit, happening next week in San Francisco, at which I expect to see many of the Hadoop developers and users we're seeing this week at Hadoop Summit in San Jose. Actually (no surprise), there's plenty Spark content in sessions here

John Furrier
#hadoopsummit Some use cases where Shark outperforms Hadoop: 1) Real Time querying of data: in secs rather than minutes w Shark; 2) Stream processing: Fraud detection, log processing in live streams alerts, aggregates, analysis; 3) Sensor data processing