
IBM Analytics45




















Q6: How does Spark complement Hadoop in the open data platform?

IBM Analytics
Please post your replies here

jameskobielus
In the open data platform, Hadoop is the data-engineering "lake" (data acquisition, preparation, storage, integration, and governance) behind front-end modeling/visualizn tools/languages--Spark, R, etc.--used by data scientists.

IBM Analytics
Please post your replies here

mark simmonds
Ease of use. Adhoc analysis, speed to market, Abstracts complexities,. It's the killer app.

jameskobielus
Spark is the development tool of choice for data scientists developing machine learning models with streaming analytics and graph analysis for in-memory execution.
Andrew Popp
new processing power brought to Hadoop: batch and now streaming and interactive

mark simmonds
HDFS is one file system. Spark can use many different types of file systems / data platforms

jameskobielus
Hadoop, especially HDFS, is the core distributed data storage, refinement, preparation, and governance layer behind Spark.

Mark van Rijmenam
As Spark is developed on top of HDFS, it works very well with Hadoop and it can be deployed on existing Hadoop clusters or work side-by-side. I wrote a white paper about this: http://floq.to/ZiBFq

mark simmonds
But not dependent on Hadoop.

Ira Michael Blonder
Spark includes the DAG engine which permits in-memory processing. The tool set also includes SparkSQL which may be more familiar to some Data Scientists & staff

Arnab Ganguly
Spark complements and overcomes the primary problem that Hadoop has always suffered from - Slow batch processing which impacted real time processing/streaming. These use cases are very well supported.

IBM Analytics
Last 4 minutes left, keep your replies coming

IBM Analytics
Last 2 minutes left, keep your replies coming

mark simmonds
Spark - in-memory, z13 perfect storm ?

Aniruddha Joshi
The data on Hadoop can be processed either with its Map reduce component or with Spark as in-memory.

Anil Saldanha
Spark provides replacement for mapreduce. Faster processing. Will still rely on HDFS.

IBM Analytics
Last minute, help us wrap up!

mark simmonds
Spark and ODP - Go to Strata conf and listen to the announcements

Craig Brown, Ph.D.
Spark is a excellent contributor when MapR can't do the job or it gets to complicated. Spark is more user friendly and seems to have a little more flexibility compared to MapR.

mark simmonds
ODP and Spark - a new beginning