Dave Vellante26
Q6. George...What does it mean for Spark and Hadoop to co-exist...sounds nice but what's the customer imperative there?
Rodrigo Gazzaneo
the Hadoop adoption curve has not peaked yet. Still room for growth. Use case will define the best tool.
Kirk Borne
Spark is about fast processing. Hadoop is about distributed data (files) access for processing. They co-exist.
George Gilbert
There is a school of thought that if you go all in on #ApacheSpark, you don't even need #Hadoop core: #HDFS and #YARN. but storage layer is key for hand-offs between jobs when Spark isn't the end-to-end processing engine
Jen(Cohen)Cheplick
You want to use the right "tool" for the job, depending on data type, batch vs. real-time, etc.
Jen(Cohen)Cheplick
This was an interesting article I read a few weeks back. High level, but makes the point of co-existence. http://www.forbes.co...
George Gilbert
@vGazza Hadoop is becoming a key part of enterprise infrastructure. Some part of it - at least #HDFS and #YARN should be common foundation - for a while. #ApacheSpark will have its own take on storage integration at some time
George Gilbert
@ggilbert41 all in on Spark to some means Databricks - which has its own stack all the way down to the metal
Kirk Borne
Spark can be MUCH faster than Hadoop processing, so Spark is red hot right now. That's fine, but Hadoop is not going away anytime soon.
Jen(Cohen)Cheplick
@ggilbert41 George & all - where are you seeing Spark take off? What use cases and/or industries
Kirk Borne
@ggilbert41 Great comment about the Databricks implementation, which you get on @MapR. I would buy that for just that reason.
Dave Vellante
@jscheplick in the press!
George Gilbert
@jscheplick there are a growing number of data tools ISVs who are using it as their data management back-end - #Zoomdata is a great example. also Internet services vendors who have the skills are working with ML and streaming
David Floyer
Start with a truck (Hadoop), move to a pick-up (Spark), end with an max plaid Tesla (streams)
David Wild
Spark's in-memory efficiency great for complex network analysis across heterogeneous data sources