sparkinsight

[+] Show Hidden Comments

Just like Hadoop platform for distributed computing.

Spark is a distributed in-memory analytics tool.

You can post your comments here - starting with the first question.

Essentially, Spark is a next generation cluster-computing solution, runtime processing environment, and development framework for in-memory advanced analytics.

What are advantages of spark?

Apache Spark's core design feature is the ability to support iterative, distributed, parallelized algorithmic program execution entirely in memory, without need to write out result sets after each pass through the data.

Spark to me is a development framework for creating Intelligent Applications.

This capability makes Apache Spark well-suited for the growing range of real-time applications—such as Internet of Things applications—where much or most of the data analysis will be performed on cached, live data, rather than stored, historica

5 Votes Vote

At its most primitive/basic you can think of it as a way to distribute #python or #scala functions across a cluster.

@andbflo_denny Spark is light weight and fast as heck.

Himanshu Mehra

Apache Spark is a, in-memory distributed computing engine specifically designed to perform machine learning

@andbflo_denny Advantages of Spark are speed, simplicity, versatilitiy, ability to work with your HDFS data, etc.

@JSHorwitz love fast!

Spark's performance advantages come from parallelizing models across distributed in-memory clusters.

An engine built for simplicity and speed with connections to any data source and in-memory processing. Enables collaboration and ability to work with all data abstracting technical challenges.

@jameskobielus is spark extension of mapreduce2 ?

@TValleru Not exactly like @apachehadoop @apachespark is in-memory. @apachehadoop is on disk.

@avi_patwardhan its similar, but not the same. Here's a good overview http://www.quora.com...

@JSHorwitz Even the architecture of HDFS, results are computed in memory and then store them on disk as I heard.

@avi_patwardhan Spark is not an extension to MapReduce; instead, it complements MapReduce through a separate SQL and runtime engine geared for distributed in-memory parallelized real-time computations across clusters.

any use case examples of who is using #spark today?

A light-weight compute engine for data science offering end to end support for data scientists, developers and data engineers.

Elesin Olalekan

Why #Spark over #Hadoop?

Spark is among the top growing open source projects, its exciting to see community innovation really taking off

@jameskobielus It comliments Mapreduce like in Hadoop but the difference will be in-memory computing

@TValleru yes, key here is the processing engine; Spark is so much more though. Its an integrated platform with #ML #Graph #Streaming #R #Scala and so much more.

@andbflo_denny Spark is used across all industries to move from dashboards and alerts to meaningful and timely action. Use cases include machine learning, iterative analytics and Internet of Things applications.

@elesinOlalekan Not a matter of #Spark over #Hadoop. It's more a matter of Spark leveraging and extending Hadoop to address broader range of use cases: in-memory, streaming, graph analytics. etc.

@madiakc I think the best option for the Enterprise is to run both Hadoop and Spark..

Monica Fox

@elesinOlalekan its not exactly #Spark over #Hadoop -- it's #Spark with #Hadoop

@jameskobielus Spark is a DAG and you can consider MapReduce to be a special case / subset of a DAG. The problem with map-reduce is each step is linear, Spark hits non-dependant operations in parallel.

Forward-looking organizations see Spark as a platform to complement their investments in advanced analytics, machine learning platforms, and big-data platforms such as Hadoop.

Currently in version 1.3.1, Spark is a layered distributed-computing framework that can leverage much of the Hadoop storage environment, including HDFS.

How Hadoop + Spark drastically improve Analytics

@elesinOlalekan It isn't either/or. You will in all probability use Spark with HDFS and a lot of Hadoop-related tooling, just instead of Map-Reduce. So it replaces pieces of Hadoop but not the whole thing.

Spark compliments data management and data discovery solutions with agile data science and application development. A key enabler of collaboration and innovation.

Time to move to Question #2 - look out!

@madiakc Super! Can you give an example if an asset reaches a meter reading alert? How would I be notified of this?

Himanshu Mehra

how does IBM sees Spark's future ?

We would recommend looking at the latest post from @ibmbigdata

@andbflo_denny yes! centerpoint energy is doing this today https://ibm.biz/BdXq... they are able to resolve issues electronically no need to deploy truck and crew

@madiakc EXCELLENT! Thank U!

@jameskobielus what are the limitations of spark @madiakc @JSHorwitz

@andbflo_denny excellent question, lots of possibilities .. car makers talk about auto playing music to suit your driving - fast, stops and starts, driver needs classical music to calm down

ideas for apps for Spark, automotive - More profitable aftermarket products based on driving preferences. More interactive and safer driving experiences, respond approaching dangers.

@madiakc LOVE IT! Living here in Boston - we could use the classical music!

Monica Fox

@andbflo_denny ^amen - from a fellow bostonian :)

@andbflo_denny :) i don't have the nerve to drive in Boston! I read a few other cool ideas from the consumer electronics show like automatically changing your alarm clock depending on traffic and adjusting airbag deployment based on weight of driver

IBM Analytics25

Q3: What are the primary use cases for #Spark, both in cross-industry and vertical-industry applications?

[+] Show Hidden Comments

@netflix yesterday they are using Spark to do real-time recommendations for their millions of clients.

Please post your responses here!

speed is amazing

use cases would be numerous from healthcare to real time transaction analytics.... web based social networking analytics

Spark includes runtime engines that are optimized for in-memory processing, streaming analytics, graph analysis, and machine learning.

@goldmansachs Intuitive language bindings to Scala, Java, Python, R. Combining relational, functional, iterative APIs all into lazy- evaluation data pipelines. Storage agnostic. Lambda closures. Similar abstraction to GS internal platfo

https://spark-summit...

Spark is well-suited for exploratory analytics by teams of data scientists using Hadoop and other big-data clusters as “data lakes” or “data reservoirs” for statistical modeling.

Finance - Faster trades. Continuously monitor risk in real time.

faster ETL than any other tool

Data scientists would use #Spark to rapidly model and simulate alternate scenarios, engage in free-form what-if analysis, and forecast alternative future states.

Everyone is building Data Lakes. Universal data acquisition makes all big data analytics and reporting easier. Hadoop provides a scalable storage with HDFS. How will we scale consumption and curation of all this data?

the answer is Spark

@avi_patwardhan actually, Hadoop works just fine for that.

pagal guy

product based companies can use it to do trend analytics using twitter based or fb or other social media channels

healthcare applications can be improved with more intelligent Spark apps, Identification of life-threaten conditions faster to dynamically adjust care and personalize treatment. Automated or clinician-driven knowledge discovery

Data scientists would use #Spark for “schema on read" development, freeing them from needing to define data models up front prior to statistical modeling and exploration.

If you're a data scientist building models in Spark, you may build a starter in-memory analytics platform with 10-25TB of priority core data, but want the ability to scale it out over time to as your investigations call for exploration of more

We're seeing it in finance in particular, but in other industries as well. Anywhere that a single computer would take too long to execute a piece of functional code is a good candidate.

pagal guy

hadoop dont give direct ways for iterative analytics but spark does that #awesome_spark

You may want to include Spark’s streaming analytics capabilities if you’re doing low-latency, event-processing, mobility-enabling, Internet of Things, and other applications that operate on live, in-motion data.

Spark’s graph analytics are well-suited for anti-fraud, influence analysis, sentiment monitoring, market segmentation, engagement optimization, and other applications where complex patterns must be rapidly identified.

Spark’s machine-learning tools are fundamental for boosting data-scientist productivity by helping them uncover hidden patterns they may otherwise have overlooked.

In addition to the Monte Carlo demo (https://github.com/m...) we'll be doing a healthcare analytics one next.

Spark-suited scenarios describe many projects in social analytics, mobile analytics, Internet of Things (IoT) analytics, and other new leading-edge frontiers fueled by big data.

Ali Khanafer

Spark SQL is great for reading data from legacy databases and doing fast computation

@jameskobielus While Spark does have streaming capabilities, it is actually still batching. Storm is better for single event analytics. Flink also looks promising http://www.infoworld...

Great conversation, now let us now go to question #4.. coming up...now!

hard to leave this question!

Jason Schroedl

We've seen Spark adoption in financial services, healthcare, retail, energy, and other industries. We did a webinar yesterday w/ some examples and use cases like machine learning: https://www.brightta...