sparkinsight

What is 'Spark'?
We are organizing a crowd chat to understand more about 'Spark' and how it can grow your business.
   9 years ago
#SparkInsightMore on 'Spark'We are organizing a crowd chat to understand more about 'Spark' and how it can grow your business.
IBM Analytics
Q1: What is #Spark?
Thulasiram Valleru
Just like Hadoop platform for distributed computing.
jameskobielus
Spark is a distributed in-memory analytics tool.
IBM Analytics
You can post your comments here - starting with the first question.
jameskobielus
Essentially, Spark is a next generation cluster-computing solution, runtime processing environment, and development framework for in-memory advanced analytics.
Pam Denny
What are advantages of spark?
jameskobielus
Apache Spark's core design feature is the ability to support iterative, distributed, parallelized algorithmic program execution entirely in memory, without need to write out result sets after each pass through the data.
Joel Horwitz
Spark to me is a development framework for creating Intelligent Applications.
jameskobielus
This capability makes Apache Spark well-suited for the growing range of real-time applications—such as Internet of Things applications—where much or most of the data analysis will be performed on cached, live data, rather than stored, historica
Andrew C. Oliver
At its most primitive/basic you can think of it as a way to distribute #python or #scala functions across a cluster.
Joel Horwitz
@andbflo_denny Spark is light weight and fast as heck.
Himanshu Mehra
Apache Spark is a, in-memory distributed computing engine specifically designed to perform machine learning
jameskobielus
@andbflo_denny Advantages of Spark are speed, simplicity, versatilitiy, ability to work with your HDFS data, etc.
jameskobielus
Spark's performance advantages come from parallelizing models across distributed in-memory clusters.
Kimberly Madia
An engine built for simplicity and speed with connections to any data source and in-memory processing. Enables collaboration and ability to work with all data abstracting technical challenges.
Avadhoot
@jameskobielus is spark extension of mapreduce2 ?
Joel Horwitz
@TValleru Not exactly like @apachehadoop @apachespark is in-memory. @apachehadoop is on disk.
Joel Horwitz
@avi_patwardhan its similar, but not the same. Here's a good overview http://www.quora.com...
Thulasiram Valleru
@JSHorwitz Even the architecture of HDFS, results are computed in memory and then store them on disk as I heard.
jameskobielus
@avi_patwardhan Spark is not an extension to MapReduce; instead, it complements MapReduce through a separate SQL and runtime engine geared for distributed in-memory parallelized real-time computations across clusters.
Pam Denny
any use case examples of who is using #spark today?
IBM Analytics
A light-weight compute engine for data science offering end to end support for data scientists, developers and data engineers.
Kimberly Madia
Spark is among the top growing open source projects, its exciting to see community innovation really taking off
Thulasiram Valleru
@jameskobielus It comliments Mapreduce like in Hadoop but the difference will be in-memory computing
Joel Horwitz
@TValleru yes, key here is the processing engine; Spark is so much more though. Its an integrated platform with #ML #Graph #Streaming #R #Scala and so much more.
Kimberly Madia
@andbflo_denny Spark is used across all industries to move from dashboards and alerts to meaningful and timely action. Use cases include machine learning, iterative analytics and Internet of Things applications.
jameskobielus
@elesinOlalekan Not a matter of #Spark over #Hadoop. It's more a matter of Spark leveraging and extending Hadoop to address broader range of use cases: in-memory, streaming, graph analytics. etc.
Avadhoot
@madiakc I think the best option for the Enterprise is to run both Hadoop and Spark..
Andrew C. Oliver
@jameskobielus Spark is a DAG and you can consider MapReduce to be a special case / subset of a DAG. The problem with map-reduce is each step is linear, Spark hits non-dependant operations in parallel.
jameskobielus
Forward-looking organizations see Spark as a platform to complement their investments in advanced analytics, machine learning platforms, and big-data platforms such as Hadoop.
jameskobielus
Currently in version 1.3.1, Spark is a layered distributed-computing framework that can leverage much of the Hadoop storage environment, including HDFS.
Thulasiram Valleru
How Hadoop + Spark drastically improve Analytics
Andrew C. Oliver
@elesinOlalekan It isn't either/or. You will in all probability use Spark with HDFS and a lot of Hadoop-related tooling, just instead of Map-Reduce. So it replaces pieces of Hadoop but not the whole thing.
Kimberly Madia
Spark compliments data management and data discovery solutions with agile data science and application development. A key enabler of collaboration and innovation.
IBM Analytics
Time to move to Question #2 - look out!
Pam Denny
@madiakc Super! Can you give an example if an asset reaches a meter reading alert? How would I be notified of this?
Himanshu Mehra
how does IBM sees Spark's future ?
IBM Analytics
We would recommend looking at the latest post from @ibmbigdata
Kimberly Madia
@andbflo_denny yes! centerpoint energy is doing this today https://ibm.biz/BdXq... they are able to resolve issues electronically no need to deploy truck and crew
Pam Denny
@madiakc EXCELLENT! Thank U!
Avadhoot
@jameskobielus what are the limitations of spark @madiakc @JSHorwitz
Kimberly Madia
@andbflo_denny excellent question, lots of possibilities .. car makers talk about auto playing music to suit your driving - fast, stops and starts, driver needs classical music to calm down
Kimberly Madia
ideas for apps for Spark, automotive - More profitable aftermarket products based on driving preferences. More interactive and safer driving experiences, respond approaching dangers.
Pam Denny
@madiakc LOVE IT! Living here in Boston - we could use the classical music!
Monica Fox
@andbflo_denny ^amen - from a fellow bostonian :)
Kimberly Madia
@andbflo_denny :) i don't have the nerve to drive in Boston! I read a few other cool ideas from the consumer electronics show like automatically changing your alarm clock depending on traffic and adjusting airbag deployment based on weight of driver
IBM Analytics
Q3: What are the primary use cases for #Spark, both in cross-industry and vertical-industry applications?
Joel Horwitz
@netflix yesterday they are using Spark to do real-time recommendations for their millions of clients.
IBM Analytics
Please post your responses here!
John Furrier
speed is amazing
pagal guy
use cases would be numerous from healthcare to real time transaction analytics.... web based social networking analytics
jameskobielus
Spark includes runtime engines that are optimized for in-memory processing, streaming analytics, graph analysis, and machine learning.
Joel Horwitz
@goldmansachs Intuitive language bindings to Scala, Java, Python, R. Combining relational, functional, iterative APIs all into lazy- evaluation data pipelines. Storage agnostic. Lambda closures. Similar abstraction to GS internal platfo
jameskobielus
Spark is well-suited for exploratory analytics by teams of data scientists using Hadoop and other big-data clusters as “data lakes” or “data reservoirs” for statistical modeling.
IBM Analytics
Finance - Faster trades. Continuously monitor risk in real time.
Avadhoot
faster ETL than any other tool
jameskobielus
Data scientists would use #Spark to rapidly model and simulate alternate scenarios, engage in free-form what-if analysis, and forecast alternative future states.
Joel Horwitz
Everyone is building Data Lakes. Universal data acquisition makes all big data analytics and reporting easier. Hadoop provides a scalable storage with HDFS. How will we scale consumption and curation of all this data?
Joel Horwitz
the answer is Spark
Joel Horwitz
@avi_patwardhan actually, Hadoop works just fine for that.
pagal guy
product based companies can use it to do trend analytics using twitter based or fb or other social media channels
Kimberly Madia
healthcare applications can be improved with more intelligent Spark apps, Identification of life-threaten conditions faster to dynamically adjust care and personalize treatment. Automated or clinician-driven knowledge discovery
jameskobielus
Data scientists would use #Spark for “schema on read" development, freeing them from needing to define data models up front prior to statistical modeling and exploration.
jameskobielus
If you're a data scientist building models in Spark, you may build a starter in-memory analytics platform with 10-25TB of priority core data, but want the ability to scale it out over time to as your investigations call for exploration of more
Andrew C. Oliver
We're seeing it in finance in particular, but in other industries as well. Anywhere that a single computer would take too long to execute a piece of functional code is a good candidate.
pagal guy
hadoop dont give direct ways for iterative analytics but spark does that #awesome_spark
jameskobielus
You may want to include Spark’s streaming analytics capabilities if you’re doing low-latency, event-processing, mobility-enabling, Internet of Things, and other applications that operate on live, in-motion data.
jameskobielus
Spark’s graph analytics are well-suited for anti-fraud, influence analysis, sentiment monitoring, market segmentation, engagement optimization, and other applications where complex patterns must be rapidly identified.
jameskobielus
Spark’s machine-learning tools are fundamental for boosting data-scientist productivity by helping them uncover hidden patterns they may otherwise have overlooked.
Andrew C. Oliver
In addition to the Monte Carlo demo (https://github.com/m...) we'll be doing a healthcare analytics one next.
jameskobielus
Spark-suited scenarios describe many projects in social analytics, mobile analytics, Internet of Things (IoT) analytics, and other new leading-edge frontiers fueled by big data.
Ali Khanafer
Spark SQL is great for reading data from legacy databases and doing fast computation
Andrew C. Oliver
@jameskobielus While Spark does have streaming capabilities, it is actually still batching. Storm is better for single event analytics. Flink also looks promising http://www.infoworld...
IBM Analytics
Great conversation, now let us now go to question #4.. coming up...now!
Pam Denny
hard to leave this question!
Jason Schroedl
We've seen Spark adoption in financial services, healthcare, retail, energy, and other industries. We did a webinar yesterday w/ some examples and use cases like machine learning: https://www.brightta...