SparkInsight

Spark : Building Smarter Apps
Join us to discuss about building smarter apps fueled by "Spark". Share your views and use cases.
   9 years ago
#SparkInsightMore on 'Spark'We are organizing a crowd chat to understand more about 'Spark' and how it can grow your business.
IBM Analytics
Q3 : What real-world challenges are driving development of smarter apps?
IBM Analytics
Please post your replies here
George Gilbert
for sure the need to build ever more sophisticated and lower latency analytic pipelines that inform operational apps at the transaction level
Kelly Capo
good old fashioned COMPETITION
George Gilbert
systems of record could only deliver historical performance reporting - like steering a ship by looking backwards at its wake
Andrew C. Oliver
There are a number of trends: a move to real-time analytics is starting, there are new devices generating new streams of voluminous data, there are computing devices in the hands of even relatively impoverished people, raw network speeds.
David Talby
Real-world challenges requiring smarter apps are simply the volume and velocity of data one should take into account, which is beyond what humans can correctly process
David Talby
Also, for many real-world problems, algorithms have improved to outperform human experts
Nanfang Hu
smarter apps should come with easy to use UI and flexibility to exercise different combinations of analysis
Kimberly Madia
Complexity, we are moving to a new level of maturity with analytics and the problems are tough
Yannik Zuehlke
there are challenges and room of improvements across all industries. Healthcare, Biotechnology.. now we got potential and capabilities of using huge amount of data to make our apps smarter.
George Gilbert
there was always the need to augment end-user decision making. now there is the ability to do it in near real-time and in many cases without a human in the loop - ad-tech, etc
Andrew C. Oliver
Another thing that is big is the ACA. Now medical records are electronic and human error can be caught, treatments can be judged statistically.
jameskobielus
The core real-world challenges are always speed, hence the need for real-time in-memory and stream computing; agility, hence the need for interactive statistical modeling; scalability, hence the need for MPP data analytic platforms.
David Talby
Healthcare, pharma and cyber are verticals that are now being transformed by big data and data science. Just a few years behind advertising & finance.
Andrew C. Oliver
We did a demo based on a real-world scenario using Spark and NLP see here http://sparknlp.mamm... source here soon https://github.com/m...
Mahanth CH
: Startups to giant corporations need data, design and speed to improve their business process through smarter apps . Probably design and speed are the main challenges.
Kimberly Madia
Heard a cool story from a data scientist at UC Berkeley, eliminate blood tests with wearables but culture shock in medical community resulted
David Talby
@acoliver, thanks for the link! Where did you get the anonymized EMR records for this demo?
jameskobielus
The convergence of mobile, social, IoT, streaming, and cloud analytics has driven the dev and adoption of Spark. The ubiquity of these converged apps in e-commerce and marketing has spurred need for smarter, game-changing Spark apps.
Andrew C. Oliver
@davidtalby The negex page has a zip you can download.
Ian Pointer
@davidtalby Anonymized EMR records came from the Chapman Negex corpus.
David Talby
@madiakc Culture shock is something we live every day... change is hard and this tech affects many people. Any tips on how to deal with it?
IBM Analytics
Time for Question #4, please look at the top of your screen for question #4
jameskobielus
The chief challenges in the Insight Economy are changing shapes of engagement, influence, marketing, transactions, and collaboration. The chief enabler is contextualized analytics drawing on machine learning and predictive modeling.
Andrew C. Oliver
@davidtalby Also I should note that Ian Pointer @carsondial did all the hard work. I just stood around saying "nope not there yet" :-)
Mahanth CH
: We have a webcast that throws light on how to develop smarter apps fueled by data. This might be a good place to start . http://ibm.co/1eTiuN...
Kelly Capo
How do I retweet a LinkedIn post??
IBM Analytics
Q4 : How can simple APIs improve Spark developer productivity?
IBM Analytics
Please post your replies to question #4 here
Kimberly Madia
Helps explore previously untapped data like the streaming API
Andrew C. Oliver
Oh good gosh can NVidia, AMD and Intel get together and commoditize APIs for dealing with GPUs and that all get tied up into Spark?
Ian Pointer
Lowering the barrier of entry and also helping to improve iteration velocity in Big Data development
jameskobielus
Standard APIs are the key to simplicity on the development side. With standards, Spark apps only need to be written to those APIs in order to execute on any platform that runs Spark engines and can, via Spark SQL, tap into HDFS data.
Andrew C. Oliver
I'd like to see a big focus on making ETL as relatively transparent as possible. Legacy shouldn't be an impairment to adoption.
Kimberly Madia
Helps foster community #IBMStreams integrates with Spark MLlib Toolkit
David Talby
Having simple, easy to learn & use API's to critical to develop productivity, and also to adoption because that's the first impression for developers trying out a platform.
David Talby
Spark API's started very simple, and now new layers of abstraction like DataFrame and KeyStoneML are being added as the core gains complexity over time
jameskobielus
Developers always more productive when can rely on a single IDE, single set of APIs, single storage/execution platform, and single pool of data and library of algoirthms. Can then focus on statistical/predictive explorations.
Kimberly Madia
Interesting article in HBR https://hbr.org/2015... APIs making business news
Andrew C. Oliver
@jameskobielus but it is important that this IDE not be slow, not be tied to a single vendors overall platform play and evolve with the times (i.e. git vs other tools...cough..clearcase...cough)
jameskobielus
Simplicity must, of necessity, gave way to greater complexity as the Spark open-source code evolves, a la "Tungsten" and adds APIs. When will Spark APIs become too complex to fathom? Hard to say at this juncture in the tech's evolution.
IBM Analytics
Thank you all! Take a look at question #5 on the evolution of Spark
IBM Analytics
Look at the top of your screen for question #5
David Talby
Spark API's are already becoming complex; this is why @Atigeo built and open sourced xFrame: https://github.com/A...
IBM Analytics
Look at the top of your screen for question #5
IBM Analytics
Q1 : What mix of skills and knowledge enable data scientists to build smarter Spark apps?
IBM Analytics
Please post your response here
IBM Analytics
This is the 1st question for our chat today and we request you to post your replies under each question.
Andrew C. Oliver
I don't fundamentally believe in data scientists as people imagine them. There are good mathematicians who write bad python. There are bad mathematicians with a distributed computing background and there are business SMEs.
jameskobielus
To build smarter Spark apps, data scientists need to have domain knowledge of what real-world scenario they're modeling: marketing, customer service, fiinance, HR, or whatever.
George Gilbert
is the emergence of notebooks democratizing access to working with spark?
Andrew C. Oliver
fundamentally I think these are teams of people and not this mythical mad data scientist who knows business, your business, math, AI, distributed computing, codes and walks on water..
jameskobielus
To build smarter Spark apps, data scientists need to do their work in collaborative, interactive environments that involve various specialists--domain specialists, statistical analytics, visualization experts, etc--pooling knowledge.
jameskobielus
To build smarter Spark apps, data scientists need to build their models from data in "lakes" that include all relevant sources, use appropriate algorithms from common libraries, and establish tight governance over data & models.
George Gilbert
in addition to the domain knowledge, spark developers or their colleagues maintaining the operational apps have to build the pipeline that informs the transaction in near real-time
Kimberly Madia
In addition to the hard skills like math and statistics softer skills like communication helps
Andrew C. Oliver
@jameskobielus To be honest, I actually think the #datalake is a stopgap. In the future we'll have better integration tools that distribute data automatically.
jameskobielus
To build smarter Spark apps, data scientists should engage with their peers throughout the open-source community to learn from best practices, identify best tools, and bring the best people to their projects.
Kelly Capo
@madiakc good observation Kimberly and I agree. Many projects fail due to mis-communication not lack of expertise.
IBM Analytics
Thank you all for you replies to question #1, now let us go to question #2 at the top of your screen.
Kimberly Madia
Data science consists of a few steps: understand the business goal (isn’t that always true, but often it is overlooked), profile and explore data and prepare data, consult with experts, build/deploy applications, valid & refresh often
Yannik Zuehlke
1. ask the right question 2. understand the data 3. understand the difference big data vs small data 4. evaluate if Spark is the right option for this specific use case 5. utilise parallelism and in-memory power of Spark 6. code
jameskobielus
The smartest Spark app of all is one that addresses a key business problem. Engage with business stakeholders in identifying what sort of analytic apps you wish to build out with Spark tools. What scratches the most urgent business itch?
IBM Analytics
Thank you all for you replies to question #1, now let us go to question #2 at the top of your screen.