SparkDeepLearning

Deep Learning & Apache Spark™
Join our crowd chat about deep learning and Apache Spark™ — what’s new and what’s to come.
IBM Analytics
Q3 - What role does Apache Spark and the open source community play in Deep Learning?
Alexander Lang
Deep learning isn’t for free: there’s a ton of parameters that have to be tuned: number of layers, number of neurons, optimization strategy,… . Using Spark, you can run several of these models in parallel, and pick the best one.
jameskobielus
Deep learning libraries are being open-sourced. Check out this recent article: http://www.technewsw...
Rania Khalaf
It is great that a lot of innovation in deep learning is happening in the open. Several key frameworks like Torch, Caffe and TensorFlow. Builds community and makes it accessible
jameskobielus
Deep learning has already been open sourced in the Spark and Java communities. DeepLearning4J: http://www.spark.tc/...
Alexander Lang
There's still work to do to run a single DL model in parallel...deeplearning4j does that somewhat by splitting up the data
Nick Pentreath
this is arguably an area where Apache Spark is relatively weak. However it still provides a great base for large-scale training, and other open-source libraries (such deeplearning4j) build on Spark to scale out learning.
Mike Tamir, PhD
#Spark excels in distributed computing DL training process itself is not naturally distributed, but designing ways for #DeepLearning to train and calculate in parallel is an active R&D area as DL permeates industry applications
Petr Bělohlávek
It is extremely hard to scale out the training of DL models, however, it is common to use the distributed computation for the trained model evaluation and real-world production usage.
mark simmonds
Apache Spark represents a fast start for people and organizations to embrace deep learning and machine learning. #SparkDeepLearning
Francois Garillot
The open-source community : plenty. All the major frameworks (Keras, Tensorflow, Torch, DeepLearning4J, mxnet) are open-source today. Spark has a lot of potential for helping with building deep learning models, but many things to improve still.
Rania Khalaf
however, the field is moving super fast so there is not a governance model yet
jameskobielus
Check out Francois Garrilot's talk on DeepLearning4J and Apache Spark: https://www.youtube....
Alexander Lang
Even though DL does a lot of feature creation and selection "under the covers", Spark has still a place for preprocessing the data
mark simmonds
Open Source embraces the collective intelligence and minds of thought leaders and experts on Deep Learning to the benefit of the data science community
jameskobielus
Check out IBM's distributions of leading Deep Learning software frameworks: https://download.bou...
jameskobielus
Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments on distributed GPUs and CPUs.
Francois Garillot
There's a lot of existing work for parallelizing deep learning training, gathered under the moniker of 'parameter server'. There's some specific work to do to bring Spark towards cooperating with this model.
Dez Blanchfield
- one of the common challenges is software development FrameWorks, Spark provides one we've fallen in love with, because we grew to loath MapReduce ;-)
Alexander Lang
@jameskobielus It's also used in "Deep Learning" by Josh Patterson and Adam Gibson, which I find a very good introduction from real practitioners
Dez Blanchfield
- opensource.. well there's a religious long running debate on Open vs Closed source software, but the single greatest value proposition with OpenSource is "you can get the code and modify it yourself for free"
Dez Blanchfield
- IBM has done an amazing job of re-inventing itself since the days of being a Hardware company, to being not just a Software company, but an OpenSource software company, and the Right sets of OpenSource, i.e. Apache Spark.
Benjamin Herta
Databricks recently added support for integrating TensorFlow via TensorFrames. That wouldn't have happened without both being open source
Dez Blanchfield
- if you take the recently announced and now generally available ( for less than lunch money ) Data Science Experience #DSX it's OpenSource like Jupiter Notebooks + Spark "done right" and made easy for everyone to gain access to..
Larry Lugo
but Google and Facebook too. Example: TensorFlow and Natural Language Analysis API
Dez Blanchfield
- I also love the idea that IBM's taken the right blend of OpenSource and built "solutions" which were beyond the reach of most #CitizenDataScientists and let folk do great things without having to "compile the platform" ;-)
IBM Analytics
Q7 - What recent trend in Deep Learning do you consider most noteworthy?
mark simmonds
Seeing much great adoption - which implies acceptance.
Mike Tamir, PhD
#DeepLearning applications in Natural Language Understanding techniques as well as Q-learning in robotics are changing the game. Coupled with advances in image recognition with ConvoNets, it is no wonder there is so much interest in AI now.
mark simmonds
It's being embedded in so many applications. Consumers are not even aware they are using it in their every day interactions #SparkDeepLearning
jameskobielus
The rapid improvement and inexorable commoditization of low-cost hardware platforms--especiallly GPU and neurosynaptic chips--to execute compressed deep learning artificial neural nets inside everything.
mark simmonds
Customer sentiment, NL Processing, Visual Recognition
Nick Pentreath
it's tough to pick one as the pace of advancement is so fast. But for me the success of DL in reinforcement learning (e.g. AlphaGo, game-playing bots etc) is one of the most promising for moving closer to the "AI" that everyone talks about.
Francois Garillot
Transfer learning, the ability to re-tool a deep neural network trained to learn a task as part of a network working on another question. http://www.theregist...
mark simmonds
@jameskobielus Embedded on chips - yes.
jameskobielus
Deep learning's embedding into the fabric of every human artifact will proceed apace as new frontiers in nanotechnology, perhaps quantum computing, come to market.
mark simmonds
Are we moving toward an IOT of collective conciousness #SparkDeepLearning
Francois Garillot
Reinforcement learning, made fast for deep learning through e.g. Actor-critic methods, lets you teach a model how to act in a "game" of your invention, just by telling it its score on its plays.
mark simmonds
Also seeing in massive online gaming
jameskobielus
GPU acceleration for Apache Spark is essential for deep learning's broad adoption. See here: http://www.spark.tc/...
Larry Lugo
OpenSource platforms (software and hardware) and APIs for AI development
jameskobielus
IBM's deep learning GPU partnership with NVIDIA for Power System: https://www-03.ibm.c...
Francois Garillot
@huitseeker In the DeepLearning4J world, that's called RL4J https://github.com/d... (using A3C)
jameskobielus
IBM's work with Lawrence Livermore Labs on the TrueNorth synaptic chip tech: http://www-03.ibm.co...
Alexander Lang
That there's so much interest in using it for language understanding, moving off the "classic" image / video analysis tasks
Rania Khalaf
a more subtle transformational point is the potential for fast, *open* experimentation, collective learning, the ability to repeat and build on each other's experiments and results. DL+Cloud+Open
Alexander Lang
That reminds me of Airbnb's recent move to put more of their data, as well as analysis notebooks, on github. Not DL, but certainly open beyond "open data"
Dez Blanchfield
- for me, the greatest "trend" has been toward Human focused value adds, ways to make life better for humans, and not just making more "things" but solving real world "problems", Health, Life, Big Challenges to help Humanities as a whole
IBM Analytics
Q4 - Besides Apache Spark, what tools, platforms, and libraries are best suited for developing deep learning applications?
Mike Tamir, PhD
#DeepLearning4J (which is getting some nice plugs) is a great option especially for Java developers. #TensorFlow has of course become very popular, #Keras is nice for data scientists getting started.
mark simmonds
I'm biased - but Watson of course https://www.ibm.com/...
jameskobielus
Not to put too fine a point on it, but I recommend PowerAI for OpenPower: http://www-03.ibm.co...
Francois Garillot
Deeplearning4J is on the JVM, and it's young but it already has a small open-source community around it and an upcoming book to help onboard new users http://shop.oreilly....
Nick Pentreath
Currently there are many competing open-source frameworks - TensorFlow, Keras, mxnet, Caffee, Torch, DL4j and so on. Most use one or two of these for their work. GPUs are more important for scaling than clusters (up to a certain problem size)
Alexander Lang
@huitseeker I like that book a lot, currently reading it on Safari...
Rania Khalaf
There are lots as @MLnick notes. Different ones now are better at different domains or appear to different programming model preferences
Francois Garillot
Another cool lib is Keras, a simple front-end to other frameworks. You can reimplement a fresh paper in a few dozen lines https://github.com/u... ... did you know that in the DL4J ecosystem, Keras is known as ScalNet https://github.com/d... ?
Alexander Lang
Agree with @NxtGenAnalytics - if you want to analyze images, you can use the Watson API right away
Dez Blanchfield
@NxtGenAnalytics - ha.. yea ok, we'll let you have that one ;-)
Rania Khalaf
Deep Learning is super heavy in amount of data and power of compute. So you need accelerators. The cloud also brings exciting scale-out options and ease of use and experimentation
Nick Pentreath
Those with truly large problems tend to have their own in-house frameworks custom built. For the rest, TF, mxnet & DL4J are ones that offer parallel training using clusters of CPU/GPU
Dez Blanchfield
@MLnick - that's the magic of OpenSource really isn't it, lots of us get ideas, we test those ideas in code, regardless of if it works for us or not we can #share that code and make it #opensource to let others build on and move faster ;-)
Francois Garillot
Also if you plan to train on a lot of data, note that I'd pay special focus on libraries that have a built-in solution for distributed training. So far, I'm aware of CNTK, mxnet, DeepLearning4J, and Tensorflow.
mark simmonds
Watson Personality Analyzer is interesting - just don't show your spouse the analysis!
jameskobielus
For good measure, though it's not a library/tool for development, here's OpenAI: https://openai.com/b...
Dez Blanchfield
- a great thing about the OpenSource work is that so many folk come from different coding backgrounds and build in their favourite or native language, i.e. Python, Java, C, C++ / C#, Scala, and you end up with lots of tools & lots of lanuages
Dez Blanchfield
@NxtGenAnalytics - ha.. laugh out loud, that's funny..
Dez Blanchfield
- i.e. Caffe's a deep learning library with Python & MATLAB bindings, it's quickly becoming a popular "go to" for a rapidly growing community..
Benjamin Herta
I don't think that most people would consider spark for deep learning. Machine learning, yes, but deep learning support is barely there.
mark simmonds
I worry that unless it is consumable deep learning will remain out of reach for the majority. Need to bring it to the masses with consumability, convergence, collaboration in mind
Rania Khalaf
@huitseeker it is possible to distribute others with a parameter server based approach. that's what mxnet uses within (CMU ps-lite). for caffe, it has been done by petuum.
Dez Blanchfield
- lots of people new to the game forget that Wolfram Mathematica has been in this space for a lifetime ;-)
Dez Blanchfield
- anyone here using Torch ( written in Lua )
Dez Blanchfield
- Python nuts like me love Theano as well ;-)
Dez Blanchfield
- I'm also a fan of Keras ( yea yea Python nut again ).. Keras uses both TensorFlow and Theano
Dez Blanchfield
- one I've started playing with recently is a JavaScript implementation of neural networks called ConvNetJS
Dez Blanchfield
- TensorFlow is another certain Python nuts are crazy for because of it's BIG brand origins and proven successes thus far..
Larry Lugo
And what about FBLearner Flow (Facebook AI Backbone)?
Francois Garillot
@huitseeker I also have to mention that there are some completely different choices depending on the situation. Keras is awesome for experimenting on a rainy weekend. DeepLearning4J aims at production models, and working well with your Hadoop / Spark.