IBM Analytics43
Q3 - What role does Apache Spark and the open source community play in Deep Learning?
Alexander Lang
Deep learning isn’t for free: there’s a ton of parameters that have to be tuned: number of layers, number of neurons, optimization strategy,… . Using Spark, you can run several of these models in parallel, and pick the best one.
jameskobielus
Deep learning libraries are being open-sourced. Check out this recent article: http://www.technewsw...
Rania Khalaf
It is great that a lot of innovation in deep learning is happening in the open. Several key frameworks like Torch, Caffe and TensorFlow. Builds community and makes it accessible
jameskobielus
Deep learning has already been open sourced in the Spark and Java communities. DeepLearning4J: http://www.spark.tc/...
Alexander Lang
There's still work to do to run a single DL model in parallel...deeplearning4j does that somewhat by splitting up the data
Nick Pentreath
this is arguably an area where Apache Spark is relatively weak. However it still provides a great base for large-scale training, and other open-source libraries (such deeplearning4j) build on Spark to scale out learning.
Mike Tamir, PhD
#Spark excels in distributed computing DL training process itself is not naturally distributed, but designing ways for #DeepLearning to train and calculate in parallel is an active R&D area as DL permeates industry applications
Petr Bělohlávek
It is extremely hard to scale out the training of DL models, however, it is common to use the distributed computation for the trained model evaluation and real-world production usage.
mark simmonds
Apache Spark represents a fast start for people and organizations to embrace deep learning and machine learning. #SparkDeepLearning
Francois Garillot
The open-source community : plenty. All the major frameworks (Keras, Tensorflow, Torch, DeepLearning4J, mxnet) are open-source today. Spark has a lot of potential for helping with building deep learning models, but many things to improve still.
Rania Khalaf
however, the field is moving super fast so there is not a governance model yet
jameskobielus
Check out Francois Garrilot's talk on DeepLearning4J and Apache Spark: https://www.youtube....
Alexander Lang
Even though DL does a lot of feature creation and selection "under the covers", Spark has still a place for preprocessing the data
mark simmonds
Open Source embraces the collective intelligence and minds of thought leaders and experts on Deep Learning to the benefit of the data science community
jameskobielus
Check out IBM's distributions of leading Deep Learning software frameworks: https://download.bou...
Francois Garillot
@jameskobielus Thanks for the plug !
jameskobielus
Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments on distributed GPUs and CPUs.
Francois Garillot
There's a lot of existing work for parallelizing deep learning training, gathered under the moniker of 'parameter server'. There's some specific work to do to bring Spark towards cooperating with this model.
jameskobielus
Here's the link to DeepLearning4J.org: https://deeplearning...
Dez Blanchfield
- one of the common challenges is software development FrameWorks, Spark provides one we've fallen in love with, because we grew to loath MapReduce ;-)
Alexander Lang
@jameskobielus It's also used in "Deep Learning" by Josh Patterson and Adam Gibson, which I find a very good introduction from real practitioners
Dez Blanchfield
- opensource.. well there's a religious long running debate on Open vs Closed source software, but the single greatest value proposition with OpenSource is "you can get the code and modify it yourself for free"
Dez Blanchfield
- IBM has done an amazing job of re-inventing itself since the days of being a Hardware company, to being not just a Software company, but an OpenSource software company, and the Right sets of OpenSource, i.e. Apache Spark.
Benjamin Herta
Databricks recently added support for integrating TensorFlow via TensorFrames. That wouldn't have happened without both being open source
Dez Blanchfield
- if you take the recently announced and now generally available ( for less than lunch money ) Data Science Experience #DSX it's OpenSource like Jupiter Notebooks + Spark "done right" and made easy for everyone to gain access to..
Larry Lugo
but Google and Facebook too. Example: TensorFlow and Natural Language Analysis API
Dez Blanchfield
- I also love the idea that IBM's taken the right blend of OpenSource and built "solutions" which were beyond the reach of most #CitizenDataScientists and let folk do great things without having to "compile the platform" ;-)