BigDataWeek

#BigDataWeek @ #BigDataSV
Discussing all the happenings at #BigDataWeek including #BigDataSV. Join the conversation!
vaughn stewart
Looking ahead 3 years I think large scale Hadoop data repositories will be better on shared storage. The model allows for more applications to access than multi-replica DAS nodes. Thoughts?
Peter Herdman-Grant
you don't need to look ahead 3 years for that EMC's Isilon is doing it today!
John Furrier
I agree storage should be "smart" and take data from the apps and be more "horizontally" integrated
Rodrigo Gazzaneo
@DBAStorage maybe send a calendar invitation for 2/19 announcement?
Zeydy Ortiz, PhD
.@vStewed Shared storage takes away adv of Hadoop. But a hybrid model may be needed.
Rodrigo Gazzaneo
@DrZeydy true scale-out storage platforms can minimize the architecture impact
vaughn stewart
Dynamically deploying nodes to jobs is easy - managing (storing, granting access, protecting, etc) TBs or PBs of raw data is hard (aka #DataGravity) thus my perspective
Rodrigo Gazzaneo
compute is the easy part. Persistence still remains challenging the Data Center.
Dave Vellante
I think it depends on how real ServerSAN is - which I realize you don't subscribe to
Peter Herdman-Grant
EMC will make a public announcement on February 19th, 2015 highlighting our ongoing commitment to keep Big Data secure and accessible for archiving and analytic needs. https://community.em...
vaughn stewart
I agree value in ServerSAN but disagree with inclusion of hyper-converged in ServerSAN category
Peter Herdman-Grant
certainly agree that securing the hadoop ecosystem is a key requirement for its enterprise success
Rodrigo Gazzaneo
you lose flexibility with HCIA as it is today. Maybe a new approach will be needed.
Rodrigo Gazzaneo
as a new platform emerges, a more flexible approach to HCI will be required for #BigData
Rishi Yadav
I don't get it. The biggest disruptive factor for emergence of Hadoop is cost of storage. How can you get cost advantage with shared storage.
Rodrigo Gazzaneo
with the proper storage architecture. Not your Enterprise High Platform.
Rodrigo Gazzaneo
HDFS on DAS was a great way to break barriers and start Hadoop pilots, though
Dave Vellante
Much of the ROI on Hadoop has been "Reduction on Investment" - quote by my good friend @ab_hi_
vaughn stewart
data reducing shared storage can require a fraction of the raw capacity over multi-replica DAS while providing native access to other platforms
Rodrigo Gazzaneo
also on protection efficiency. 3 copies are expensive at scale.
Nick Howell
How does #AppDev have to change in order to truly take advantage of these modern data warehouses?
Jerry Overton
Apps have to be born adaptive -- adjust functionality to a given data context.
vaughn stewart
seems like apps will be developed for Hadoop or alternative access Hadoop data from a shared storage backend. The latter seems to have less lock-in.
Dave Vellante
Interesting stat - the 2 most critical tools in #BigData projects: 1/ the existing EDW and 2/ Data integration tools
Nick Howell
Collecting a bunch of unstructured data in one place is borderline useless unless you have applications that can take advantage of the information that data warehouse inherently provides.
vaughn stewart
never confuse data with information
Nick Howell
Exactly what I was calling attention to. :)
Rodrigo Gazzaneo
The #DataLake works for those who can leverage it.
vaughn stewart
@vGazza no it doesn't #DataLake is an EMC marketing term
Rodrigo Gazzaneo
@vStewed it is not. I just read Price Waterhousecooper leveraging the term.
vaughn stewart
@vGazza Thanks, I understand how marketing works
Nancy Hensley
- appdev shouldn't care, the ultimate goal is to keep the infrastructure out of the way of progress.
John Furrier
Congrats to Cloudera for two things: sharing numbers (they don't have to do that) and 2) business performance. Can't wait to ask Ping Li and Frank Artale on my panel wed night on this
Rishi Yadav
I look at #cloudera what #netscape was to dot com boom
John Furrier
what do you mean - do you see them crashing or being the enabler to a bubble or boom?
Rodrigo Gazzaneo
interesting point of view and great question!
Peter Herdman-Grant
if Oracle and Cloudera don't join the Open Data Platform, they will become one! :)
Rishi Yadav
enabled. Netscape crashed due to other factors but the change they brought to market place was permanent. In the same way the big data movement which Cloudera started is going to be permanent. I hope they reap most rewards from it.
vaughn stewart
Kudos to @Cloudera for disclosing - @Veeam did the same a few weeks back. A new trend?
John Furrier
controlling the narrative and counterstrike to Pivotal Hortonworks news- smart strategy
scott herson
thoughts on Cloudera Revenue announcement - what % might be Software vs Services/Support?
Dave Vellante
Cloudera blew thru $100M in revenue - not a huge surprise but not much more in that announcement - Like Dell - cloudera can cherry pick financial data (for now)
Dave Vellante
@vStewed u guys next? What data did they share though really - very common for pre-ipo no?
Dave Vellante
Netscape (Jim Clark) totally underestimated the impact of MSFT bundling the browser into its OS
Mark 'Rizzn' Hopkins
@dvellante Controlling the narrative is important - but what's the counter-narrative being pre-empted here?
Rishi Yadav
Any take on #apacespark vs other compute technologies in big data. Isn't debate already over?
Rishi Yadav
to be honest the only other compute technology we see being adopted by clients is #clouderaimpala
John Furrier
Spark is very relevant and great for real time analytics; the innovation beyond in memory - is in-processor that's the real deal #silicon
Rodrigo Gazzaneo
when memory becomes non volatile at scale, I/O will not be limiting anymore
vaughn stewart
There's so much runway here it's silly. Let's get the world from disk to flash before we talk "in-processor"
Dave Vellante
Ever since the dawn of memory there's been talk of "in-memory" - it's more real now than every because of costs, etc - still have to protect the data
Rodrigo Gazzaneo
@dvellante the greatest potential for innovation is in the frontier of memory and storage.
Rishi Yadav
compute will keep getting closer to the chip but at present when using memory is most practical #apachespark has taken persistent storage to memory which itself has brought tremendous information gains
John Furrier
@vStewed the joke the other day was spinning disk will die b4 tape
Rishi Yadav
I agree. I think spinning disk will be dead for enterprise storage. It will stay alive for consumer desktops though
vaughn stewart
@furrier I agree the disk replaces tape chatter was illogical due to power requirements... tape turns off!
Rodrigo Gazzaneo
not even desktop storage ... flash wins
Rodrigo Gazzaneo
@vStewed tapes travel faster than disk too. On cargo.
Tyler Britten
Not anytime soon. $/GB + online access will keep spinning disk around for a while
Rodrigo Gazzaneo
@vmtyler for a while. But growing flash density will make a big influence here.
Tyler Britten
and spinning disk density is growing too
Rodrigo Gazzaneo
@vmtyler agree. Flash density is behind but growing faster. It will be fun to watch. #MediaWars