EMCDataLake

AMA: Data Lake Breakthrough
Spanning the Edge, Core and Cloud—EMC Data Lake experts take your questions!
   5 years ago
#EMCDataLakeExploring Data FrontiersDiscuss Data Lake & Big Data solutions to support digital transformation with our panel of experts.
John Furrier
Q2: Is unstructured data really that much of a problem for the enterprise?
Jeff Frick
> Problem or Opportunity or Both?
Ashish Palekar
the numbers are staggering - 80% of data is unstructured and doubling every two years
Ashish Palekar
Data from IDC - the question for enterprises becomes how do you manage all this growth
George Gilbert
unstructured data volume #bigdata and increasingly speed / #fastdata requires new tools to process
Brian Gracely
@logicalblock can you share some customer examples of companies getting new value or insight out of that data? is it typically machine-generated or human-generated?
Jeff Frick
@logicalblock > What public unstructured data set are enterprises integrating the most?
Jeff Frick
@logicalblock > From exhaust that pollutes to exhaust with potential insight.
George Gilbert
batch processing log data is what first drove #bigdata and #datalake requirements - this was refined via ELT process for #datawarehouse usecases
Ashish Palekar
@bgracely Mix of both - machine generated data is what is causing the volume. Examples abound - analytics to predict how to stock warehouses, genome sequencing, etc
Dave Vellante
@logicalblock our data suggests that unstructured is even higher - over 90% of the data created is unstructured and growing much faster than structured
Ashish Palekar
@dvellante this is an emerging field with data trends changing real time #bigdata
Ashish Palekar
@dvellante would love to see the data - if anything it bolsters the DataLake 2.0 story
Sam Grocott
i am hoping for 100% someday! :)
John Furrier
Q1: We’ve heard EMC talk about the Data Lake a lot. Is this just a term dreamed up by EMC marketing or is there real strategic value to it?
George Gilbert
#datalake has been widely adopted term in #hadoop to refer to uncurated data that needs refinement, enrichment
Sam Grocott
very real and very strategic! customer demand mass consolidation of unstructured data AND the ability to drive insight over this information...we have delivered and expanded this strategy with the data lake 2.0 announcements today!
Brian Gracely
this is where John makes his case for "Data Ocean"
Suhela Dighe
Especially exciting is the broad spectrum from cloud to edge devices.
John Furrier
To me the Data Lake is about data warehouse market
Sam Grocott
au contriari mon frere data warehouse is a great 1999 conversation, but the technology has moved quickly and is being dominated by unstructured data...audio, video, images, files, objects, social, etc... data lakes are required now to store this
John Furrier
@sgrocott so is data lake more reliable than a data ocean :-)
George Gilbert
with all the new machine readable data emerging, the #datalake is going beyond #datawarehouse use cases
Sam Grocott
depends, is it running on EMC Isilon?
George Gilbert
@ggilbert41 new use cases are lower latency capturing and processing data nearer origination
Dave Vellante
Wikibon practitioners tell us that the traditional EDW is still a key part of their #bigdata strategy...does data "ocean" imply a superset of the data lake that includes EDW or is EDW part of the lake?
Phil Bullinger
@dvellante EDW is a use case of a data lake, but it doesn't entirely define the space.
Phil Bullinger
A Data Lake is part efficiency, part value. The efficiency of coalescing unstructured data, and the value gained by reasoning over it through multi-protocol access.
Dave Vellante
Hey Phil! ok...so it's in the scope of DL
Phil Bullinger
@dvellante Right. We have a lot of customers with EDW-focused Data Lake workloads, more focused on historical data. Our tiering capabilities (perf, capacity, archive and cloud tiers) enables multiple uses cases in a single Data Lake.
John Furrier
Q3: Why are customers asking for more out of their Data Lake strategies?
John Furrier
I see your edge, core and cloud message. Why is this something customers need today?
George Gilbert
#datalake started as adjunct to #datawarehouse. now it's becoming key part of near real-time #fastdata analytic #datapipeline
Ashish Palekar
starts with getting value from data and then keeping data management costs down #SimpleIsSmart
Ashish Palekar
a petabyte #Isilon cluster was a rarity, soon this will become the average
Jeff Frick
@logicalblock > Demands for storage will never decrease
Jeff Frick
@logicalblock > Sounds easier in theory than practice, especially if you're just "capturing" not really sure what the value it.
Jeff Frick
@logicalblock > Begs Question, what are some qualifying questions to ask before turning your data exhaust from simply flowing onto the floor, and into the data lake?
Ashish Palekar
@JeffFrick completely fair - the first step starts by having a set of use-cases that you know you want to look at and then expanding from there
Dave Vellante
imo customers are spending 75-80% of their time messing w/ data quality issues and the like
Dave Vellante
Hadoop complexity has hurt ROI so they're demanding more from suppliers of data lakes #ROI
John Furrier
Please welcome the hosts for today’s CrowdChat Phil Bullinger (logged in as LinkedIn but is one of our hosts) and Sam Grocott and Ashish Paler and me your moderator John Furrier
Sam Grocott
so glad to be here! Exciting day today!
John Furrier
Welcome to the EMC AMA on the big news today about Data Lake 2.0 - I'm tweeting from Palo Alto CA. Where are you posting from?
Jeff Frick
> Hello Phil, Sam and Ashish!
Ashish Palekar
thanks john! looking forward to it
John Furrier
I will start with a series of question but this is for anyone to come in and ask questions hence AMA
Suhela Dighe
Hi all, this is my 10th Crowdchat this year.. For once not as a host. Looking forward to the chat.
Crowd Captain
I'm sailing the Data Ocean
Crowd Captain
@suzyspaatz I've been admiring your chats lately and tell us what we can do better
Sam Grocott
welcome crowd captain!
Jeff Frick
@sgrocott > Who doesn't love Captn Stubing.
Dave Vellante
thanks for hosting John...
Jeff Frick
@Schmarzo > Where is the Dean?
John Furrier
btw: I like the hashtag selection here great job whoever thought of it - it will have legs