From the “Open Letter to the Storage Industry”: “Data lake is dying. It was built on the obsolete premise that all unstructured data is meant to be stored.” How are you evolving your management of analytics data? https://www.crowdcha...
Traditional Data Lakes and data warehouses have had poor returns on investment - they make only people smarter. AI is driving up the value, as Inference engines can make processes smarter. Data Hubs are a step on the way!
I have been on record since that term came out (took tons of heat for it) that data lakes aren't the silver bullet in fact I dislike the term. Data is fluid and the ocean metaphor is better bc
@dfloyer I agree with David Floyer on this. I would add further that imho data needs many hubs and data buses
current data centers have too many different architectures for storing data, hte public cloud simplified it by narrowing the choices down dramatically, same has to happen for on-prem, and why new architectures like Data Hub have to come to fruition
@purerkim They are and they mostly turn into data swamps bc they have to optimize on keeping the data moving when the better architecture is to let the apps drive the data. movement. The result is extra time is wasted managing data quality
Data lakes ultimately are proprietary with 'openness' enabled via ETL and/or gateways. They don't work with the speed at which data is being created and analyzed across a multitude of platforms
AI and Cloud change the return on Data - by integrating data hubs and moving code to the data, much richer real-time analysis can occur. This enables smarter application decisions in real-time.
it also strikes me that most on-prem apps today are built to run on files, but cloud apps are built to run on objects -- that has to be unified at some point
@purerkim The simplest measurement of ROI is to ask CIOs what the comparative return on data lakes and data warehouses. They will usually roll there eyes!
@vStewed the past data lakes are also too static, and you can't easily inject new analytics apps into your pipeline, one more reason to seperate compute (into containers or VMs) and enable fast scalable storage tier -- that is a data hub
@ARmiBanaria Flash is a critical role in increasing the amount of data processed. Well designed flash systems have much lower latency, and much higher aggregate throughput. This leads to higher value real-time applications.
@TheSchwarzBwthU spot on - one's AI & Analytics architectures should be build on the same principals where on-prem, hybrid or in the cloud.
@dfloyer Two fundamental problems with classic analytics and AI data storage strategies are 1) they were designed on slow disk and networking technologies and 2) single use case centricity, expandable via gateways or ETL.
@dfloyer Data Lakes are filling the role of the landing zone for further processing of data. To DW for that SVOT need and to support ML. Unfortunate name, Data Lake.
@NeilRaden , Agreed. "Lake" is a poor choice for name..
@NeilRaden, it's too close to being a "swamp"!
the whole concept of data lake is flawed imo...