RealDataStories

Infrastructure at Scale
Join to discuss lessons learned while running infrastructures at scale. Share tips & strategies.
   9 years ago
#realdatastoriesInfrastructure at ScaleWe'll be talking about Infrastructure at Scale and the recent HP Apollo announcement.
Leo Leung
Let's tee it up with our first question to the crowd http://www.via-cc.at... - are you ready for "always on"

Leo Leung
We're joined by @gregorygbishop , who ran the huge Time Warner infrastructure
Dave Vellante
the issue of course is always recovery - this is what makes "enterprise ready" very difficult - all the processes and procedures that are in place create terribly cemented infrastructure - but it typically works
John Furrier
. @lleung here at #bigdataSV #strataconf the questions on scale with hardware is coming up on how to scale #bigdata
Greg Bishop
At TWC, I supported a mail system with 20M mailboxes, and over 10B object in storage
Leo Leung
@dvellante yes and no - recovery is relevant, but so is running continuously even with failures
Stuart Miniman
the old way was to harden every piece of infra, the new was is distributed systems. Big hurdle around applications making this change, infrastructure is getting there faster.
Joseph B George (JBG)
Definitely a common theme that I hear from customers these days
John Furrier
hadoop is still useful for petabyte processing. and yes many companies do have that problem. spark seems cool only if you can figure out how to keep it running at scale - what should folks do for this
Greg Bishop
Recover isn't the issue - the issue is figuring how not to need recovery
Joseph B George (JBG)
and as tech evolves (a la Hadoop 2.0 and features like erasure coding), scaling becomes ever more interesting
Greg Bishop
When making a system "always on" and "at scale," one must assume that the system always has something in a failed state.
Leo Leung
@dvellante - the old way of downtime or system slowdown while you recover is no longer valid
Andrew Reichman
with massive data sets it's just not viable to think that you can have primary running with copies to somewhere else that you would recover to when things break- it just takes too long to move the data and build out a new envr.- you need HA
Greg Bishop
So in the old sense, the system is always 'in recovery'
Leo Leung
@gregorygbishop - agree. there are always disks down and nodes down... cc @dvellante @stu
Andrew Reichman
But building HA requires deep integration with the apps that use the data, technology to keep multiple sites synchronized and double huge infr
Joseph B George (JBG)
I'm seeing more people put more thought into things like fault domains - embracing that downtime will happen and planning with it in mind
Andrew Reichman
@gregorygbishop exactly- instead of recovery being a declared event when things hit the fan, it's more of a constant scenario that you're mitigating in smaller, non-disruptive ways
Leo Leung
@reichmanIT - @gregorygbishop - do you agree in the notion of deep integration or is the infrastructure smarter?
Lacee
@jbgeorge fault domain seems to be a common issue I hear from customers #realdatastories
Leo Leung
There's definitely a law of large numbers effect - 1,000's of disks, 1,000's of nodes, things will fail
Joseph B George (JBG)
I will also say that as the infrastructure is evolving - esp as we are looking at networking beyond 10GbE - the infrastructure design gets more interesting
Greg Bishop
I'm not sure that deep integration with the infrastructure is required to support the resiliency requires for 'always on'
Andrew Reichman
@gregorygbishop depends on who's talking- if it's infr team they will say deep integration. if it's app team, they will say that they can control dumb infr with their smart software
Joseph B George (JBG)
back in 100Mb times, it was a source that had to be "designed around" - that is changing
Leo Leung
@gregorygbishop - certainly, our prescription is a different kind of infrastructure - "distributed" is one piece @stu
John Furrier
polarization with apps at scale (bus applications) and infra at scale (infra software) - lots of innovation at the infra
Leo Leung
@gregorygbishop - given "continuous recovery" what do you do differently from before?
John Furrier
. @lleung this bringups the notion of hw as a service - consumption has to be easy to stand up and provision for app scale world - I'm interested in what solutions are out there
Ariana Gradow
The value and the ability to build something that can scale is now a necessity
Leo Leung
@furrier Definitely - not so much "as a service", but service oriented yes. Have to work with old and new apps.
Joseph B George (JBG)
I know @zehicle has been talking about this for many years
Greg Bishop
At TWC, the mail application interfaces with the storage infrastructure using a standard web interface, but has no concept of how the infrastructure keeps data available
John Furrier
. @lleung many think that containers are a big part of the transition from old apps to new and powering #devops
Joseph B George (JBG)
I actually WOULD say it is HWaaS - the tools behind can give it that level of delivery
Greg Bishop
So, the goal was to make the infrastructure smart, not HA in the traditional sense, as the application sees no 'failover'
Joseph B George (JBG)
totally agree on containers @furrier
Leo Leung
@gregorygbishop - cool - my point these days is traditional notions of failover, recovery, availability... need an update
Joseph B George (JBG)
In that vein, we are seeing more and more HP customers start looking at infra closer - purpose built vs general purpose - getting great results
Andrew Reichman
@gregorygbishop decoupled architecture allows each piece to scale indefinitely and not break the others so long as everybody is reliable and speaking a language the others understand
Joseph B George (JBG)
The recently announced HP Big Data Ref Arch is a good example
Leo Leung
@reichmanIT - that's what i mean by service oriented vs. "as a service" - probably need a longer piece on that
Leo Leung
OK - about to switch to next topic
Dave Vellante
@gregorygbishop how utopian - that would be a computer industry first!
Andrew Reichman
agree- as a service just means that someone else is doing it. service oriented means that separate domains have rules of engagement whereever they might live and whoever might have built them
Dave Vellante
you have to think about "disposable infrastructure" but imo if you ignore recovery you are a foolish practitioner - remember - even google has to recover from tape at times
Ariana Gradow
.@lleung Cloud scalability and performance should be at the heart of every successful internet venture.