haflash

High Availability Flash
Join Fusion.io for CrowdChat thought leader conversation on high availability flash storage
Soham Chakraborty
and another one. What in-house/default tool you suggest for benchmarking the iops. blktrace of Jens Axobe is what I have used extensively on disk based environment. Is it same for flash or anything else?
Dave Vellante
I'm no expert but I know a lot of practitioners that use iometer cause it's simple
garyorenstein
in general we recommended benchmarking at the application transaction level. sometimes people talk about using iometer and I joke "do you run your business on iometer?" :)
Soham Chakraborty Okay, now the tough part is whenever users face a bottleneck they think it is either storage or OS. Coming from Linux background, it is quite troublesome to let them explain about app level. Then question, what metric on app level :)
garyorenstein agree with challenges Soham. We see it as a transformation where huge benefits come from rethinking prior assumptions
John Furrier
Soham great questions - i voted for this
Soham Chakraborty Thanks John. I am always curious about performance engineering ;)
garyorenstein
one of the creative stats I saw last week from eBay was URLs per Kilowatt. they really narrowed down what mattered up top (pages served) with costs on the bottom (electricity)
Dave Vellante
So here's a question - what's changed in the past decade wrt high availability and how has it impacted the application layer?
garyorenstein
HA always a requirement, now often built into the application layer, SQL mirroring, Oracle Dataguard
Dave Vellante so ur saying historically apps weren't responsible for recovery and today they increasingly are?
Stuart Miniman
general trend has gone from buying more hardware to deliver HA to building it into the sw layer
garyorenstein exactly. often built in right to the application/database. does not mean the only option though
Dave Vellante so what connects the active active controllers? Infiniband?
Stuart Miniman for high-speed, low latency Infiniband is a very popular option today. Ethernet is catching up - seen renewed interest in RoCE (Dell announcement last week).
garyorenstein
historically folks relied on HA with separate failover at storage layer. that is still around but we didn't have MSFT Always On back then.
David Floyer
The majority of modern cloud applications built on commodity hardware are responsible for high availability - it is built into the application, assuming that any server storage network or flash storage unit can and will fail.
Dave Vellante can you give some examples?
garyorenstein Data Availability Groups are a perfect technical example of HA at the server level
David Floyer
Oracle applications and applications such as Microsoft Exchange (with DAGs) are also responsible for high availability
David Floyer
Examples of modern high availability cloud applications include Facebook, Apple iCloud, Apple application management, and most Google services
David Floyer
The greatest challenge in traditional disk-based recovery systems is the time to load the checkpoint data, run through the log files and restart - Disk IO is always a bottleneck.
garyorenstein my favorite quote "the best I/O is the I/O you do not have to do!" :)
garyorenstein checkpoint sure to be a next frontier in flash use, logging to. breaking one bottleneck one day at a time.
David Floyer
One of the biggest benefits of high performance flash is the potential to reduce recovery time from hours to minutes
Stuart Miniman
for high-speed, low latency Infiniband is a very popular option today. Ethernet is catching up - seen renewed interest in RoCE (Dell announcement last week).
Soham Chakraborty
Considering a medium sized organization, how would you suggest going for a flash based HA architecture? HA being included in app and not being app are two conditions. #flash #haflash
garyorenstein
in the app, you could take advantage of SQL Server Always On / Mirroring / Replication or Oracle ASM / Dataguard...
garyorenstein
an with in the app you can use in server flash
garyorenstein
without HA at the application layer you could use a hybrid storage appliance which has active-active controllers like our ioControl product or
garyorenstein
or another option for peak performance without HA at the application layer is an all flash appliance replicating to a second all flash appliance
Dave Vellante
Here's a good resource fm oracle on H/A architectures http://docs.oracle.com/cd/B28359_01/server.111/b28281/architectures.htm#g1012375
John Furrier what is your angle on applications
John Furrier
what apps are you thinking about?
Soham Chakraborty mostly oracle and derivatives. However, I have little choice over apps which customers use.
David Floyer
Putting HA in the application layer gives the greatest flexibility and the lowest infrastructure cost. HA has to include data availability, integrity & recovery. Using flash as part of the storage can significantly help recovery times
garyorenstein agree. HA at app and in server flash are max performance min footprint. but storage failover can be FAST
David Floyer
If the HA is not included App, then using flash well in the storage layer can provide storage redundancy (replication) and provide fast recovery. This has to be done for legacy applications with no HA awareness.
Fusion-io
Here is a great quote from David Yu, Global Service Team, Quanta Computer "We reduced failover times on our SAP database from 30 minutes (or 50 minutes to recovery un-commit) to just five seconds." - http://www.fusionio.com/case-studies/quanta/
garyorenstein
another of my favorites Cloudmark supercharging MySQL replications http://www.fusionio.com/casestudies/cloudmark
John Furrier
Is this use case representative to the broader enterprise market or a outlier?
garyorenstein great things happen when you go from disk-to-disk replication to flash-to-flash replication. applies across applications and market segments.
Dave Vellante
What has to be done to eliminate legacy IO protocols and what will be the impact on HA?
garyorenstein
anytime you can remove bottlenecks you can speed up the system. faster transactions and faster logging with flash, coupled with flash-to-flash replication for HA brings performance up and costs down.
David Floyer
The legacy IO protocols were design for magnetic spinning disk, and designed to ensure integrity of disk writes. Within the protocol are n-phase commits, which require multiple exchanges. Moving to flash as an extension of DRAM would simplify...
David Floyer
... atomic writes are part of the T10 submissions and can achieve this. In HA, the data could be written to flash directly on one system in parallel with data being sent to another server and being written to flash on the second system
Dave Vellante
is there an analog to write integrity with flash "garbage collection" david?
Aaron Nuechterlein
What is Fusion-IO’s thoughts on Sata Express? How will your company embrace this technology to promote widespread adoption of PCI Express SSDs? #Flash #Storage
garyorenstein
SATA is great for disk drives and we use it extensively in our hybrid storage appliance. but for letting flash do what it does best, a PCIe based approach is best
Dave Vellante see "the best io is no io"
Aaron Nuechterlein Seeing as SATA Express utilizes a PCIe based interface do you have plans to implement it? And interesting find on the Macbook...
garyorenstein
interesting to note that apple promotes "PCIe-based flash" when you go to buy a Macbook. They are good indicators of where the industry is heading http://store.apple.com/us/buy-mac/macbook-pro
John Furrier
Does having active-passive or active-active controller based architectures translate into consistently high, predictable performance..discuss
garyorenstein
having active active is generally better. more performance and load balancing capability
John Furrier what about across mixed workloads with headroom for IOPS and bandwidth for demanding applications?
garyorenstein yes. active/active provides highest reliability and performance overall.
David Floyer
mmm - I would put it another way - the application has to be able to know if the data is available, and if available know if it as written. Storage systems needs to have capabilities to self-check data integrity and either correct or flag the app.
Fusion-io
Another good case study is a our customer Polaris, who took query times from 3 minutes (with frequent time outs) to an average of 2-3 seconds. This included full HA with SIOS DataKeeper. http://www.fusionio.com/blog/polaris-sios/
John Furrier
how old is that case study?
Fusion-io Just over a year. We still do a lot of work with @SIOSTech.
John Furrier
How do you best architect flash where HA is not built in to the application #flash #storage #cloud
garyorenstein
a few options to do at the storage layer. 1) replicate between flash appliance nodes synchronously, a benefit of fast flash-flash replication ...
garyorenstein
2) another option is active-active controller architectures in the storage appliance itself...similar to our ioControl Hybrid Storage product http://www.fusionio.com/products/iocontrol
Dave Vellante
What does it mean that pcie doesn't have dual porting? Does that mean generally apps will handle h/a? or is there a HW option too?
garyorenstein
two separate things. You can have dual porting PCIe. Servers over time will be architected as such.
Dave Vellante is this a standards issue - i.e. is that in the stds committees?
garyorenstein
In the meantime you can get HA in a single box with dual controllers, or go to a server-server HA setup
garyorenstein
more of an issue that server architectures do not change overnight. interfaces/specifications moving along well.
David Floyer
The normal way of handling it is to use RAID 1 or mirror the data on a different server - obviously this can impact RT