ChaosEngineering

Chaos Engineering
This Chat creates future perspective on Chaos Engineering
Suresh GP, MBRM
There are also a lot of misconceptions around Chaos Engineering that we need to address. 1.It is not about breaking things in Production 2. It is not about injecting random chaos experiments into the system and seeing what happens
http://1.It
http://1.It
Garima Bajpai
could you refer to some platforms for reference for our audience ?
Suresh GP, MBRM
Gremlin is a good platform to experiment in a controlled environment where you minimize the blast radius
Mark Peters
verica also offers some good options for full spectrum Chaos testing
Mark Peters
it is about breaking things at all levels. Just not breaking them in your customer facing application. We create env to allow test and this is the same application
Suresh GP, MBRM
Are organizations even ready from a risk appetite and maturity level to do Chaos Engineering? Considering DevOps 3rd way is followed by very few organizations across the globe?
Mark Peters
always say, if you are still getting gains from the basics, you dont need to go advanced. Chaos is best in complex distributed systems where basic testing cant find the pain points
Garima Bajpai
@TinyCyber could you help understand some features which stand out ?
Suresh GP, MBRM
@TinyCyber Spot on Mark. Lets not boil the ocean and do things for the sake of doing Chaos Engineering
Mark Peters
latency, app calls, sysstem load, infastructure inefficiency, recovery practices, load balancing
Garima Bajpai
What do you think are the top challenges & Opportunities - Chaos Engineering
Mark Peters
top challenge is translating results to new features that solve the problem. Complex issues have multiple strings
Garima Bajpai
Systematic thinking around Chaos Engineering - more about building a practice less about tool
Suresh GP, MBRM
@bajpaigarima Challenges 1 Creating Social Awarness is key. Way of Doing Game Days will help. 2. Chaos Engineering is a Journey and not Destination 3. Start Small and see progress instead of Big Bang Approach 4.Get your entire value chain orchestrated well before you experiment
Suresh GP, MBRM
Opportunities : 1. Ability to move to Predictive Management. 2. Improving your longer version of the game. 3. Inculcate the Experimental and Growth Mindset
Garima Bajpai
Creating agnostic view on Chaos practice
William Szepesi
Creating visibility can be a challenge. How are individual failures or performance issues made visible and secondly, how these issues have a knock on effect on the distributed system
Mark Peters
cant practice without the right tools
Garima Bajpai
@TinyCyber Can you talk about "right tools" ?
Mark Peters
whatever gets you were you need to go smaller testing and localized chaos can use the hammer and nails, the bigger the network, the more you want automated aolutions and integrated results
Mark Peters
if your chaos tool outputs logs and tracea, still a lot of work converting to usable framework. DAST tools can be low level chaos
Mark Peters
If breaks are only visible with the chaos tool and you cant duplicate locally can be difficult to fix
Garima Bajpai
What is key business value of Chaos Engineering, how organization benefit?
Garima Bajpai
Focus on the third way – DevOps, enable experiential learning
Suresh GP, MBRM
Prevents Long Outage from happening that invariably reduces business impact
Garima Bajpai
User Experience at Scale. ..
Suresh GP, MBRM
Customers gain confidence that the team is fully equiped for any disasters and ensure business continuity.
Garima Bajpai
Could it help in feature engineering ?
Suresh GP, MBRM
I look at Feature Engineering addressing aspects of Usability, Convenience and Reliability. So from that perspective it helps.
Mark Peters
value is finding challenges and errors before the customer
William Szepesi
identifying failures before they become outages - as Gremlins whitepaper is titled - is a key business value, protecting clients from service failures and maintaining client trust in your service reliability - a key business value!
Mark Peters
chaos can also verify the right fliw and feedback exist in your distributed practices
Suresh GP, MBRM
Can you share the whitepaper link @WSzepesi for our viewers?
Suresh GP, MBRM
The purpose of Chaos Engineering is NOT to “Break Things on Purpose”.
If anything we are trying to “Fix them on Purpose”!
Garima Bajpai
Well said - how do you think we move from purpose of breaking things to creating resiliency
Suresh GP, MBRM
@bajpaigarima1 Understanding the Goal is important. What are we committing to deliver for a seamless customer experience.? How is our ecosystem tailored to meet the fit for purpose and fit for use objective.
Mark Peters
the purpose is to break things but in a non-permenent and non-value losing way
Mark Peters
with materials engineering we do nondestructive testing. Chaos offees ua that option for code
Garima Bajpai
@TinyCyber Does it help prioritize the technical debt
Mark Peters
could help identify but should realy find tech debt during dev and planning. Chaos tests unexpected gaps
Mark Peters
@bajpaigarima1 Can help prioritize features. Initial chaos eng showed where things broke. Modern testing practices with chaos are more about when it will break
Suresh GP, MBRM
I agree with Mark. It is not a substitute for prioritizing technical debt. It has to be done during dev and planning
Mark Peters
sometimes new features might fix chaos issues without being tech debt. If a feature plans to decrease latency both aspects might get to the same point
William Szepesi
Can security be controlled when Chaos Engineering is being practiced?
Suresh GP, MBRM
Tricky question. With DevSecOps being integrated as part of Pipeline, you can make it work. However needs some planning, coordination and VSM to make it work
Garima Bajpai
may be part of the chaos experiments - how secure is your chaos experiments
William Szepesi
Yes and you want to automate the testing of it in a continuous deployment pipeline so you maintain that competence.
Suresh GP, MBRM
https://www.crowdchat.net/s/867zg - Strongly recommend this book from Aaron Rinehart!
https://www.crowdchat.net/s/867zg

Mark Peters
I see chaos as a differwnt aspect of security. Not system security as much as resilence and recovery. Dont need it for vulnerabilities and network monitoring
Suresh GP, MBRM
in my opinion Chaos Engineering can be summarized as follows - ●Its about Systems Safety
●Building safety margins into systems
●Replace blame culture with learning culture
●Telemetry, experimentation, and instrumentation
Mark Peters
none of those are limited to chaos. Those are good central practices that chaos can help with
Garima Bajpai
bingo!! Convergence of SRE and Chaos outcomes
Mark Peters
@bajpaigarima1 always looking to expand that toolbox
Suresh GP, MBRM
I always think that #SRE and #Chaos go very well together in achieving reliability and best user experience
Garima Bajpai
Lets talk about tools , which tool you are looking out in the Chaos space ?
Suresh GP, MBRM
In addition to Gremlin and Chaos Monkey (Netflix) I would recommend the following. Chaos Toolkit: This open-source initiative makes tests easier with an open API and a standard JSON format.
Pumba: Pumba is a chaos testing and network emulation tool for Docker.
L
Suresh GP, MBRM
Litmus: A chaos engineering tool for stateful workloads on Kubernetes.
William Szepesi
Limited set of tools available via open source and you may be re-inventing the wheel. Also, security vulnerabilities may not be well understood..
Suresh GP, MBRM
I would recommend Chaos Toolkit: This open-source initiative makes tests easier with an open API and a standard JSON format
Mark Peters
@sureshgp havent uswd the chaos toolkit. I havw to check that,and litmus out. I like robot framework for basic testing