SRE

Reliability trends for DevOps
How DevOps teams get proactive and prioritize work by adopting SRE practices
Garima Bajpai
What are the long-term benefits of moving from DevOps to SRE practices?
Sriram RJ
99.999% of environment availability :)
Kurt Andersen
SRE brings a perspective of the value being delivered to the customer more so than devops which usually stops at delivery into production.
Garima Bajpai
turning around the business model. - from reactive to proactive
Nicolas Philip
SRE practices is rather complementary to DevOps. It focuses on customer satisfaction.
Garima Bajpai
@drkurta Excellent - so Ops into DevOps :-)
Kurt Andersen
@RjSriram15 But why? Can you substantiate the cost/value trade-off? Especially if the underlying network infra is less reliable?
Deirdre Mahon
Understanding customer journey is really hard for eng teams #sre
Sriram RJ
treat issues with same respect as new features / enhancements
Emily Arnott
Establishing a "universal language" across the whole org of reliability based on user happiness
Garima Bajpai
Top to down approach - renegotiating contracts. - move away from SLA driven approach
Kurt Andersen
SRE can also help with scaling individual devops team practices across the organization - working as a platform team (a la _Team Topologies_)
Garima Bajpai
preventing churn - continuous relationship rather one -time contract
Sriram RJ
SLA --> SLOs & SLIs --> OKRs
Paul Chu
focusing on how the end user/customer is impacted across all aspects of development and operations
Deirdre Mahon
Writing your customer journey + success is hard. If the business can't do it - engineers don't have a chance to succeed. #sre #devops
Kurt Andersen
@RjSriram15 At a certain point of baseline reliability, you need to focus on containment and quick remediation of incidents to achieve more overall reliability.
Oleg Nenashev

> SLIs --> OKRs

This part is a subject for debate. SLIs should be indeed metrics for measuring Key Results in the OKR framework, but at the same time not all KRs should be measurable as SLIs
Deirdre Mahon
@oleg_nenashev yes indeed words matter too. sticking with the agreed objective for time to learn is key. orgs tend to have little patience

(edited)

Deirdre Mahon
@RjSriram15 or the other direction?
Oleg Nenashev
@dbmahon indeed many orgs have little patience and even less long-term strategy. Some of them confuse lack of patience and failing fast unfortunately. So they always fail...
Sriram RJ
@DBMahon most of the organizations today are reactive and are still at SLAs.. There should be a roadmap to move in both the directions
Garima Bajpai
What are the practical steps to implement SRE practices in orgs already invested in DevOps
Sriram RJ
Do not hire sre engineers from market. in house talents with environment knowledge are their assets
Deirdre Mahon
Hopefully it starts with the right mindset and desire to become more proactive.
Sriram RJ
set realistic goals
Deirdre Mahon
@RjSriram15 +1 yes but don't leave to their own devices. give them resources, training and tools
Kurt Andersen
The mindset is really critical - along with executive support.
Garima Bajpai
Focus on Ops aspects of DevOps. - Toil reduction , Error Budgets , Measurement (SLI,SLO ..) , Observability
MRAHMAN
differentiate practices for DevOps and SRE. People has many misconception
Deirdre Mahon
@drkurta yes it costs money and time to improve reliability and it's a journey. never one and done.
Deirdre Mahon
@_YEG yes get clear definitions and roles and don't overload the teams with unrealistic expectations.
Garima Bajpai
get management buy-in ? do you have a business case here ..
Emily Arnott
We actually wrote an ebook on this very subject! Check it out: https://info.blameless.com/bridging-the-gap-devops...

If you're invested in DevOps, you have a lot of resources to leverage for your SRE solution. It's all about orienting to new practices with the tools you have :)
https://info.blameless.com/bridging-the-gap-devops-to-sre
Bridging the Gap: DevOps to SRE Ebook
Bridging the Gap: DevOps to SRE Ebook
Bridging the Gap: DevOps to SRE is a practical guide to implementing SRE practices in an organization that’s already invested in DevOps. It makes clear the benefits of SRE and lays out how they’re achieved. While acknowledging potential challenges, i...
Garima Bajpai
not one size fit all
Deirdre Mahon
perhaps drowning in sev0 or sev1 incidents will get attention. Not ideal however.
William Szepesi
Mindset regarding culture change needs to show that you are not a prisoner of your past, you are a pioneer of your future.
Garima Bajpai
SRE brings human in the loop #SRE
Kurt Andersen
Small improvements, implemented on an ongoing basis, compound into significant changes. See @bvssh (and the book)!
Sriram RJ
have a team of experts from varied skills and solve problems collectively
Garima Bajpai
What are the key components of a Reliability Engineering platform?
Sriram RJ
AI based monitoring
MRAHMAN
-Buy-in mindset and not carrying the same baggage from the past
Garima Bajpai
Reliability Engineering platform - is for everyone #lowcode
Garima Bajpai
Datascience integration into SLI,SLO modelling , error budgets & policies
Sriram RJ
auto remediation engine??
Deirdre Mahon
agreeing on SLOs is really hard for disparate, large teams
Kurt Andersen
Integrating with the tools used across the pertinent teams - reduces the onboarding effort
Blameless
SLOs and SLIs to contextualize based on user impact
Garima Bajpai
integrating - SRE tools and orchestrating it in secure & reliable way
Paul Chu
Reliability doesn't only mean fixing the issues, but also the communication to customers as these issues are being fixed. Communication is something that a lot of service providers struggle with, which can impact how users view how "reliable" a service is.
Kurt Andersen
Effectively collecting info across all systems (both social and technical) which contribute to the systemic reliability is key
Garima Bajpai
@drkurta how do we add features of social engineering to #SRE platform
Nicolas Philip
@bajpaigarima1A Reliability Engineering platform is about people, processes and tools and automation and how data and communication flows.
Sriram RJ
@drkurta something like gamification
Kurt Andersen
So far, facilitating reflection through retrospective processes along with the good management practices that support psychological safety is the best that we have
Kurt Andersen
The field of cognitive systems engineering has a lot of insight on how to reduce cognitive load and make the tools into good co-workers

(edited)