Site Reliability Engineering
The SRE team at Litmus is a bit of a hybrid between Site Reliability Engineering and Platform Engineering. The SRE team is focused on designing and managing our AWS cloud platform's core infrastructure, as well as building abstractions and automation to enable our engineering teams to deploy and operate their software and services in production safely and reliably. Within SRE we aim to design our infrastructure and implementations for the safety, contentment, knowledge and freedom of both our peers and our customers.
The SRE team is involved with most every aspect of our systems, but we take primary responsibility for the low-level complexity of our AWS environments, for the build and continuous deployment and integration services, VPNs, Terraform automation, and a long list of other services that enable our high-velocity engineering organization. We work closely with engineering to continuously improve these abstractions and services.
The SRE team has a set of shared values that we have adopted over time as the team has formed, and that we practice and hold each other accountable for.
Empathy is our superpower
The focus of our team is enablement, and one of the most critical tools we have for identifying where to focus our efforts is empathy.
Engineer for Safety and Freedom
Our goal is to provide as much freedom for our engineering peers as we can, within a safe environment where mistakes will not be tragic, so they may act without fear of unintended consequences.
Pull over push
We leverage Kanban within our team because we believe in the value of pull over push. We prefer to deliver based on demand over a plan based on predictions that may never get used
Wherever possible our goal is to eliminate wasteful work that does not add value, either through automation or through simply not doing it.
Diversity is a asset
We value diversity in backgrounds, thoughts, experiences, preferences and opinions. Much like in any good adventuring party, it pays to have a diverse group. Our goal is harmony, not homogeneity.
Only Actionable Alerts
We have all lived through oncall nightmares and are determined to not have them at Litmus. We are passionate about limiting oncall alerts to only items which are truly actionable and pursuing resilient services that rarely, if ever, generate them. This is a part of empathy, as well as engineering for the well-being of ourselves and our peers.
You're only as good as the company you keep. Our SRE team is a motley crew of talented engineers with a diverse set of skills and backgrounds. Interested in working with us? We're hiring!
Adam Greenlee Site Reliability Engineer Tech Lead
Ben Sykes Principal Site Reliability Engineer
Chad May Senior Site Reliability Engineer
Davina Fournier Principal Site Reliability Engineer
Mark Bainter Sr Director of Site Reliability Engineering
Shawn Brown Database Reliability Engineer Tech Lead