1. Engineers that are on call be allocated in product sprints. However, they are not expected to spend more than 50% of their time working on the sprint related tasks. This percentage can go lower if it is a particularly busy week for on-call. The on-call engineer is protecting the rest of the team from distractions and is providing service to the rest of the company. They can't do this well if they're juggling feature development work. It is okay if they are unable to finish their sprint work during this week.
  2. Engineers are expected to be available to respond to and triage technical issues around the company, please use our Incident Severity Levels docs to understand the expectations for response time and escalating to people for outages.
    1. The On-Call engineer runs the incident, investigates what's happening, determines the severity level, and is empowered to mobilize other parts of the team based on their discretion. It's totally valid to loop @Adam Haney into these decisions if you're unsure.
    2. If it’s a P3 or more severe, the On-Call communicates (or delegates communication) about the status of the incident to the rest of the company. This includes sending out a company-wide(partners) email and tagging the relevant stakeholders on Slack (leads, operators, PMs).
    3. After the incident, the On-Call engineer digs in and researches what went wrong, the steps that lead us there, and investigates our systems. They then write a Post Mortem doc that goes in the database in notion. For P0 incidents they should also email this document to the rest of the company.
  3. <Alerting System??>
  4. When the on-call is not responding to incidents they should use their time to make automation code better. This includes:
    1. Writing documentation
    2. Refactoring code to introduce better coding practices.
  5. At the start of your shift make sure to
    1. Take over in the @automations-oncall group in Slack. Follow this guide to set yourself as the @automations-oncall: How To: Set Yourself as automations-oncall in Slack
  6. At the end of your on-call shift, you should write up what happened while you were on call, the status of any ongoing incidents, and a description of the better engineering tasks you were able to complete. You can also refer to this template On Call Report Template to write the on-call report (please send this both as an email and add it to the database. To make it easier to write the on call report at the end of your shift it’s suggested that you create the report at the beginning of your shift to take notes in, then clean up / edit the report before the hand-off meeting.

How To: Set Yourself as automations-oncall in Slack