- How does someone trigger an incident?
- Currently, alerting the Slack
egt-oncall
user and opening a thread.
- We should have a mechanism that uses OpsGenie or DataDog to start this workflow.
- What happens when we do? What are the roles?
- RP vs. On-Call?
- On-Call for triage vs. resolution?
- DORA Metrics
- Frequency
- Time from commit to deploy?
- Incidents in DataDog / Sentry / etc.
- Questions for roles:
- Who is triaging?
- Who is communicating? Frequency?
- Fix-forward vs. roll back?
Incident Response Meetings
Incident Response Meeting 9/12/22