Managed Incidents across multiple teams as a part of Incident Manager on-call rotation, making prompt decisions and guiding incidents towards solutions reducing time to recovery
Conducted simulation to improve understanding and limitations of Shopify's systems by simulating complex services, identifying, and implementing improvement action plans
Migrated assets from Datadog to Prometheus system, streamlining Shopify's monitoring processes by managing alerts, SLOs and dashboards
Like this project
Posted Aug 8, 2023
Worked as a part of the Resiliency team to ensure maximum uptime by reducing time to recovery of incidents.