• The many ways to obtain credentials in AWS: Lots of different ways to get a credential depending on execution context. Interesting and utterly impossible to remember! :)
  • Break Stuff on Purpose: Slack lost an elasticsearch cluster and found they didn’t have a recent backup and their runbooks were our of date. This sounds familiar. I find at work we write runbooks when we first launch a new service and then all that documentation languishes. Also not having an alert when a backup failed for 2 years … I feel for them!
  • GitHub Availability Report: December 2024: Here’s another failure report this time from github. Planned maintenance increased load on a system in a way that prevented monitoring from seeing what was happening. (For our monitoring systems to work they need enough reserved capacity in a system to be able to collect telemetry.)
  • How to end-to-end test microservices across bounded contexts?: Focus on the module under test and mock away everything else.