• Monitorama BAL 2019 - Coalescing systems data without losing fidelity: How do you collect just enough data from a system that’s emitting lots of it? Adaptive sampling, avoid aggregation because you lose information, figure out how much you need to collect to be representative of broader data set. Collecting and storing everything can be prohibitively expensive.
  • Software Managers’ Guide to Operational Excellence: Things to consider as regular operations tasks for a team that owns security, availability, and performance of software that has to keep running. eg:
    • monitoring
    • incident response
    • disaster recovery, backups
    • oncall
    • platform tools (ci, deployer, config management, etc)
    • pdf
  • An overview of Cloudflare’s logging pipeline: Cloudflare’s logging pipeline including at the source: journald, syslog-ng, kafka, elk and clickhouse. This seems simple and scalable which makes me happy. I didn’t know the collector end of the log pipeline could rely on systemd.
  • Let’s Consign CAP to the Cabinet of Curiosities: CAP an idea from distributed systems theory isn’t always applications. App developers are usually able to find ways to get consistency and availability together in the presence of network partitions. (Would the venerable Load Balancer please step forward + other system facilities we have to route clients away from failed components.) Couple of interesting papers in here I should probably read as well. :)