- Migrating the Jira Database Platform to AWS Aurora: Every Jira instance gets its own db. There are millions of them packed onto several thousand virtual machines. They’re moving around hundreds a day as a way to balance queries / workloads across their db instances. They’re using AWS aurora.
- Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure: Interesting alerts these (not the usual ones) …
- High nat gateway throughput: They have a static threshold set such that when they cross it they know the system has grown substantively either organically or by way of a bug they should look into
- System logs that make it to long term archival storage: They know how many log events they see at their ALB (amazon application load balancer) and they check what they seen in long term storage for a given period lines up
- k8s api errors / request limiting: Feedback from their platform is monitored closely
- Saving Millions of Dollars by Bin-Packing ClickHouse Pods in AWS EKS: Switched k8s pod scheduler policy LeastAllocated (cpu, ram) -> MostAllocated. The increased util in the busiest cluster nodes by 20 - 30% and were able to turn off some nodes for cost savings. Bin packing isn’t really something we can consider in fargate but maybe in the future we decide we’re ok running ecs in ec2 mode for more control / cost savings.
- The Blue Tape List: This is helpful when I’m processing change around me. Don’t jump to conclusions or try to fix stuff you’re seeing for the first time. Make note of it - the blue tape list - and come back to it after a while when you have more context.
- Seventh-generation server hardware at Dropbox: our most efficient and capable architecture yet: Dropbox talks about their current gen server hardware and how they think about h/w + s/w co-design.
- Building ClickHouse Cloud From Scratch in a Year: How clickhouse the business arm (vs open source db) went about making a managed hosting option for ch. Speaks to their architecture decisions across all areas: tech stack, billing, monitoring, security, etc. Very nice writeup!