Performance Analysis and Troubleshooting Methodologies for Databases

  • Short talk that reminded me about how to be methodical about investigation performance issues
    • USE: Resource centric, Brendan Gregg, utilization + saturation + errors of underlying servers
    • RED: App centric, Tom Wilke, Rate + errors + duration
    • Golden Signals: App centric as well (Overlap with RED), from the SRE book, traffic volume + errors + latency + saturation
  • Random googling is not always your friend :)
  • Percona has a tool that automates USE metric collection
  • source

G1: To Infinity and Beyond

  • Speaker: Stefan Johansson
  • source

  • Collectors can optimize for 1 or a few of
    • Throughput
    • Latency
    • Memory footprint
  • G1 tries to achieve a balance between them (zgc goes for low latency … sub 1ms pause times achieved!)
  • G1 is the default collector in jdk 9+
  • Goal: avoid full collections
    • Try to do bulk of work outside full gc span during concurrent steps
  • G1 has improved a lot since jdk8 (1000+ patches to this collector)
    • Big improvements in jdk11 + jdk17

How to Measure Latency

  • Different between response time vs service time
  • Measuring only on the server is insufficient (service time)
    • But this is where we often start because it’s easy
    • Only measuring here does not correlate with user experience
    • It hides queuing and other issues in components between the client and backend
  • Measuring at the client gives us more information but isn’t the full story either
    • You have to do both to be able to reason about some of the things happening in between
    • Queuing! Queues are everywhere and represent things / people waiting for access to a resource
  • Hidden queues
  • How to measure latency