Site Reliability Engineering: How Google Runs Production Systems中文译
更新中...
供自我学习使用,如果翻译有错误或者术语问题可以在issue和PR中讨论。
- Foreword(前言)
- Preface(序言)
- Part I. Introduction(介绍)
- Part II. Principles(原则)
- Chapter 3. Embracing Risk(拥抱风险)
- Chapter 4. Service Level Objectives(服务水平目标)
- Chapter 5. Eliminating Toil(消除劳作)
- Chapter 6. Monitoring Distributed Systems(监控分布式系统)
- Definitions(定义)
- Why Monitor?(为什么要监控)
- Setting Reasonable Expectations for Monitoring(设定合理的监控期望值)
- Symptoms Versus Causes(症状与原因)
- Black-Box Versus White-Box(黑盒与白盒)
- The Four Golden Signals(四种黄金信号)
- Worrying About Your Tail (or, Instrumentation and Performance)(小心拖尾数据(或者,仪表和性能))
- Choosing an Appropriate Resolution for Measurements(为测量选择一个合适的分辨率)
- As Simple as Possible, No Simpler(尽可能地简单)
- Tying These Principles Together(将这些原则联系起来)
- Monitoring for the Long Term(长期监控)
- Conclusion(总结)
- Chapter 7. The Evolution of Automation at Google(Google的自动化进程)
- The Value of Automation(自动化的价值)
- The Value for Google SRE(Google SRE的价值)
- The Use Cases for Automation(使用自动化的例子)
- Automate Yourself Out of a Job: Automate ALL the Things!(让自己的工作自动化:把所有的事情都自动化!)
- Soothing the Pain: Applying Automation to Cluster Turnups(舒缓痛苦:将自动化应用于集群启动)
- Borg: Birth of the Warehouse-Scale Computer(Borg:仓库级计算机的诞生)
- Reliability Is the Fundamental Feature(可靠性是最基本的特征)
- Recommendations(建议)
- Chapter 8. Release Engineering(发布工程)
- Chapter 9. Simplicity(简单性)
- Part III. Practices(实践)