1 option
Site reliability engineering essentials.
- Format:
- Video
- Language:
- English
- Subjects (All):
- Information technology--Management.
- Information technology.
- Reliability (Engineering).
- Computer engineering.
- Physical Description:
- 1 online resource (1 video file (4 hr., 10 min.)) : sound, color.
- Edition:
- [First edition].
- Place of Publication:
- [Place of publication not identified] : Pearson, [2025]
- Summary:
- Master the essentials of Site Reliability Engineering to effectively manage production systems with real-world insights and techniques. Unlock the power of Site Reliability Engineering (SRE) with this comprehensive video course. SRE is a critical discipline that combines software engineering with IT operations to ensure high system reliability, scalability, and performance. This course provides a deep dive into the core principles and practices of SRE, equipping you with the tools to build reliable systems and improve operational efficiency. The course covers key SRE concepts, includingService Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets, with practical examples that help you apply these principles to your own organization. You will learn how to build and optimize a robust monitoring and observability system using essential telemetry data, such as logs, metrics, and traces. Through an in-depth exploration of observability platforms, you will learn how to effectively monitor and maintain system health. The course also addresses crucial aspects of incident management, such as managing on-call duties, running war rooms for critical incidents, and conducting blameless postmortems to learn from failures. Gain insights into reliable system architecture patterns, such as load balancing, auto-scaling, and the CAP theorem, to ensure your infrastructure remains resilient under high traffic. Additionally, you will discover release management strategies that minimize user impact during deployments, monitor your CI/CD pipeline, and ensure progressive rollouts. The course also guides you through implementing SRE practices within your organization, including setting up a central SRE team and conducting production readiness reviews to ensure your systems are always production ready. By the end of this course, you will have a solid understanding of SRE best practices and the knowledge to enhance the reliability and scalability of your systems while reducing downtime and improving overall operational efficiency.
- Notes:
- OCLC-licensed vendor bibliographic record.
- ISBN:
- 9780135415016
- 0135415012
- OCLC:
- 1485474523
The Penn Libraries is committed to describing library materials using current, accurate, and responsible language. If you discover outdated or inaccurate language, please fill out this feedback form to report it and suggest alternative language.