About The Team
Join Uber Site Reliability Engineering and help us redefine what it means to be an SRE in 2018! As an Uber SRE, you will join a team of reliability engineers who partner with development teams throughout the organization with the ultimate goal of improving Uber products, features, and flow reliability.
An Uber SRE spends just as much of their time working on systems as they do writing code. You’ll be tasked with all manner of work from building operational tooling, automating operational workflows, performing architecture and design reviews, investigating system failures and complex outages, improving our monitoring infrastructure, defining service level objectives and agreements for Uber products and flows, and much more.
We hire SREs at all levels.
- Work with development partners to shape the architecture, design, and implementations of new and existing systems to enhance their reliability, performance, efficiency, and scalability
- Ensure all key services are measured, monitored and raising alerts when needed
- Automation of deployment and configuration processes
- Develop reliability tools and frameworks for use by all engineers
- Share on-call for Uber’s most critical systems and lead incident response and no-blame postmortem analysis and review
- Drive efficiencies in systems and processes: capacity planning, configuration management, performance tuning, monitoring and root cause analysis.
- We are expert in Uber infrastructure and best practices and we help development teams using infrastructure more effectively.
- We are on point for capacity planning and to help teams anticipate and prepare for growth.
What you'll need
- Grit, drive and a deep feeling of ownership.
- BS or MS in Computer Science or a related technical discipline. Equivalent practical experience is a reasonable substitute.
- Experience in the Linux environment and a good understanding of its fundamentals and internals: filesystems and modern memory management, threads and processes, the user/kernel-space divide, etc.
- A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring and storage systems.
- Working knowledge of the TCP/IP stack, internet routing and load balancing.
About Uber SRE
SRE is an organization of engineers who work with our production services throughout their entire life cycle, from design and architecture, through implementation, deployment, and sustaining operation.
SRE delivers important system properties: reliability, performance, efficiency, and scalability, for the products and platforms that our customers use every day.
SREs work in high-performance squads with expertise on large scale system reliability and in-depth understanding of critical business components architecture, as well as dedicated engineering teams building comprehensive tools, platform and infrastructure.
We're bringing Uber to every major city in the world. We need your skills and passion to help make it happen!
Be sure to check out the Uber Engineering Blog and @UberEng to learn more about the team.