About the Role
As an SRE Engineering Manager for Uber ATG you will lead the initiation of a broad SRE platform supporting self driving vehicle services. You will develop plans and set objectives with the team, negotiate engagement details with your development partners and business owners, and organize the work of your SRE squad. You will be responsible for measuring and reporting on reliability, systems capacity, performance and the overall maintainability of systems. As this is a ground floor opportunity, you will help recruit and grow the team to foster success in reliability engineering for Uber ATG.
What You'll Do
- Build, grow, and lead the vision for the SRE Platform at Uber ATG
- Lead incident response, postmortem analysis, and review as well as coordinate on-call for your share of ATG's most critical systems
- Develop reliability tools and frameworks for use by all engineers
- Automate reliable deployment and rollback tooling for service deploys
- Work with development partners to shape the architecture, design, and implementation of new and existing systems to enhance their reliability performance, efficiency, and scalability
What You'll Need
- A minimum of 5 years in a site reliability engineering lead role
- Prior experience programming in C/C++, Ruby, Python, Java, or Go, and an ability to pick up new ones
- Experience deploying infrastructure in public cloud environments (Amazon AWS, Google Cloud, etc.)
- Demonstrated expertise in measurement, monitoring, and alerting
- Experience in the Linux environment and a good understanding of its fundamentals and internals: filesystems and modern memory management, threads and processes, the user/kernel-space divide, etc.
- Working knowledge of the TCP/IP stack, internet routing and load balancing
- A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring and storage systems
- Passion, drive and a deep feeling of ownership
Bonus Points (optional)
- A history of capacity planning and helping teams anticipate and prepare for growth
- Experience with traditional infrastructure configuration management and provisioning technologies such as Puppet, Chef, Ansible, Saltstack, Vagrant, Packer, Terraform, Cloudformation, etc.
- Experience architecting, building, and supporting distributed systems for batch processing and cluster computation on Mesos or Kubernetes
- You're curious or experienced in Tensorflow, Pytorch, Caffe and other libraries for ML
About the Team
The Infrastructure Platform consists of the software components, services and tooling required to enable offboard software systems to access cluster-oriented compute, storage and networking resources, reliability and at scale. These components are critical to the development of self-driving technologies, and contributions in this area are directly impactful to the progress at which self-driving becomes a reality. The team works across various groups, including autonomy and research teams, to enable development velocity in areas such as metrics computation, deep learning, and simulation. The SRE Platform will work across all of these domains in conjunction with Infrastructure Platform leads.
At Uber, we believe technology has the power to make transportation more efficient, accessible, and safer than ever before. Self-driving technology has the potential to make these benefits an everyday reality for our customers, but it's not going to happen overnight. Building best-in-class self-driving technology will take time, and safety is our priority every step of the way. Operating inclusively and transparently, while displaying responsible behavior in a structured development are critical to safety. We at ATG seek candidates who will role model these values.
At Uber, we ignite opportunity by setting the world in motion. We take on big problems to help drivers, riders, delivery partners, and eaters get moving in more than 600 cities around the world.
We welcome people from all backgrounds who seek the opportunity to help build a future where everyone and everything can move independently. If you have the curiosity, passion, and collaborative spirit, work with us, and let's move the world forward, together.