Reliability Platform is a mission within Uber Infrastructure that is charged with measuring, monitoring, and providing tools for quick or even automated remediation for every outage for every mission-critical service at Uber, as well as ensuring that our critical services can scale with the rapid growth of our business.
With teams in San Francisco, New York City, and Europe, the Reliability Platform team develops and provides a portfolio of measurement, monitoring, tracing, logging, dependency comprehension, stress test, and on-call experience platforms to keep Uber's products highly reliable and scalable, and to provide extreme leverage to software developers across Uber.
The Blackbox / Hailstorm team seeks a seasoned distributed systems engineering manager and leader capable of working effectively with a seasoned engineering team. It will require solid experience building large-scale distributed systems and the ability to work and negotiate effectively with senior stakeholders.
About Blackbox and Hailstorm
There are two major products that this team owns and develops.
Blackbox monitoring. This is a platform for running test cases which simulate Uber's core business flows, hitting the same external endpoints that our client-facing applications use. We build and operate this world-class external monitoring system deployed independently of Uber's infrastructure on multiple cloud providers (AWS, GCP). Blackbox is often the first system to detect major outages, and is invaluable to the company as it moves fast in bringing new features to market.
Now, we're expanding our scope to active probing as a platform within Uber's production zones. This will unlock a whole wave of new capability for engineers at Uber to monitor their systems for correct behavior, and add to the team's positive impact on the business.
Hailstorm. This is a platform that guarantees that Uber can continue to scale at a fantastic pace. It runs thousands of integration and load tests against individual services at Uber. It also generates hundreds of thousands of simulated core business flows (rides, EATS orders, etch.) in Uber's production environment to stress the platform and measure the impact of that stress on our systems.
Hailstorm also plans to build a platform which can fully automate the load testing for both core trip flows and individual services by monitoring live production traffic, utilizing the forecast throughput as the target, and wisely throttling tests to mitigate the stress impact to our production environment.
Our software engineering efforts are just getting started, so come help create the platforms that will help all engineers at Uber and tackle these challenges with a tight-knit team of experienced engineers.
Required Skills and Expertise
- Highly effective communication skills - needs to drive discussions / roadmaps with other engineering leaders within Uber and negotiate conflicting priorities.
- Work closely with tech leads on the team, complementing their vision of architecture and design with a focus on organizational relationships, staff development, and overall team health.
- Planning and organizational skills - able to lead a highly structured planning process (Uber uses the OKR - Objectives and Key Results framework) that produces clear technical and product direction for Blackbox and Hailstorm in collaboration with numerous engineering stakeholders.
- Software engineering experience - architecture and design of software, development processes and tools, testing, software forensics, change management processes and tools, documentation.
- Systems engineering experience - distributed systems architecture, SOA and microservices, system management, and system configuration.
- Quantitative and analytical skills - measurement and analysis of systems, scalability analysis.
- Advanced managerial acumen. We are looking for an experienced manager that is more excited about the scale of their impact than the size of their empire.
At Uber, we ignite opportunity by setting the world in motion. We take on big problems to help drivers, riders, delivery partners, and eaters get moving in more than 600 cities around the world.
We welcome people from all backgrounds who seek the opportunity to help build a future where everyone and everything can move independently. If you have the curiosity, passion, and collaborative spirit, work with us, and let's move the world forward, together.