Engineering Manager - Distributed Tracing at Uber
New York City, NY, US
At Uber, we ignite opportunity by setting the world in motion. We take on big problems to help drivers, riders, delivery partners, and eaters get moving in more than 600 cities around the world.
We welcome people from all backgrounds who seek the opportunity to help build a future where everyone and everything can move independently. If you have the curiosity, passion, and collaborative spirit, work with us, and let’s move the world forward, together.
The Observability team builds the tools and systems that every engineering team at Uber uses to develop, scale, understand, and monitor their systems. These tools are absolutely critical to Uber – without them it would be impossible to understand and debug problems in an environment with over three thousand microservices, hundreds of thousands of CPU cores in multiple DCs and the cloud, and hundreds of thousands of concurrent trips around the world.
The Observability suite includes:
- Jaeger, our open source enterprise Golang tracing system. Provides actionable insight into individual flows through our microservice architecture, and comprehension of the entirety of Uber’s software ecosystem.
- M3, our open source enterprise metrics stack. It handles hundreds of millions of emitted metrics per second, used to monitor and alert for every product and microservice at Uber.
- Synoptic, our Uber-aware dashboarding system which displays context-sensitive information from across the Uber ecosystem, enabling quick detection and mitigation of issues.
- Our deeply integrated On-Call Experience suite of tools, which gives on-call engineers everything they need to raise, track, and close outages incidents, to track the SNR of alerts, and to drive improvements in their team health by reducing alert load.
- Blackbox, our system for externally monitoring our critical business endpoints, via emulated workflows.
- A new system under development to provide enterprise logging, with deep integration into our Observability stack, including alerting, linkage to traces, etc.
Jaeger is Uber’s open-source Distributed Tracing system, designed to provide real time performance monitoring and profiling for distributed architectures. Inspired by Google’s Dapper and OpenZipkin, Jaeger is a complete redesign based on the new OpenTracing standard. Since its first deployment in production about a year ago over 600 microservices have been integrated with Jaeger, with many hundreds more to come.
The project has recently been open sourced and the team is working with other major tech companies to make this the leading tracing project for large scale distributed tracing systems worldwide. Check out our Distributed Tracing blog post:https://eng.uber.com/distributed-tracing/
What You’ll Do:
Current major priorities that a new engineering manager can help with:
- Building innovative new features that will surface tracing data in new ways across the company.
- Scaling our platform to handle larger amounts of data and support the new features than previously.
- Building out the team as a high priority for Uber to support our growing infrastructure.
What You’ll Need
- Highly effective communication skills - needs to drive discussions / roadmaps with other engineering leaders within Uber and negotiate conflicting priorities.
- Work closely with tech leads on the team, complementing their vision of architecture and design with a focus on organizational relationships, staff development, and overall team health.
- Planning and organizational skills - able to lead a highly structured planning process (Uber uses the OKR - Objectives and Key Results framework) that produces clear technical and product direction for Blackbox in collaboration with numerous engineering stakeholders.
- Software engineering experience - architecture and design of software, development processes and tools, testing, software forensics, change management processes and tools, documentation.
- Systems engineering experience - distributed systems architecture, SOA and microservices, system management, and system configuration.
- Quantitative and analytical skills - measurement and analysis of systems, scalability analysis.
- Advanced managerial acumen. We are looking for an experienced manager that is more excited about the scale of their impact than the size of their empire.
- BS/MS/PhD in Computer Science or a related field.
This role requires some travel to San Francisco (Uber HQ) to participate in discussions with key stakeholders about requirements and needs.