Stay in Touch

TITLE

 

Site Reliability Engineer

COMPANY

 

Mist

LOCATION

 

Cupertino, CA, US

Description

The SRE role at Mist is responsible for architecting and operating our infrastructure reliably and cost effectively. We operate over 50 microservices and 100 Apache Storm topologies and use a handful of open source distributed systems like Kafka, Cassandra, Zookeeper, Apache Storm and Mesos to name a few. You can find more details about our infrastructure in this blog. SREs work in both the dev and systems world, instrumenting key parts of core architecture while supporting developers as they try to do the same. Expect to work with a variety of programming languages and diving into all sorts of distributed systems. Other than working with open source distributed systems, get ready for some highly scalable in-house distributed systems, e.g. live-aggregators. An SRE at Mist involves dealing with some `real’ chaos as we are primarily running on AWS Spot Instances. If there’s a problem that you are manually fixing very frequently, we’d rely on you as the SRE to automate it.

What exactly will you do:

  • Work closely with developers in supporting new features and services
  • Monitor site stability and performance
  • Troubleshoot site issues and take measures to prevent them from happening again
  • Develop tools to better monitor Mist Cloud
  • Participate in on-call rotation

What are we looking for:

  • Knowledge of distributed systems like Kafka, Cassandra, zookeeper
  • Strong programming skills
  • Good understanding of Linux
  • Understanding of fundamental networking technologies
  • Knowledge of configuration tools like puppet, chef, or ansible

Apply for the job

Subscribe to our blog.


 

Blog & Newsletter Signup