Qubole, the leading cloud-agnostic, big data-as-a-service provider, is passionate about making data-driven insights easily accessible to anyone. Qubole delivers the industry’s first autonomous data platform. The cloud-based data platform, Qubole Data Service (QDS), removes the burden of maintaining infrastructure of multiple big data processing engines, and enables customers to focus on their data. Qubole customers process nearly an exabyte of data every month. Qubole investors include Charles River, Institutional Venture Partners, Lightspeed, Norwest, Harmony and Singtel Innov8.
We have a rapidly growing footprint on AWS, a fast-growing customer base and up and coming services on GCE and Azure as well. We strongly believe in automating and codifying as much of our operational procedures as possible. Running this service securely, reliably and within budget is a hard problem. As one of our dedicated Production Engineer - you would be responsible for managing the Cloud infrastructure end to end which is powered by Kubernetes, ELK, Ruby on Rails, Jenkins, Statsd, Chef, MySQL, Redis across cloud environments like AWS, Azure and GCP.
Do you understand the challenges with managing dozens of production environments across regions on various public clouds as a SAAS platform, then we would definitely love to talk to you.
What you will be doing
· Sreamlining developer / Devops flows to maximize productivity and reduce time/manual intervention from code checkin to production deployment
· Design, Document and Rollout distributed monitoring for both application and infrastructure at scale and self healing techniques.
· Design and implement cloud containerization frameworks for containerization of applications, services and accomplished service life cycle management.
· Audit and advise cost optimization techniques to reduce the operational cost for the organization including any recommended automation or tooling.
· Participates in technical review and be the advocate for maintainable infrastructure. Also influence design decisions which may impact site reliability or scalability.
· Identify areas to improve service resiliency through techniques such as chaos engineering, performance/load testing, game days, etc...
· Work with our engineering team to plan, deploy, and maintain critical business applications.
· Identify and help migrate our legacy systems out of chef and into public cloud environment using containers.
· Introduce new technologies and related tools to grow our agile development environment.
· Assisting with capacity planning and scalability to ensure systems are optimized for continuous growth.
· Work with security architects to ensure systems are secure and compliant with industry best practices.
· Be involved in buy vs build decisions as applies to deployment costs, strategy and topology support.
· Mentor junior engineers and help grow them into excellent engineers.
· Lead the transition of infrastructure from chef based on AWS, Azure and GCP to K8S based infrastructure designed for any cloud.
· Design and Implementation of Infrastructure as Code using Terraform across all public clouds to manage the life cycle of systems.
· Work with our development teams on architecture or re-architecture of existing systems for improved scale and reliability
· Design and Implement tools/bots so that operations can scale up from operating single environment to ten’s of production environment across regions in public clouds.
· Thought process around our automation tools so that they become part of our product which will be shipped to customers for operating.
Required Experience & Skills
· Engineering degree in Computer Science and at least 10+ years of experience in a similar job profile
· Expert automation skills with Python or Ruby or Go
· Strong system administration background for Linux based systems
· Large scale production experience with Kubernetes, EKS, or AKS or GKE
· Operational expertise around deploying and managing components like MySQL, Nginx, ElasticSearch, Java Applications, RoR, Load Balancers.
· Comfortable with networking fundamentals like Firewalls, Subnetting, Routing, etc
· Experience working with config and deploy management tools like Chef, Puppet, Ansible or Salt
· Monitoring and logging with ELK, Datadog, Signalfx, Graphite, Statsd, Prometheus
· Expert in cloud orchestration tools like terraform
· Strong understanding of various design and architecture patterns
· Experience in optimization/troubleshooting issues that span public clouds, systems, network, and code.
· Good RESTful API and systems design sensibilities
Qubole is hitting that growth inflection point where we need talented people to help us scale up. Our company culture is special, and we are looking for people to join us who want to continue building a great company while going after the big data activation market.
Culture at Qubole
Trust and Autonomy: We absolutely pride ourselves on the lack of bureaucracy at work, and believe in delegating power and responsibility, aggressively to our employees.
Transparency and Teamwork: Complete transparency in all our thoughts and actions is integral to our genetic character, and it helps us to stick together and function effectively as a team.
Who Thrives: If you are a self-starter and thrive on complexity and independence and truly understand and live the tenets of humility, hunger and honesty and you will love Qubole.
Qubole is an Equal Employment Opportunity employer that proudly pursues and hires a diverse workforce. Qubole does not make hiring or employment decisions on the basis of race, color, religion or religious belief, ethnic or national origin, nationality, sex, gender, gender-identity, sexual orientation, disability, age, military or veteran status, or any other basis protected by applicable local, state, or federal laws or prohibited by Company policy. Qubole also strives for a healthy and safe workplace and strictly prohibits harassment of any kind.