Stay in Touch



Staff Data Engineer






Bangalore, IN


Qubole, the leading cloud-agnostic, big data-as-a-service provider, is passionate about making data-driven insights easily accessible to anyone. Qubole delivers the industry’s first autonomous data platform. The cloud-based data platform, Qubole Data Service (QDS), removes the burden of maintaining infrastructure of multiple big data processing engines, and enables customers to focus on their data. Qubole customers process nearly an exabyte of data every month. Qubole investors include Charles River, Institutional Venture Partners, Lightspeed, Norwest, Harmony and Singtel Innov8.

The Data Apps team collects all data generated internally at Qubole and publishes it for internal consumption as well as builds data driven products out of the same. You will be one of the first engineers on this team and will work closely with Product and Engineering teams and Customers to drive and implement the first ever data driven products in the emerging market segment of Big-Data-as-a-Service.

Your Part

        Design data warehouse using Cloud Storage for data analysis using Hive/Spark etc.
        Write ETL pipelines in Hive and Spark/ Presto for loading data into a Cloud based Data Warehouse.
        Write agents in Bash/Python/Java to publish data from different distributed data sources
        Design and manage real-time data ingestion using Apache Kafka and RoR based web-service.
        Work with product and engineering teams on different data driven products and help drive/implement the Data flows for the same.
        Perform ad-hoc data analysis to test various hypothesis using SQL/Scala/Python
        Mentor a team of engineers while maintaining a hands-on role.

Must Have

        Hands on experience in BigData Technology like Kafka, Spark, Hadoop Stack like M/R, Hive, HBase, Sqoop, etc..NoSQL stores like HBase etc.
        Designing, integrating and documenting technical components for seamless data extraction and analysis on big data platform
        Hands on experience on ETL scheduling frameworks like Airflow.
        Passionate for continuous learning, experimenting, applying and contributing towards cutting edge open source technologies and software paradigms
        Ensuring best practices that can be adopted in BigData stack and share across teams and BUs.
        Provides operational excellence through root cause analysis and continuous improvement in technologies and processes.
        Strong Programming experience with Java, Python, Scala and SQL.
        Experience managing software projects.
        Bachelors Degree in Computer Science, or related discipline with minimum 7 years experience.

Nice to Have

        Prior experience in actively contributing to open source projects like Hadoop/Spark/Hive/HBase
        Experience in applied machine learning.
        Academic research experience in the area of distributed systems and query processing.
        Advanced Degree (MS or PhD in Computer Science).

Apply for the job

Subscribe to our blog.


Blog & Newsletter Signup