Qubole, the leading cloud-agnostic, big data-as-a-service provider, is passionate about making data-driven insights easily accessible to anyone. Qubole delivers the industry’s first autonomous data platform. The cloud-based data platform, Qubole Data Service (QDS), removes the burden of maintaining infrastructure of multiple big data processing engines, and enables customers to focus on their data. Qubole customers process nearly an exabyte of data every month. Qubole investors include Charles River, Institutional Venture Partners, Lightspeed, Norwest, Harmony and Singtel Innov8.
The Data Apps team collects all data generated internally at Qubole and publishes it for internal consumption as well as builds data driven products out of the same. You will be one of the first engineers on this team and will work closely with Product and Engineering teams and Customers to drive and implement the first ever data driven products in the emerging market segment of Big-Data-as-a-Service.
Design data warehouse using Cloud Storage for data analysis using Hive/Spark etc.
Write ETL pipelines in Hive and Spark/ Presto for loading data into a Cloud based Data Warehouse.
Write agents in Bash/Python/Java to publish data from different distributed data sources
Design and manage real-time data ingestion using Apache Kafka and RoR based web-service.
Work with product and engineering teams on different data driven products and help drive/implement the Data flows for the same.
Perform ad-hoc data analysis to test various hypothesis using SQL/Scala/Python
Mentor a team of engineers while maintaining a hands-on role.
Hands on experience in BigData Technology like Kafka, Spark, Hadoop Stack like M/R, Hive, HBase, Sqoop, etc..NoSQL stores like HBase etc.
Designing, integrating and documenting technical components for seamless data extraction and analysis on big data platform
Hands on experience on ETL scheduling frameworks like Airflow.
Passionate for continuous learning, experimenting, applying and contributing towards cutting edge open source technologies and software paradigms
Ensuring best practices that can be adopted in BigData stack and share across teams and BUs.
Provides operational excellence through root cause analysis and continuous improvement in technologies and processes.
Strong Programming experience with Java, Python, Scala and SQL.
Experience managing software projects.
Bachelors Degree in Computer Science, or related discipline with minimum 7 years experience.
Nice to Have
Prior experience in actively contributing to open source projects like Hadoop/Spark/Hive/HBase
Experience in applied machine learning.
Academic research experience in the area of distributed systems and query processing.
Advanced Degree (MS or PhD in Computer Science).