Stay in Touch



Sr. Data Scientist - Life Sciences



Health Catalyst



Boston, MA, US / Berlin, DE


About Health Catalyst

Health Catalyst was named as one of the 30 Best Workplaces in Technology by Fortune Magazine and the 11th best place to work by Glassdoor.  Health Catalyst earned the highest overall score in Healthcare BI by KLAS and was named to the World’s Best 100 cloud companies by Forbes.  Health Catalyst analyzes healthcare records of almost a third of the US population (65 million patients) and recently released the first open source software for healthcare machine learning:

Health Catalyst’s platform and applications are being used at leading health systems including, John Muir Health, UPMC, MultiCare Health System, Partners HealthCare, Banner Health, Stanford Hospital & Clinics, Texas Children’s Hospital, and over 40 others. Health Catalyst products and services are utilized in over 400 hospitals and 4000 clinics, supporting over 90 million patients.

In line with its mission to achieve massive, sustained healthcare outcome improvements, Health Catalyst recently launched a new Life Science business unit based in Cambridge, MA, that will develop "patient-first" solutions to support drug development. The new business unit will support the life science industry and regulators to improve probability of R&D pipeline success, optimize clinical trial design and enrollment, and perform post-launch RWD retrospective studies; grounding decisions in real-time, ethical access to massive, high quality, real world data across all therapeutic indications to tackle unmet medical needs and further improve outcomes for patients.

Our team lives the cultural attributes of Smart, Hardworking, and Humble.  Learn more about working at Health Catalyst here:


Job Summary

The Senior Data Scientist will be responsible for designing, developing and coding methods, processes, and systems to consolidate and analyze structured and unstructured, diverse big data sources from the Health Catalyst ecosystem serving 65 million patients, to generate actionable insights and solutions for services to Life Science industry and regulators and new product design/development. Specifically, the SDS will develop reproducible methods (utilizing Jupyter Notebook/Google Datalab technology) to robustly identify crucial data elements from existing data sources (e.g. clinical data from EHR, biomarkers from pathology/genomics sources, outcomes from integrated data sources). The SDS must have broad understanding of different data science approaches (e.g. natural language processing, neural networks, hypothesis testing, classification, regression, clustering, feature allocation, deep learning, time-series analysis, network modeling, feature selection/engineering) and have a very “applied science” approach (find the best hammer for the specific nail, rather than the most sophisticated hammer for all problems). The SDS will interact with product and service teams as well as with clients as required, to identify questions and issues for data analysis and experiments. The SDS will be able to interpret and communicate insights and findings from analysis and experiments to product, service, and business managers, as well as clients as required.


We seek an SDS who will have the  ability to drive projects, and who could eventually lead a team. The SDS will report to the Director of Life Science Analytics, and will support the Director and the SVP/head of the business unit in engaging both with internal  management as well as client  management to grow successfully the business unit and demonstrate impact on drug development R&D and ultimately patient outcomes. The role has great potential for a successful candidate as they will  form the initial seed of a new business unit with significant patient outcome and business potential. This role will ideally be based in Cambridge, MA but can also be in Salt Lake City, UT or remote as long as candidate is willing to travel regularly.


Duties & Responsibilities

  • Establish Life Science Analytics prototype platform – Guiding the effort to produce a first prototype of the life sciences Analytics platform, in collaboration with other members of the team (biostatisticians and clinical data analysts) developing and embedding the first data science packages leveraging HealthCatalyst data for Life Science Analytics into a more generalizable framework that can be re-utilized and re-deployed to solve other similar problems, leveraging Google Datalab platform/Jupyter Notebooks.
  • Develop processes to ingest selected Health Catalyst data sources to the platform – build robust methods to ingest data on a large-scale (close to real-time) key data sources (e.g. treatment, biomarkers and survival data sources) that will be gradually prioritized by Clinical Data Analysts in the team, with key QC processes embedded in the ingestion mechanism and basic dashboards to monitor and present data ingestion summary statistics.
  • Design, evaluate and develop analytics solutions – Work with other members of the team (clinical data analysts, data engineers, biostatisticians) to rapidly prototype solutions that allow to transform Health Catalyst real-world clinical and biomarker data into actionable insights using robust data science and statistical methods and insightful data visualizations easily approachable by a variety of users. Guide  specific projects with stronger data science components (e.g projects where neural networks are used to identify structured data from unstructured data, or predictive analytics solutions utilizing machine learning)
  • Execute rapid prototyping projects for internal R&D, publications and clients – Identify with Director and head of unit key initial projects to be executed for internal R&D, external publications and/or clients, where SDS depending on project will contribute as part of a larger team or take a project from start to end, from initial requirements and feasibility assessment to execution and delivery/communication of results, depending on the specifics of the project.
  • Project status communication – Clearly articulate to clients and internal  management status of projects and provide insight to any additional support needed for a viable success of the implementation. Participate in relevant internal meetings, providing feedback on processes and strategic issues. Engage with broader HC community and with clients to rapidly shape the business unit and make it successful.
  • Provide guidance, training and knowledge transfer to other HC teams as needed – as the business unit will start developing solutions, working with clients and interacting with the broader RWD life sciences community, all team members will also be expected to support existing HC teams across the organization to receive guidance, learn and be trained in areas pertaining drug development, clinical trials, translational research, to support and enhance existing products, services and business relationships.
  • Managing collaborations/internships as needed – While initially the role is as individual contributor, as the business unit will be engaging in collaborations with Boston based colleges and Institutes, it is expected that the SDS will take on students/interns and/or collaborations with academic labs to further the impact of the team. It is also foreseen that the SDS can grow into a team lead as the BU will develop.


Required Skills

  • Machine learning experience required
  • Natural language processing experience preferred
  • Shiny, Spyre, Flask, WebDev and prototyping experience preferred
  • JupyterHub, Sun Grid Engine, Google Cloud Platform, AWS experience preferred
  • Experienced in data science methodologies and techniques, e.g. hypothesis testing, classification, regression, clustering, feature allocation, deep learning, time-series analysis, network modeling, feature selection/engineering.
  • Significant scientific background with prior experience in clinical research
  • PhD in relevant domain highly preferred (Data Science, Machine Learning ideally applied to health-related insight generation)
  • Ability to derive robust insights from complex RWD datasets
  • Must have experience in performing data analysis utilizing statistical frameworks (R, SciPy) and ideally experienced already in using environments such as Jupyter Notebooks/Google Datalab
  • Some data engineering experience (ETL, SQL) desirable
  • Must be comfortable working in rapid prototyping environment, using agile approaches to guiding teams and projects
  • Some understanding of drug development process and use of biomedical data for drug development and clinical trials design preferred
  • Strong communication skills; ability to guide  small projects, interacting with clients and internal management.
  • Ability to mentor/train/educate others, including interns/students at collaborating Institutions.
  • Good interpersonal skills with the ability to effectively guide and facilitate meetings.
  • Must possess advanced presentation skills and be comfortable presenting to internal management, clients and at conferences.  Ability to present key concepts to small and large audiences effectively.  Ability to communicate in a group setting, concisely at a detail level when needed.
  • Ability to adapt quickly and change direction as needed to address evolving needs of business unit for both R&D and client needs.
  • Ability to learn new skills and software tools quickly and ask for help or training when necessary.


Work Experience

  • Required:  8+ years of relevant experience in data science/machine learning, of which at least 3 years applied to complex health related questions and insights
  • Preferred: PhD in relevant discipline (e.g. data science, machine learning ideally applied to biomedical domain)
  • Experience working in the life sciences industry or with/for the life science industry strongly preferred
  • Experience working with Real World Data strongly preferred.
  • 1+ year experience effectively communicating to clients/conferences/management required
  • Experience of working with or as part of clinical/drug development teams preferred
  • Experience managing small teams/students preferred
  • 1+ year project management experience preferred.


Education & Relevant Experience

  • Required - Bachelor’s degree in a scientific/quantitative subject
  • Preferred – PhD, with a strong component of data science and ideally applied to the life sciences
  • Required – Expert in data science and machine learning, min 3 years experience in applying these disciplines to gain relevant insights in the field of health, drug discovery, translational research
  • Secondary skills/attributes (highly desired);
    • Subject matter expertise in some clinical/biological domains
    • Exposure to formal statistics, especially in the context of clinical trial design
    • Project management 


The above statements describe the general nature and level of work being performed in this job function.  They are not intended to be an exhaustive list of all duties, and indeed additional responsibilities may be assigned by Health Catalyst.

Apply for the job

Subscribe to our blog.


Blog & Newsletter Signup