Stay in Touch



Bioinformatics Data Engineer, Oncology



Health Catalyst



Boston, MA, US


About Health Catalyst

Health Catalyst provides data and analytics software and services to help providers and risk-bearing entities unleash their data to operate in a data-informed manner, driving improvements in their clinical and financial operations.  Health Catalyst was named as one of the 30 Best Workplaces in Technology by Fortune Magazine and the 11th best place to work by Glassdoor.  Health Catalyst’s platform and applications are being used at leading health systems including, John Muir Health, UPMC, MultiCare Health System, Partners HealthCare, Banner Health, Stanford Hospital & Clinics, Texas Children’s Hospital, and over 40 others; enabling the Company to analyze healthcare records of over 100 million patients. Our team lives the cultural attributes of Smart, Hardworking and Humble.  Learn more about working at Health Catalyst here:


Job Summary

The Oncology Bioinformatics Data Engineer will be responsible for analyzing, integrating, QC’ing, harmonizing, and documenting molecular and clinical data across leading healthcare organizations throughout the U.S. The incumbent will work with genetics and genomics data from next generation sequencing, as well as transcriptomic, proteomic, pathology, and other biomarker data from a diverse set of clients and disease spaces. Supporting the heavy emphasis of cancer precision medicine in clinical practice, the Oncology Bioinformatics Data Engineer will focus largely on oncology-specific clinical and molecular data. The Oncology Bioinformatics Data Engineer will support the Senior Director for Precision Medicine in drafting, prototyping, and implementing a harmonized data schema that rationalizes molecular and clinical data from diverse sources.

The ideal candidate will have a passion for ‘omics data, interrogating complex client data and identifying nuances that will need to be accounted for; deep understanding of cancer diagnosis, treatment, and other clinical data; and will have strong data analysis skills to validate and enforce logical consistency within and across columns and tables of data.


Duties & Responsibilities

  1. (Implement ETL for raw molecular data across Health Catalyst clients, and ensure proper documentation for all data.
  2. Perform manual and automated QC across data sources, identifying and managing issues as they arise with transparently implemented measures.
  3. Collaborate with data engineers across Health Catalyst to follow best practices in ETL, and colleagues in the new Life Sciences Business to leverage expertise in molecular data.
  4. Maintain data dictionaries, ERDs, and other metadata as required.
  5. Propose and implement a harmonized data model that handles data such as NGS DNA Seq data from germline testing, somatic DNA testing, WGS, targeted panels, transcriptomics, pathology-based individual gene/protein from IHC/PCR, pharmacogenomics, and more.
  6. Investigate and develop a strategy for providing standardized annotations across diverse client data (e.g. gene names).
  7. Serve as subject matter expert for colleagues across Health Catalyst and clients who have questions about data provenance, and when appropriate, as a point of contact for feasibility of ingesting new data sources.


Recommended Skills

  • Deep understanding of at least one type of molecular profiling data
  • Experience with cancer-specific clinical data ontologies (diagnoses, treatments, outcomes) such as ICD-0/OncoTree and data sources
  • Exposure to private/local data sets and/or public data sets and resources (e.g. dbSNP, COSMIC, gnomAD, GDC, TCGA, 1000 Genomes)
  • Understanding of the nuances across different versions of molecular data (e.g. genome builds, transcript sources)
  • Strong data analysis skills using SQL
  • Ability to manipulate and build data pipelines using Python
  • Strong written and oral communication skills


 Desired Skills

  • Organized with strong documentation skills
  • Experience working with a variety of data formats (e.g. BAM, FASTQ, VCF, MAF)


Education & Relevant Experience

  • BS/BA in Bioinformatics or related background. PhD or M.S. preferred
  • 5+ years working with molecular data (genetics/genomics, transcriptomics, etc.)
  • 5+ years in a cancer-specific research or clinical setting
  • 3+ years working with SQL and Python



The above statements describe the general nature and level of work being performed in this job function.  They are not intended to be an exhaustive list of all duties, and indeed additional responsibilities may be assigned by Health Catalyst.

Apply for the job

Subscribe to our blog.


Blog & Newsletter Signup