Job Details
Skill Requirements Exp: 8yrs+ with minimum 4+ years experience, specifically in data engineer or ETL developer roles 5+ years in SQL: Programming complex queries, dynamic SQL, stored procedures, user-defined functions, performance tuning 3+ years in ETL: Data pipelines, ETL concepts, and frameworks, data-oriented cloud architecture, data warehousing, scalable technologies (such as column-store databases and Spark) 3+ years in data-oriented cloud services: Preferably AWS (S3, RDS, Redshift, Athena, Glue), Databricks, or related Azure or GCP is also acceptable 1+ years with Python, Pandas, PySpark (SAS and R beneficial) 1+ years with Linux command line, shell scripting, and Git / GitHub.
Program in SQL, Python, Linux bash, and cloud services APIs to automate the processing of patient-level healthcare data and aggregated public health data Develop and implement interactive analytic dashboards and data visualizations Build algorithms for fuzzy matching, de-duplication, and rule-based de-identification Leverage a range of cloud services, ETL frameworks, and libraries, such as AWS (EC2, RDS, S3, Lambda, Redshift, Athena, Glue), Databricks, Postgres, Spark SQL, Python, Pandas, PySpark, Apache Airflow Work with stakeholders to develop key metrics across our pipelines Validate and track metrics down to the data source and through our pipelines.
Extract valuable insights and provide the product, operations, and engineering suggestions to improve our pipelines and product Educational Qualification: Computer Science degree from a competitive university program (Alternatively, 8+ years of relevant experience with a history of skills progression and demonstrated accomplishments