Apply now

About the Company

We are looking for a strong core Data Engineer with hands-on experience in PySpark, Databricks, and Azure data platforms to design, build, and support end-to-end data pipelines. The ideal candidate will develop and optimize data transformations, build production-grade Python components, and maintain cloud-native Azure environments while collaborating with application teams and ensuring high-quality, reliable data delivery. This role offers the opportunity to work with large-scale datasets, implement ETL/ELT best practices, optimize Databricks clusters, and leverage modern cloud technologies to support AI/ML initiatives.

About the Role

Location: 3 days Hybrid in Chicago, IL

Duration: 6+ Month Contract

Interview: 2 video interview and final onsite

Responsibilities

  • Design, build, and support end-to-end data pipelines, including ingestion, transformation, validation, and publishing.
  • Develop and optimize SQL and PySpark/Databricks transformations for large datasets.
  • Build production-grade Python modules with logging, error handling, testing, and integration with APIs/files.
  • Create, maintain, and operate Azure Data Factory (ADF) pipelines, including triggers, parameterization, monitoring, and failure handling.
  • Work within Azure environments: ADLS Gen2 (Blob Storage), Azure SQL, Azure App Service, and resource groups.
  • Provision and maintain Azure components using Pulumi (Infrastructure as Code).
  • Optimize Databricks clusters, workflows, and jobs for performance and reliability.
  • Participate in code reviews, documentation, and operational support, including triage and root cause analysis.
  • Collaborate with application teams for integration, troubleshooting, and operational coordination.

Qualifications

  • Education: Bachelor's degree in Computer Science, Engineering, or a related technical field (or equivalent experience).

Required Skills

  • Experience: 5+ years as a Data Engineer; 3+ years in ETL/ELT concepts, PySpark, and SQL.
  • SQL: Advanced querying, CTEs, views, joins, complex transformations, and performance tuning.
  • Python: 2+ years building production-quality modules, unit/integration testing, logging, and CI/CD integration.
  • Databricks: 1+ year working with notebooks, jobs, workflows, external/managed tables, Delta Lake, and basic cluster configuration.
  • Azure Data Factory (ADF): 1+ year creating and maintaining pipelines, including triggers, parameterization, monitoring, and error handling.
  • Azure Cloud: Hands-on with ADLS Gen2, Azure SQL, Azure App Service, and general Azure portal/resource group operations.
  • Infrastructure as Code: Experience provisioning Azure resources with Pulumi.
  • ETL/ELT Concepts: Strong understanding of pipeline patterns, incremental loads, data validation, and troubleshooting.

Preferred Skills

  • Additional Skills (nice-to-have): R for data validation, TypeScript for Pulumi pipelines, Java/.NET for integration, Angular/Spring Boot for minor troubleshooting.