Data Engineer
About the Company
We are looking for a strong core Data Engineer with hands-on experience in PySpark, Databricks, and Azure data platforms to design, build, and support end-to-end data pipelines. The ideal candidate will develop and optimize data transformations, build production-grade Python components, and maintain cloud-native Azure environments while collaborating with application teams and ensuring high-quality, reliable data delivery. This role offers the opportunity to work with large-scale datasets, implement ETL/ELT best practices, optimize Databricks clusters, and leverage modern cloud technologies to support AI/ML initiatives.
About the Role
Location: 3 days Hybrid in Chicago, IL
Duration: 6+ Month Contract
Interview: 2 video interview and final onsite
Responsibilities
- Design, build, and support end-to-end data pipelines, including ingestion, transformation, validation, and publishing.
- Develop and optimize SQL and PySpark/Databricks transformations for large datasets.
- Build production-grade Python modules with logging, error handling, testing, and integration with APIs/files.
- Create, maintain, and operate Azure Data Factory (ADF) pipelines, including triggers, parameterization, monitoring, and failure handling.
- Work within Azure environments: ADLS Gen2 (Blob Storage), Azure SQL, Azure App Service, and resource groups.
- Provision and maintain Azure components using Pulumi (Infrastructure as Code).
- Optimize Databricks clusters, workflows, and jobs for performance and reliability.
- Participate in code reviews, documentation, and operational support, including triage and root cause analysis.
- Collaborate with application teams for integration, troubleshooting, and operational coordination.
Qualifications
- Education: Bachelor's degree in Computer Science, Engineering, or a related technical field (or equivalent experience).
Required Skills
- Experience: 5+ years as a Data Engineer; 3+ years in ETL/ELT concepts, PySpark, and SQL.
- SQL: Advanced querying, CTEs, views, joins, complex transformations, and performance tuning.
- Python: 2+ years building production-quality modules, unit/integration testing, logging, and CI/CD integration.
- Databricks: 1+ year working with notebooks, jobs, workflows, external/managed tables, Delta Lake, and basic cluster configuration.
- Azure Data Factory (ADF): 1+ year creating and maintaining pipelines, including triggers, parameterization, monitoring, and error handling.
- Azure Cloud: Hands-on with ADLS Gen2, Azure SQL, Azure App Service, and general Azure portal/resource group operations.
- Infrastructure as Code: Experience provisioning Azure resources with Pulumi.
- ETL/ELT Concepts: Strong understanding of pipeline patterns, incremental loads, data validation, and troubleshooting.
Preferred Skills
- Additional Skills (nice-to-have): R for data validation, TypeScript for Pulumi pipelines, Java/.NET for integration, Angular/Spring Boot for minor troubleshooting.