Job Summary
The Data Engineer is responsible for designing, developing, and managing data pipelines and architectures. This role ensures seamless data flow from multiple sources into databases, data lakes, or warehouses, supporting business analysis and decision-making. The ideal candidate will have strong software engineering skills, experience with data processing frameworks, and expertise in optimizing and scaling large data systems.
Key Responsibilities
Design, build, and maintain scalable data pipelines to support data collection, transformation, and integration from various sources.
Maintain data systems such as databases, warehouses, and lakes, ensuring integrity, scalability, and reliability.
Implement efficient ETL/ELT processes to automate data collection, cleaning, and transformation.
Integrate data from internal and external sources, including APIs and third-party systems.
Optimize data architectures for performance and cost efficiency.
Collaborate with the data team to deliver reliable, clean, and accessible data solutions.
Implement validation and monitoring processes to ensure data accuracy and integrity.
Manage and optimize cloud-based infrastructure for data storage and processing.
Automate repetitive tasks using scripting languages like Python.
Document data architectures and workflows for technical clarity and troubleshooting.
Collaborate with business and technical stakeholders to define requirements and translate them into actionable deliverables.
Perform live user verification and testing to ensure error-free deployment.
Provide support for data extraction and reporting as needed.
Qualifications
Bachelor's degree in Computer Engineering, Data Science, Statistics, or related IT fields.
Knowledge of Agile methodologies (e.g., Scrum, Kanban).
Experience with cloud computing platforms (e.g., Azure, AWS).
Proficiency in coding languages such as SQL, Python, JavaScript, and commands like Git and Linux.
Familiarity with software and platforms like Power BI, Microsoft SQL Server, Databricks, and Azure.
Expertise in ETL/ELT processes, API integration, data modeling, and data quality governance.