Job Objective:
The Data Engineer will be responsible for developing real-time and batch data integration pipelines. This role will ensure that data pipelines adhere to defined standards and are included in the established data integration framework. The Data Engineer will work closely with data scientists, analysts, and other stakeholders to ensure data availability for analysis and decision-making, significantly impacting our data-driven initiatives.
Key Role Responsibilities:
- Data Pipeline Development: Create and maintain scalable and efficient data pipelines for data extraction, transformation, and loading (ETL) processes, ensuring data accuracy and consistency.
- Data Warehousing: Collaborate with Sr. Data Engineers and Data Architects to design and manage data lakes, storing and organizing large volumes of structured and unstructured data.
- Data Modeling: Develop data models and schema designs that optimize data storage and retrieval, supporting various analytical and reporting needs.
- Data Integration: Integrate data from multiple sources, including APIs and external databases, for real-time and batch data pipelines.
- Data Quality Assurance: Implement data quality checks, validation, and cleansing processes to maintain data integrity.
- Documentation: Create and maintain documentation for data engineering processes, data schemas, and ETL workflows.
- Performance Optimization: Monitor and fine-tune data pipelines to ensure optimal performance and efficiency.
- Collaboration: Work with cross-functional teams, including data scientists, analysts, and software engineers, to understand data requirements and deliver solutions.
Qualifications, Experience, Skills & Competencies:
Minimum Qualifications:
- Graduate or Post-graduate in Computer Science or a related field is preferred.
Minimum Experience:
- Minimum of 3 years of IT experience with at least 2 years in a similar role.
Job-Specific Skills:
- Knowledge of big data technologies such as Hadoop, Spark, Kafka, and related ecosystems.
- Proficient in SQL and experience with data warehousing platforms like Snowflake, Redshift, or BigQuery.
- Strong experience with ETL/ELT tools such as Alteryx, Talend, or Informatica.
- Proficiency in programming languages, particularly Python, Java, or Scala.
- Experience with database technologies, including SQL and NoSQL databases.
- Familiarity with cloud platforms and services such as AWS, Azure, or Google Cloud.
- Experience with data orchestration tools like Apache Airflow or Informatica.
Behavioral Competencies:
- Practical understanding of agile methodologies.
- Ability to work in small teams structured around delivering data products.
- Proven problem-solving skills for unstructured, uncertain problems.
- Data-oriented, analytical, performance-driven, and assertive.
Technical Competencies:
- ETL Tools (Informatica, Google DataFlow, Data Factory): Expert
- Real-Time (Apache Kafka, Google Pub-Sub): Expert
- Data Modeling / Data Integration: Advanced
- GBQ or Synapse or Oracle Autonomous DWH: Advanced
- PowerBI / Alteryx: Basic
- Python / Jupyter Notebooks: Advanced