What We Need
We are seeking an experienced Senior Data Engineer / Data Expert with a strong background in data streaming, Apache Kafka, and Airflow to join our dynamic data engineering team. This role will involve managing and enhancing a large-scale infrastructure designed for extensive social media data scraping, integration into a data lake house, and coordination across multiple data-centric teams. This position requires expertise in handling complex data pipelines, identifying and resolving security vulnerabilities, and ensuring the optimal storage and retrieval of data within our systems.
What You'll Do
- Design, develop, and maintain scalable data pipelines to support AI model development.
- Design, build, and maintain efficient, scalable, and reliable data pipelines using Apache Kafka, streaming services, and Airflow.
- Coordinate with the data acquisition team to ensure seamless data flow while addressing and resolving any security vulnerabilities identified in the Airflow and Kubernetes (K8) setup
- Implement data solutions to handle large volumes of structured and unstructured data, including videos, audio, images, and text.
- Collaborate with AI researchers, machine learning engineers, and software engineers to ensure data is available and ready for model training.
- Ensure data quality, integrity, and security throughout the data lifecycle.
- Optimize data processing workflows for performance and scalability.
- Lead initiatives to validate and manage multilingual data, specifically for Arabic and English datasets, with a focus on YouTube data.
- Coordinate model training and data validation efforts, to enhance multilingual processing capabilities.
- Oversee the completion of Phase 01 data scraping project, which includes managing data collection from social media platforms under API constraints.
- Coordinate with the team to finalize infrastructure setup for scraping 2 million hours of video from YouTube and 10 million photos from other sources.
What You Have
- Bachelor's or master's degree in computer science, Data Engineering, or a related field.
- 8+ years of experience in data engineering with a focus on building data pipelines.
- Proficiency in data processing technologies such as Apache Spark, Airflow, Kafka, Hadoop, and cloud platforms (AWS, GCP, Azure).
- Strong programming skills in Python, Java, Go, or Scala.
- Experience with SQL and NoSQL databases.
- Demonstrated ability to work on large-scale data projects and deliver results.
- Excellent problem-solving skills and attention to detail.
- Strong communication and teamwork skills.
What We Offer
- An international workforce to learn from and grow with we have a diverse, multicultural workforce with Nordic values
- A fair compensation package and a considerable annual leave of 25 days per annum. We support our staff to be with family at the most important time. Partners with a newborn baby can have additional holidays.
- Medical insurance depending on agreement
- Opportunities for growth and enrichment through Grow with Oivan, our internal learning and development departments
- Line devices Mac or PC within a fixed company budget
- Team building activities, movie nights, events
Who We Are
PDPL Statement
By submitting your application and CV, you give us consent to handle and store your personal information in our information systems according to the Saudi Arabian Personal Data Protection Law. This information will be processed in line with the legal requirements and in accordance with the principles of data privacy and protection.