Job Description
Innovation is at the heart of what we do. At Agolo, natural language processing, machine learning, and data are at the core of our work.
We are looking for a Senior DevOps / MLOps Engineer to join our team in Cairo. As an engineer at Agolo, you will work closely with our data science and product teams to build the next-generation AI summarization platforms.
As a Senior DevOps / MLOps Engineer, you will:
Design, implement, and manage our different clouds including client environments.
Develop, maintain and optimize infrastructure and deployment processes for AI-based applications in the cloud.
Advise on the selection and implementation of adequate MLOps toolchains and technologies in the cloud.
Meet with our clients to understand deployment requirements and limitations.
Work with our product and sales teams to design and implement SLIs, SLAs, and SLOs.
Identify projects that result in substantial cost savings or revenue.
Implement Infrastructure as Code on Kubernetes using Helm and Terraform.
Continuously enhancing our monitoring and alerting to prevent incidents.
Practice sustainable incident response and blameless postmortems.
Achieve continuous integration and deployment to all of our services.
Automate of ML training and testing processes.
Work closely with data scientists to comprehend their requirements and ensuring a seamless workflow for model development and deployment.
Qualifications:
4+ years of industry experience.
Experience with at least one of the cloud providers: AWS, Azure, or GCP.
Experience in developing and operating software and automation systems with AI components.
Solid experience with container runtimes and orchestrators: Docker and Kubernetes.
Know your way around Linux and the Unix Shell.
Strong knowledge of debugging networking issues within Kubernetes.
Solid experience with continuous integration systems using Git and CircleCI.
Strong knowledge of ArgoCD.
Confident in debugging and writing scripts in Python and Bash.
Very strong verbal and written communication skills in English are a must.
Strong communication skills and a sense of ownership and drive.
Preferred qualifications, you have experience in:
Large-scale distributed systems: Kafka and ElasticSearch
Operating microservice architecture
Security
Helm
Monitoring and Dashboard tools: Prometheus and Grafana
Designing tests and quality assurance
Infrastructure as code tools like Terraform
What we offer:
Competitive compensation packages
Highest-tier social and health insurance
Flexible and open vacation policy
Flexible working hours
Certification and learning budget
Our interview process is as follows:
A screening call (30 mins).
1-2 technical interviews with a member of the Reliability Engineering team (60 mins).
Culture interview (60 mins).
3 reference calls.
Final meeting with our VP of Engineering or CTO (60 mins).
Desired Candidate Profile
Education:
Bachelor of Technology/Engineering
Gender:
nm
Nationality:
Any Nationality