Company: AI71
Role: LLM Engineer
Location: Abu Dhabi, UAE
About Us:
AI71 is an applied research team dedicated to creating helpful and responsible AI agents for knowledge workers.
Working closely with our industry partners, our cross-functional teams of AI experts build products grounded in the cutting-edge research of our colleagues from the Technology Innovation Institute (TII).
Job Description:
As a Senior LLM Engineer, you will be responsible for the end-to-end development, optimization, and deployment of large language models. You'll work on challenging problems at the intersection of deep learning, natural language processing, and distributed computing.
What You'll Do:
- Analyze large and complex datasets to extract meaningful insights and inform data-driven decision-making.
- Develop, train, and deploy predictive models to enhance the capabilities of our AI solutions.
- Collaborate with cross-functional teams to understand business objectives and translate them into actionable data science tasks.
- Design and implement advanced LLM architectures, including transformer-based models and their variants
- Develop novel attention mechanisms and positional encoding schemes
- Experiment with model scaling techniques and efficient architectures (e.g., MoE, sparse transformers)
- Continuously evaluate and improve existing models based on real-world performance and evolving business needs.
- Implement and optimize distributed training pipelines for large-scale models
- Develop strategies for efficient fine-tuning, including parameter-efficient techniques (e.g., LoRA, prefix tuning)
- Apply advanced optimization techniques such as mixed-precision training and gradient accumulation
- Optimize models for inference, including quantization and pruning techniques
- Implement efficient serving solutions for real-time inference
- Develop strategies for model compression and knowledge distillation
- Develop task-specific algorithms for applications such as text classification, named entity recognition, and question-answering
- Work with MLOps teams to design and maintain training and serving infrastructure
What You'll Bring:
- 5+ years of experience in deep learning and NLP, with a focus on large language models
- Master's or Ph.D. in Data Science, Statistics, Computer Science, or a related field.
- Expert-level proficiency in Python and at least one deep learning framework (PyTorch, TensorFlow, or JAX)
- Strong understanding of transformer architectures, attention mechanisms, and recent advancements in LLMs
- Experience with distributed training frameworks (e.g., DeepSpeed, Megatron-LM)
- Proficiency in optimizing model performance using techniques like mixed-precision training, gradient checkpointing, and model parallelism
- Understanding of NLP algorithms such as tokenization, parsing, and semantic analysis
- Experience with sequence-to-sequence models and self-supervised learning techniques
- Experience with both SQL and NoSQL databases for managing training data and model artifacts
Why AI71:
- Proven performance of our large language models
- Strong traction and adoption from the open-source community
- Secured proprietary data to build specialized distinctive models.
- Locked large compute power to support our roadmap.
- Signed anchor clients, to develop POCs and demonstrate our solutions.