Job Overview
We are seeking a highly skilled and versatile Data Scientist to join our innovative team. The ideal candidate will be a tech-savvy professional with extensive hands-on experience across a wide range of technologies and tools, including Azure ML Studio, Databricks, Cosmos DB, NLP, GenAI, OCR, and more. This role requires a dynamic individual who can lead complex data projects, build and deploy machine learning models, and derive actionable insights from large and diverse datasets.
Responsibilities
- Azure ML Studio & Databricks: Build, deploy, and monitor machine learning models using Azure ML Studio and Databricks, ensuring robust pipelines for data ingestion and model retraining.
- Cosmos DB: Leverage expertise in Cosmos DB for scalable and efficient storage, processing, and querying of big datasets.
- Pipeline Development: Design, build, and optimize end-to-end machine learning pipelines, ensuring efficient data flow and integration with various systems.
- Social Media APIs Integration: Implement and work with APIs from platforms like Telegram, Twitter, and Meta to collect, preprocess, and analyze social media data.
- Web Crawling & Scraping: Perform web crawling and scraping to gather structured and unstructured data from multiple sources for analysis.
- Geolocation Data Analysis: Apply advanced techniques to work with and analyze geolocation data, extracting meaningful insights and patterns.
- Data Preprocessing: Handle large, complex datasets with advanced preprocessing techniques, ensuring data quality, consistency, and readiness for modeling.
- Generative AI (GenAI) Models: Implement and fine-tune GenAI models for use in various tasks such as text generation, summarization, and conversational AI.
- Optical Character Recognition (OCR): Develop and deploy OCR solutions to accurately identify handwritten Arabic text from images.
- Multiprocessing & Multithreading: Write optimized, parallelized code to handle high-performance computations using multiprocessing and multithreading techniques in Python.
- Natural Language Processing (NLP): Design and implement advanced NLP models to analyze, interpret, and extract information from textual data.
- Data Visualization: Utilize data analysis tools such as Power BI to create meaningful dashboards and reports, presenting key insights to stakeholders.
- Cross-functional Collaboration: Work closely with engineering, product, and business teams to ensure smooth integration and deployment of solutions in real-world environments.
Qualifications
- Experience: 6+ years of experience in data science, machine learning, and deploying data-driven solutions.
- Expertise in Python: Proficient in Python programming, with hands-on experience using Anaconda and Jupyter Notebooks.
- Cloud Experience: Extensive experience working with Azure ML Studio, Databricks, and Cosmos DB.
- NLP & GenAI: Strong background in Natural Language Processing (NLP) and Generative AI models.
- OCR & Web Crawling: Proven experience in OCR, particularly with handwritten Arabic text, and web crawling techniques.
- Data Pipelines: Proficiency in building and maintaining scalable data pipelines.
- Social media: In-depth knowledge of integrating with social media APIs (Telegram, Twitter, Meta).
- Data Visualization: Familiarity with data visualization tools such as Power BI.
- Parallel Processing: Experience with multiprocessing and multithreading to optimize computational tasks.
Preferred Skills
- Strong analytical skills with attention to detail.
- Excellent communication and presentation skills.
- Ability to work in a fast-paced, dynamic environment.