Job Purpose:
Lead the development of new and transformed cloud services to cater the product requirements
Lead the daily operations activities including automation in the environment, monitoring systems, and logging utilizing the principals of DevOps, Agile and software engineering practices
Manage and report on critical metrics regarding the status and SLA compliance of cloud services
Provide technical support and direction regarding new or existing technologies, operational standards, and staff development
Use established change management processes and disaster recovery plans, requiring operational procedures be performed with minimal customer impact.
Drive continuous improvement in systems operations through tool building and automation
Ensuring efficient and scalable artifact deployments to production servers using automation scripts and other deployment tools
Monitor the SaaS environment and work with QA, Developers, Ops to identify and solve problems
Responding to and resolving technical emergencies. Availability for on-call after-hours support when required.
Design, manage and implement application deployments adhering to CI/CD methodology.
Architect and develop automation solutions across Amazon AWS, Microsoft Azure, and Google Cloud.
Implement monitoring tools (NewRelic, AppSignal), notifications set ups, webhooks etc.
Conduct audits and drills on the environments to ensure the system reliability and notifications.
Onboard / Manage the security solutions (WAF, SSL Pinning, Agro Tunnel), CDN. (CloudFlare/Akamai)
Work closely with the development and QA groups to support, improve, and integrate automated solutions into automation frameworks.
Automate configuration of various big data distributions such as Cloudera, HDP, or EMR and their various components such as Hadoop, Spark, etc.
Monitor and identify the areas to reduce the cost and suggest to reporting manager.
Other automation and analysis duties as assigned by the line manager.
Key Responsibilities: (What you will be responsible for)
Lead the end to end DevOps activities.
Developing and supporting full-stack platform infrastructure initiatives in a complex environment.
Implementation, monitoring and supporting all the deployment activities, dockers, containers, monitoring tools (appsignal, newrelic, etc).
Generate scripts to improve the efficiency of deployment & monitoring activities as and when required.
Ensuring efficient and scalable artifact deployments to production servers using automation scripts and other deployment tools
Provide technical support of all the applications as required. (after hours support is expected).
Recommending the enhancements that will produce greater stability, scalability, and throughput for the platform.
To provide input to Development and Operations team members when new architectures, designs, and/or operational models are being formulated.
Responsible for providing technical expertise to other teams as you build specific expertise in the deployment and operational best practices specific to the platform.
Architecting scalable technical solutions, lending technical guidance on DevOps best practices, supporting activities that span various stages of the DevOps pipeline (planning, dev/test, release and monitoring).
Managing, optimizing, monitoring and reporting on the infrastructure environment, and researching latest development trends and technologies.
Generate weekly, monthly, quarterly reports (finance & usage stats).
Skills Required (What you will need)
Minimum of 5+ years of Site Reliability, DevOps, and/or Software Development experience, including experience in a growth-stage environment.
2+ years experience in architecting solutions within Amazon Web Services (AWS) and other
2+ years experience in Senior SRE DevOps role with cloud providers.
5+ years experience with deployment orchestration, automation, and security configuration management (Jenkins, Puppet, Ansible, Chef, Terraform etc.) preferred.
5+ years experience supporting the following Datastores/Databases: MS-SQL server, CosmosDB, MySQL, MongoDB, PostgreSQL, Cassandra
4+ years experience CI/CD (cirlceci), Container Orchestration (Docker, Kubernetes).
4+ years experience working across EC2, S3, Route53, ELB, cloud front, cloud formation
3+ years experience with Containers and Serverless architectures
2+ years experience with scripting languages like Python, Ruby etc
3+ years experience with distributed source control like Git, including branching, merging and release management
3+ years experience in managing web security solution (WAF), CDN.
Working knowledge of related technologies including encryption, IPsec, VLANs, VPNs, routing, firewalls, proxy services, LAN/WAN connectivity
Expertise in Big Data platform engineering; experience with MySQL and HBase strongly preferred
Possess strong communication and presentation skills, including both written and verbal.
Experience in automating the configuration of various big data distributions such as Cloudera, HDP, or EMR and their various components such as Hadoop, Spark, etc.
Experience working in SCRUM Frameworks in an Agile manner
Bachelors in computer science engineering or Equivalent.
Cloud Architecture related certifications (Amazon AWS Certified Solutions Architect Associate/Professional, Amazon AWS Certified SysOps/DevOps Engineer - Professional)
ITIL v3 / Agile DevOps certification is highly desired.