Overview
The Lead Systems Engineer - Storage Technology is responsible for design, leads implementation, and provides Level 3 expert support for extra-large scale storage infrastructure, ensuring the highest levels of performance, scalability, and reliability. This role serves as a key subject matter expert, leading a team of engineers responsible for block, object, and file storage solutions, as well as backup services.
Responsibilities
- Co-design, lead implementation and management of PBs-level block, object and file storage solutions as integral components of Cloud and HPC environments ensuring stability, performance and compliance with industry standards and best practices.
- Define and oversee implementation of roadmap for all storage solutions and services across the company.
- Collaborate with architecture and other engineering teams on storage and backup technology component evaluation and selection ensuring solutions are designed following best practices and are optimized from both functional and non-functional perspectives.
- Lead regular capacity planning exercises to anticipate and accommodate the growing demands on the storage infrastructure, ensuring it meets current and future requirements.
- Develop and oversee implementation of plans to enhance the reliability of the storage infrastructure, addressing potential points of failure and ensuring high availability of storage services.
- Explore, analyze, and implement performance optimization strategies for the storage solutions, ensuring optimal resource utilization and performance.
- Lead evaluation and integration of advanced storage technologies and methodologies, such as SDS to enhance features, performance, and efficiency.
- Define and oversee execution of disaster recovery strategies ensuring data integrity, availability, and protection across all platforms and environments.
- Design and enhance observability stack in collaboration with the IaaS operations team ensuring monitoring coverage and accuracy.
- Provide L3 expert support including on-call shifts and being the final tier of resolution for L2 support teams through problem analysis and communication with vendor's technical support.
- Lead and mentor a team of storage engineers and collaborate with other platform engineering teams on solution design and delivery.
- Collaborate with security management teams to ensure that systems are safe and secure against cybersecurity threats.
- Write and maintain relevant documentation ensuring completeness and quality.
- Work closely with process management and operational teams and contribute to process development standardizing collaboration framework and improving collaboration efficiency.
Qualifications
- Bachelor's or master's degree in computer science, engineering, software engineering or related field in technology.
- 2+ years of experience leading a team of 3+ engineers holding accountability for quality and timely delivery of infrastructure projects.
- 7+ years of experience with deep expertise in designing, implementing, and managing large-scale software-defined storage (SDS) solutions providing block, object or file storage services and backup capabilities.
- In-depth hands-on experience in system implementation, management, and optimization of storage systems from leading vendors, including but not limited to HPE, Dell, NetApp, Hitachi, IBM, PureStorage, or VAST Data.
- Deep knowledge of different storage protocols providing block, object and file storage interfaces such as iSCSI, S3, NFS, FC[oE], NVME over TCP, etc.
- Proficient with Linux/Linux kernel and storage stack and capable of debugging related issues.
- Advanced experience in managing object storage solutions based on SeaweedFS, MinIO, Cloudian HyperStore, Qumulo S3, Scality Ring or Dell ECS.
- Experience with cloud native Backup solution for OpenStack (e.g.,Freezer, Karbor, TriliO, Hystax, Raksha etc) is highly desirable.
- Experience in designing and managing clustered/parallel file systems such as Lustre, GPFS, etc is highly desirable.
- Familiarity with containerization technologies (Openshift, Docker, Kubernetes, etc) and container storage technologies [Rook, CSI, PVC, etc].
- Experience with integration of identity management, access management, and authorization solutions (PKI, LDAP, OAUTH, OpenID).
- In-depth knowledge of backup systems, disaster recovery principles, and data protection strategies.
- Familiarity with load balancers technologies for object storage solutions.
- Deep knowledge in data encryption, security practices, and hardening related to Storage and Backup systems.
- Solid knowledge of Data center network design and related technologies [OSI model, TCP/IP stack, firewalling, routing, VLAN/VxLAN, etc]
- Hands-on experience with monitoring and observability tools like Zabbix, Nagios, Prometheus, Grafana, ELK Stack (Elasticsearch, Logstash, Kibana).
- Understanding of CI/CD principles, Infrastructure as Code (IaaC) approach and software defined infrastructure solutions.
- Advanced level in programming and scripting using Python and/or Golang, bash.