Design, deploy, and manage our Kubernetes platform to support scalable and reliable application deployments. Monitor and maintain the platform's health, performance, and security
Oversee the deployment of our Software-as-a-Service applications on the Kubernetes platform. Implement best practices for application scalability, high availability, and disaster recovery
Implement robust monitoring, alerting, and logging systems to proactively identify and resolve potential issues. Ensure high system availability and quick incident response times
Continuously optimize the Kubernetes infrastructure and SaaS applications to achieve maximum performance and efficiency. Conduct performance testing and tuning to meet or exceed service level objectives
Participate in an on-call rotation to respond to incidents promptly and effectively
Conduct thorough post-incident reviews to identify root causes and implement preventive measures
Develop and maintain automation tools and scripts to streamline processes and improve the efficiency of operational tasks
Implement security best practices for Kubernetes and SaaS applications
Collaborate with the security team to ensure compliance with industry standards and regulations
Work closely with cross-functional teams, including development, infrastructure, and product management, to provide expertise and support throughout the software development lifecycle
Identify areas for improvement in the infrastructure, processes, and deployment methodologies. Propose and implement enhancements to increase system reliability and performance.