Site Operations Engineer
Belgrade, Serbia
Sysdig is the secure DevOps company, and we’re at the forefront of the container and Kubernetes revolution. We are passionate, technical problem-solvers, continually innovating and delivering powerful solutions to secure and operate cloud-native applications in production. Our consistent contributions to open source software projects reflect our commitment to the open cloud movement.
We value diversity and open dialog to spur ideas, working closely together to achieve goals. And we're a great place to work too — we were awarded the 2019 Bay Area Best Places to Work Award from San Francisco Business Times and the Silicon Valley Business Journal. We are looking for team members who share our commitment to customers and are willing to dig deeper, understand problems and deliver innovative solutions. Does this sound like the right place for you?
YOUR OPPORTUNITY
As a Site Operations Engineer, you’ll be responsible for the availability, performance, and resilience of the Sysdig platform in our largest on-premise customer environments. You will collaborate with high-performing infrastructure and engineering teams both within Sysdig and customer organizations to help drive the scalability and stability of our platform.
YOUR RESPONSIBILITIES
- Participate in a globally distributed team of Site Operations Engineers, supporting multiple Sysdig application stacks across our most critical on-premises customers
- Manage the services that comprise the Sysdig platform (Kubernetes, Cassandra, Elasticsearch, Redis, etc).
- Implement disaster recovery and reliability improvement initiatives, including performance tuning and infrastructure optimization
- Maintain and support the production environments and communicate directly with customer stakeholders
- Participate in an on-call rotation with other Site Operations Engineers
YOUR BACKGROUND
- Experience managing Kubernetes clusters in a production environment
- Worked with containers such as Docker, Rkt (Rocket), containerd
- Aptitude for troubleshooting complex problems in high-throughput web applications and network services
- Solid understanding of Linux systems and networking
- Experience in diagnosing and troubleshooting customer-facing production service outages
- Command of a scripting language such as python or bash
- Strong sense of ownership and a focus on customer delight
- Management of any of these clusters: Cassandra, Elasticsearch, Kafka, Redis, HBase
- Proficiency with configuration management tools. We love Terraform, but you may have experience with Puppet, Chef, or SaltStack
- Experience creating and tuning Kafka, Cassandra, or Redis clusters
- Used log aggregation services like Elasticsearch or Splunk
- Experience supporting a customer-facing product hosted in a public or private cloud ecosystem
KEY TECHNOLOGIES
Kubernetes, Docker, Python, Cassandra, Kafka, Terraform, public/private cloud ecosystems
Deadline for applications: 30.05.2020.