Oglasi za posao Cloud Operations Lead

Oglas je preuzet sa sajta poslodavca i sajt HelloWorld ne garantuje njegovu ažurnost.

Job listing has been deactivated.

Cloud Operations Lead

Chainstack

Rad od kuće

16.05.2023.

Linux AWS Azure Cloud Kubernetes intermediate

Chainstack is the leading suite of services connecting developers with Web3 infrastructure, powering applications in DeFi, NFT, gaming, analytics, and everything in between.

From startups to large enterprises, Chainstack enables thousands of companies to cut down the time to market, costs, and risks associated with creating and scaling decentralized applications. By offering fast, reliable, and easy-to-use infrastructure solutions distributed globally, we make sure innovators can focus on what’s important.

We are looking for an enthusiastic Cloud Operations Lead with a passion for reliability to lead Cloud Operations team responsible for keeping all user-facing services and other Chainstack production systems running smoothly.

Responsibilities:

Providing leadership and technical guidance to the Cloud Operations team of 6-10 people in multiple time zones across APAC, EU, and LATAM regions.
Owning the reliability aspect of Chainstack production services in the scope of Cloud Operations team
Managing day-to-day operational tasks, such as maintenance, troubleshooting, automation, and improvement projects
Driving reliability initiatives around Chainstack production and representing these activities outside the Cloud Operations team
Collaborate effectively and cross-functionally to drive production issues at all levels
Identifying automation points and driving efficiency improvement
Identifying changes from the reliability perspective with a data-driven approach.
Identifying parts of the system that do not scale, providing immediate workaround measures, and driving long-term resolution
Generating and implementing process improvements within Cloud Operations team
Contributing to the hiring process by conducting a technical interview
Improving documentation all around, explaining the why, not stopping with the what

Requirements:

3 or more years of experience in SRE/Cloud Operations/Infrastructure Engineering function supporting a large-scale service(s)
Experience in operating mission-critical services, which includes being responsible for reliability (SLA/SLO) and managing incidents (monitoring, troubleshooting, escalation)
Strong production experience on Kubernetes, Helm, Terraform, monitoring solutions (Grafana, Prometheus, InfluxDB, etc), and public cloud providers (AWS, GCP, Azure, etc)
Proficient on Linux and the shell
Able to collaborate effectively across the organization
Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it
Have the urge to document all the things, so you don't need to learn the same thing twice
Enthusiasm for providing feedback, teaching others, and learning new techniques
Professional or personal exposure to Web3 technologies