Senior Site Reliability Engineer

MentorMate

Rad od kuće

14.12.2023.

.NET C# Java Python AWS Ansible DevOps NodeJS Golang Cloud senior

MentorMate creates durable technical solutions that deliver digital transformation at scale by blending strategic insights and thoughtful design with brilliant engineering. The company has completed over 1,500 projects and has global technological hubs in Europe and North and South America. With mature and established practices in enterprise web and mobile development, quality engineering, technical architecture, human-centered design, cloud, DevOps, data, and analytics, we provide contract-based career opportunities, competitive pay, and flexibility.

About the role

We are looking to hire a Senior Site Reliability Engineer to join an innovative project in the healthcare domain for one of the largest pharmaceutical companies in the world. The focus of the role is to enhance the reliability and performance of applications with automation. We regularly refine our methods of work, tools, and technologies. We value independence, natural curiosity, ownership, and the desire for constant improvement.

About the team

The seasoned experts in our cloud & DevOps team provide secure, flexible, and time-efficient solutions to our clients, whether shortening the development runway, modernizing legacy systems, or taking a more cost-effective approach to technology with the cloud. Our cloud & DevOps team's projects are HIPAA, PCI-DSS, or SOC2 compliant and are based on a well-architected framework focusing on security, sustainability, and scalability.

Responsibilities

You will co-own critical production service designs to ensure high reliability is achievable and measurable
You’ll drive reliability and observability improvements in the services within the engineering verticals
Using monitoring and telemetry data, you’ll help teams make informed decisions on where reliability challenges may exist and help design and build solutions to improve them
You’ll build and improve internal tools and automation software to make maintaining production services easier and safer
You’ll champion and lead reliability-focused practices such as Failure Analysis, Load and Capacity Planning, Service Reviews, Architecture Designs, Incident Postmortems, and others
You will build SRE dashboards from SLIs to measure SLO adherence
Define (from design to implementation details) necessary auto-healing and fault-tolerant systems
Point of contact for production application issues, working closely with engineering leadership

Requirements

4+ years of experience in an Infrastructure, SRE, DevOps, CloudOps role
OR 4+ years of experience in System Administration
Experience programming in one or more of the following: C#, Java, Python, .Net, NodeJS, Go, automation scriptings like Terraform, Ansible, or any similar programming language
Experience with at least one cloud technology - AWS, Compute/Containers, Storage, etc.
Experience with cloud-performant microservices and event-driven architectures
Understanding of information security concepts and terminology
Distributed monitoring experience: logging, metrics, tracing, etc.
Strong knowledge of software development methodologies and passion for creating high-standard tool sets for infrastructure-as-code
Ability to analyze problems quickly and find suitable solutions based on available resources
A proactive and open-minded individual with a clean-cut client focus and structured approach
Very good English level

Why take this opportunity

Remote Work Model: Freedom to work remotely with a globally-minded team
Global Tech Community: Work in an experienced team using the latest technologies
Exciting Career Prospects: Enterprise projects that set standards and save lives
Competitive Pay: Feel satisfied with the negotiated terms
No Intermediaries: Direct communication with our teams