About the role
We are looking for a SRE / DevOps Engineer – Observability & Cloud to help design, build, and operate scalable observability and monitoring platforms across infrastructure, network services, and applications.
You will work closely with DevOps, platform, and engineering teams to improve system reliability, performance, and operational visibility through modern monitoring, logging, and telemetry solutions.
Key Responsibilities:
- Design and maintain observability platforms for metrics, logs, and distributed tracing
- Build telemetry pipelines to collect data from infrastructure, network devices, and applications
- Develop dashboards and alerts for infrastructure and service reliability
- Implement centralized logging solutions and log ingestion pipelines
- Support application performance monitoring (APM) and distributed tracing
- Implement synthetic monitoring to proactively detect service degradation
- Collaborate with engineering teams to implement instrumentation and monitoring best practices
- Automate infrastructure and monitoring configurations using infrastructure-as-code
- Support incident investigation and root cause analysis
Requirements:
- 3+ years of experience in SRE, DevOps, Platform Engineering, or Infrastructure Engineering
- Experience with observability tools such as Datadog/Splunk/Prometheus/Grafana
- Experience with cloud platforms (AWS, Azure, or GCP)
- Experience with containerized environments such as Kubernetes
- Experience with automation tools such as Terraform / Ansible
- Scripting experience (Python, Bash, or similar)
Nice to have:
- Experience with open telemetry
- Experience with distributed tracing
- Familiarity with reliability concepts such as SLI / SLO
Thank you for your interest in this position. Please note that only candidates whose qualifications closely match our requirements will be contacted.
Preporuke se učitavaju...