Senior Site Reliability Engineer
Jobgether · Irlande
Job description
About the role
We are seeking a Senior Site Reliability Engineer to join a globally distributed team that powers one of the world’s most widely used knowledge platforms. This fully remote position offers the chance to shape large‑scale infrastructure, improve reliability, and work closely with product and engineering teams across multiple time zones.
Key responsibilities
- Perform day‑to‑day operations and DevOps tasks on large public‑facing infrastructure, including deployment, configuration, maintenance, and troubleshooting.
- Manage and optimise configuration and deployment systems using tools such as Puppet and Kubernetes.
- Automate infrastructure provisioning, service deployment, and operational workflows to boost reliability and efficiency.
- Collaborate with product and engineering teams to design scalable architectures that handle global traffic loads.
- Participate in a 24/7 on‑call rotation, handling incident response, alerts, troubleshooting, and post‑incident reviews.
- Conduct root‑cause analysis of production incidents and implement preventive measures.
- Contribute to monitoring, observability, and performance‑optimisation initiatives.
- Mentor engineers and share operational expertise within a distributed, cross‑functional team.
- Work asynchronously with global teams while ensuring clear technical communication.
Required profile
- 6+ years of experience in Site Reliability Engineering, DevOps, or infrastructure operations within complex distributed systems.
- Strong proficiency in Linux systems administration, troubleshooting, and performance tuning.
- Experience with scripting languages such as Python, Bash, Go, or Ruby for automation.
- Hands‑on experience with configuration management tools like Puppet or Ansible.
- Solid understanding of distributed systems, caching technologies, and system optimisation techniques.
- Experience with Linux package management on Debian‑based systems.
- Proven track record of automating operational processes and driving system improvements.
- Experience participating in incident response, post‑mortems, and reliability initiatives.
Required skills
- Linux
- Puppet
- Kubernetes
- Python
- Bash
- Go
- Ruby
- Ansible
- Debian‑based package management
- Distributed systems
- Caching technologies
Questions fréquentes
Why are you reporting this job?
Apply in 30 seconds
Enter your email to apply. An account will be created automatically.
By continuing, you accept our terms of use.
Already have an account? Login
Published 4 days ago
Expires 1 month from now
9 views · 0 interested
Boost your chances
Upload your CV — we will match you with relevant openings.
Analyzing your CV...
Jobgether
Irlande