Senior Site Reliability Engineer
vor 1 Tag
Your Role
Are you passionate about observability and resiliency? Is ensuring we know about issues before our customers second nature to you? Is being at the front and orchestrating processes sounds fun to you? emnify is seeking a talented Reliability Engineer & Incident Management Operator to drive the company Incident Management routines, be the authority for everything observability and resiliency, and guide internal stakeholders with best practices.
As a part of the larger Engineering department, our Platform team plays a crucial role in enhancing our competitive edge by improving developer experience to increase development efficiency and scale productivity. You will join a team of 3 engineers, fostering empathy and a collaboration mindset to ensure continuous improvement of development experience at emnify. The ideal candidate will have extensive experience with AWS cloud infrastructure, microservices, and modern observability practices as well as strong communication and organizational skills.
The position is 35% Incident management operations, 35% Observability and monitoring work, and 30% platform engineering and developer support.
Emnify technology radar
The position is based in emnify's office in Berlin.
Your Impact:
- Incident management operations:
Lead and optimize the incident management process end-to-end, ensuring timely detection, resolution, and documentation of incidents; coordinating cross-functional teams, conducting post-mortems and root cause analyses, and driving continuous improvements to workflows.
- Observability and monitoring:
Design, implement, and continuously improve observability frameworks by developing dashboards, alerts, metrics, and logging strategies to monitor service health, detect anomalies proactively, support issue resolution, and ensure cost-optimized performance across the platform.
- Collaboration and Support:
Partner with cross-functional teams to implement observability best practices, providing training and guidance on tools while leveraging metrics data to drive engineering priorities.
- Platform engineering:
Leverage AWS to design, build, and maintain a resilient cloud infrastructure, implementing best practices for security, scalability, and cost optimization while ensuring high availability, disaster recovery, and robust platform components such as pipelines, shared infrastructure, and application services.
Your Skills:
Proven experience as a (Site) Reliability Engineer or similar role in a SaaS and/or telecom company.
Hands-on experience with observability tools (e.g., Prometheus, Mimir, Grafana, Loki, CloudWatch, Grafana IRM, Rootly), including setup and optimization of metrics and alerts.
Experience in establishing and managing incident management processes.
Understanding of incident management frameworks and best practices.
Extensive experience with AWS cloud services (e.g., EC2, S3, RDS, Lambda, CloudWatch).
Expert skills with modern infrastructure tooling and principles (Kubernetes, IaaC - Terraform, CI/CD - GitHub Actions, Jenkins)
Good understanding of modern development tooling and principles (e.g., microservices architecture, 12-factor applications, Docker)
Advanced documentation skills for effective knowledge sharing and collaboration.
Exceptional problem-solving and critical thinking with a passion for enhancing development experiences in fast-paced tech environments.
Ability to work independently and as part of a team.
Nice to have:
Knowledge of networking protocols and telecom systems
Knowledge of secure software development
Familiarity with programming languages such as Python, Go, or Java.
Certification in AWS (e.g., AWS Certified DevOps Engineer, AWS Certified Solutions Architect)
-
Senior Site Reliability Engineer
vor 1 Woche
Berlin, Berlin, Deutschland KOMBO Vollzeit 100.000 € - 150.000 € pro JahrSenior Site Reliability Engineer (Database) @KomboBerlin (On-site) · Full-timeTL;DRJoin Kombo as one of our first Database Reliability Engineer. You'll take ownership of our Postgres infrastructure, ensuring performance, scalability, and reliability as we grow.High impact, high autonomy, and the chance to shape Kombo's database reliability practices from...
-
Senior Site Reliability Engineer
vor 2 Wochen
Berlin, Berlin, Deutschland Kombo Vollzeit 80.000 € - 120.000 € pro JahrSenior Site Reliability Engineer (Database) @Kombo Berlin (On-site) · Full-timeTL;DRJoin Kombo as one of our first Database Reliability Engineer. You'll take ownership of our Postgres infrastructure, ensuring performance, scalability, and reliability as we grow.High impact, high autonomy, and the chance to shape Kombo's database reliability practices from...
-
Senior Site Reliability Engineer
vor 1 Woche
Berlin, Berlin, Deutschland Kombo Vollzeit 80.000 € - 120.000 € pro JahrSenior Site Reliability Engineer (SRE) @Kombo Berlin (On-site) · Full-timeTL;DRJoin Kombo as one of our first Senior SREs. You'll work on reliability, scale our infrastructure, and help define how SRE is done at Kombo — while staying hands-on. High impact, high autonomy, and the chance to shape (and later lead) our growing platform/SRE function.Why You...
-
Site Reliability Engineer
vor 24 Stunden
Berlin, Berlin, Deutschland Hirefive Vollzeit 60.000 € - 120.000 € pro JahrSite Reliability Engineer Our growing user base demands cheap, fast and highly available web hosting and we need youto make it possible Join us as a full-time Site Reliability Engineer. This position will offer you personal andprofessional development, startup insights, and the opportunity to be part of one of the mostinspiring deep-tech startups. You...
-
Senior Site Reliability Engineer
vor 2 Wochen
Berlin, Berlin, Deutschland Ageras Vollzeit 60.000 € - 120.000 € pro JahrAbout the RoleWe're looking for a Senior Site Reliability Engineer (SRE) to join our Infrastructure team. This is a long-term position to replace a recent departure and strengthen our capacity as we scale.As part of the team, you'll play a crucial role in maintaining and improving the reliability, security, and scalability of our cloud infrastructure. You'll...
-
Senior Site Reliability Engineer
vor 22 Stunden
Berlin, Berlin, Deutschland Scout24 SE Vollzeit 60.000 € - 120.000 € pro JahrWhy Scout24?Scout24 is home of ImmoScout24, Germany's #1 for real estate. With ImmoScout24 we have been revolutionizing the real estate market in Germany and Austria for more than 25 years. Our goal is to build a digital ecosystem that brings homeowners, seekers, and agents together. Finding the right home and property is one of the most important decisions...
-
Site Reliability Engineer
Vor 7 Tagen
Berlin, Berlin, Deutschland Wire Vollzeit 70.000 € - 95.000 € pro JahrWHO WE ARE We are looking for a Site Reliability Engineer / Systems Engineer to complement our Deployment Operations Team. In this role, you will build, improve and manage our automations and deployment infrastructure, to ensure the reliability, resilience, availability and observability of our product.Join us at Wire, the leading end-to-end encrypted...
-
Site Reliability Engineer
Vor 5 Tagen
Berlin, Berlin, Deutschland Blackfluo Vollzeit 84.000 € - 85.000 € pro JahrJob DescriptionLocation: Full remote, EU timezone (CET +/- 2 hours)Start Date: As soon as possibleLanguages: English requiredWe are looking for a skilled Site Reliability Engineer (SRE) with deep expertise in AWS to help us scale and secure our infrastructure. As an SRE, you will be instrumental in ensuring the reliability, performance, and scalability of...
-
Site Reliability Engineer
vor 1 Tag
Berlin, Berlin, Deutschland 1KOMMA5˚ Vollzeit 60.000 € - 120.000 € pro Jahr1KOMMA5°We are looking for you as an addition to our tech-team in Berlin, Munich or Hamburg. 1KOMMA5° is building Germany's largest one-stop-shop for sale, installation and services related to solar, heat pumps, electricity and charging infrastructure. And they are all connected Be a part of our missionBecome a part of our mission and learn about our...
-
Site Reliability Engineer
vor 1 Tag
Berlin, Berlin, Deutschland Zattoo Vollzeit 80.000 € - 120.000 € pro JahrTHE ROLE & THE SRE TEAMAt Zattoo, we're building the TV platform of the future. With our ever-growing demand for unicast TV delivery, we're scaling out our custom-built infrastructure to deliver live and on-demand video at multi-Tbps scale. Because we own the full chain - from ingest, encoding/transcoding, packaging, to delivery - our engineers have the...