Senior Site Reliability Engineer

vor 1 Tag


Berlin, Berlin, Deutschland emnify Vollzeit 80.000 € - 120.000 € pro Jahr

Your Role

Are you passionate about observability and resiliency? Is ensuring we know about issues before our customers second nature to you? Is being at the front and orchestrating processes sounds fun to you? emnify is seeking a talented Reliability Engineer & Incident Management Operator to drive the company Incident Management routines, be the authority for everything observability and resiliency, and guide internal stakeholders with best practices.

As a part of the larger Engineering department, our Platform team plays a crucial role in enhancing our competitive edge by improving developer experience to increase development efficiency and scale productivity. You will join a team of 3 engineers, fostering empathy and a collaboration mindset to ensure continuous improvement of development experience at emnify. The ideal candidate will have extensive experience with AWS cloud infrastructure, microservices, and modern observability practices as well as strong communication and organizational skills.

The position is 35% Incident management operations, 35% Observability and monitoring work, and 30% platform engineering and developer support.

Emnify technology radar

The position is based in emnify's office in Berlin.

Your Impact:

  • Incident management operations:

Lead and optimize the incident management process end-to-end, ensuring timely detection, resolution, and documentation of incidents; coordinating cross-functional teams, conducting post-mortems and root cause analyses, and driving continuous improvements to workflows.

  • Observability and monitoring:

Design, implement, and continuously improve observability frameworks by developing dashboards, alerts, metrics, and logging strategies to monitor service health, detect anomalies proactively, support issue resolution, and ensure cost-optimized performance across the platform.

  • Collaboration and Support:

Partner with cross-functional teams to implement observability best practices, providing training and guidance on tools while leveraging metrics data to drive engineering priorities.

  • Platform engineering:

Leverage AWS to design, build, and maintain a resilient cloud infrastructure, implementing best practices for security, scalability, and cost optimization while ensuring high availability, disaster recovery, and robust platform components such as pipelines, shared infrastructure, and application services.

Your Skills:

 Proven experience as a (Site) Reliability Engineer or similar role in a SaaS and/or telecom company.

 Hands-on experience with observability tools (e.g., Prometheus, Mimir, Grafana, Loki, CloudWatch, Grafana IRM, Rootly), including setup and optimization of metrics and alerts.

 Experience in establishing and managing incident management processes.

 Understanding of incident management frameworks and best practices.

 Extensive experience with AWS cloud services (e.g., EC2, S3, RDS, Lambda, CloudWatch).

 Expert skills with modern infrastructure tooling and principles (Kubernetes, IaaC - Terraform, CI/CD - GitHub Actions, Jenkins)

 Good understanding of modern development tooling and principles (e.g., microservices architecture, 12-factor applications, Docker)

 Advanced documentation skills for effective knowledge sharing and collaboration.

 Exceptional problem-solving and critical thinking with a passion for enhancing development experiences in fast-paced tech environments.

 Ability to work independently and as part of a team.

Nice to have:

 Knowledge of networking protocols and telecom systems

 Knowledge of secure software development

 Familiarity with programming languages such as Python, Go, or Java.

 Certification in AWS (e.g., AWS Certified DevOps Engineer, AWS Certified Solutions Architect)



  • Berlin, Berlin, Deutschland KOMBO Vollzeit 100.000 € - 150.000 € pro Jahr

    Senior Site Reliability Engineer (Database) @KomboBerlin (On-site) · Full-timeTL;DRJoin Kombo as one of our first Database Reliability Engineer. You'll take ownership of our Postgres infrastructure, ensuring performance, scalability, and reliability as we grow.High impact, high autonomy, and the chance to shape Kombo's database reliability practices from...


  • Berlin, Berlin, Deutschland Kombo Vollzeit 80.000 € - 120.000 € pro Jahr

    Senior Site Reliability Engineer (Database) @Kombo Berlin (On-site) · Full-timeTL;DRJoin Kombo as one of our first Database Reliability Engineer. You'll take ownership of our Postgres infrastructure, ensuring performance, scalability, and reliability as we grow.High impact, high autonomy, and the chance to shape Kombo's database reliability practices from...


  • Berlin, Berlin, Deutschland Kombo Vollzeit 80.000 € - 120.000 € pro Jahr

    Senior Site Reliability Engineer (SRE) @Kombo Berlin (On-site) · Full-timeTL;DRJoin Kombo as one of our first Senior SREs. You'll work on reliability, scale our infrastructure, and help define how SRE is done at Kombo — while staying hands-on. High impact, high autonomy, and the chance to shape (and later lead) our growing platform/SRE function.Why You...

  • Site Reliability Engineer

    vor 24 Stunden


    Berlin, Berlin, Deutschland Hirefive Vollzeit 60.000 € - 120.000 € pro Jahr

     Site Reliability Engineer Our growing user base demands cheap, fast and highly available web hosting and we need youto make it possible Join us as a full-time Site Reliability Engineer. This position will offer you personal andprofessional development, startup insights, and the opportunity to be part of one of the mostinspiring deep-tech startups. You...


  • Berlin, Berlin, Deutschland Ageras Vollzeit 60.000 € - 120.000 € pro Jahr

    About the RoleWe're looking for a Senior Site Reliability Engineer (SRE) to join our Infrastructure team. This is a long-term position to replace a recent departure and strengthen our capacity as we scale.As part of the team, you'll play a crucial role in maintaining and improving the reliability, security, and scalability of our cloud infrastructure. You'll...


  • Berlin, Berlin, Deutschland Scout24 SE Vollzeit 60.000 € - 120.000 € pro Jahr

    Why Scout24?Scout24 is home of ImmoScout24, Germany's #1 for real estate. With ImmoScout24 we have been revolutionizing the real estate market in Germany and Austria for more than 25 years. Our goal is to build a digital ecosystem that brings homeowners, seekers, and agents together. Finding the right home and property is one of the most important decisions...


  • Berlin, Berlin, Deutschland Wire Vollzeit 70.000 € - 95.000 € pro Jahr

    WHO WE ARE We are looking for a Site Reliability Engineer / Systems Engineer to complement our Deployment Operations  Team. In this role, you will build, improve and manage our automations and deployment infrastructure, to ensure the reliability, resilience, availability and observability of our product.Join us at Wire, the leading end-to-end encrypted...


  • Berlin, Berlin, Deutschland Blackfluo Vollzeit 84.000 € - 85.000 € pro Jahr

    Job DescriptionLocation: Full remote, EU timezone (CET +/- 2 hours)Start Date: As soon as possibleLanguages: English requiredWe are looking for a skilled Site Reliability Engineer (SRE) with deep expertise in AWS to help us scale and secure our infrastructure. As an SRE, you will be instrumental in ensuring the reliability, performance, and scalability of...


  • Berlin, Berlin, Deutschland 1KOMMA5˚ Vollzeit 60.000 € - 120.000 € pro Jahr

    1KOMMA5°We are looking for you as an addition to our tech-team in Berlin, Munich or Hamburg. 1KOMMA5° is building Germany's largest one-stop-shop for sale, installation and services related to solar, heat pumps, electricity and charging infrastructure. And they are all connected Be a part of our missionBecome a part of our mission and learn about our...


  • Berlin, Berlin, Deutschland Zattoo Vollzeit 80.000 € - 120.000 € pro Jahr

    THE ROLE & THE SRE TEAMAt Zattoo, we're building the TV platform of the future. With our ever-growing demand for unicast TV delivery, we're scaling out our custom-built infrastructure to deliver live and on-demand video at multi-Tbps scale. Because we own the full chain - from ingest, encoding/transcoding, packaging, to delivery - our engineers have the...