Site Reliability Engineer

Vor 2 Tagen

Berlin, Berlin, Deutschland Ionos En Vollzeit

At IONOS, the leading European provider of cloud infrastructure, cloud services and hosting services, you will work together with a wide range of teams. We are characterized by open structures, a friendly working culture and flat hierarchies with a strong team spirit. We firmly believe that work and fun are compatible, and offer you the right environment for this. Our constant growth means that we are always looking for new colleagues. Become part of IONOS and grow with us.

We are seeking a highly skilled and experienced Site Reliability Engineer to join our team working on a 24/7 shift basis. The Site Reliability Engineering L2 department operates all IONOS Cloud IaaS and PaaS services. As a Site Reliability Engineer, you will be responsible for ensuring the stability, security, and performance of our complex and distributed systems. You will work closely with our development teams to design, implement, and maintain scalable and reliable infrastructure, and to automate and optimize our systems and processes.

Tasks

to provide Technical level 2 support with direct customer contact.
Maintain monitoring, logging, and alerting solutions using tools such as Prometheus, Grafana, and Loki, to proactively detect blockers in shift rotation and contribute to resolving complex issues in distributed systems.
Troubleshoot network (LAN/WAN/VPN, DNS, DHCP) and storage systems (file/object/block), including provision, operation of highly available services on Linux and Kubernetes with Helm Charts.
Maintain Infrastructure as a Code, automation and playbooks using tools such as Ansible, Terraform, GitLab CI/CD, ArgoCD, and scripting languages like Bash, Python, and Go.
Collaborate with development teams to enhance processes and deployments, and to ensure smooth integration of new services and applications into our cloud and Kubernetes environment.
Ensure the stable and secure operation of our platforms, including management of incidents end-to-end, from initial analysis to resolution and follow-up through Problem Management.

Qualifications

Willingness to work in a 24x7 shift model that includes nights, weekends, and holidays with a strong problem-solving and troubleshooting approach to resolve complex technical problems.
You have multiple years of experience as a Site Reliability Engineer or in a related role (Linux System Administrator, Platform Engineer, DevOps/Infrastructure Engineer, Full Stack Developer).
Strong Experience with automation tools (e.g., Ansible, SaltStack), monitoring and observability tools (e.g., Prometheus, Grafana, Loki), and logging and alerting solutions (e.g., ELK Stack).
Strong Experience with virtualized environments, including Qemu/KVM, OpenStack, Proxmox, Cloud Storage technologies (File, Object, Block) and proficient knowledge of Docker & Kubernetes (K8s).
Proficiency in at least one programming or scripting language (e.g., Go, Python, Bash) for automation and monitoring tasks.
Experience with code management is required, with knowledge of merge conflicts, feature branches, merge requests, and continuous integration (CI/CD) being a plus.

Nice to have:

Experience with RDMA, InfiniBand, and RoCE protocols.
Strong experience with Linux MD RAID (mdadm , sedadm) and LVM.
Proficiency in Linux performance tuning and network stack debugging (e.g., ethtool, perf, tcpdump, ibstat, ibtop).
Experience with S3, Ceph and software-defined networks.
Experience with established software development practices, including code reviews, build processes, packaging, and testing.

Language Skills: Must be fluent in German and English. Atleast B2 CEFR Level.

Location: Berlin
Note: At the end of the application process, candidates must undergo a security check. Your consent will be requested in good time during the process.

Benefits

Hybrid working model.
Shift working hours.
At some locations a subsidized canteen and various free drinks.
Modern office space with very good transport connections.
Various employee discounts for activities and products.
Employee events such as summer and winter parties, as well as workshops.
Numerous training and development opportunities.
Various health offers, such as sports and health courses.

About IONOS

IONOS is the leading European digitalization partner for small and medium-sized businesses (SMB). The company serves around six million customers and operates across 18 markets in Europe and North America, with its services being accessible worldwide. With its Web Presence & Productivity portfolio, IONOS acts as a 'one-stop shop' for all digitalization needs: from domains and web hosting to classic website builders and do-it-yourself solutions, from e-commerce to online marketing tools. In addition, the company offers Cloud Solutions to enterprises who are looking to move to the cloud as their businesses evolve.

We value diversity and welcome all applications - regardless of, for example, gender, nationality, ethnic or social origin, religion, disability, age as well as sexual orientation and identity, physical characteristics, marital status or any other irrelevant factor subject to applicable law.

Site Reliability Engineer

Vor 4 Tagen

Berlin, Berlin, Deutschland Glow Beauty On Demand Vollzeit

About the opportunity We are seeking a Site Reliability Engineer to join the Observability group inside our Platform Engineering domain. Platform Engineering's goal is to provide easy to use, self-service platforms to enable other segments to easily build, deploy and monitor their business applications. And Observability's role in that part of the company...
Site Reliability Engineer

vor 1 Woche

Berlin, Berlin, Deutschland Hirefive Vollzeit 60.000 € - 120.000 € pro Jahr

Site Reliability Engineer Our growing user base demands cheap, fast and highly available web hosting and we need youto make it possible Join us as a full-time Site Reliability Engineer. This position will offer you personal andprofessional development, startup insights, and the opportunity to be part of one of the mostinspiring deep-tech startups. You...
Site Reliability Engineer

Vor 4 Tagen

Berlin, Berlin, Deutschland IONOS SE Vollzeit

Bei IONOS arbeitest Du bei dem führenden europäischen Anbieter von Cloud-Infrastruktur, Cloud-Services und Hosting-Dienstleistungen partnerschaftlich mit unterschiedlichen Teams zusammen. Wir bieten Dir eine Perspektive in einer der zukunftssichersten Branchen. Uns zeichnen offene Arbeitsstrukturen, Duz-Kultur und flache Hierarchien mit unvergleichlichem...
Senior Site Reliability Engineer

Vor 4 Tagen

Berlin, Berlin, Deutschland Glow Beauty On Demand Vollzeit

About the opportunity We are seeking a Senior Site Reliability Engineer to join the Platform Engineering Domain in the AI Platform Team. The mission of Platform Engineering is to provide trusted, performant, self-service platforms that empower product teams to build 'the bank the world loves to use.' The AI Platform team contributes to this mission by...
Site Reliability Engineer

Vor 4 Tagen

Berlin, Berlin, Deutschland IONOS Vollzeit

Bei IONOS arbeitest Du bei dem führenden europäischen Anbieter von Cloud-Infrastruktur, Cloud-Services und Hosting-Dienstleistungen partnerschaftlich mit unterschiedlichen Teams zusammen. Wir bieten Dir eine Perspektive in einer der zukunftssichersten Branchen. Uns zeichnen offene Arbeitsstrukturen, Duz-Kultur und flache Hierarchien mit unvergleichlichem...
Site Reliability Engineer

vor 2 Wochen

Berlin, Berlin, Deutschland Blackfluo Vollzeit 84.000 € - 85.000 € pro Jahr

Job DescriptionLocation: Full remote, EU timezone (CET +/- 2 hours)Start Date: As soon as possibleLanguages: English requiredWe are looking for a skilled Site Reliability Engineer (SRE) with deep expertise in AWS to help us scale and secure our infrastructure. As an SRE, you will be instrumental in ensuring the reliability, performance, and scalability of...
Site Reliability Engineer

vor 1 Woche

Berlin, Berlin, Deutschland Zattoo Vollzeit 80.000 € - 120.000 € pro Jahr

THE ROLE & THE SRE TEAMAt Zattoo, we're building the TV platform of the future. With our ever-growing demand for unicast TV delivery, we're scaling out our custom-built infrastructure to deliver live and on-demand video at multi-Tbps scale. Because we own the full chain - from ingest, encoding/transcoding, packaging, to delivery - our engineers have the...
Site Reliability Engineer

Vor 2 Tagen

Berlin, Berlin, Deutschland Assecor GmbH Vollzeit

Deine Rolle bei unsDu brennst für Technologie, denkst systematisch und handelst mit Weitblick? Du liebst es, Neues zu lernen, gehst Herausforderungen proaktiv an und behältst auch in kritischen Situationen einen kühlen Kopf? Dann bist du als Site Reliability Engineer (m/w/d) bei uns genau richtigWir suchen eine Persönlichkeit mit Wissensdurst und...
Site Reliability Engineer

Vor 2 Tagen

Berlin, Berlin, Deutschland Zattoo Vollzeit

YOUR FUTURE, ON DEMANDThe ideal blend of stability and flexibility. A genuinely human employer that cares for people and the planet. True autonomy to shape what comes next, for us and you. This is the perfect platform to take your career where you want. Back in 2005, we pioneered Europe's first TV streaming service. Today, we're the world's first certified...
Site Reliability Engineer

vor 1 Woche

Berlin, Berlin, Deutschland 1KOMMA5˚ Vollzeit 60.000 € - 120.000 € pro Jahr

1KOMMA5°We are looking for you as an addition to our tech-team in Berlin, Munich or Hamburg. 1KOMMA5° is building Germany's largest one-stop-shop for sale, installation and services related to solar, heat pumps, electricity and charging infrastructure. And they are all connected Be a part of our missionBecome a part of our mission and learn about our...

Amerika

Europa

Asien / Ozeanien

Afrika

Site Reliability Engineer