Site Reliability Engineer
vor 3 Wochen
Job Overview:
As a Mid-Level Site Reliability Engineer, you will play a crucial role in ensuring the availability, performance, and reliability of our systems hosted on Google Cloud Platform (GCP). You will collaborate with engineering teams to design, deploy, and maintain scalable infrastructure, automate workflows, and troubleshoot production issues to maintain optimal system performance. The ideal candidate has a strong background in cloud infrastructure, automation, and incident management.
Key Responsibilities:
- Maintain and monitor systems: Ensure high availability and reliability of services and infrastructure deployed on Google Cloud Platform (GCP).
- Incident response: Act as an escalation point for production incidents, troubleshoot and resolve issues quickly, and perform root cause analysis to prevent future occurrences.
- Automation and tooling: Build and maintain automation tools to streamline operational workflows, deployment processes, and monitoring.
- Performance optimization: Work to enhance the scalability, reliability, and performance of the infrastructure and services by identifying bottlenecks, performing load testing, and optimizing cloud resources.
- Collaboration with development teams: Collaborate closely with software engineers to design resilient systems, develop deployment pipelines, and improve system observability.
- Monitoring and alerting: Set up and maintain monitoring and alerting systems using GCP-native tools (e.g., Cloud Monitoring, Cloud Logging), as well as third-party tools, to proactively identify and address issues.
- Security and compliance: Implement security best practices for infrastructure, ensure data privacy and integrity, and assist with maintaining compliance requirements.
- Documentation: Maintain clear documentation for processes, runbooks, and infrastructure configurations.
Qualifications:
- Experience:
- 3+ years of experience in Site Reliability Engineering, DevOps, or related roles.
- Hands-on experience working with Google Cloud Platform (GCP), including services like Compute Engine, Kubernetes Engine, Cloud Storage, Cloud Functions, BigQuery, and Cloud Pub/Sub.
- Solid understanding of infrastructure-as-code (IaC) principles and experience with tools like Terraform, Google Cloud Deployment Manager, or CloudFormation.
- Experience with containerization and orchestration technologies, particularly Docker and Kubernetes.
- Familiarity with CI/CD pipelines and automation tools (e.g., Jenkins, GitLab CI, GitHub Actions).
- Strong experience with monitoring, logging, and alerting tools, including Google Cloud Monitoring, Prometheus, Grafana, or similar.
- Technical Skills:
- Proficient in scripting languages such as Python, Bash, or Go.
- Strong knowledge of networking, load balancing, DNS, and firewall management in cloud environments.
- Familiarity with version control systems like Git.
- Solid understanding of containerization, microservices architecture, and distributed systems.
Soft Skills:
- Strong problem-solving and analytical skills.
- Excellent communication skills, with the ability to collaborate with cross-functional teams.
- Ability to handle high-pressure situations and provide timely resolutions to incidents.
- A proactive, self-starter with a passion for continuous learning and improvement.
-
Site Reliability Engineer
vor 1 Monat
Berlin, Berlin, Deutschland Schwarz Dienstleistungen VollzeitSchwarz Dienstleistungen sucht einen Site Reliability Engineer mit folgenden Qualifikationen:• Gute Kenntnisse in einer der folgenden Programmiersprachen: C, C++, Go (Golang), Rust, Java• Grundlegendes Verständnis der Prinzipien des Site Reliability Engineering (SRE), wie zum Beispiel Monitoring, Alarmierung, Fehlerbudgets, Fehleranalysen oder anderer...
-
Site Reliability Engineer
vor 3 Wochen
Berlin, Berlin, Deutschland Paymenttools VollzeitStell dir vor, du bist ein wichtiger Teil des Teams, das die Zahlungssysteme von Paymenttools sicher und zuverlässig macht.Wir suchen nach erfahrener Site Reliability Engineer, der unsere Systeme optimalisiert, die Zuverlässigkeit erhöht und sicherstellt, dass unsere Zahlungssysteme immer verfügbar sind. Du wirst mit den Produktteams zusammenarbeiten, um...
-
Site Reliability Expert
vor 2 Wochen
Berlin, Berlin, Deutschland EGYM GmbH VollzeitAbout the RoleWe're seeking a highly skilled Site Reliability Engineer to join our team at EGYM GmbH. As a Site Reliability Engineer, you'll play a key role in ensuring the reliability and scalability of our cloud services.You'll be responsible for monitoring the availability and latency of our services, troubleshooting incidents, and collaborating with...
-
Berlin, Berlin, Deutschland EGYM GmbH VollzeitAbout the RoleWe are seeking a skilled Site Reliability Engineer to join our international team in Munich or Berlin.The ideal candidate will be experienced in working with Cloud Providers (GCP, AWS), Microservices and Container Orchestration. Proficient coding skills in at least one or more programming languages (preferably Go or Java) and a solid...
-
Site Reliability Engineer
vor 3 Wochen
Berlin, Deutschland Nooxit VollzeitFull-time (40 h), as soon as possible, permanent and based in Berlin or remotely in home office. We’re seeking an experienced Site Reliability Engineer (SRE) with a solid foundation in Python, a passion for performance optimization, and a proactive approach to infrastructure management. In this role, you’ll work closely with development and operations...
-
Site Reliability Engineer
vor 3 Wochen
Berlin, Deutschland Nooxit VollzeitFull-time (40 h), as soon as possible, permanent and based in Berlin or remotely in home office. We’re seeking an experienced Site Reliability Engineer (SRE) with a solid foundation in Python, a passion for performance optimization, and a proactive approach to infrastructure management. In this role, you’ll work closely with development and operations...
-
(Senior) Site Reliability Engineer
vor 6 Monaten
Berlin, Deutschland Paymenttools VollzeitReliability ist dein zweiter Vorname?Wir bei Paymenttools, einer Tochtergesellschaft der REWE Group, revolutionieren die Zahlungslandschaft. Von Apple Pay bis PayPal - wir haben es uns zur Aufgabe gemacht, digitale Transaktionen in ganz Europa und darüber hinaus zu vereinfachen und zu sichern. Unser Mantra: #wesolvepayn. Wir verbinden modernste Technologie...
-
(Senior) Site Reliability Engineer
vor 6 Monaten
Berlin, Deutschland Paymenttools VollzeitReliability ist dein zweiter Vorname? Wir bei Paymenttools, einer Tochtergesellschaft der REWE Group, revolutionieren die Zahlungslandschaft. Von Apple Pay bis PayPal - wir haben es uns zur Aufgabe gemacht, digitale Transaktionen in ganz Europa und darüber hinaus zu vereinfachen und zu sichern. Unser Mantra: #wesolvepayn. Wir verbinden modernste Technologie...
-
(Senior) Site Reliability Engineer
vor 6 Monaten
Berlin, Deutschland Paymenttools VollzeitReliability ist dein zweiter Vorname? Wir bei Paymenttools, einer Tochtergesellschaft der REWE Group, revolutionieren die Zahlungslandschaft. Von Apple Pay bis PayPal - wir haben es uns zur Aufgabe gemacht, digitale Transaktionen in ganz Europa und darüber hinaus zu vereinfachen und zu sichern. Unser Mantra: #wesolvepayn. Wir verbinden modernste Technologie...
-
Site Reliability Engineer
vor 3 Wochen
Berlin, Deutschland Nooxit VollzeitFull-time (40 h), as soon as possible, permanent and based in Berlin or remotely in home office. We’re seeking an experienced Site Reliability Engineer (SRE) with a solid foundation in Python, a passion for performance optimization, and a proactive approach to infrastructure management. In this role, you’ll work closely with development and operations...
-
Chief Site Reliability Officer
vor 3 Wochen
Berlin, Berlin, Deutschland Delivery Hero VollzeitOverviewDelivery Hero is a leading global food delivery marketplace, and we are seeking an experienced Chief Site Reliability Officer to lead our Site Reliability Engineering (SRE) department. This role will be based in Berlin, Germany, and will report to the leader of Developer Platform.
-
Site Reliability Engineer
vor 1 Monat
Berlin, Deutschland Schwarz IT Vollzeith1> Site Reliability Engineer - Platform Engineering - STACKIT Standort: Berlin Abteilung: IT - Cloud Services Level: Berufserfahrene Referenznummer: 42252-de_DE Du willst mit uns STACKITEERs die Cloud-Welt im Sturm erobern und mit uns die Zukunft Europas gestalten? Auch das Onboarding neuer Cloud-Nutzer und die Unterstützung bei...
-
Site Reliability Engineer
vor 3 Wochen
Berlin, Deutschland Schwarz IT VollzeitSite Reliability Engineer - Platform Engineering - STACKIT Standort: Berlin Abteilung: IT - Cloud Services Level: Berufserfahrene Referenznummer: 42252-de_DE Du willst mit uns STACKITEERs die Cloud-Welt im Sturm erobern und mit uns die Zukunft Europas gestalten? Prima! Dann bist du bei STACKIT genau richtig. Unsere Vision ist ambitioniert: Ein unabhängiges...
-
Site Reliability Engineer
vor 2 Wochen
Berlin, Berlin, Deutschland Nooxit VollzeitSystem Reliability Expert Sought for Cutting-Edge StartupNooxit is looking for an experienced Site Reliability Engineer to join our team in Berlin or remotely. The ideal candidate will have a solid foundation in Python, a passion for performance optimization, and a proactive approach to infrastructure management.The successful applicant will work closely...
-
Site Reliability Engineer
vor 3 Wochen
Berlin, Deutschland All the Top Bananas VollzeitSite Reliability Engineer - Platform Engineering - STACKITStandort: Berlin Abteilung: IT - Cloud Services Level: Berufserfahrene Referenznummer: 42252-de_DE Du willst mit uns STACKITEERs die Cloud-Welt im Sturm erobern und mit uns die Zukunft Europas gestalten? Prima! Dann bist du bei STACKIT genau richtig. Unsere Vision ist ambitioniert: Ein unabhängiges...
-
Site Reliability Engineer
vor 1 Monat
Berlin, Deutschland Schwarz IT VollzeitSite Reliability Engineer - Platform Engineering - STACKIT Standort: Berlin Abteilung: IT - Cloud Services Level: Berufserfahrene Referenznummer: 42252-de_DE Du willst mit uns STACKITEERs die Cloud-Welt im Sturm erobern und mit uns die Zukunft Europas gestalten? Prima! Dann bist du bei STACKIT genau richtig. Unsere Vision ist...
-
Site Reliability Engineer
vor 23 Stunden
Berlin, Deutschland Solactive AG VollzeitCompany Description Since its creation in 2007 in the financial city of Frankfurt am Main, Solactive AG has grown to one of the key players in the indexing space. The German multi-asset index provider focusses on tailor-made indices, offering to its clients a faster service, with greater flexibility and at a reasonable cost. Solactive AG and its subsidiaries...
-
Site Reliability Engineer
vor 21 Stunden
Berlin, Deutschland Solactive AG VollzeitCompany Description Since its creation in 2007 in the financial city of Frankfurt am Main, Solactive AG has grown to one of the key players in the indexing space. The German multi-asset index provider focusses on tailor-made indices, offering to its clients a faster service, with greater flexibility and at a reasonable cost. Solactive AG and its subsidiaries...
-
Site Reliability Engineer
vor 6 Monaten
Berlin, Deutschland BestSecret VollzeitAbout BestSecret Group We are a leading European members-only online destination for premium and luxury off-price fashion. Partnering with over 3,000 international brands, our tech-focused mindset and strong commitment to sustainability drives a truly unique experience for our members. With almost 100 years of experience behind us, and a major tech...
-
Site-Sicherheitsingenieur für Zahlungssysteme
Vor 5 Tagen
Berlin, Berlin, Deutschland Paymenttools VollzeitBeschreibung der PositionWir suchen einen erfahrenden Site Reliability Engineer, der unsere Zahlungssysteme zuverlässig, skalierbar, beobachtbar und sicher macht. Als SRE wirst du mit Produktteams zusammenarbeiten, um Infrastruktur, Tools und Prozesse zu entwickeln, zu implementieren und zu warten, die unsere geschäftskritischen Zahlungsanwendungen und...