Stellenangebot der Site Reliability Engineer in Unterföhring, Bayern

Site Reliability Engineer

vor 2 Monaten

Unterföhring, Bayern, Deutschland Virtual Minds GmbH Vollzeit

Job Description

Virtual Minds GmbH is a leading provider of premium Adtech solutions in Europe, with over 20 years of experience in the digital advertising market. We are seeking a highly skilled Site Reliability Engineer to join our team and contribute to the growth of our dynamic and innovative organization.

Key Responsibilities

Design, deploy, and manage our Kubernetes platform to support scalable and reliable application deployments.
Oversee the deployment of our Software-as-a-Service applications on the Kubernetes platform, implementing best practices for application scalability, high availability, and disaster recovery.
Implement robust monitoring, alerting, and logging systems to proactively identify and resolve potential issues, ensuring high system availability and quick incident response times.
Continuously optimize the Kubernetes infrastructure and SaaS applications to achieve maximum performance and efficiency, conducting performance testing and tuning to meet or exceed service level objectives.
Participate in an on-call rotation to respond to incidents promptly and effectively, conducting thorough post-incident reviews to identify root causes and implement preventive measures.
Develop and maintain automation tools and scripts to streamline processes and improve the efficiency of operational tasks.
Implement security best practices for Kubernetes and SaaS applications, collaborating with the security team to ensure compliance with industry standards and regulations.
Work closely with cross-functional teams, including development, infrastructure, and product management, to provide expertise and support throughout the software development lifecycle.
Identify areas for improvement in the infrastructure, processes, and deployment methodologies, proposing and implementing enhancements to increase system reliability and performance.

Requirements

Significant relevant experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role, with a strong focus on Kubernetes platform management and SaaS deployment.
Proficiency in managing Kubernetes clusters and related tooling, including Helm, kubectl, and operators.
Experience with container orchestration, service mesh, and Kubernetes networking.
Significant experience with AWS, especially services like EKS, MSK, RDS, S3, CloudTrail, CloudWatch, and deploying and managing the AWS infrastructure as code using Terraform and ArgoCD.
Solid programming skills in languages such as Python or Go, with proficiency in scripting to automate tasks and develop tooling.
Experience with monitoring solutions, such as Prometheus and Grafana, and centralized logging platforms, such as the ELK stack.
Knowledge of continuous integration and continuous deployment pipelines, preferably with tools like Jenkins, GitLab CI/CD, or Tekton.
Understanding of networking concepts and security best practices in the context of Kubernetes and SaaS deployments.
Strong analytical and problem-solving abilities to diagnose and resolve complex technical issues.
Excellent teamwork and communication skills to collaborate effectively with various teams and stakeholders.
A passion for staying up-to-date with the latest technologies, industry trends, and best practices in SRE and Kubernetes.

What We Offer

A dynamic and innovative work environment with a team of top experts.
A start-up atmosphere with the benefits of a large corporation.
Flexible working hours and 30 days of vacation.
A wide range of internal and external training opportunities for personal and professional development.
Additional benefits, such as employee discounts, bicycle leasing, and subsidised company pension schemes.

Amerika

Europa

Asien / Ozeanien

Afrika

Site Reliability Engineer