Site Reliability Engineer
vor 2 Monaten
Unterföhring, Bayern, Deutschland
Virtual Minds GmbH
Vollzeit
Job DescriptionVirtual Minds GmbH is a leading provider of premium Adtech solutions in Europe, with over 20 years of experience in the digital advertising market. We are seeking a highly skilled Site Reliability Engineer to join our team and contribute to the growth of our dynamic and innovative organization.
Key Responsibilities- Design, deploy, and manage our Kubernetes platform to support scalable and reliable application deployments.
- Oversee the deployment of our Software-as-a-Service applications on the Kubernetes platform, implementing best practices for application scalability, high availability, and disaster recovery.
- Implement robust monitoring, alerting, and logging systems to proactively identify and resolve potential issues, ensuring high system availability and quick incident response times.
- Continuously optimize the Kubernetes infrastructure and SaaS applications to achieve maximum performance and efficiency, conducting performance testing and tuning to meet or exceed service level objectives.
- Participate in an on-call rotation to respond to incidents promptly and effectively, conducting thorough post-incident reviews to identify root causes and implement preventive measures.
- Develop and maintain automation tools and scripts to streamline processes and improve the efficiency of operational tasks.
- Implement security best practices for Kubernetes and SaaS applications, collaborating with the security team to ensure compliance with industry standards and regulations.
- Work closely with cross-functional teams, including development, infrastructure, and product management, to provide expertise and support throughout the software development lifecycle.
- Identify areas for improvement in the infrastructure, processes, and deployment methodologies, proposing and implementing enhancements to increase system reliability and performance.
- Significant relevant experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role, with a strong focus on Kubernetes platform management and SaaS deployment.
- Proficiency in managing Kubernetes clusters and related tooling, including Helm, kubectl, and operators.
- Experience with container orchestration, service mesh, and Kubernetes networking.
- Significant experience with AWS, especially services like EKS, MSK, RDS, S3, CloudTrail, CloudWatch, and deploying and managing the AWS infrastructure as code using Terraform and ArgoCD.
- Solid programming skills in languages such as Python or Go, with proficiency in scripting to automate tasks and develop tooling.
- Experience with monitoring solutions, such as Prometheus and Grafana, and centralized logging platforms, such as the ELK stack.
- Knowledge of continuous integration and continuous deployment pipelines, preferably with tools like Jenkins, GitLab CI/CD, or Tekton.
- Understanding of networking concepts and security best practices in the context of Kubernetes and SaaS deployments.
- Strong analytical and problem-solving abilities to diagnose and resolve complex technical issues.
- Excellent teamwork and communication skills to collaborate effectively with various teams and stakeholders.
- A passion for staying up-to-date with the latest technologies, industry trends, and best practices in SRE and Kubernetes.
- A dynamic and innovative work environment with a team of top experts.
- A start-up atmosphere with the benefits of a large corporation.
- Flexible working hours and 30 days of vacation.
- A wide range of internal and external training opportunities for personal and professional development.
- Additional benefits, such as employee discounts, bicycle leasing, and subsidised company pension schemes.