100 remote - AI Infrastructure & Inference Engineer (m/w/d) Fokus GPU & LLM

vor 1 Woche


remote, Deutschland Nemensis Ag Vollzeit 80.000 € - 120.000 € pro Jahr

Stellenbeschreibung
Für unseren Kunden sind wir auf der Suche nach einem AI Infrastructure & Inference Engineer (m/w/d) mit Fokus GPU & LLM.
 
Laufzeit: 5.1.26
Auslastung: Vollzeit
Einsatzort: Remote

• Design, implement, and optimize LLM and multimodal inference pipelines across multi-GPU, multi-node, and distributed environments.

• Build request routing and load balancing systems to ensure ultra-low latency, high-throughput services.

• Develop auto-scaling and intelligent resource allocation to meet strict SLAs across multiple data centers.

• Architect trade-offs between latency, throughput, and cost efficiency for diverse workloads.

• Implement traffic shaping and multi-tenant orchestration for fair and reliable compute allocation.

• Collaborate with AI researchers, platform engineers, and ML practitioners to bring new model architectures to production.

• Automate system provisioning, deployment pipelines, and operational tasks using modern DevOps and MLOps practices.

• Monitor, profile, and benchmark system-level performance for maximum GPU utilization and uptime.

• Apply best practices in system security, observability (logging/metrics/tracing), and disaster recovery.

• Contribute to open-source ecosystems and internal tooling to push the boundaries of inference performance.

• Maintain comprehensive technical documentation and participate in continuous process improvements.
 
Required skills

• Bachelor's or Master's degree in Computer Science, Engineering, or a related field.

• 5+ years of experience in high-performance computing, GPU infrastructure, or distributed systems.

• Deep understanding of multi-GPU orchestration, workload scheduling, and distributed architectures.

• Proficiency with programming (Python or similar language) and systems automation scripting.

• Strong background in containerization (Docker), orchestration frameworks (Kubernetes), and CI/CD pipelines.

• Familiarity with observability tools such as Prometheus, Grafana, and OpenTelemetry.

• Strong understanding of OS-level performance (multi-threading, networking, memory management).

• Clear communication skills and the ability to work collaboratively across technical teams.
 
Preferred Skills

• Experience with NVIDIA DGX systems, NIM, TensorRT-LLM, or high-performance inference frameworks.

• Hands-on knowledge of CUDA, NCCL, Triton, MPI, NVLink, or InfiniBand networking.

• Experience deploying GPU clusters in both cloud and bare-metal environments.

• Familiarity with open-source inference ecosystems like SGLang, vLLM, or NVIDIA Dynamo.

• Knowledge of LLM optimization techniques for inference and fine-tuning acceleration.

• Understanding of enterprise security frameworks, compliance standards, and GDPR requirements.
  • AI Engineer

    vor 7 Stunden


    Remote (Germany) Cardo AI Vollzeit 80.000 € - 120.000 € pro Jahr

    DescriptionAbout Cardo AICardo AI builds next generation technology for asset-based finance, private credit, and structured products. Our platform powers institutional investors, banks, and fintechs with AI-driven data management, reporting, and analytics for complex debt portfolios.For more details, visit our website at Role OverviewWe are looking for a...

  • Staff AI Engineer

    vor 7 Stunden


    Berlin - Remote in Europe, Deutschland Bluefish AI Vollzeit 100.000 € - 1.500.000 € pro Jahr

    About the PositionAs a Staff AI Engineer, you'll serve as a technical leader for our LLM-powered products at the forefront of marketing and advertising technologies. You'll own critical architectural decisions, set quality bars, and lead multi‑team initiatives that drive measurable outcomes.As our Staff AI Engineer, you will lead the vision and execution...

  • Staff AI Engineer

    Vor 6 Tagen


    Berlin - Remote in Europe, Deutschland Bluefish AI Vollzeit 90.000 € - 120.000 € pro Jahr

    About the PositionAs a Staff AI Engineer, you'll serve as a technical leader for our LLM-powered products at the forefront of marketing and advertising technologies. You'll own critical architectural decisions, set quality bars, and lead multi‑team initiatives that drive measurable outcomes.As our Staff AI Engineer, you will lead the vision and execution...

  • AI Engineer Student

    vor 6 Stunden


    Remote Deutschland iits-consulting Vollzeit 40.000 € - 60.000 € pro Jahr

    Über uns Wir bei iits bieten unseren Kunden individuelle Softwarelösungen, da jedes Unternehmen einzigartig ist und eigene Ziele und Herausforderungen hat. Dies ermöglichen wir durch die technische Expertise unserer Teams, welche die Geschäftsprozesse der Kunden tagtäglich optimieren und eine maßgeschneiderte Digitalisierung ermöglichen. Wir haben...

  • Student R&D support

    vor 1 Woche


    Hybrid, Remote, Kaiserslautern, Deutschland Lubis Eda Vollzeit 45.000 € - 65.000 € pro Jahr

    Your mission We are looking for a motivated student assistant to support our R&D team in setting up and maintaining large language model (LLM) inference environments and related API services. The role involves hands-on work with modern inference frameworks and GPU-based infrastructures, both cloud-hosted and on-premises.Setting up, configuring, and...


  • Munich, Germany (remote) Mitratech Vollzeit 100.000 € - 150.000 € pro Jahr

    At Mitratech, we are a team of innovators focused on building world-class products that simplify operations in the Legal, Risk, Compliance, and HR functions. We are a close-knit, globally dispersed team that thrives in an ecosystem that supports individual excellence and takes pride in its diverse and inclusive work culture centered around great people...

  • Applied AI Engineer

    Vor 6 Tagen


    Remote, Deutschland Moss Vollzeit 90.000 € - 120.000 € pro Jahr

    At Moss, we give finance professionals the power to automate their day-to-day and make forward-thinking decisions.Our team and culture make us unique — we're driven by impact and growth, where every one of us strives to learn and excel. Recognised by Sifted's Rising 100 and LinkedIn's Top Startups, we're here to help propel your career and together, make...

  • AI Engineer

    Vor 7 Tagen


    Berlin (Remote), Deutschland Acto Vollzeit 90.000 € - 120.000 € pro Jahr

    Your Role We're looking for an experienced (Senior/Staff) AI Engineer (m/f/d) to help build the next generation of agentic AI systems powered by Large Language Models (LLMs). You'll be working at the intersection of reasoning, tool-use, and memory, designing intelligent agents that operate in complex environments, interact with APIs and structured data,...


  • Germany (Remote) ; Ireland (Remote); Netherlands (Remote) ; Portugal (Remote) ; Spain (Remote) ; United Kingdom (Remote) Typeform Vollzeit 80.000 € - 120.000 € pro Jahr

    Who we areTypeform is a refreshingly different form builder. We help over 150,000 businesses collect the data they need with forms, surveys, and quizzes that people enjoy. Designed to look striking and feel effortless to fill out, Typeform drives 500 million responses every year—and integrates with essential tools like Slack, Zapier, and Hubspot. Typeform...


  • Deutschland, remote bei The Quality Group Vollzeit 40.000 € - 60.000 € pro Jahr

    Start: ab sofort | Level: Mid-Senior | Location: Deutschland, remote | Arbeitszeit: Vollzeit (40h/Woche)Als Cloud & AI Engineer (gn) gestaltest du die Zukunft unserer digital-technischen Landschaft: Du entwickelst zuverlässige, automatisierte Infrastruktur und bringst KI-Workflows in Produktion – mit dem Ziel, Effizienz, Skalierbarkeit und Innovation in...