AI Research Engineer

Vor 6 Tagen

Bremen, Bremen, Deutschland ellamind Vollzeit 110.000 € pro Jahr

At ellamind, we build evaluation-first AI infrastructure. Our platform elluminate turns AI evaluation from ad-hoc "vibe checks" into rigorous, repeatable engineering to enable teams to test, measure, and improve LLM applications with confidence.

What you'll do

Advance LLM evaluation research: Design, implement, and validate new benchmarks, metrics, and workflows that measure correctness, robustness, safety, and reliability. Across languages and modalities.
Build LLM-as-a-judge setups and reward models: Develop rubric-based graders, preference data pipelines, reward models, do DPO/RLHF/RLAIF/RLVF training
Generate and curate synthetic data: Create high-quality synthetic datasets for pre-training, post-training and evaluation of LLMs with filtering, deduplication and decontamination to reliably improve model capabilities.
Train and adapt open models: Pre-train and fine-tune open-source LLMs. Use LLM training frameworks to run rigorous ablations.
Scale experiments on GPU clusters: Orchestrate large-scale training inference, and evaluation jobs. Optimize efficiency, and ensure reproducibility end-to-end. We are working with thousands of GPUs.
Multilingual data and evaluation: Extend training datasets and eval pipelines to European languages.
Open science & collaboration: Release datasets/tools, publish technical reports blog posts and papers, and collaborate with partners (e.g., OpenEuroLLM) to push evaluation standards forward.
Productize research: Turn prototypes into elluminate features—automated eval suites, graders, and data pipelines. Work with platform engineers and product to ship reliable workflows.

You'll mostly work with a Python-based LLM research stack (Huggingface ecosystem, PyTorch, Megatron-LM/torchtitan, vLLM/SGLang, lm-eval-harness/LightEval, dataframe libraries, SLURM, Ray).

What we're looking for

Must-haves

Strong Python engineering skills: Experience building LLM-centric systems with clean, maintainable code, comprehensive testing, and performance optimization at scale.
LLM operations expertise: You're comfortable with tokenizers/vocabs, data specs (e.g., Parquet), sampling/decoding configs, and evaluation.
Distributed training & inference literacy: Solid grasp of multi-GPU/multi-node fundamentals (e.g., FSDP/DeepSpeed), scheduling, and monitoring—plus practical debugging of throughput/memory issues.
Experiment design & statistics: You plan ablations, track experiments, and use sound statistical methods (significance testing, uncertainty estimates) to draw reliable conclusions.
Data hygiene mindset: You care about dataset quality—deduplication, contamination checks, multilingual coverage, and traceable versioning.
Linux comfort: You're productive on Linux servers—shell workflows, virtual environments, containers, GPU tooling, logs/metrics, and remote development/debugging.
On-site collaboration: 3 days/week in Berlin or Bremen. Travel to our Bremen HQ during onboarding.
Fluency in English: At least B2 level for team collaboration and technical discussions.
Valid EU work authorization.

Nice-to-haves

Experience with LLM evaluation frameworks (lm-eval-harness, LightEval) or a track record of rigorous custom benchmarks and metrics.
Background in preference learning and reward modeling (DPO/RLHF/RLAIF), including rubric design and high-quality preference data pipelines.
Multilingual expertise: building or evaluating models across European languages; data collection, alignment, and cross-lingual transfer.
Comfort with high-throughput inference systems (vLLM, SGLang), latency/memory optimization, and model quantization.
Experience with systems and orchestration (Slurm/Ray/Kubernetes) and containers (Docker/Apptainer) – including GPU observability, scheduling, and performance tuning.
Familiarity with MLOps and reproducibility: experiment tracking (e.g., W&B), dataset/model/prompt versioning, CI for research workflows, and dependable artifact management.
Experience building open-source tools or publishing research artifacts (datasets, models, papers) or strong technical writing.
Experience working directly with partners or customers to validate results and translate research into product impact.
Advanced degree in Computer Science, Machine Learning, Data Science, or a related field (PhD preferred, or equivalent achievements).

What matters most

We prioritize demonstrated excellence in your projects and career. If you're motivated to build and optimize AI solutions, we want to hear from you—even if you don't meet every single criterion.

Diversity & inclusion

Different perspectives make us stronger. We welcome applicants from all backgrounds and encourage you to apply.

Why us?

Shape the future of AI research: Influence our research agenda and Europe's LLM ecosystem—help set evaluation standards and training practices that serious AI teams and institutions rely on.
Technical excellence meets cutting-edge research: Push the frontier of LLM training and evaluation—design multilingual benchmarks, build LLM-as-a-judge and reward models, generate high-quality synthetic data, and run rigorous ablations at scale on large GPU clusters.
Career-defining opportunity: Systematic evaluation is becoming as fundamental to AI as version control is to software. Work at the center of this shift and contribute methods, datasets, and tools that others adopt and build upon.
Ownership and impact: Lead research end-to-end—formulate hypotheses, build datasets and benchmarks, run large-scale experiments, and publish results (papers, technical reports, OSS). Collaborate with top-tier partner labs and see your work shape model behavior and evaluation practices across the industry.
Compute that matches your ambition: Access serious GPU resources.
Open science by default: Freedom to release datasets, models, and tools; backing for conference submissions and travel.
Competitive package with upside: In addition to a competitive salary, we offer a VSOP (Virtual Stock Option Program) to give you a real stake in the company's success as we grow.
Best-in-class development experience: Fast and streamlined access to all AI technologies that make your life (and development work) easier, plus the latest tools and platforms to maximize your productivity.
Work environment: Our Bremen office features stunning waterfront views, complimentary beverages, smoothies, and a boat. We're opening our Berlin office at the end of 2025, giving you flexibility as we expand.
Grow with transformative technology: Build deep expertise in LLM evaluation and infrastructure, contribute to open standards, and advance the state of the art alongside a team that values rigor and impact.

About us

We are a cash-flow-positive Germany-based AI startup building elluminate—the enterprise platform that turns AI evaluation from ad-hoc experiments into rigorous, repeatable workflows so teams can ship reliable AI with confidence. Teams use elluminate to design test suites, benchmark models, track regressions, and ship reliable AI with clear, measurable quality gates. We pair elluminate with custom large-language-model solutions and full on-prem deployment options. Our products have already earned the trust of renowned clients such as DeutscheTelekom, the German Federal Government, and leading health insurers like hkk.

Rooted in Bremen and collaborating with leading organizations, our team has a track record in advanced model and dataset development. We like owning problems end-to-end and shipping pragmatically, and contribute to the open-source community across initiatives like OpenEuroLLM, and regularly publish models and tools to accelerate the broader ecosystem.

Compensation Range: € €110,000.00

AI-intern

Vor 5 Tagen

Bremen, Bremen, Deutschland Constructor TECH Vollzeit 13.000 € - 20.000 € pro Jahr

Our missionConstructor's mission is to enable all educational organisations to provide high-quality digital education to 10x people with 10x efficiency. With strong expertise in machine intelligence and data science, Constructor's all-in-one platform for education and research addresses today's pressing educational challenges: access inequality, tech...
AI Project Intern

vor 1 Woche

Bremen, Bremen, Deutschland BEGO Group Vollzeit 30.000 € - 45.000 € pro Jahr

Company DescriptionBEGO is a Germany-based leader in dental technology, specializing in dental prosthetics and implantology. The company provides dental lab technicians and dentists with innovative materials, equipment, and processes for creating high-quality dental prosthetics, including alloys, ceramics, and CAD/CAM solutions. BEGO also offers advanced...
Lead AI/ML Engineer

Vor 5 Tagen

Bremen, Bremen, Deutschland Rheinmetall Vollzeit 80.000 € - 120.000 € pro Jahr

Ref.-Nr.: DE16271Anstellungsart: VollzeitVertragsart: Unbefristeter VertragWOFÜR WIR SIE SUCHENLeitung des Teams aus Data Scientist und ML/AI EngineersTechnische Gesamtverantwortung für die Entwicklung, Integration und kontinuierliche Weiterentwicklung von KI-/ML-Komponenten innerhalb der DatenverarbeitungsketteDefinition der Machine Learning-Architektur...
Field Engineer

Vor 5 Tagen

Bremen, Bremen, Deutschland Avomind Vollzeit 40.000 € - 80.000 € pro Jahr

Company Overview: Headquartered in Suzhou, China, our client provides a complete set of technology and service for smart logistics. Their customized solutions are equipped with world-class AI algorithms. They provide highly-usable and reliable robots and systems for warehousing and intralogistics. Their products and system platforms deliver value for...
Fullstack Engineer

Vor 6 Tagen

Bremen, Bremen, Deutschland ellamind Vollzeit 60.000 € - 100.000 € pro Jahr

At ellamind, we build evaluation-first AI infrastructure. Our platform elluminate turns AI evaluation from ad-hoc "vibe checks" into rigorous, repeatable engineering to enable teams to test, measure, and improve LLM applications with confidence.What you'll doBuild evaluation infrastructure: Develop scalable Django Ninja APIs and interfaces that power...
Back-End Engineer

Vor 4 Tagen

Bremen, Bremen, Deutschland PlanBlue Vollzeit 65.000 € - 80.000 € pro Jahr

Company DescriptionPlanBlue is on a mission to harness the critical role of the seafloor in addressing global challenges such as climate change, biodiversity loss, and food insecurity. Its cutting-edge technology integrates advanced imaging with AI-driven data processing, delivering automated seafloor intelligence that uncovers the seafloor's true ecological...
Propulsion Subsystem Engineer

vor 1 Woche

Bremen, Bremen, Deutschland OHB Vollzeit 60.000 € - 100.000 € pro Jahr

OHB System AG – We.Create.SpaceOHB System AG is one of the leading space companies in Europe. The company is part of the listed high-tech group OHB SE, which has more than 3,200 employees working on pivotal European space programs. With two strong locations in Bremen and Oberpfaffenhofen near Munich and more than 40 years of experience, OHB System AG...
Design Engineer

Vor 6 Tagen

Bremen, Bremen, Deutschland NAXCON GROUP Vollzeit 60.000 € - 80.000 € pro Jahr

Job Openings Design Engineer (m/f/d)About The Job Design Engineer (m/f/d)NAXCON GmbH, located in the heart of Freiburg, is at the forefront of the German IT and engineering industry.Our experts have extensive knowledge in software and hardware development, state-of-the-art electronics, and future-oriented technologies such as artificial intelligence and...
Design Engineer

vor 1 Woche

Bremen, Bremen, Deutschland SOGECLAIR Vollzeit 60.000 € - 100.000 € pro Jahr

Founded in 1962, SOGECLAIR is a group of engineering companies specializing in high technologies. With an international presence, SOGECLAIR has extended its influence beyond national borders, reinforcing its ability to innovate and collaborate with partners worldwide.SOGECLAIR DIGITAL ENGINEERING consolidates and develops engineering projects within the...
Design Engineer

vor 1 Woche

Bremen, Bremen, Deutschland SOGECLAIR Vollzeit 60.000 € - 120.000 € pro Jahr

Founded in 1962, SOGECLAIR is a group of engineering companies specializing in high technologies. With an international presence, SOGECLAIR has extended its influence beyond national borders, reinforcing its ability to innovate and collaborate with partners worldwide.SOGECLAIR DIGITAL ENGINEERING consolidates and develops engineering projects within the...

Amerika

Europa

Asien / Ozeanien

Afrika

AI Research Engineer