AI Research Engineer

Vor 3 Tagen

Berlin, Berlin, Deutschland ellamind Vollzeit 80.000 € - 120.000 € pro Jahr

At ellamind, we build evaluation-first AI infrastructure. Our platform elluminate turns AI evaluation from ad-hoc "vibe checks" into rigorous, repeatable engineering to enable teams to test, measure, and improve LLM applications with confidence.

What you'll do

Advance LLM evaluation research: Design, implement, and validate new benchmarks, metrics, and workflows that measure correctness, robustness, safety, and reliability. Across languages and modalities.
Build LLM-as-a-judge setups and reward models: Develop rubric-based graders, preference data pipelines, reward models, do DPO/RLHF/RLAIF/RLVF training
Generate and curate synthetic data: Create high-quality synthetic datasets for pre-training, post-training and evaluation of LLMs with filtering, deduplication and decontamination to reliably improve model capabilities.
Train and adapt open models: Pre-train and fine-tune open-source LLMs. Use LLM training frameworks to run rigorous ablations.
Scale experiments on GPU clusters: Orchestrate large-scale training inference, and evaluation jobs. Optimize efficiency, and ensure reproducibility end-to-end. We are working with thousands of GPUs.
Multilingual data and evaluation: Extend training datasets and eval pipelines to European languages.
Open science & collaboration: Release datasets/tools, publish technical reports blog posts and papers, and collaborate with partners (e.g., OpenEuroLLM) to push evaluation standards forward.
Productize research: Turn prototypes into elluminate features—automated eval suites, graders, and data pipelines. Work with platform engineers and product to ship reliable workflows.

You'll mostly work with a Python-based LLM research stack (Huggingface ecosystem, PyTorch, Megatron-LM/torchtitan, vLLM/SGLang, lm-eval-harness/LightEval, dataframe libraries, SLURM, Ray).

What we're looking for

Must-haves

Strong Python engineering skills: Experience building LLM-centric systems with clean, maintainable code, comprehensive testing, and performance optimization at scale.
LLM operations expertise: You're comfortable with tokenizers/vocabs, data specs (e.g., Parquet), sampling/decoding configs, and evaluation.
Distributed training & inference literacy: Solid grasp of multi-GPU/multi-node fundamentals (e.g., FSDP/DeepSpeed), scheduling, and monitoring—plus practical debugging of throughput/memory issues.
Experiment design & statistics: You plan ablations, track experiments, and use sound statistical methods (significance testing, uncertainty estimates) to draw reliable conclusions.
Data hygiene mindset: You care about dataset quality—deduplication, contamination checks, multilingual coverage, and traceable versioning.
Linux comfort: You're productive on Linux servers—shell workflows, virtual environments, containers, GPU tooling, logs/metrics, and remote development/debugging.
On-site collaboration: ≥3 days/week in Berlin or Bremen. Travel to our Bremen HQ during onboarding.
Fluency in English: At least B2 level for team collaboration and technical discussions.
Valid EU work authorization.

Nice-to-haves

Experience with LLM evaluation frameworks (lm-eval-harness, LightEval) or a track record of rigorous custom benchmarks and metrics.
Background in preference learning and reward modeling (DPO/RLHF/RLAIF), including rubric design and high-quality preference data pipelines.
Multilingual expertise: building or evaluating models across European languages; data collection, alignment, and cross-lingual transfer.
Comfort with high-throughput inference systems (vLLM, SGLang), latency/memory optimization, and model quantization.
Experience with systems and orchestration (Slurm/Ray/Kubernetes) and containers (Docker/Apptainer) – including GPU observability, scheduling, and performance tuning.
Familiarity with MLOps and reproducibility: experiment tracking (e.g., W&B), dataset/model/prompt versioning, CI for research workflows, and dependable artifact management.
Experience building open-source tools or publishing research artifacts (datasets, models, papers) or strong technical writing.
Experience working directly with partners or customers to validate results and translate research into product impact.
Advanced degree in Computer Science, Machine Learning, Data Science, or a related field (PhD preferred, or equivalent achievements).

What matters most

We prioritize demonstrated excellence in your projects and career. If you're motivated to build and optimize AI solutions, we want to hear from you—even if you don't meet every single criterion.

Diversity & inclusion

Different perspectives make us stronger. We welcome applicants from all backgrounds and encourage you to apply.

Why us?

Shape the future of AI research: Influence our research agenda and Europe's LLM ecosystem—help set evaluation standards and training practices that serious AI teams and institutions rely on.
Technical excellence meets cutting-edge research: Push the frontier of LLM training and evaluation—design multilingual benchmarks, build LLM-as-a-judge and reward models, generate high-quality synthetic data, and run rigorous ablations at scale on large GPU clusters.
Career-defining opportunity: Systematic evaluation is becoming as fundamental to AI as version control is to software. Work at the center of this shift and contribute methods, datasets, and tools that others adopt and build upon.
Ownership and impact: Lead research end-to-end—formulate hypotheses, build datasets and benchmarks, run large-scale experiments, and publish results (papers, technical reports, OSS). Collaborate with top-tier partner labs and see your work shape model behavior and evaluation practices across the industry.
Compute that matches your ambition: Access serious GPU resources.
Open science by default: Freedom to release datasets, models, and tools; backing for conference submissions and travel.
Competitive package with upside: In addition to a competitive salary, we offer a VSOP (Virtual Stock Option Program) to give you a real stake in the company's success as we grow.
Best-in-class development experience: Fast and streamlined access to all AI technologies that make your life (and development work) easier, plus the latest tools and platforms to maximize your productivity.
Work environment: Our Bremen office features stunning waterfront views, complimentary beverages, smoothies, and a boat. We're opening our Berlin office at the end of 2025, giving you flexibility as we expand.
Grow with transformative technology: Build deep expertise in LLM evaluation and infrastructure, contribute to open standards, and advance the state of the art alongside a team that values rigor and impact.

About us

We are a cash-flow-positive Germany-based AI startup building elluminate—the enterprise platform that turns AI evaluation from ad-hoc experiments into rigorous, repeatable workflows so teams can ship reliable AI with confidence. Teams use elluminate to design test suites, benchmark models, track regressions, and ship reliable AI with clear, measurable quality gates. We pair elluminate with custom large-language-model solutions and full on-prem deployment options. Our products have already earned the trust of renowned clients such as Deutsche Telekom, the German Federal Government, and leading health insurers like hkk.

Rooted in Bremen and collaborating with leading organizations, our team has a track record in advanced model and dataset development. We like owning problems end-to-end and shipping pragmatically, and contribute to the open-source community across initiatives like OpenEuroLLM, and regularly publish models and tools to accelerate the broader ecosystem.

AI Automation Engineer

vor 2 Wochen

Berlin, Berlin, Deutschland Zenflow AI Vollzeit 60.000 € - 85.000 € pro Jahr

Company DescriptionZenflow AI helps companies implement intelligent AI systems. We work closely with leadership teams to automate manual processes, streamline workflows, and deploy AI agents that operate across departments. Our custom-built solutions deliver measurable ROI, enabling your team to focus on strategic priorities while AI handles the repetitive...
Senior AI Engineer

vor 2 Wochen

Berlin, Berlin, Deutschland Bluefish AI Vollzeit 1.000.000 € - 1.500.000 € pro Jahr

About the Position As a Senior AI Engineer, you'll spearhead the development of LLM-powered products at the forefront of marketing and advertising technologies. Utilizing your expertise in machine learning, large language models, and natural language processing, you'll play a pivotal role in designing, implementing, and enhancing our range of AI-driven...
Medical Research Coordinator

Vor 5 Tagen

Berlin, Berlin, Deutschland Nucs AI Vollzeit 45.000 € - 75.000 € pro Jahr

Job description About Nucs AI:Nucs AI is a pioneer in developing AI-driven solutions that personalize prostate cancer care. Our advanced technologies provide healthcare professionals with state-of-the-art tools for diagnosis, treatment, and research. We collaborate with leading medical institutions globally to achieve groundbreaking results in patient...
Senior AI Agent Engineer

vor 2 Wochen

Berlin, Berlin, Deutschland Beam AI Vollzeit 80.000 € - 120.000 € pro Jahr

Beam AI builds self-learning AI agents and an organisational operating system where intelligence grows through autonomous discovery. We help enterprises evolve from AI-curious to AI-native with automation that delivers results today and learns to improve itself over time. Join us in shaping a future where AI continuously discovers, adapts, and makes work...
AI Engineer

vor 1 Tag

Berlin, Berlin, Deutschland Jack & JillExternal ATS Vollzeit

This is a job that we are recruiting for on behalf of one of our customers.To apply, speak to Jack. He's an AI agent that sends you unmissable jobs and then helps you ace the interview. He'll make sure you are considered for this role, and help you find others if you ask.AI Engineer Company Description: - Sequoia Arc alumni, AI-native automationJob...
AI Engineer

vor 1 Tag

Berlin, Berlin, Deutschland Jack & Jill Vollzeit

This is a job that we are recruiting for on behalf of one of our customers.To apply, speak to Jack. He's an AI agent that sends you unmissable jobs and then helps you ace the interview. He'll make sure you are considered for this role, and help you find others if you ask.AI EngineerCompany Description: - Sequoia Arc alumni, AI-native automationJob...
Research Engineer

Vor 5 Tagen

Berlin, Berlin, Deutschland DeepRec Vollzeit 70.000 € - 120.000 € pro Jahr

Research Engineer – Training Optimisation and InfrastructureLocation: Berlin - Remote within Europe (±2 hours CET)Level:Mid to StaffPackage:Competitive salary plus equityThe OpportunityA Series A generative AI company is hiring a Research Engineer to drive optimisation across training strategy and ML infrastructure. The business builds state-of-the-art...
User Experience Researcher

Vor 5 Tagen

Berlin, Berlin, Deutschland AI Republic Vollzeit 75.000 € - 120.000 € pro Jahr

We are working with AI scale-up in Berlin looking for afreelance UX Researcher,You'll work closely with Product, Design, and Engineering to uncover user needs, validate concepts, and guide decisions with clear, actionable insights.What you'll do:Plan and run mixed-methods UX research (interviews, usability tests, surveys, analytics).Translate findings into...
Multimodal AI Engineer

vor 2 Wochen

Berlin, Berlin, Deutschland Solutyics Vollzeit 60.000 € - 120.000 € pro Jahr

Company DescriptionSolutyics is dedicated to shaping a future where Data and AI drive transformative possibilities. Our team of technology experts specializes in AI, Machine Learning, Data Engineering, and Advanced Data Analytics to provide innovative solutions that help clients overcome challenges and achieve long-term success. By prioritizing a deep...
AI Engineer

vor 2 Wochen

Berlin, Berlin, Deutschland Flank Vollzeit 80.000 € - 120.000 € pro Jahr

Flank is the leading Agentic AI platform for in-house legal teams—built for a future where autonomous AI colleagues handle entire workflows.Companies like TravelPerk, QA, and Mural use Flank to automate contract negotiation, email triage, and security reviews. What used to take hours of manual work now happens automatically.We're building in an emerging...

Amerika

Europa

Asien / Ozeanien

Afrika

AI Research Engineer