Founding ML Engineer in the Flower Frontier Model Team
Vor 4 Tagen
Do you want to push the boundaries of what frontier AI models can be? Join as one of the founding members of the Flower Frontier Model Team, a new group at Flower Labs charged with building category-defining models that blend the bleeding-edge in existing practices together with Flower's pioneering decentralized learning methods. This is a fundamentally different direction than the one vanilla frontier labs are taking, one that not only eases the path to GPU scaling but also unlocks new data silos currently unable to be leveraged for frontier model training.
We will ship models with superhuman capabilities in domains spanning science, health, finance, drug discovery, and more. This is an opportunity to help invent and build the training paradigms that will define the next decade of AI, and to work on technologies that others will study, emulate, and build upon.
About the Role(Preference given to candidates with post-training expertise. But any talented individual with a track record of exceptional drive and determination are encouraged to apply regardless of prior experience.)
As a founding ML Engineer in this new team, you will play a critical role in building SOTA LLMs and foundation models within a small, high-impact team composed of contributors that have a mix of both research and engineering backgrounds. This role combines fast-paced development with disciplined software engineering: you will help build a reliable, maintainable and scalable software stack and use this to produce world-leading models that are open-sourced and integrated into new Flower Lab products.
You will design, implement and optimize core components across the full spectrum of stages relevant to frontier model building: data curation, evals, pre-training, post-training — everything is in scope as the team seeks to release its first series of models. Experience in these areas is obviously welcome, but a general expectation of problem solving, learning on the job and working collaboratively to efficiently combine the talents of the team is an explicit requirement for success. Familiarity with ML distributed and scaling strategies will be essential, as will experience working with GPU clusters (or similar) for multi-node training. You will diagnose and resolve GPU/kernel issues, memory/storage bottlenecks, and multi-node failures at scale — and collaborate on the debugging of training instabilities and related issues. Ability to adapt to different HPC configurations and GPU architectures (e.g., AMD/NVIDIA) will be a big plus. You will also devise surrounding infrastructure, tooling, monitoring, and observability, all essential for large-scale LLM development.
This is a foundational role for an ambitious technical effort. We are looking for a special talent that brings strong engineering discipline to the team, and has the ability to assume technical leadership as the training system scales in complexity and capability. More broadly, you can expect a collaborative, fast-paced and demanding start-up environment containing a team of experts in their respective fields, in which everyone still learns something new every day. You will have the opportunity to contribute ideas, be heard and influence the direction of the company across the board.
About the CompanyFlower Labs is the world-class AI startup best known for being behind the most popular open-source framework in the world for training AI on distributed data and compute resources using decentralized and federated methods. Trusted by industry leaders such as Mozilla, JP Morgan, Owkin, Banking Circle and Temenos use Flower to easily improve their AI models on sensitive data that is distributed across organizational silos or user devices. In a world where most AI relies on centralized public datasets — just a fraction of the data available — we believe unlocking access to (orders of magnitude more) sensitive data will drive the next breakthroughs in artificial intelligence.
Flower Labs is a Y Combinator (YCW23) graduate and backed by top-tier investors and renowned angels, including Felicis, First Spark Ventures, Mozilla Ventures, Hugging Face CEO Clem Delangue, GitHub Co-Founder Scott Chacon, Factorial Capital, Betaworks, and Pioneer Fund. Together, we are redefining how AI is built, deployed, and scaled.
Must Have Skills- Exceptional software engineering skills (Python, deep learning frameworks, testing, profiling, refactoring, reproducibility)
- Expertise with modern ML training stacks: PyTorch, JAX or equivalent; experience implementing model architectures from scratch and working within libraries like DeepSpeed, Megatron or equivalent
- Ability to tune, debug, and profile large-scale training runs
- Hands-on experience working with large GPU clusters, including job orchestration, scheduling, multi-node runs, NCCL/RDMA issues, and GPU performance optimization
- Ability to collaborate effectively with both research-oriented and engineering-oriented colleagues; comfortable turning research ideas into robust, maintainable implementations
- Good engineering hygiene: modular design, code reviews, documentation, reproducibility, versioning of data/models/configurations
- Familiarity with common tools (Linux command line, git, Docker, …)
- Openness to adopting new tooling
- Solid understanding of distributed systems and networking
- Strong written English
- Open, honest and transparent communication skills
- PhD or Masters degree in a relevant discipline
- Familiarity with various components and stages relevant to building LLMs and foundation models, such as architectures, pre-training, data curation, post-training, and evaluation.
- Experience with post-training methods (SFT, RLHF, DPO, reward modeling, or equivalent) — note, preference will be given to individuals with post-training experience.
- Ability to read, implement, and extend cutting-edge research papers quickly
- Prior track-record in advanced distributed training frameworks and concepts
- Strong grasp of optimization and training techniques: mixed precision, curriculum/data strategies, LR schedules, checkpointing or equivalent
- Background writing high-performance kernels (CUDA, Triton)
- Experience in developing components within systems used by thousands of users
- Track record of working in open-source projects
-
Founding Engineer
Vor 6 Tagen
Berlin, Berlin, Deutschland Few&Far VollzeitBe part of a Founding team redefining how businesses run their finances. TypeScript, , React, Python, AI/LLMs Onsite: 5 days per week, Berlin Base Salary: €90k – €130k + Equity well above marketThis VC-backed startup has raised close to $5M from 2 top-tier investors to tackle one of the biggest untapped opportunities in tech: fully autonomous...
-
Founding Machine Learning Engineer
Vor 7 Tagen
Berlin, Berlin, Deutschland Few&Far VollzeitFounding Machine Learning Engineer (NLP) Onsite: 5 days per week, Berlin Base Salary: €100k – €150k + Equity well above marketBe part of a world-class Founding team redefining how businesses run their financesThis VC-backed startup has raised SEED close to $5M from 2 top-tier investors to tackle one of the biggest untapped opportunities in tech: fully...
-
Founding Machine Learning Engineer
vor 2 Wochen
Berlin, Berlin, Deutschland Bjak VollzeitTransform language models into real-world, high-impact product experiences.A1 is a self-funded AI group, operating in full stealth. We're building a new global consumer AI application focused on an important but underexplored use case.You will shape the core technical direction of A1 - model selection, training strategy, infrastructure, and long-term...
-
ML Engineer, Foundation Model
vor 17 Stunden
Berlin, Berlin, Deutschland Prior Labs VollzeitJoin Prior LabsWho We Are: Prior Labs is building breakthrough foundation models that understand spreadsheets and databases - the backbone of science and business. Foundation models have transformed text and images, but structured data has remained largely untouched. We're tackling this $100B+ opportunity to revolutionize how we approach scientific...
-
AI/ML Engineer
Vor 2 Tagen
Berlin, Berlin, Deutschland Melotech VollzeitWho we areMelotech is revolutionizing media and entertainment. We create art through technology for humans to enjoy. In just 18 months, our work has been heard, watched and loved for over 2 billion minutes worldwide.Founded by entrepreneur and investor Soheil Mirpour, we are backed by top VCs Cherry Ventures, Speedinvest and GFC, alongside world-class angels...
-
Founding Engineer
Vor 4 Tagen
Berlin, Berlin, Deutschland motus med VollzeitBerlin, onsite 4-5 days/week5+ years experience building and running production systems end-to-endAbout motus medmotus medis a digital health startup centered around an AI-powered medical video platform. Our first product is an effective tool for diagnosis and monitoring in epilepsy. Our work builds on many years of research and clinical experience within...
-
ML Engineer
vor 2 Wochen
Berlin, Berlin, Deutschland Triskel Consulting VollzeitOur client is a digital services provider operating within the iGaming field. As part of their growth and expansion they are now seeking to recruit an ML Engineer (LLM / Google Cloud) who will be responsible for training and Fine-tuning text models (LLMs), deploying them on Google Cloud, and building automation around these models. The core mission: take...
-
AI/ML Engineer Intern
Vor 2 Tagen
Berlin, Berlin, Deutschland Melotech VollzeitWho we areMelotech is revolutionizing media and entertainment. We create art through technology for humans to enjoy. In just 18 months, our work has been heard, watched and loved for over 2 billion minutes worldwide.Founded by entrepreneur and investor Soheil Mirpour, we are backed by top VCs Cherry Ventures, Speedinvest and GFC, alongside world-class angels...
-
Founding Engineer
Vor 4 Tagen
Berlin, Berlin, Deutschland motus med VollzeitSenior Full-Stack Engineer (React, React Native, ideally Kotlin)Berlin, onsite 5 days/week5+ years of experience shipping end-to-end productsHigh ownership, fast shipping, AI-first workflowFounder-track role with substantial equityAbout motus medmotus med is a digital health startup centered around an AI-powered medical video platform. Our first product is...
-
ML Engineer
vor 2 Wochen
Berlin, Berlin, Deutschland voize VollzeitWhy voize? Because we're more than just a jobAt voize, we believe the greatest gift to frontline workers is time - time to care, connect, and be present. Today, that time is lost to busywork and complex systems that pull them away from what matters most: people.Our vision is to change that by building AI companions that seamlessly take over digital...