Master Thesis efficient Data Preparation for Ml
vor 1 Woche
Location: Darmstadt (Germany)
Please note that this offer is an unpaid master thesis
In the research project KompAKI, we seek to unleash the power of machine learning (ML) algorithms to individuals, e.g., domain experts. To this end, we develop end-to-end automated and interactive machine learning pipelines. Such pipelines typically comprise various components, including data categorization, cleaning, wrangling, feature engineering, model training, and postprocessing. Bringing automaticity and interactivity to all these components broadly enables the novice users to build reliable and complex ML pipelines, even without having a deep technical background in this domain. Moreover, the users gain detailed explanations about the generated models along with several ways to guide the generation process, if necessary. As a result, the task of building ML pipelines in Software AG's products, e.g., Zementis and TrendMiner, will be highly simplified together with requiring much less time.
In general, artificial intelligence benefits from a wide variety of reliable data mostly originated from multiple sources. The quality of the data, i.e., the degree to which the data adheres to desirable quality and integrity constraints, can have a significant impact on the businesses themselves, the companies, or even in human lives. The existence of dirty data not only leads to erroneous decisions or unreliable analysis but probably causes a blow to the corporate economy. For instance, a recent study by Gartner showed that organizations believe poor data quality to be responsible for an average of $15 million per year in losses. As a consequence, there has been a surge of interest from both industry and academia on developing efficient and effective data cleaning methods. In this context, two main tasks have broadly been investigated, namely (1) error detection, where data inconsistencies such as duplicate data, integrity constraint violations, and incorrect or missing data values are identified, and (ii) data repairing, which involves updating the available data to remove any detected errors.
Considering ML pipelines, data cleaning represents a crucial component since it prevents the propagation of data errors to the data analysis step. As a result, data scientists typically spend the majority of their time on cleaning and organizing data. This fact emerges from the need to select the right data cleaning tools together with optimally configuring these tools. To relieve the burden of detecting and repairing heterogeneous error types, several efforts have been exerted to develop automated data cleaning methods. However, current automated methods still suffer from accuracy and scalability problems. Moreover, they hardly consider the requirements of common ML models, such as data relevancy and model fairness against data bias. In this MSc topic, we target designing and implementing an intelligent data cleaning method which exploits the context information and metadata of the dirty data to optimize the detecion accuracy and run-time while repairing large datasets.
YOUR TASKS
In particular, this study project mandates the following goals:
- Study of related work from the field of automated machine learning systems and data cleaning methods
- Design and implement a novel error detection and recognition method which maximizes the performance of machine learning models
- Evaluate the performance of the proposed method in terms of the detection accuracy and runtime
- Documentation of the results in a written report
YOUR PROFILE
- You are studying a MSc in the fields of Computer Science, Mathematics, or comparable.
- Good conceptual knowledge of machine learning models
- Good programming skills in Python and its ML-related libraries, e.g., Scikit-learn, TensorFlow, and Keras, is required, other programming languages such as Java is a plus
- Strong drive to learn new technologies and to deliver code in highest quality
- You have a high degree of creativity, resilience, reliability and team spirit
- Fluent English in spoken and written
WHAT YOU CAN EXPECT
- Targeted initial training
- Flat hierarchies
- Modern working environment
- Free drinks
- Open and constructive discussion culture
- Good internal entry and development opportunities after graduation
- The position is not remunerated, however you will receive the necessary hardware, such as a laptop and a monitor, as well as an access to our computing resources and the internal learning platforms
INTERESTED?
Your contact:
Tanja Topal, Manager HR Recruiting & Onboarding DACH, Phone +49 (0) 681 210 3105
- f/m/d - diversity matters
LI-TT1
-
Master Thesis
Vor 2 Tagen
Darmstadt, Deutschland Merck Group VollzeitWork Your Magic with us! Ready to explore, break barriers, and discover more? We know you’ve got big plans - so do we! Our colleagues across the globe love innovating with science and technology to enrich people’s lives with our solutions in Healthcare, Life Science, and Electronics. Together, we dream big and are passionate about caring for our rich...
-
Bachelor's or Master's thesis
vor 1 Woche
Darmstadt, Deutschland Fraunhofer VollzeitWHAT COUNTS FOR US IS THE IDEA -AND THE PEOPLE BEHIND IT. CHANGE STARTS WITH US.Bachelor's or Master's thesis – Development of a Robotic Probe System for Electrochemical Impedance Measurements (all genders) DarmstadtAs electric mobility continues to expand globally, sustainable recycling and second-life utilization of traction batteries are becoming...
-
Master Data Migration Lead
vor 2 Wochen
Darmstadt, Deutschland Merck KGaA Darmstadt Germany VollzeitA career at our company is an ongoing journey of discovery: our 60,300 people are shaping how the world lives, works and plays through next generation advancements in Healthcare, Life Science and Electronics. For more than 350 years and across the world we have passionately pursued our curiosity to find novel and vibrant ways of enhancing the lives of...
-
Darmstadt, Deutschland Computer Graphics Research VollzeitBiometrics is a rapidly growing technology that aims to identify or verify people's identities based on their physical or behavioural properties. Different aspects of biometric technology are active research fields. Enhancing the accuracy of biometric comparisons, securing the biometric templates, managing fast searches in biometric databases, and detecting...
-
Working Student Artwork
vor 1 Woche
Darmstadt, Deutschland Merck KGaA Darmstadt Germany VollzeitA career at our company is an ongoing journey of discovery: our 60,300 people are shaping how the world lives, works and plays through next generation advancements in Healthcare, Life Science and Electronics. For more than 350 years and across the world we have passionately pursued our curiosity to find novel and vibrant ways of enhancing the lives of...
-
Darmstadt, Deutschland Fraunhofer-Gesellschaft VollzeitThe "Experimental System Analysis" group deals with the analysis of mechanical systems on the basis of experimentally obtained measurement data. The systems can be laboratory set-ups, components on test benches or real systems during use. The first essential point of the analysis is the measurement of motion and state variables. The measurements are carried...
-
Bachelor's or Master's thesis
vor 1 Woche
Darmstadt, Deutschland Fraunhofer VollzeitWHAT COUNTS FOR US IS THE IDEA -AND THE PEOPLE BEHIND IT. CHANGE STARTS WITH US.Bachelor's or Master's thesis – EIS-based diagnosis of lithium-ion cells in automated disassembly (all genders) Darmstadt Here you create change With the increasing use of electric vehicles, the need for sustainable and efficient recycling processes for traction batteries...
-
Senior Bioinformatics Data Scientist
vor 2 Wochen
Darmstadt, Deutschland Merck KGaA VollzeitWork Your Magic with us! Ready to explore, break barriers, and discover more? We know you've got big plans '“ so do we! Our colleagues across the globe love innovating with science and technology to enrich people's lives with our solutions in Healthcare, Life Science, and Electronics. Together, we dream big and are passionate about caring for our rich mix...
-
Electrical Engineering and Information Technology
vor 3 Wochen
Darmstadt, Deutschland Hochschule Darmstadt VollzeitElectrical Engineering and Information Technology - International (Master of Science) - dualOur MissionWe offer a superior level of education in all key areas of modern electrical Engineering. Our ambition is to provide scientific skills and the ability to analyse, develop, and manage complex systems in the field of electrical engineering through the...
-
Electrical Engineering and Information Technology
vor 3 Wochen
Darmstadt, Deutschland Hochschule Darmstadt VollzeitElectrical Engineering and Information Technology - International (Master of Science) - dual Our Mission We offer a superior level of education in all key areas of modern electrical Engineering. Our ambition is to provide scientific skills and the ability to analyse, develop, and manage complex systems in the field of electrical engineering through the...