How do you make 30 years of data accessible?

In conversation with Jonas Münch, who developed a RAG chatbot at Bayer that combines 30 years of internal data and clinical results in one place.

Bertram Weiss

In conversation with Jonas Münch, who has developed a RAG (Retrieval Augmented Generation) chatbot at Bayer that combines 30 years of internal data and clinical results in one place.

At our latest event, we welcomed Jonas Münch, who, along with his team, developed PRINCE (Preclinical Information Center). PRINCE is a data platform featuring an AI-driven chatbot custom-built for Bayer. It functions as a researcher, writer, and reviewer agent with access to roughly three decades of internal data. He talked about how Bayer utilizes a range of digital assistants, spanning from machine learning tools to generative AI solutions. Jonas shared fascinating insights on the development of the chatbot, the challenges he and his team encountered, and the key lessons learned. For those who missed it, here’s a brief interview capturing the highlights of his talk.

Can you briefly introduce yourself?
As the Head of Pharmacology & Safety IT at Bayer, I oversee the digitalization of the preclinical area within our Pharma division. Leading a team of internal and external software engineers and product managers, we develop highly specialized software products that enhance our scientists' daily work.

What typical challenges do you face in your work?
A common challenge we encounter is the access and availability of our in-house data. In Pharma, we invest billions of euros to generate data through research; however, we still need to improve at reusing this valuable information to derive insights beyond the initial intended use case.

How did you approach this problem with AI compared to other approaches?
In our current initiative to enhance data findability and accessibility by breaking down data silos and creating a new data platform, we soon identified a challenge in extracting insights from older, unstructured data. The primary issue is that many past study reports lack essential annotations and are often only available as scanned PDF documents. By employing automated content extraction and leveraging large language models, we are now positioned to make all our data accessible to anyone in the company—something that was previously difficult to achieve with traditional search algorithms.

What does the solution look like and what impact does it have?
With our solution, we now provide our scientists with an intuitive interface for querying in-house data from the past 30 years. Like a chatbot, users can not only locate existing information but also allow our knowledge engine to interpret results and conduct complex comparisons and integrations directly within an intuitive user interface..

What did you learn during the development of this AI-driven solution?
We discovered that building a working RAG-based prototype is relatively straightforward and can be completed quickly. However, achieving a state where it consistently delivers accurate and reliable results is a challenging task that requires extensive customization, with end-users continuously involved in the process. Consistency and ongoing improvement are crucial in building trust with our user base, which is a prerequisite for the success of our preclinical knowledge engine.

What is the next challenge you want to tackle with AI?
Our next objective is to evolve from merely querying our data to generating something new. In our vision for the next phase of our system, we aim to develop a writing assistant for standardized documents. This would save our users significant time and ultimately improve quality. We already have a working prototype for one compliance-relevant document and are currently collaborating with our users to make it practical for daily use.

The Bottom Line

This real-world example of AI in Action demonstrates the immense value of leveraging existing data to unlock new potential. By effectively integrating AI solutions into daily workflows, organizations can significantly boost productivity and efficiency. Our key takeaways:

Data Accessibility is Critical: One of Bayer's biggest challenges was making 30 years of unstructured, often inaccessible data usable. Through automated content extraction and the use of large language models, Bayer turned this data into a valuable resource for its scientists.
AI Enhances Productivity: By developing PRINCE, an AI-powered chatbot, Bayer has given its scientists the ability to query internal data easily, interpret results, and even conduct complex analyses within a user-friendly interface, streamlining their research processes.
Customization and User Involvement are Essential: Building a functioning RAG chatbot was just the start. Continuous customization and involving end-users in the process have been crucial in ensuring the tool delivers accurate, reliable results and builds trust with the user base.

Subscribe to the Merantix Momentum Newsletter now.

The latest industry news, interviews, technologies and resources.

All articles

"AI in Market Access: Do We Have What It Takes?"

Interview with Menasheh Fogel - On AI in Market Access, Organizational Challenges, and the Future of Data-Driven Pricing and Reimbursement Strategies.

A Deep Dive into Tabular In-Context Learning

ICL Models vs. Boosting for Tabular Data, Where New Transformer Approaches Excel and Where Traditional ML Methods Still Dominate.

Article

Harnessing the Potential of AI in Manufacturing: A Consortium Approach

Bridging the gap between AI potential and measurable impact on the shop floor

Article

Transforming complex AI challenges into real impact: The intelligent healthcare system for the next decade

How AI, collaboration, and secure data infrastructures are redefining research, diagnostics, and care, paving the way for the intelligent healthcare system of the next decade.

Article

Can a diverse team develop better AI? The answer is a resounding yes.

Innovation in AI starts with the people who shape it. At Merantix Momentum, we believe that diverse teams develop fairer, more effective AI - and turn inclusion into a competitive advantage.

How do you make 30 years of data accessible?

In conversation with Jonas Münch, who has developed a RAG (Retrieval Augmented Generation) chatbot at Bayer that combines 30 years of internal data and clinical results in one place.

The Bottom Line

Data Accessibility is Critical: One of Bayer's biggest challenges was making 30 years of unstructured, often inaccessible data usable. Through automated content extraction and the use of large language models, Bayer turned this data into a valuable resource for its scientists.
AI Enhances Productivity: By developing PRINCE, an AI-powered chatbot, Bayer has given its scientists the ability to query internal data easily, interpret results, and even conduct complex analyses within a user-friendly interface, streamlining their research processes.
Customization and User Involvement are Essential: Building a functioning RAG chatbot was just the start. Continuous customization and involving end-users in the process have been crucial in ensuring the tool delivers accurate, reliable results and builds trust with the user base.