AI-Driven Compound Optimization in the Large Molecule Space

An expert interview on AI-Driven Compound Optimization in the Large Molecule Space.
from
Vendela Jagdt

Dr. Gillian Hertlein is working as a strategic project manager at Merantix Momentum, specializing in AI strategy, particularly for life science and pharma projects. Being a molecular biologist by training and having worked in academic as well as industry R&D, she bridging the gap between AI to Life Sciences for over four years.

Dr. Jelena Ivanovska holding a Ph.D. in Biochemistry and Molecular Biology from FAU Erlangen-Nürnberg and Max-Delbrück Institute, specializes in oncology, working on proteins and signaling cascades involved in cancer. After several stops in academia and industry, she joined Exazyme as CSO and co-founder - a company at the forefront of AI-powered Protein Design.

Dr. Harry Sevi earned a Ph.D. in Machine Learning and Applied Mathematics from École normale supérieure de Lyon. As Head of Engineering at Exazyme, he bridges the gap between machine learning and protein engineering.

Vendela: Let's get started. Thank you all for making time for this interview. We're excited to have you share your insights and experiences. To provide some context, this interview is part of our series on practical applications of machine learning in the pharmaceutical industry. Today, we aim to delve into the complexities of large molecules and explore how AI is revolutionizing this field.

To start, for those who may not be familiar, could one of you provide a brief explanation of what large molecules are, particularly in the context of drug development, and their uses?

Jelena: Large molecules, also known as macromolecules, play a significant role in biopharmaceuticals. They are often therapeutic targets due to their complexity. In the case of Exazyme, we focus on proteins, which are demanding macromolecules. Proteins are made up of sequences composed of 20 different amino acids, and these sequences define both their structure and function. Large molecules like proteins are crucial in various applications, including enzymes, antibodies, cytokines, and signaling molecules, which are widely used in therapies for conditions like cancer and autoimmune diseases.

Vendela: Thank you for that explanation. It's clear that large molecules are diverse and have significant roles in drug development. They can act as targets but also act as drugs themselves. Speaking of drug optimization, what are the top three main goals when it comes to optimizing large molecules, particularly in the case of enzymes?

Jelena: The primary goals in optimizing large molecules, specifically enzymes, include improving their efficacy, ensuring their safety, and enhancing their stability. These are essential factors to consider when developing therapeutic treatments involving large molecules.

Vendela: One major leap forward in the optimization of large molecules without conducting experiments was Alpha Fold. Could one of you explain what AlphaFold is and highlight its uniqueness in the field of protein structure prediction?

Gillian: AlphaFold is a groundbreaking model developed by Google DeepMind for predicting the three-dimensional structure of proteins. The challenge with proteins is that they are highly complex molecules, and understanding their 3D structure is crucial for various applications, including drug development. Traditionally, obtaining the structure of a protein involved crystallizing the protein and using X-ray crystallography, a process that can be challenging and time-consuming. AlphaFold revolutionized this field by integrating available protein structure data and creating a deep neural network that can predict the 3D structure of proteins with remarkable accuracy. What makes AlphaFold unique is its ability to predict protein structures with high precision, making it a valuable tool for scientists and researchers in the field. Previously, such 3D models were handcrafted by leading experts in the field; now, they can be modeled by a much wider audience.

Vendela: Harry, could you share your perspective on the challenges and potential AI-based modeling solutions following the introduction of AlphaFold?

Harry: Absolutely. From a machine-learning perspective, AlphaFold has been a game-changer in predicting protein structures. However, it's essential to consider that the model relies on known protein structures, which can limit its accuracy when dealing with entirely novel proteins. Challenges lie in improving the representation of protein sequences and exploring ways to combine AlphaFold's structural predictions with other machine learning models focused on sequences. Models like ESM (Evolutionary Scale Modeling) have gained attention by using sequences alone to represent proteins and predict various properties, but they lack structural information. The future likely involves combining the strengths of both structural and sequence-based models to enhance predictions, particularly in functional aspects such as thermostability, solubility, and enzymatic activity.

Vendela: Thank you, Harry. Now taking a leap and coming to large molecule optimization. Jelena, could you provide more insights into the challenges and strategies involved in optimizing large molecules, particularly enzymes?

Jelena: Optimizing large molecules, especially enzymes, is a multifaceted process. One of the key challenges is understanding the complex interplay between different domains within a protein and how changes in the sequence can impact its function. Additionally, predicting how mutations or modifications will affect properties like catalytic activity, solubility, stability, and safety is challenging. To address these challenges, we often rely on high-throughput data, such as deep mutation scanning, to gather information on how different sequence variations impact protein function. Machine learning models, as Harry mentioned earlier, can be employed to analyze this data and guide optimization efforts. The ultimate goal is to fine-tune large molecules to achieve the desired properties for specific applications.

Vendela: Thank you, Jelena. Can you give us a brief overview of the current key challenges in traditional methods of optimizing large molecules as compounds?

Jelena: Traditional methods for optimizing large molecules, particularly enzymes, primarily involve two approaches. The first is directed evolution, which won a Nobel Prize. In this approach, random mutations are introduced into the amino acid sequence of the enzyme, creating a large library of variants. However, this method is not only expensive but also time-consuming due to the need to test a vast number of candidates. The second approach is rational design, where specific mutations are introduced based on the protein's sequence and, if known, its structure. This method requires domain knowledge and often involves experimentation. Additionally, computational tools like in silico modeling can be used to predict the impact of specific mutations. Both methods typically involve screening arrays of variants to identify improved properties.

Vendela: Could you explain how Exazyme's approach differs from these traditional methods and what advantages it offers?

Jelena: Exazyme’s approach offers several advantages over traditional methods. Our method is less expensive and time-consuming since we leverage existing sequences and measurements. We focus on exploring the sequence space more efficiently, meaning we don't require thousands of sequences to work with. In fact, we're striving to work with fewer sequences, making it more cost-effective. What's remarkable is that we can use both positive and negative data, including non-functional sequences, to train our models. Traditional methods often discard negative results, but for machine learning, every piece of data counts. This approach allows us to optimize multiple properties simultaneously, making it more versatile and efficient.

Harry: Additionally, our approach reduces the need for specialized equipment, making it more accessible to a broader range of researchers. We also aim to improve the incorporation of protein dynamics into our models, as this is crucial for understanding large molecules' behavior.

Vendela: That's fascinating, and it's great to see how Exazyme is addressing the challenges faced by traditional methods with a more efficient and data-driven approach. Now, shifting gears a bit, let's talk about protein and antibody design, areas that have seen substantial investments, including in generative AI applications. Gillian, what aspects of protein and antibody design make them particularly interesting for generative AI, and why are these areas attracting significant investments?

Gillian: Protein and antibody design are highly attractive for generative AI applications for several reasons. Firstly, these molecules play a critical role in the development of therapies, especially in treating diseases like cancer and autoimmune disorders. Secondly, they are often not designed but generated as an immune response to the respective target - limiting the outcome. Third, their complex structures are difficult to project in traditional rule-based algorithms, and last but not least, huge antibody libraries are available to train a foundation model.

Vendela: Harry, would you like to add any insights from a machine learning perspective on the challenges and future directions in generative AI for protein and antibody design?

Harry: From a machine learning standpoint, there's tremendous promise in generative AI for designing proteins and antibodies. However, we should remain cautious. The complexity of these molecules, especially proteins, requires a deep understanding of the fitness landscape. We need benchmark data to comprehensively understand epistasis and navigate the sequence space efficiently. While generative AI can produce intricate protein structures, functionality remains a challenge. It's an active area of research, but we're not there yet in terms of creating functional proteins consistently.

Vendela: Let's wrap up the interview by looking ahead. If we imagine ourselves one year ahead, in January 2025, what are your hopes or expectations for advancements in this field?

Jelena: From my perspective, I hope to see more data integration, especially multi-omics data, to improve our understanding of protein dynamics and structures. Incorporating diverse data sources can enhance the accuracy of our predictions and help us explore the sequence space more effectively.

Harry: Personally, I'd like to see advancements in understanding epistasis and the fitness landscape of proteins. Having benchmark data to navigate this complex landscape efficiently would be a significant step forward.

Gillian: I share the hope of more data. In particular, I am looking forward to advancements in Cryo-EM, which could provide more data on protein structures, especially dynamics. This would greatly benefit programs like AlphaFold and similar tools.

Vendela: Thank you all for sharing your thoughts on the future of this exciting field!

Subscribe to the Merantix Momentum Newsletter now.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

More articles

The latest industry news, interviews, technologies and resources.

Building Better Medicines: Exploring AI-Driven Compound Optimization

An expert interview on research into AI-driven drug optimization.

How do you make 30 years of data accessible?

In conversation with Jonas Münch, who developed a RAG chatbot at Bayer that combines 30 years of internal data and clinical results in one place.

Reliable and crisis-resistant inventory optimization

Revolutionizing supply chain optimization.

DNA-Encoded Libraries and AI for Compound Identification

An expert interview on the use of DNA-encoded libraries for connection detection.

How do you make 30 years of data accessible?

In conversation with Jonas Münch, who developed a RAG chatbot at Bayer that combines 30 years of internal data and clinical results in one place.

AI-Driven Compound Optimization in the Large Molecule Space

Dr. Gillian Hertlein is working as a strategic project manager at Merantix Momentum, specializing in AI strategy, particularly for life science and pharma projects. Being a molecular biologist by training and having worked in academic as well as industry R&D, she bridging the gap between AI to Life Sciences for over four years.

Dr. Jelena Ivanovska holding a Ph.D. in Biochemistry and Molecular Biology from FAU Erlangen-Nürnberg and Max-Delbrück Institute, specializes in oncology, working on proteins and signaling cascades involved in cancer. After several stops in academia and industry, she joined Exazyme as CSO and co-founder - a company at the forefront of AI-powered Protein Design.

Dr. Harry Sevi earned a Ph.D. in Machine Learning and Applied Mathematics from École normale supérieure de Lyon. As Head of Engineering at Exazyme, he bridges the gap between machine learning and protein engineering.

Vendela: Let's get started. Thank you all for making time for this interview. We're excited to have you share your insights and experiences. To provide some context, this interview is part of our series on practical applications of machine learning in the pharmaceutical industry. Today, we aim to delve into the complexities of large molecules and explore how AI is revolutionizing this field.

To start, for those who may not be familiar, could one of you provide a brief explanation of what large molecules are, particularly in the context of drug development, and their uses?

Jelena: Large molecules, also known as macromolecules, play a significant role in biopharmaceuticals. They are often therapeutic targets due to their complexity. In the case of Exazyme, we focus on proteins, which are demanding macromolecules. Proteins are made up of sequences composed of 20 different amino acids, and these sequences define both their structure and function. Large molecules like proteins are crucial in various applications, including enzymes, antibodies, cytokines, and signaling molecules, which are widely used in therapies for conditions like cancer and autoimmune diseases.

Vendela: Thank you for that explanation. It's clear that large molecules are diverse and have significant roles in drug development. They can act as targets but also act as drugs themselves. Speaking of drug optimization, what are the top three main goals when it comes to optimizing large molecules, particularly in the case of enzymes?

Jelena: The primary goals in optimizing large molecules, specifically enzymes, include improving their efficacy, ensuring their safety, and enhancing their stability. These are essential factors to consider when developing therapeutic treatments involving large molecules.

Vendela: One major leap forward in the optimization of large molecules without conducting experiments was Alpha Fold. Could one of you explain what AlphaFold is and highlight its uniqueness in the field of protein structure prediction?

Gillian: AlphaFold is a groundbreaking model developed by Google DeepMind for predicting the three-dimensional structure of proteins. The challenge with proteins is that they are highly complex molecules, and understanding their 3D structure is crucial for various applications, including drug development. Traditionally, obtaining the structure of a protein involved crystallizing the protein and using X-ray crystallography, a process that can be challenging and time-consuming. AlphaFold revolutionized this field by integrating available protein structure data and creating a deep neural network that can predict the 3D structure of proteins with remarkable accuracy. What makes AlphaFold unique is its ability to predict protein structures with high precision, making it a valuable tool for scientists and researchers in the field. Previously, such 3D models were handcrafted by leading experts in the field; now, they can be modeled by a much wider audience.

Vendela: Harry, could you share your perspective on the challenges and potential AI-based modeling solutions following the introduction of AlphaFold?

Harry: Absolutely. From a machine-learning perspective, AlphaFold has been a game-changer in predicting protein structures. However, it's essential to consider that the model relies on known protein structures, which can limit its accuracy when dealing with entirely novel proteins. Challenges lie in improving the representation of protein sequences and exploring ways to combine AlphaFold's structural predictions with other machine learning models focused on sequences. Models like ESM (Evolutionary Scale Modeling) have gained attention by using sequences alone to represent proteins and predict various properties, but they lack structural information. The future likely involves combining the strengths of both structural and sequence-based models to enhance predictions, particularly in functional aspects such as thermostability, solubility, and enzymatic activity.

Vendela: Thank you, Harry. Now taking a leap and coming to large molecule optimization. Jelena, could you provide more insights into the challenges and strategies involved in optimizing large molecules, particularly enzymes?

Jelena: Optimizing large molecules, especially enzymes, is a multifaceted process. One of the key challenges is understanding the complex interplay between different domains within a protein and how changes in the sequence can impact its function. Additionally, predicting how mutations or modifications will affect properties like catalytic activity, solubility, stability, and safety is challenging. To address these challenges, we often rely on high-throughput data, such as deep mutation scanning, to gather information on how different sequence variations impact protein function. Machine learning models, as Harry mentioned earlier, can be employed to analyze this data and guide optimization efforts. The ultimate goal is to fine-tune large molecules to achieve the desired properties for specific applications.

Vendela: Thank you, Jelena. Can you give us a brief overview of the current key challenges in traditional methods of optimizing large molecules as compounds?

Jelena: Traditional methods for optimizing large molecules, particularly enzymes, primarily involve two approaches. The first is directed evolution, which won a Nobel Prize. In this approach, random mutations are introduced into the amino acid sequence of the enzyme, creating a large library of variants. However, this method is not only expensive but also time-consuming due to the need to test a vast number of candidates. The second approach is rational design, where specific mutations are introduced based on the protein's sequence and, if known, its structure. This method requires domain knowledge and often involves experimentation. Additionally, computational tools like in silico modeling can be used to predict the impact of specific mutations. Both methods typically involve screening arrays of variants to identify improved properties.

Vendela: Could you explain how Exazyme's approach differs from these traditional methods and what advantages it offers?

Jelena: Exazyme’s approach offers several advantages over traditional methods. Our method is less expensive and time-consuming since we leverage existing sequences and measurements. We focus on exploring the sequence space more efficiently, meaning we don't require thousands of sequences to work with. In fact, we're striving to work with fewer sequences, making it more cost-effective. What's remarkable is that we can use both positive and negative data, including non-functional sequences, to train our models. Traditional methods often discard negative results, but for machine learning, every piece of data counts. This approach allows us to optimize multiple properties simultaneously, making it more versatile and efficient.

Harry: Additionally, our approach reduces the need for specialized equipment, making it more accessible to a broader range of researchers. We also aim to improve the incorporation of protein dynamics into our models, as this is crucial for understanding large molecules' behavior.

Vendela: That's fascinating, and it's great to see how Exazyme is addressing the challenges faced by traditional methods with a more efficient and data-driven approach. Now, shifting gears a bit, let's talk about protein and antibody design, areas that have seen substantial investments, including in generative AI applications. Gillian, what aspects of protein and antibody design make them particularly interesting for generative AI, and why are these areas attracting significant investments?

Gillian: Protein and antibody design are highly attractive for generative AI applications for several reasons. Firstly, these molecules play a critical role in the development of therapies, especially in treating diseases like cancer and autoimmune disorders. Secondly, they are often not designed but generated as an immune response to the respective target - limiting the outcome. Third, their complex structures are difficult to project in traditional rule-based algorithms, and last but not least, huge antibody libraries are available to train a foundation model.

Vendela: Harry, would you like to add any insights from a machine learning perspective on the challenges and future directions in generative AI for protein and antibody design?

Harry: From a machine learning standpoint, there's tremendous promise in generative AI for designing proteins and antibodies. However, we should remain cautious. The complexity of these molecules, especially proteins, requires a deep understanding of the fitness landscape. We need benchmark data to comprehensively understand epistasis and navigate the sequence space efficiently. While generative AI can produce intricate protein structures, functionality remains a challenge. It's an active area of research, but we're not there yet in terms of creating functional proteins consistently.

Vendela: Let's wrap up the interview by looking ahead. If we imagine ourselves one year ahead, in January 2025, what are your hopes or expectations for advancements in this field?

Jelena: From my perspective, I hope to see more data integration, especially multi-omics data, to improve our understanding of protein dynamics and structures. Incorporating diverse data sources can enhance the accuracy of our predictions and help us explore the sequence space more effectively.

Harry: Personally, I'd like to see advancements in understanding epistasis and the fitness landscape of proteins. Having benchmark data to navigate this complex landscape efficiently would be a significant step forward.

Gillian: I share the hope of more data. In particular, I am looking forward to advancements in Cryo-EM, which could provide more data on protein structures, especially dynamics. This would greatly benefit programs like AlphaFold and similar tools.

Vendela: Thank you all for sharing your thoughts on the future of this exciting field!

Oops! Something has gone wrong.
Oops! Something has gone wrong.
Oops! Something has gone wrong.
Oops! Something has gone wrong.
Oops! Something has gone wrong.

Discover more whitepapers

Data-driven to the drug of tomorrow

Opportunities and barriers of AI in a GxP world.

Leveraging the EU AI Act to your advantage

Using the EU AI Act to your advantage

Towards Tabular Foundation Models

About the status quo, challenges and opportunities

Towards Tabular Foundation Models

About the status quo, challenges and opportunities

Data-driven to the drug of tomorrow

Opportunities and barriers of AI in a GxP world.

Leveraging the EU AI Act to your advantage

Using the EU AI Act to your advantage
We would like to get to know you!

Start your AI journey with us now

Subscribe now to the Merantix Momentum Newsletter.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.