Precious R&D data wants to become valuable
The BioData and BioTechX community in Basel, Switzerland, is a hotspot for on-site biotechnology innovation among many big pharma companies. Time to review the 2023 edition and our personal experiences with a focus on data and its value.
Setting the scene: The data flood 🌊
Data in scientific research is incredibly complex. It ranges from Omics data sets, including genomics and epigenomics, to proteomics and single-cell/nuclei analyses – so-called Multiomics. Naturally, they are available across species. Text-based literature and various imaging modalities further contribute to the data complexity.
All this data holds an often untapped value that is waiting to be realized.
Data generation at pace ⚙️
In recent years, significant efforts have been made in automation to ensure continuous data generation and higher sample throughput at the molecular level. With it came substantial improvements in quality control, protocols and sample preparation. While some major players in the field have effectively addressed these challenges, others are only now recognising the need for immediate action.
To generate data at high speed, a coordinated approach is needed, from laboratory hardware and software to data and workflow management.
From data to insights: Connecting the dots 🕸
Various tools, many of which are presented at Biotech conferences, aim to combine new datasets with existing knowledge to provide a strong foundation for scientific data interpretation.
Worth mentioning this year are knowledge graphs. They are increasingly being used to help data experts model and store complex (meta) data and patient information as an aid in the consolidation of insights. These graphs have different underlying data models to connect the isolated “siloed” data sets and information. Having multiple graphs within a company is becoming standard, while data transfer between graphs is often necessary.
While knowledge graphs are useful in various sectors, in Biotech R&D they are often applied at the use case level.
The 2023 AI = Language Model hype 🚀
The rise of AI for text and image analysis, like LLMs and CNNs, has fueled hopes for improving early R&D outcomes too. However, directly applying AI to biological data has not met the high expectations – yet.
Deep learning models like LLMs and CNNs excel at text/image modalities, but come with limitations when it comes to highly complex, often unstructured molecular data (with the exception of genomics, structural biology and chemoinformatics data).
The complexity of biological data poses a fundamental challenge as benchmark datasets are often not available. Just imagine that proteins – the potential drug targets – may act as pro-inflammatory at times and anti-inflammatory at others. Our cells know exactly how to react to external stimuli and environmental changes, but it’s hard for us to recapitulate cellular decision making in a generic or even generative manner.
Can we win the race? 🏆
The key to success with any given technology will be to deliver business value.
Nothing more and nothing less.
We can ask ourselves: How does my R&D solution create real business value? Genuine success will stem from effective treatments. Keep this criterion in mind when determining your goals and/or developing your algorithms.
It’s not enough anymore to rely just on a small data set to take a (early R&D) decision. We now have ample data to evaluate, which decision is more likely to be successful. Therefore, use your data wisely in the context of public data and be prepared to fail early if things aren’t working.