Sptlabtech-ai-drug-discovery-datasets-on-demand

Generating Bespoke Datasets for Foundation Models in AI Drug Discovery

The application of AI in drug discovery relies on high-quality, reproducible training datasets. Traditional screening campaigns focus on identifying potent hits, but ML-driven drug discovery requires comprehensive potency evaluation across entire compound libraries. Here, we introduce a partial concentration-response curve (pCRC) approach that estimates potency using just two data points per compound. We onboarded a panel of 65 diverse kinases and screened 7000 compounds against the panel at ATP concentrations near Kₘ to minimize modality bias, achieving a mean robust Z’ of 0.74 across all targets. A direct comparison of 100 fragments tested in both 2-point pCRC and conventional 11-point CRC formats demonstrated excellent correlation, confirming that our pCRC methodology produces high-quality data suitable for ML model training. The integration of our automation platform, including SPT Labtech’s dragonfly® discovery, with automated data pipelines enabled the generation of 221,000 high-quality ML-ready data points per day, accelerating the development of foundation model training for drug discovery.

Key learning objectives:

Recognize the importance of robust data for AI model training in drug discovery.
Understand how automation is critical for data quality and throughput to enable AI / ML model training in drug discovery
Learn about the partial concentration response curve (pCRC) approach and its benefits.
Explore how their automation platform speeds up the development of foundation model training in drug discovery.

sptlabtech-ai-drug-discovery-datasets-on-demand-thumbnail-img

Register Now

* Under no circumstances we will share or sell your email and contact information with any govt or private entity.

Generating Bespoke Datasets for Foundation Models in AI Drug Discovery

Key learning objectives:

Register Now

Speakers:

Jeeven Singh