Drug discovery using advanced computational tools such as machine learning has succeeded in reducing about 40% and 60% of the time and costs required by conventional drug discovery pipelines respectively. In this study we aim at building a combinatorial library of anthraquinone and chalcone derivative and producing a workflow of different screening and scoring methodologies to find hits against cancerrelated proteins, and examine them using molecular dynamic and mechanics simulations. A combinatorial library, consisting of virtual compounds, was synthesized using 20 anthraquinone and 24 chalcone core structures via R-group enumeration methodology. The resulting compounds were optimized to the near drug-likeness properties and the physicochemical descriptors were calculated for all datasets and compared with commercially available databases such as FDA, Non-FDA, and natural products (NPs) datasets from ZINC 15. A workflow of a novel virtual screening and scoring methods was optimized based on the nature of the protein target. As a result; the optimized enumeration resulted in 1,610,268 compounds with NP-Likeness, and synthetic feasibility mean scores close to FDA, Non-FDA, and NPs datasets. The cheminformatic analysis illustrated an overlap between the chemical space of the generated library was more prominent with NPs with the lowest molecular diversity compared with other natural and synthetic drugs databases. Moreover, the consensus scoring methodology that we produced was based on quantitative structure-activity relationship, pharmacophore fitness, shape similarity, and docking scores. The optimized virtual screening for the protein targets was found to be beneficial in the retrospective enrichment studies, as it prioritized true positives in high percentage (ROC curve > 0.9). Compared to all other conventional screening methods individually, consensus scoring outperformed them. It was also found that this method of multistage virtual screening overcome challenges in the training set such as limited number of data points and limited diversity of activity. In molecular mechanic simulations, the range of activity of the experimental datasets plays a crucial role in the nature of the correlation between experimental activity values and binding free energy obtained by MM/GBSA calculations. In conclusion, consensus scoring using z-score fusion method is a beneficial way of virtual screening especially when the training dataset is imbalanced.
Cheminformatics virtual screening machine learning drug design consensus scoring
Birincil Dil | İngilizce |
---|---|
Konular | Eczacılık ve İlaç Bilimleri (Diğer) |
Bölüm | Reviews |
Yazarlar | |
Yayımlanma Tarihi | 28 Haziran 2025 |
Yayımlandığı Sayı | Yıl 2023 Cilt: 27 Sayı: Current Research Topıcs In Pharmacy: Pharmacology Debates |