-
化合物的属性预测在药物研发、材料设计、毒理学研究等领域发挥了重要的作用,与人类生活息息相关[1-2]. 化合物属性预测的相关研究可追溯到药物合成的早期研究,当时主要是化学家通过重复实验,进行测试和验证并获取各类化学信息,合成目标分子[3]. 由于重复实验耗时长、成本高,科学家基于构效关系(SAR)发展出了定量-构效关系,为化合物结构与其性质之间建立了数学关系框架. 1962年,Hansch等首次实践了定量-构效关系(QSAR),成为该领域具有里程碑意义的事件,也是化合物属性预测研究迈入新阶段的标志[4]. 随后,Hansch在1964年提出了Hansch方程,这个发现为QSAR模型运行提供了一种新方法. 但传统QSAR模型一般使用一些常见的分子描述符来预测化合物属性,然而化合物结构多样,少量的分子描述符很难全面地描述化合物的结构信息,这使得模型很难精准预测化合物性质. 同时,随着研究数据集增大、描述符增多,传统的方法难以拟合化学结构与性质之间的复杂关系. 因此,需要比传统统计工具更先进、更强大的计算和数据分析方法.
机器学习(特别是深度学习),由于其强大的计算和数据分析能力,已被用于解决以上QSAR研究中的问题. 例如,研究人员通过机器学习或深度学习方法将三维甚至更高维分子结构与其属性联系起来,弥补了传统的化合物属性预测方法的不足之处,大力推动了化合物属性研究的发展[5-6].
近年来,机器学习在化合物属性的预测研究上表现出不俗的潜力,因此这方面的研究也逐年增多. 比如在理化性质方面,在机器学习的帮助下,预测分子的原子化能、振动频率、溶剂化自由能、计算键离能等,成本更低,结果准确可靠,计算速度更快[7-11];在生物活性方面,建模方面逐步引入了神经网络算法、分子图等,所构建模型性能更优异,结果可靠[12-14];在毒性方面,根据机器学习建立的模型可以非常有效地识别有毒分子和预测特定毒性,可筛选确认之前未曾识别出的危险化学品[15-17]. 本文主要介绍机器学习在化合物属性预测方面的应用过程及相应的模块内容,并结合应用实例总结和展望机器学习在该应用方面现存的问题和机遇.
机器学习在化合物属性预测中的应用
Applications of machine learning on compound property prediction
-
摘要: 化合物的属性预测是药物研发、毒理学研究、环境行为预测等工作的核心任务. 目前,人工合成的化学物质层出不穷,相关的实验研究数据在持续扩充,但实验研究数据远无法赶超新型化学物质的研发速度. 近年来,机器学习算法及模型在化合物属性预测方面展现了独特的优势和巨大的潜力,尤其在实验数据匮乏的情况下,提供了可靠的模型预测数据. 本文介绍了机器学习应用于化合物属性预测的主要流程步骤和相应的模块的内容,涵盖数据集、分子描述方法、模型性能评估指标和评估方法等. 同时,本文系统总结了机器学习方法在化合物物理化学性质预测、生物活性预测和毒性预测方面的应用实例,并从数据集、分子特征化、模型解释等方面分析并讨论了相关研究工作现存问题与未来挑战.Abstract: Compounds property prediction is an essential task in drug development, toxicology, and environmental behavior prediction. Along with an increasing number of synthetic chemicals, the corresponding experimental research data are expanding. However, the experimental data are still far away from rapid invention of novel chemicals. In recent years, machine learning algorithms and models have shown advantages and great potential in compound property prediction, especially in case of lacking experimental data, providing reliable model-predicted data. Our study outlines the main procedures and corresponding modules related to applications of machine learning tools for compound property prediction, specifically including datasets, molecular description methods, model performance evaluation metrics, and methods. Furthermore, this work systematically summarizes progress and advances in compound property prediction based on machine learning approaches, and also introduces specific examples on compounds predictions of physical and chemical properties, bioactivity, and toxicity. To end, the existing problems and challenges are discussed based on data sets, molecular characterization, and model outcome interpretation.
-
Key words:
- machine learning /
- compound property /
- molecular structure /
- model prediction.
-
表 1 常见的公开数据库
Table 1. Common public databases
数据库
Database简要介绍
Brief introduction参考文献
ReferencesPubChem 有机小分子生物活性数据库,包含3个子数据库PubChem BioAssay ,PubChem Compound,PubChem Substance. [18] Chemspider 小分子信息整合数据库,包含了分子简介、实验测定和实时估算的理化性质、毒性、Smile字符串等信息. [19] GDB-13 有机小分子数据库,含有多达13个C、N、O、S和Cl原子的小有机分子. [20] GDB-17 有机小分子数据库,含有1664亿个分子,组成中多达17个C、N、O、S和卤素等原子. [21] FreeSolv 提供实验和计算的水合自由能的数据库. [22] ZINC 可以进行虚拟筛选的数据库,汇总了化合物的购买信息和来自其他数据库的注释化合物. [23-25] ChEMBL 一个大型的生物活性数据库,数据主要来源于文献提取和PCBA数据库. [26] DrugBank 综合性药物数据库,包含有关药物的机制、相互作用和靶标的全面分子信息. [27] ToxCast 来自EPA开展的一个利用高通量筛选方法和计算毒理学方法预测化合物毒性并进行优先级排序的研究项目,拥有约1800种化学品的数据. [28] PDBbind 提供生物分子复合物的结合亲和力数据. [29] BindingDB 收集了生物活性数据注释的小分子. [30] -
[1] MARTIN Y C, KOFRON J L, TRAPHAGEN L M. Do structurally similar molecules have similar biological activity? [J]. Journal of Medicinal Chemistry, 2002, 45(19): 4350-4358. doi: 10.1021/jm020155c [2] PANDEY S, QU J X, STEVANOVIĆ V, et al. Predicting energy and stability of known and hypothetical crystals using graph neural network [J]. Patterns, 2021, 2(11): 100361. doi: 10.1016/j.patter.2021.100361 [3] WALTERS W P, BARZILAY R. Applications of deep learning in molecule generation and molecular property prediction [J]. Accounts of Chemical Research, 2021, 54(2): 263-270. doi: 10.1021/acs.accounts.0c00699 [4] HANSCH C, MALONEY P P, FUJITA T, et al. Correlation of biological activity of phenoxyacetic acids with Hammett substituent constants and partition coefficients [J]. Nature, 1962, 194(4824): 178-180. [5] CHERKASOV A, MURATOV E N, FOURCHES D, et al. QSAR modeling: Where have you been? Where are you going to? [J]. Journal of Medicinal Chemistry, 2014, 57(12): 4977-5010. doi: 10.1021/jm4004285 [6] ZHONG S F, HU J J, YU X, et al. Molecular image-convolutional neural network (CNN) assisted QSAR models for predicting contaminant reactivity toward OH radicals: Transfer learning, data augmentation and model interpretation [J]. Chemical Engineering Journal, 2021, 408: 127998. doi: 10.1016/j.cej.2020.127998 [7] GILMER J, SCHOENHOLZ S S, RILEY P F, et al. Neural message passing for Quantum chemistry[C]//Proceedings of the 34th International Conference on Machine Learning - Volume 70. August 6-11, 2017, Sydney, NSW, Australia. New York: ACM, 2017: 1263–1272. [8] YANG K, SWANSON K, JIN W G, et al. Analyzing learned molecular representations for property prediction [J]. Journal of Chemical Information and Modeling, 2019, 59(8): 3370-3388. doi: 10.1021/acs.jcim.9b00237 [9] WEINREICH J, BROWNING N J, von LILIENFELD O A. Machine learning of free energies in chemical compound space using ensemble representations: Reaching experimental uncertainty for solvation [J]. The Journal of Chemical Physics, 2021, 154(13): 134113. doi: 10.1063/5.0041548 [10] ZHANG D D, XIA S, ZHANG Y K. Accurate prediction of aqueous free solvation energies using 3D atomic feature-based graph neural network with transfer learning [J]. Journal of Chemical Information and Modeling, 2022, 62(8): 1840-1848. doi: 10.1021/acs.jcim.2c00260 [11] RAZA A, BARDHAN S, XU L H, et al. A machine learning approach for predicting defluorination of per- and polyfluoroalkyl substances (PFAS) for their efficient treatment and removal [J]. Environmental Science & Technology Letters, 2019, 6(10): 624-629. [12] WALLACH I, DZAMBA M, HEIFETS A. AtomNet: A deep convolutional neural network for bioactivity prediction in structure-based drug discovery [J]. Mathematische Zeitschrift, 2015, 47(1): 34-46. [13] CHENG W X, NG C A. Using machine learning to classify bioactivity for 3486 per- and polyfluoroalkyl substances (PFASs) from the OECD list [J]. Environmental Science & Technology, 2019, 53(23): 13970-13980. [14] BERTONI M, DURAN-FRIGOLA M, BADIA-I-MOMPEL P, et al. Bioactivity descriptors for uncharacterized chemical compounds [J]. Nature Communications, 2021, 12(1): 3932. doi: 10.1038/s41467-021-24150-4 [15] MUKHERJEE A, SU A, RAJAN K. Deep learning model for identifying critical structural motifs in potential endocrine disruptors [J]. Journal of Chemical Information and Modeling, 2021, 61(5): 2187-2197. doi: 10.1021/acs.jcim.0c01409 [16] SUN X F, ZHANG X M, MUIR D C G, et al. Identification of potential PBT/POP-like chemicals by a deep learning approach based on 2D structural features [J]. Environmental Science & Technology, 2020, 54(13): 8221-8231. [17] WANG H B, WANG Z Y, CHEN J W, et al. Graph attention network model with defined applicability domains for screening PBT chemicals [J]. Environmental Science & Technology, 2022, 56(10): 6774-6785. [18] KIM S, CHEN J, CHENG T J, et al. PubChem in 2021: New data content and improved web interfaces [J]. Nucleic Acids Research, 2021, 49(D1): D1388-D1395. doi: 10.1093/nar/gkaa971 [19] PENCE H E, WILLIAMS A. ChemSpider: An online chemical information resource [J]. Journal of Chemical Education, 2010, 87(11): 1123-1124. doi: 10.1021/ed100697w [20] BLUM L C, REYMOND J L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13 [J]. Journal of the American Chemical Society, 2009, 131(25): 8732-8733. doi: 10.1021/ja902302h [21] RUDDIGKEIT L, van DEURSEN R, BLUM L C, et al. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17 [J]. Journal of Chemical Information and Modeling, 2012, 52(11): 2864-2875. doi: 10.1021/ci300415d [22] MOBLEY D L, GUTHRIE J P. FreeSolv: A database of experimental and calculated hydration free energies, with input files [J]. Journal of Computer-Aided Molecular Design, 2014, 28(7): 711-720. doi: 10.1007/s10822-014-9747-x [23] IRWIN J J, STERLING T, MYSINGER M M, et al. ZINC: A free tool to discover chemistry for biology [J]. Journal of Chemical Information and Modeling, 2012, 52(7): 1757-1768. doi: 10.1021/ci3001277 [24] IRWIN J J, TANG K G, YOUNG J, et al. ZINC20-a free ultralarge-scale chemical database for ligand discovery [J]. Journal of Chemical Information and Modeling, 2020, 60(12): 6065-6073. doi: 10.1021/acs.jcim.0c00675 [25] IRWIN J J, SHOICHET B K. ZINC: A free database of commercially available compounds for virtual screening [J]. Journal of Chemical Information and Modeling, 2005, 45(1): 177-182. doi: 10.1021/ci049714+ [26] BENTO A P, GAULTON A, HERSEY A, et al. The ChEMBL bioactivity database: An update [J]. Nucleic Acids Research, 2014, 42(D1): D1083-D1090. doi: 10.1093/nar/gkt1031 [27] WISHART D S, FEUNANG Y D, GUO A C, et al. DrugBank 5.0: A major update to the DrugBank database for 2018 [J]. Nucleic Acids Research, 2018, 46(D1): D1074-D1082. doi: 10.1093/nar/gkx1037 [28] DIX D J, HOUCK K A, MARTIN M T, et al. The ToxCast program for prioritizing toxicity testing of environmental chemicals [J]. Toxicological Sciences, 2007, 95(1): 5-12. doi: 10.1093/toxsci/kfl103 [29] LIU Z H, LI Y, HAN L, et al. PDB-wide collection of binding data: Current status of the PDBbind database [J]. Bioinformatics, 2015, 31(3): 405-412. doi: 10.1093/bioinformatics/btu626 [30] GILSON M K, LIU T Q, BAITALUK M, et al. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology [J]. Nucleic Acids Research, 2016, 44(D1): D1045-D1053. doi: 10.1093/nar/gkv1072 [31] SHEN W X, ZENG X, ZHU F, et al. Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations [J]. Nature Machine Intelligence, 2021, 3(4): 334-343. doi: 10.1038/s42256-021-00301-6 [32] WANG L G, ZHAO L, LIU X, et al. SepPCNET: Deeping learning on a 3D surface electrostatic potential point cloud for enhanced toxicity classification and its application to suspected environmental estrogens [J]. Environmental Science & Technology, 2021, 55(14): 9958-9967. [33] TODESCHINI R, CONSONNI V. Molecular Descriptors for Chemoinformatics[M]. Wiley-VCH, 2009. [34] GRISONI F, CONSONNI V, TODESCHINI R. Impact of molecular descriptors on computational models [J]. Methods in Molecular Biology (Clifton, N. J. ), 2018, 1825: 171-209. [35] 吴萍, 孔德信. 分子相似性与MOLPRINT 2D的本地化 [J]. 计算机与应用化学, 2008, 25(4): 505-508. doi: 10.3969/j.issn.1001-4160.2008.04.027 WU P, KONG D X. Molecular similarity and localization of MOLPRINT 2D [J]. Computers and Applied Chemistry, 2008, 25(4): 505-508(in Chinese). doi: 10.3969/j.issn.1001-4160.2008.04.027
[36] KHAN A U. Descriptors and their selection methods in QSAR analysis: Paradigm for drug design [J]. Drug Discovery Today, 2016, 21(8): 1291-1302. doi: 10.1016/j.drudis.2016.06.013 [37] CERETO-MASSAGUÉ A, OJEDA M J, VALLS C, et al. Molecular fingerprint similarity search in virtual screening [J]. Methods, 2015, 71: 58-63. doi: 10.1016/j.ymeth.2014.08.005 [38] DURANT J L, LELAND B A, HENRY D R, et al. Reoptimization of MDL keys for use in drug discovery [J]. Journal of Chemical Information and Computer Sciences, 2002, 42(6): 1273-1280. doi: 10.1021/ci010132r [39] ROGERS D, HAHN M. Extended-connectivity fingerprints [J]. Journal of Chemical Information and Modeling, 2010, 50(5): 742-754. doi: 10.1021/ci100050t [40] BENDER A, MUSSA H Y, GLEN R C, et al. Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): Evaluation of performance [J]. Journal of Chemical Information and Computer Sciences, 2004, 44(5): 1708-1718. doi: 10.1021/ci0498719 [41] MAURI A. alvaDesc: A Tool to Calculate and Analyze Molecular Descriptors and Fingerprints[M]// Ecotoxicological QSARs. New York: Humana, 2020: 801-820. [42] O'BOYLE N M, BANCK M, JAMES C A, et al. Open Babel: An open chemical toolbox [J]. Journal of Cheminformatics, 2011, 3: 33. doi: 10.1186/1758-2946-3-33 [43] STEINBECK C, HAN Y Q, KUHN S, et al. The Chemistry Development Kit (CDK): An open-source Java library for Chemo- and Bioinformatics [J]. Journal of Chemical Information and Computer Sciences, 2003, 43(2): 493-500. doi: 10.1021/ci025584y [44] XIONG Z P, WANG D Y, LIU X H, et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism [J]. Journal of Medicinal Chemistry, 2020, 63(16): 8749-8760. doi: 10.1021/acs.jmedchem.9b00959 [45] WEININGER D. SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules [J]. Journal of Chemical Information & Computer Sciences, 1988, 28(1): 31-36. [46] SANCHES-NETO F O, DIAS-SILVA J R, KENG QUEIROZ L H Jr, et al. “pySiRC”: Machine learning combined with molecular fingerprints to predict the reaction rate constant of the radical-based oxidation processes of aqueous organic contaminants [J]. Environmental Science & Technology, 2021, 55(18): 12437-12448. [47] ZHONG S F, ZHANG K, WANG D, et al. Shedding light on “Black Box” machine learning models for predicting the reactivity of HO radicals toward organic compounds [J]. Chemical Engineering Journal, 2021, 405: 126627. doi: 10.1016/j.cej.2020.126627 [48] ZHONG S F, HU J J, FAN X D, et al. A deep neural network combined with molecular fingerprints (DNN-MF) to develop predictive models for hydroxyl radical rate constants of water contaminants [J]. Journal of Hazardous Materials, 2020, 383: 121141. doi: 10.1016/j.jhazmat.2019.121141 [49] HELLER S R, McNAUGHT A, PLETNEV I, et al. InChI, the IUPAC international chemical identifier [J]. Journal of Cheminformatics, 2015, 7: 23. doi: 10.1186/s13321-015-0068-4 [50] GOH G B, SIEGEL C, VISHNU A, et al. Chemception: A deep neural network with minimal chemistry knowledge matches the performance of expert-developed QSAR/QSPR models[J]. ArXiv, 2017, abs/1706.06689 [51] GOH G B, SIEGEL C, VISHNU A, et al. ChemNet: A transferable and generalizable deep neural network for small-molecule property prediction[J]. ArXiv, 2017, abs/1712.02734 [52] KORKMAZ S. Deep learning-based imbalanced data classification for drug discovery [J]. Journal of Chemical Information and Modeling, 2020, 60(9): 4180-4190. doi: 10.1021/acs.jcim.9b01162 [53] WU Z Q, RAMSUNDAR B, FEINBERG E N, et al. MoleculeNet: A benchmark for molecular machine learning [J]. Chemical Science, 2017, 9(2): 513-530. [54] 徐玲玲, 迟冬祥. 面向不平衡数据集的机器学习分类策略 [J]. 计算机工程与应用, 2020, 56(24): 12-27. doi: 10.3778/j.issn.1002-8331.2007-0120 XU L L, CHI D X. Machine learning classification strategy for imbalanced data sets [J]. Computer Engineering and Applications, 2020, 56(24): 12-27(in Chinese). doi: 10.3778/j.issn.1002-8331.2007-0120
[55] NOBLE W S. What is a support vector machine? [J]. Nature Biotechnology, 2006, 24(12): 1565-1567. doi: 10.1038/nbt1206-1565 [56] QUINLAN J R. Induction of decision trees [J]. Machine Learning, 1986, 1(1): 81-106. [57] BREIMAN L. Random forests [J]. Machine Learning, 2001, 45(1): 5-32. doi: 10.1023/A:1010933404324 [58] SUTTON R S, BARTO A G. Reinforcement learning: An introduction [J]. IEEE Transactions on Neural Networks, 1998, 9(5): 1054. [59] MITCHELL J B O. Machine learning methods in chemoinformatics [J]. Wiley Interdisciplinary Reviews. Computational Molecular Science, 2014, 4(5): 468-481. doi: 10.1002/wcms.1183 [60] GRAMATICA P. Principles of QSAR models validation: Internal and external [J]. QSAR & Combinatorial Science, 2007, 26(5): 694-701. [61] KAR S, ROY K, LESZCZYNSKI J. Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR Modeling[M]// Computational Toxicology, Part of the Methods in Molecular Biology book series. New York: Humana Press, 2018: 141-169. [62] 王中钰, 陈景文, 傅志强, 等. QSAR模型应用域的表征方法 [J]. 科学通报, 2022, 67(3): 255-266. doi: 10.1360/TB-2021-0406 WANG Z Y, CHEN J W, FU Z Q, et al. Characterization of applicability domains for QSAR models [J]. Chinese Science Bulletin, 2022, 67(3): 255-266(in Chinese). doi: 10.1360/TB-2021-0406
[63] WANG Z Y, CHEN J W, HONG H X. Developing QSAR models with defined applicability domains on PPARγ binding affinity using large data sets and machine learning algorithms [J]. Environmental Science & Technology, 2021, 55(10): 6857-6866. [64] BERENGER F, YAMANISHI Y. A distance-based Boolean applicability domain for classification of high throughput screening data [J]. Journal of Chemical Information and Modeling, 2019, 59(1): 463-476. doi: 10.1021/acs.jcim.8b00499 [65] 郑玉婷, 乔显亮, 于洋, 等. 有机化学品生物富集因子定量结构-活性关系模型 [J]. 生态毒理学报, 2019, 14(2): 214-221. doi: 10.7524/AJE.1673-5897.20180718002 ZHENG Y T, QIAO X L, YU Y, et al. Quantitative structure-activity relationship model for bioconcentration factors of organic chemicals [J]. Asian Journal of Ecotoxicology, 2019, 14(2): 214-221(in Chinese). doi: 10.7524/AJE.1673-5897.20180718002
[66] 杨真真, 匡楠, 范露, 等. 基于卷积神经网络的图像分类算法综述 [J]. 信号处理, 2018, 34(12): 1474-1489. doi: 10.16798/j.issn.1003-0530.2018.12.009 YANG Z Z, KUANG N, FAN L, et al. Review of image classification algorithms based on convolutional neural networks [J]. Journal of Signal Processing, 2018, 34(12): 1474-1489(in Chinese). doi: 10.16798/j.issn.1003-0530.2018.12.009
[67] RIBEIRO M T, SINGH S, GUESTRIN C. “why should I trust You?”: Explaining the predictions of any classifier[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. August 13-17, 2016, San Francisco, California, USA. New York: ACM, 2016: 1135-1144. [68] BARREDO ARRIETA A, DÍAZ-RODRÍGUEZ N, del SER J, et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI [J]. Information Fusion, 2020, 58: 82-115. doi: 10.1016/j.inffus.2019.12.012 [69] PETCH J, DI S, NELSON W. Opening the black box: The promise and limitations of explainable machine learning in cardiology [J]. Canadian Journal of Cardiology, 2022, 38(2): 204-213. doi: 10.1016/j.cjca.2021.09.004 [70] ANDREWS R, DIEDERICH J, TICKLE A B. Survey and critique of techniques for extracting rules from trained artificial neural networks [J]. Knowledge-Based Systems, 1995, 8(6): 373-389. doi: 10.1016/0950-7051(96)81920-4 [71] LIU X, WANG X G, MATWIN S. Improving the interpretability of deep neural networks with knowledge distillation[C]//2018 IEEE International Conference on Data Mining Workshops (ICDMW). November 17-20, 2018, Singapore. IEEE, 2019: 905-912. [72] MASHAYEKHI M, GRAS R. Rule extraction from decision trees ensembles: New algorithms based on heuristic search and sparse group lasso methods [J]. International Journal of Information Technology & Decision Making, 2017, 16(6): 1707-1727. [73] GOLDSTEIN A, KAPELNER A, BLEICH J, et al. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation [J]. Journal of Computational and Graphical Statistics, 2015, 24(1): 44-65. doi: 10.1080/10618600.2014.907095 [74] RIBEIRO M T, SINGH S, GUESTRIN C. Anchors: High-precision model-agnostic explanations[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018: 1527-1535. [75] BACH S, BINDER A, MONTAVON G, et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation [J]. PLoS One, 2015, 10(7): e0130140. doi: 10.1371/journal.pone.0130140 [76] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C]// 2017 IEEE International Conference On Computer Vision (ICCV). 2017: 618-626. [77] LUNDBERG S M, LEE S I. A unified approach to interpreting model predictions[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. December 4-9, 2017, Long Beach, California, USA. New York: ACM, 2017: 4768-4777. [78] HASEBE T. Knowledge-embedded message-passing neural networks: Improving molecular property prediction with human knowledge [J]. ACS Omega, 2021, 6(42): 27955-27967. doi: 10.1021/acsomega.1c03839 [79] MATOS G D R, KYU D Y, LOEFFLER H H, et al. Approaches for calculating solvation free energies and enthalpies demonstrated with an update of the FreeSolv database [J]. Journal of Chemical and Engineering Data, 2017, 62(5): 1559-1569. doi: 10.1021/acs.jced.7b00104 [80] VERMEIRE F H, GREEN W H. Transfer learning for solvation free energies: From quantum chemistry to experiments [J]. Chemical Engineering Journal, 2021, 418: 129307. doi: 10.1016/j.cej.2021.129307 [81] SU A, RAJAN K. A database framework for rapid screening of structure-function relationships in PFAS chemistry [J]. Scientific Data, 2021, 8(1): 1-10. doi: 10.1038/s41597-020-00786-7 [82] MA J S, SHERIDAN R P, LIAW A, et al. Deep neural nets as a method for quantitative structure-activity relationships [J]. Journal of Chemical Information and Modeling, 2015, 55(2): 263-274. doi: 10.1021/ci500747n [83] MAYR A, KLAMBAUER G, UNTERTHINER T, et al. DeepTox: Toxicity prediction using deep learning [J]. Frontiers in Environmental Science, 2016, 3: 80. [84] PU L M, NADERI M, LIU T R, et al. eToxPred: A machine learning-based approach to estimate the toxicity of drug candidates [J]. BMC Pharmacology and Toxicology, 2019, 20(1): 2. doi: 10.1186/s40360-018-0282-6 [85] ALVES V, MURATOV E, CAPUZZI S, et al. Alarms about structural alerts [J]. Green Chemistry, 2016, 18(16): 4348-4360. doi: 10.1039/C6GC01492E [86] WU X D, KUMAR V, QUINLAN J R, et al. Top 10 algorithms in data mining [J]. Knowledge and Information Systems, 2008, 14(1): 1-37. doi: 10.1007/s10115-007-0114-2 [87] JIMÉNEZ-LUNA J, GRISONI F, SCHNEIDER G. Drug discovery with explainable artificial intelligence [J]. Nature Machine Intelligence, 2020, 2(10): 573-584. doi: 10.1038/s42256-020-00236-4 [88] CHEN D, GAO K F, NGUYEN D D, et al. Algebraic graph-assisted bidirectional transformers for molecular property prediction [J]. Nature Communications, 2021, 12(1): 1-9. doi: 10.1038/s41467-020-20314-w [89] RODRÍGUEZ-PÉREZ R, BAJORATH J. Explainable machine learning for property predictions in compound optimization [J]. Journal of Medicinal Chemistry, 2021, 64(24): 17744-17752. doi: 10.1021/acs.jmedchem.1c01789