resumo
Sigma profiles are quantum-chemistry-derived molecular descriptors that encode the polarity of molecules. They have shown great performance when used as a feature in machine learning applications. To accelerate the development of these models and the construction of large sigma profile databases, this work proposes a graph convolutional network (GCN) architecture to predict sigma profiles from molecule structures. To do so, the usage of molecular mechanics (force field atom types) is explored as a computationally inexpensive node-level featurization technique to encode the local and global chemical environments of atoms in molecules. The GCN models developed in this work accurately predict the sigma profiles of assorted organic and inorganic compounds. The best GCN model here reported, obtained using Merck molecular force field (MMFF) atom types, displayed training and testing set coefficients of determination of 0.98 and 0.96, respectively, which are superior to previous methodologies reported in the literature. This performance boost is shown to be due to both the usage of a convolutional architecture and node-level features based on force field atom types. Finally, to demonstrate their practical applicability, we used GCN-predicted sigma profiles as the input to machine learning models previously developed in the literature that predict boiling temperatures and aqueous solubilities. Using the predicted sigma profiles as input, these models were able to compute both physicochemical properties using significantly less computational resources and displayed only a slight decrease in performance when compared with sigma profiles obtained from quantum chemistry methods.
autores
Abranches, DO; Maginn, EJ; Colón, YJ
nossos autores
agradecimentos
This work was supported by the U.S. Department of Energy via subcontract 630340 from Los Alamos National Laboratory, Materials and Chemical Sciences Division, and Breakthrough Electrolytes for Energy Storage Systems (BEES2), an Energy Frontier Research Center funded by the U.S. Department of Energy, Office of Science, Basic Energy Sciences (BES), under award DE-SC0019409. The authors acknowledge the Center for Research Computing (CRC) at the University of Notre Dame for providing computational resources. D.O.A. also thanks the support of the Patrick and Jana Eilers Graduate Student Fellowship for Energy Related Research.