Image credit: Danilo Alvesd Unsplash
The discovery of a new drug is estimated to require at least ten years and cost more than $2 billion USD. However, as the current COVID pandemic is rapidly developing, it reminded us that sometimes reliable solutions are needed as fast as possible, and preferably in a cost-efficient manner.
Even so, the drug discovery process requires the search of a drug candidate amongst thousands to millions of molecules. In addition to this exhaustive search, the candidates that are selected with chemical intuition may not provide the expected results. From a practical point of view, the use of computer power to predict and eliminate drug candidates prior to any experimental procedure would prove to be more time and cost efficient.
The use of computers to discover new drugs has decades of history, but it first found itself under the spotlight with the Fortune magazine cover on October 5, 1981. Prior to this exposure, computers were already in use to perform virtual mini experiments, which was eventually awarded with Nobel prize in chemistry in 2013. With the increase in computer power, scientist found themselves in search of faster and more reliable computational tools for drug discovery. This search eventually led researchers to artificial intelligence and machine learning methods in chemistry.
Machine learning can be simply described as training machines to predict target-based properties that are closely related to certain patterns. A very naïve example would be training to predict if a given image belongs to a cat or a bird by recognizing the physical patterns such as tail or wings. In case of chemistry, these patterns are the atoms and their three-dimensional arrangements in a molecule, while the predictions would be the chemical properties such as the energy.
There are different class of machine learning algorithms, but the neural networks are shown to be the most suitable class for chemical research. These algorithms are inspired by the neurons in brain and their capabilities to predict chemical properties can be traced back to early 1990s. These early examples were not applicable for a broad range of molecules even if they consisted similar atom types that were used for training. However, this restriction has been tackled over the last decade, and neural networks were successfully shown to predict chemical properties of drug-like molecules.
One fact about the computer aid for chemical research prevents scientists from attaining high accuracy for their virtual experiments: the larger the molecules of interest are, the longer the computation time will be. Hence, current tools that are used in computational drug design are restrictive in terms of either system size or accuracy. The accuracy and size dilemma shows itself yet again when there are millions of molecules from which the drug candidates are going to be selected.
This predicament adds another level of complexity to achieve effective drug discovery process and enforces computational chemists to comply with less reliable results when the virtual experiment involves very large systems.
However, neural networks have the potential to overcome the size-accuracy dilemma. The main advantage of neural networks lies on their pattern recognition capabilities. They can be used to predict properties of large systems even if they are trained for smaller molecules. Thus, they can overcome the setbacks that current computational drug discovery tools suffer from and give a light at the end of the tunnel to accomplish more reliable results.
However, as Marie Curie once said, “One never notices what has been done; one can only see what remains to be done.” The scientific community is compelled to improve the effectiveness of neural networks and provide general purpose models that can predict chemical properties.
Practical applications dictate that desirable neural networks should tackle large systems, provide high accuracy, and should be transferable and extensible, meaning that the neural network should not just memorize the data. Latest neural network models like ANI-2x and AIMNet can approach such capabilities. However, the quantity and the quality of the training data can also affect the accuracy of prediction. Even though the amount of high-quality reference data for training is limited, recent strategies to tackle transferability and extensibility provide encouraging accuracy.
Written by: Hatice Gokcan and Olexander Isayev
Reference: Hatice Gokcan and Olexander Isayev, Learning molecular potentials with neural networks, WIREs Computational Molecular Science (2021). DOI: 10.1002/wcms.1564