Machine learning for encoding optical spectra

Architecture of the neural network used by Rodriguez & Kramer to encode the 2d-echo spectra of the photosynthetic FMO complex. The network is trained on GPUs with the using Mathematica, which in turn uses the Apache MXNet framework.

Machine learning techniques (“neural networks”) are presently explored in a wide range of applications, with the standard showcase being image recognition. In most scenarios the input data is “user generated” (for example handwritten digits) or comes from automatic sensors. The neural network gets trained with (Key–Value) pairs which are often tagged before used for “supervised learning”.

But what if we want to apply machine learning to scientific data sets generated by demanding simulations for instance on supercomputers? In that case, the input data does not come “for free”, but is the outcome of state-of-the art simulations, for instance of the optical properties of photosynthetic complexes, discussed before.

The advantage of the Machine Learning technique in this case is that the input parameter are known and the training works with very reliable information. This allows one to find very small-sized (in terms of storage) neural network representations of huge data sets (several gigabytes). We (Rodriguez & Kramer 2019, arxiv version) have explored this method for encoding the information of “two-dimensional optical spectra” and to relate the spectra to the molecular structure, such as the dipole orientations and the fluctuating energy states.

From a “physics perspective”, machine learning provides a way of automatic parameter fitting and could be seen as minimizing a variational parameter space. The problem: a variational principle always gives happily an answer, even if that answer is wrong. While we cannot solve this problem, we have studied how good different network layouts perform under the constraint of fixing the number of fitted parameters. This determines the size of the resulting parameter file of the network, which becomes surprisingly small. You can explore it by downloading the ancillary data we deposited on the arxiv.