Trespassing allowed: from the classical to the quantum world (and back again)

Blog of a research group in theoretical physics.

Archive for the ‘GPU’ Category

Computational physics & GPU programming: Solving the time-dependent Schrödinger equation

with one comment

I start my series on the physics of GPU programming by a relatively simple example, which makes use of a mix of library calls and well-documented GPU kernels. The run-time of the split-step algorithm described here is about 280 seconds for the CPU version (Intel(R) Xeon(R) CPU E5420 @ 2.50GHz), vs. 10 seconds for the GPU version (NVIDIA(R) Tesla C1060 GPU), resulting in 28 fold speed-up! On a C2070 the run time is less than 5 seconds, yielding an 80 fold speedup.

autocorrelation function in a uniform force field

Autocorrelation function C(t) of a Gaussian wavepacket in a uniform force field. I compare the GPU and CPU results using the wavepacket code.

The description of coherent electron transport in quasi two-dimensional electron gases requires to solve the Schrödinger equation in the presence of a potential landscape. As discussed in my post Time to find eigenvalues without diagonalization, our approach using wavepackets allows one to obtain the scattering matrix over a wide range of energies from a single wavepacket run without the need to diagonalize a matrix. In the following I discuss the basic example of propagating a wavepacket and obtaining the autocorrelation function, which in turn determines the spectrum. I programmed the GPU code in 2008 as a first test to evaluate the potential of GPGPU programming for my research. At that time double-precision floating support was lacking and the fast Fourier transform (FFT) implementations were little developed. Starting with CUDA 3.0, the program runs fine in double precision and my group used the algorithm for calculating electron flow through nanodevices. The CPU version was used for our articles in Physica Scripta Wave packet approach to transport in mesoscopic systems and the Physical Review B Phase shifts and phase π-jumps in four-terminal waveguide Aharonov-Bohm interferometers among others.
Here, I consider a very simple example, the propagation of a Gaussian wavepacket in a uniform potential V(x,y)=-F x, for which the autocorrelation function of the initial state \langle x,y | \psi(t=0) \rangle=\frac{1}{a\sqrt{\pi}}\exp\big(-(x^2+y^2)/(2 a^2)\big) is known in analytic form: \langle \psi(t=0) | \psi(t) \rangle=\frac{2 a^2 m}{2 a^2 m+i\hbar t}\exp\left({-\frac{a^2F^2t^2}{4\hbar^2}-i\frac{F^2t^3}{24\hbar m}}\right).
Read the rest of this entry »

Written by tobiaskramer

March 28, 2012 at 14:50

The physics of GPU programming

leave a comment »

GPU cluster

Me pointing at the GPU Resonance cluster at SEAS Harvard with 32x448=14336 processing cores. Just imagine how tightly integrated this setup is compared to 3584 quad-core computers. Picture courtesy of Academic Computing, SEAS Harvard.

From discussions I learn that while many physicists have heard of Graphics Processing Units as fast computers, resistance to use them is widespread. One of the reasons is that physics has been relying on computers for a long time and tons of old, well trusted codes are lying around which are not easily ported to the GPU. Interestingly, the adoption of GPUs happens much faster in biology, medical imaging, and engineering.
I view GPU computing as a great opportunity to investigate new physics and my feeling is that todays methods optimized for serial processors may need to be replaced by a different set of standard methods which scale better with massively parallel processors. In 2008 I dived into GPU programming for a couple of reasons:

  1. As a “model-builder” the GPU allows me to reconsider previous limitations and simplifications of models and use the GPU power to solve the extended models.
  2. The turn-around time is incredibly fast. Compared to queues in conventional clusters where I wait for days or weeks, I get back results with 10000 CPU hours compute time the very same day. This in turn further facilitates the model-building process.
  3. Some people complain about the strict synchronization requirements when running GPU codes. In my view this is an advantage, since essentially no messaging overhead exists.
  4. If you want to develop high-performance algorithm, it is not good enough to convert library calls to GPU library calls. You might get speed-ups of about 2-4. However, if you invest the time and develop your own know-how you can expect much higher speed-ups of around 100 times or more, as seen in the applications I discussed in this blog before.

This summer I will lecture about GPU programming at several places and thus I plan to write a series of GPU related posts. I do have a complementary background in mathematical physics and special functions, which I find very useful in relation with GPU programming since new physical models require a stringent mathematical foundation and numerical studies.

Written by tobiaskramer

March 11, 2012 at 16:57

Catching and tracking light: following the excitations in the Fenna-Matthews-Olson complex

leave a comment »

Peak oscillations in the FMO complex calculated using GPU-HEOM

The animation shows how peaks in the 2d echo-spectra are oscillation and changing for various delay times. For a full explanation, see Modelling of Oscillations in Two-Dimensional Echo-Spectra of the Fenna-Matthews-Olson Complex by B.Hein, C. Kreisbeck, T. Kramer, M. Rodríguez, New J. of Phys., 14, 023018 (2012), open access.

Efficient and fast transport of electric current is a basic requirement for the functioning of nanodevices and biological systems. A neat example is the energy-transport of a light-induced excitation in the Fenna-Matthews-Olson complex of green sulfur bacteria. This process has been elucidated by pump-probe spectroscopy. The resulting spectra contain an enormous amount of information about the couplings of the different pigments and the pathways taken by the excitation. The basic guide to a 2d echo-spectrum is as follows:
You can find peaks of high intensity along the diagonal line which are roughly representing a more common absorption spectrum.  If you delay the pump and probe pulses by several picoseconds, you will find a new set of peaks at a horizontal axis which indicates that energy of the excitation gets redistributed and the system relaxes and transfers part of the energy to vibrational motion. This process is nicely visible in the spectra recorded by Brixner et al.
A lot of excitement and activity on photosynthetic complexes was triggered by experiments of Engel et al showing that besides the relaxation process also periodic oscillations are visible in the oscillations for more than a picosecond.

What is causing the oscillations in the peak amplitudes of 2d echo-spectra in the Fenna-Matthews Olson complex?

A purely classical transport picture should not show such oscillations and the excitation instead hops around the complex without interference. Could the observed oscillations point to a different transport mechanism, possibly related to the quantum-mechanical simultaneous superposition of several transport paths?

The initial answer from the theoretical side was no, since within simplified models the thermalization occurs fast and without oscillations. It turned out that the simple calculations are a bit too simplistic to describe the system accurately and exact solutions are required. But exact solutions (even for simple models) are difficult to obtain. Known exact methods such as DMRG work only reliable at very low temperatures (-273 C), which are not directly applicable to biological systems. Other schemes use the famous path integrals but are too slow to calculate the pump-probe signals.

Our contribution to the field is to provide an exact computation of the 2d echo-spectra at the relevant temperatures and to see the difference to the simpler models in order to quantify how much coherence is preserved. From the method-development the computational challenge is to speed-up the calculations several hundred times in order to get results within days of computational run-time. We did achieve this by developing a method which we call GPU-hierarchical equations of motion (GPU-HEOM). The hierarchical equations of motions are a nice scheme to propagate a density matrix under consideration of non-Markovian effects and strong couplings to the environment. The HEOM scheme was developed by Kubo, Tanimura, and Ishizaki (Prof. Tanimura has posted some material on HEOM here).

However, the original computational method suffers from the same problems as path-integral calculations and is rather slow (though the HEOM method can be made faster and applied to electronic systems by using smart filtering as done by Prof. YiJing Yan). The GPU part in GPU-HEOM stands for Graphics Processing Units. Using our GPU adoption of the hierarchical equations (see details in Kreisbeck et al[JCTC, 7, 2166 (2011)] ) allowed us to cut down computational times dramatically and made it possible to perform a systematic study of the oscillations and the influence of temperature and disorder in our recent article Hein et al [New J. of Phys., 14, 023018 (2012), open access] .

Written by tobiaskramer

February 8, 2012 at 17:37

Time to find eigenvalues without diagonalization

leave a comment »

Solving the stationary Schrödinger (H-E)Ψ=0 equation can in principle be reduced to solving a matrix equation. This eigenvalue problem requires to calculate matrix elements of the Hamiltonian with respect to a set of basis functions and to diagonalize the resulting matrix. In practice this time consuming diagonalization step is replaced by a recursive method, which yields the eigenfunctions for a specific eigenvalue.

A very different approach is followed by wavepacket methods. It is possible to propagate a wavepacket without determining the eigenfunctions beforehand. For a given Hamiltonian, we solve the time-dependent Schrödinger equation (i ∂t-H) Ψ=0 for an almost arbitrary initial state Ψ(t=0)  (initial value problem).

The reformulation of the determination of eigenstates as an initial value problem has a couple of computational advantages:

  • results can be obtained for the whole range of energies represented by the wavepacket, whereas a recursive scheme yields only one eigenenergy
  • the wavepacket motion yields direct insight into the pathways and allows us to develop an intuitive understanding of the transport choreography of a quantum system
  • solving the time-dependent Schrödinger equation can be efficiently implemented using Graphics Processing Units (GPU), resulting in a large (> 20 fold) speedup compared to  CPU code
Aharnov-Bohm Ring conductance oscillations

The Zebra stripe pattern along the horizontal axis shows Aharonov-Bohm oscillations in the conductance of a half-circular nanodevice due to the changing magnetic flux. The vertical axis denotes the Fermi energy, which can be tuned experimentally. For details see our paper in Physical Review B.

The determination of transmissions requires now to calculate the Fourier transform of correlation functions <Ψ(t=0)|Ψ(t)>. This method has been pioneered by Prof. Eric J. Heller, Harvard University, and I have written an introductory article for the Latin American School of Physics 2010 (arxiv version).

Recently, Christoph Kreisbeck  has done a detailed calculations on the gate-voltage dependency of the conductance in Aharonov-Bohm nanodevices, taking full adventage of the simultaneous probing of a range of Fermi energies with one single wavepacket. A very clean experimental realization of the device was achieved by Sven Buchholz, Prof. Saskia Fischer, and Prof. Ulrich Kunze (RU Bochum), based on a semiconductor material grown by Dr. Dirk Reuter and Prof. Anreas Wieck (RU Bochum). The details, including a comparison of experimental and theoretical results shown in the left figure, are published in Physical Review B (arxiv version).

Written by tobiaskramer

May 29, 2011 at 19:20

Interactions: from galaxies to the nanoscale

leave a comment »

Microscopic model of a Hall bar

(a) Device model
(b) phenomenological potential
(c) GPU result

For a while we have explored the usage of General Purpose Graphics Processing Units (GPGPU) for electronic transport calculations in nanodevices, where we want to include all electron-electron and electron-donor interactions. The GPU allows us to drastically (250 fold !!!) boost the performance of N-body codes and we manage to propagate 10,000 particles over several million time-steps within days. While GPU methods are now rather popular within the astrophysics crowd, we haven’t seen many GPU applications for electronic transport in a nanodevice. Besides the change from astronomical units to atomic ones, gravitational forces are always attractive, whereas electrons are affected by electron-donor charges (attractive) and electron-electron repulsion. Furthermore we have a magnetic field present, leading to deflections. Last, the space where electrons can spread out is limited by the device borders. In total the force on the kth electron is given by \vec{F}_{k}=-\frac{e^2}{4\pi\epsilon_0 \epsilon}\sum_{\substack{l=1}}^{N_{\rm donor}}\frac{\vec{r}_l-\vec{r}_k}{|\vec{r}_l-\vec{r}_k|^3}+\frac{e^2}{4\pi\epsilon_0 \epsilon}\sum_{\substack{l=1\\l\ne k}}^{N_{\rm elec}}\frac{\vec{r}_l-\vec{r}_k}{|\vec{r}_l-\vec{r}_k|^3}+e \dot{\vec{r}}_k\times\vec{B}

Our recent paper in Physical Review B (also freely available on the arxiv) gives the first microscopic description of the classical Hall effect, where interactions are everything: without interactions no Hall field and no drift transport. The role and importance of the interactions is surprisingly sparsely mentioned in the literature, probably due to a lack of computational means to move beyond phenomenological models. A notable exception is the very first paper on the Hall effect by Edwin Hall, where he writes “the phenomena observed indicate that two currents, parallel and in the same direction, tend to repel each other”. Note that this repulsion works throughout the device and therefore electrons do not pile up at the upper edge, but rather a complete redistribution of the electronic density takes place, yielding the potential shown in the figure.

Another important part of our simulation of the classical Hall effect are the electron sources and sinks, the contacts at the left and right ends of the device. We have developed a feed-in and removal model of the contacts, which keeps the contact on the same (externally enforced) potential during the course of the simulation.

Mind-boggling is the fact that the very same “classical Hall potential” has also been observed in conjunction with a plateau of the integer quantum Hall effect (IQHE) [Knott et al 1995 Semicond. Sci. Technol. 10 117 (1995)]. Despite these observations, many theoretical models of the integer quantum Hall effect do not consider the interactions between the electrons. In our classical model, the Hall potential for non-interacting electrons differs dramatically from the solution shown above and transport proceeds then (and only then) along the lower and upper edges. However the edge current solution is not compatible with the contact potential model described above where an external reservoir enforces equipotentials within each contact.

Written by tobiaskramer

May 8, 2010 at 13:33

Follow

Get every new post delivered to your Inbox.