Photosynthesis: from the antenna to the reaction center

From the antenna to the reaction center: downhill energy transfer in the photosynthetic apparatus of Chlorobium tepidum.

The photosynthetic apparatus of the green sulfur bacteria (chlorobium tepdidum) spurred a long lasting discussion about quantum coherence in biology, mostly focused on its subunit, the Fenna Matthews Olson complex. An account of the discovery of the FMO Protein is given by Olson in Photosynth. Res 80, 2004.

Recent experiments by Dostál, Pšenčík, and Zigmantas (Nat. Chem 8, 2016) show measured time and frequency resolved 2d-spectra of the whole photosynthetic apparatus. These results allow one to trace the energy flow from the antenna down to the reaction center and relate it to theoretical models.

Computed two-dimensional spectrum of the antenna and FMO complex of C. tepidum. “A” denotes the location of the antenna peak, 1-7 the FMO complex states. Within tens of picoseconds, the energy is shuffled from the antenna towards the FMO complex.

In our article [Kramer & Rodriguez  Sci. Reports 7, 2017] , open access] we provide a model of the experimental results using the open quantum system dynamics code described previously. In addition we show how the different pathways in 2D spectroscopy (ground state bleaching, stimulated emission, and excited state absorption) affect the spectra and lead to shifts of “blobs” down from the diagonal places. This allows us to infer the effective coupling of the antenna part to the FMO complex and to assess the relative orientations of the different units. The comparison of theory and experimental result is an good test of our current understanding of the physical processes at work.

How a wave packet travels through a quantum electronic interferometer

Together with Christoph Kreisbeck and Rafael A Molina I have contributed a blog entry to the News and Views section of the Journal of Physics describing our most recent work on Aharonov-Bohm interferometer with an imbedded quantum dot (article, arxiv). Can you spot Schrödinger’s cat in the result?

Transition between the resistivity of the nanoring with and without embedded quantum dot. The vertical axis denotes the Fermi energy (controlled by a gate), while the horizontal axis scans through the magnetic field to induce phase differences between the pathways.

Splitting the heat: the quantum limits of thermal energy flow

Device geometry. a) Scanning electron micrograph of the sample. The 1D waveguides with a lithographic width of 170 nm form a half-ring connected to reservoirs A-F. A global top-gate is present. Heating of reservoirs A, B is generated by applying a current Ih, thermal noise measurements are performed at contacts E, F. The reservoirs C and D are left floating. b) Device potential for the ballistic transport model with labels A∗ and E∗ denoting the joined reservoirs A+B and E+F. Harmonic waveguide network with Gaussian scatterer, mode spacing is ħω = 5 meV.
Device geometry. a) Scanning electron micrograph of an the sample. The 1D waveguides with a lithographic width of 170 nm form a half-ring connected to reservoirs A-F. A global top-gate is present. Heating of reservoirs A, B is generated by applying a current Ih, thermal noise measurements are performed at contacts E, F. The reservoirs C and D are left floating. b) Device potential for the ballistic transport model with labels A∗ and E∗ denoting the joined reservoirs A+B and E+F. Harmonic waveguide network with Gaussian scatterer (indicated by arrow). Mode spacing is ħω = 5 meV. © 2016 Kramer et al. Citation: AIP Advances 6, 065306 (2016);

With ever shrinking sizes of electronic transistors, the quantum mechanical nature of electrons becomes more visible. For instance two electrons with the same spin orientation and velocities cannot be at the same location (Pauli blocking). At low temperatures, electronic waves travel many mircometers completely coherently, only reflected by the geometric of the confinement. A tight confinement leads to larger separation of quantized energy levels and restricts the lateral spread of the electrons to specific eigenmodes of a nanowire.

The distribution of the electronic current into various is then given by the geometrical scattering properties of the device interior, which are conveniently computed using wave packets. The ballistic electrons entering a nanodevice carry along charge and thermal energy. The maximum amount of thermal energy Q per time which can be transported through a single channel between two reservoirs of different temperatures is limited to  Q ≤ π2 kB2 (T22-T12)/(3h) [h denotes Planck’s and kB Boltzmann’s constant]. This has implications for computing devices, since this restricts the cooling rate (Pendry 1982).

In a collaboration with the novel materials group at Humboldt University (Prof. S.F. Fischer, Dr. C. Riha, Dr. O. Chiatti, S. Buchholz) and using wafers produced in the lab of A. Wieck, D. Reuter (Bochum, Paderborn) C. Kreisbeck and I have compared theoretical expectations with experimental data for the thermal energy and charge currents in multi-terminal nanorings (AIP Advances 2016, open access). Our findings highlight the influence of the device geometry on both, charge and thermal energy transfer and demonstrate the usefulness of the time-dependent wave-packet algorithm to find eigenstates over a whole range of temperature.

Metadata analysis of 80,000 arxiv:physics/astro-ph articles reveals biased moderation

Have you ever thought of arXiv moderation in astro-ph being a problem? Did you experience a >5 months delay from submission of your pre-print to the arXiv  to being publicly visible? Did this happen without any explanation or reaction from the arXiv moderators despite the same article being published after peer review in the Astrophysical Journal Letters?

Chances are high that your answer is no, to be precise the odds are 81404/81440=99.9558 percent that this did not happen to you.  Lucky you! Now let me tell about the other 36/81440=0.0442043 percent. My computer based analysis of the last 80,000 deposited arxiv:astro-ph articles shows interesting results about the moderation patterns in astrophysics. To repeat the analysis

  • get the arXiv metadata, which is available (good!) from the arxiv itself. I used the excellent metha tools from Martin Czygan to download all metadata from the astro-ph and quant-ph sections since 5/2014.
  • parse the resulting 200 MB XML file, for instance with Mathematica. To get the delay from submission to arXiv publication, I  took the time difference between the submission date stamp (oldest XMLElement[{, date}) and the arXiv identifier, which encodes the year and month of public visibility.
  • Example: the article  arxiv:1604.00876 went public in April 2016, 5 months after submission to the arXiv (November 5, 2015) and publication in the Astrophysical Journal Letters (there total processing time from submission to online publication, including peer review 1.5 months).

The analysis shows different patterns of moderation for the two sections I considered, quant-ph and astro-ph. It reveals problematic moderation effects in the arXiv astro-ph section:

  1. Completely suitable articles are blocked, mostly peer reviewed and published for instance in the Astrophysical Journal, Astrophysical Journal Letters, Monthly Notices of the Royal Astronomical Society.
  2. This might indicate a biased moderation toward specific persons and subjects. In contrast to scientific journals with their named editors, the arXiv moderation is opaque and anonymous. The metadata analysis shows that the moderation of the physics:astro-ph and physics:quant-ph use very different guidelines, with astro-ph having a strong bias to block valid contributions.
  3. It makes the astro-ph arXiv less usable as a medium for rapid dissemination of cutting edge research via preprints.
  4. This hurts careers, citation histories, and encourages plagiarism. New scientific findings are more easily plagiarized by other groups, since no arXiv time-stamped preprint establishes the precedence.
  5. If we, the scientists, want a publicly funded arXiv we must ensure that it is operated according to scientific standards which serve the public. This excludes biased blocking of valid and publicly funded research.
  6. Finally, the arXiv was not put in place to be a backup server for all journals, but rather to provide a space to share upcoming scientific publications without months of delay.

I will be happy to share comments I receive about similar cases. I am not talking about dubious articles or non-scientific theories, but about standard peer-reviewed contributions published in established physics journals, which should be on the astrophysical preprint arXiv.

Here follows the list of all articles which were delayed by more than 3 months from arxiv:physics/astro-ph (out of a total of 81,440 deposited articles) and if known where the peer reviewed article got published. I cannot exclude other factors besides moderation for the delay, but can definitely confirm incorrect moderation being the cause for the 2 cases I have experienced. Interestingly the same analysis on arxiv:physics/quant-ph did not reveal such a moderation bias of peer reviewed articles. This gives hope that the astrophysical section could recover and return to 100 percent flawless operation. Then the arXiv fulfils its own pledge on accountability and on good scientific practices (principles of the arXiv’s operation). The Astrophysical Journal Publications of Astronomical Society of Japan Journal of Astrophysics and Astronomy EPJ Web of Conferences Astrophysics and Space Sciences The Astrophysical Journal Physical Review C Monthly Notices of the Royal Astronomical Society The Astrophysical Journal Letters Journal of Statistical Mechanics: Theory and Experiment The Astrophysical Journal Letters

Predicting comets: a matter of perspective

Contrast stretched NAVCAM image of the nucleus of comet 67P/Churyumov-Gerasimenko to highlight the “jets” of dust emitted from all over the surface. CC BY-SA IGO 3.0

In general, any cometary activity is difficult to predict and many comets are known for sudden changes in brightness, break ups and simple disappearances. Fortunately, the Rosetta target comet 67P/Churyumov-Gerasiminko (67P/C-G) is much more amendable to theoretical predictions. The OSIRIS and NAVCAM images show light reflected from a highly structured dust coma within the space probe orbit (ca 20-150 km).

Is is possible to predict the dust coma and tail of comets?

Starting in 2014 we have been working on a dust forecast for 67P/C-G, see the previous blog entries. We had now the chance to check how well our predictions hold by comparing the model outcome to a image sequence from the OSIRIS camera during one rotation period of 67P/C-G on April 12, 2015, published by Vincent et al in A&A 587, A14 (2016) (arxiv version, there Fig. 13).

Comparison of Rosetta observations by Vincent et al A&A 2016 (left panels) with the homogeneous model (right panels). Taken from Kramer&Noack (ApJL 2016) Credit for (a, c): ESA/Rosetta/MPS for OSIRIS Team MPS/UPD/LAM/IAA/SSO/INTA/UPM/DASP/IDA

Our results appeared in Kramer & Noack, Astrophysical Journal Letters, 823, L11 (preprint, images). We obtain a surprisingly high correlation coefficient (average 80%, max 90%) between theory and observation, if we stick to the following minimal assumption model:

  1. dust is emitted from the entire sunlit nucleus, not only from localized active areas. We refer to this as the “homogeneous activity model”
  2. dust is entering space with a finite velocity (on average) along the surface normal. This implies that close to the surface a rapid acceleration takes place.
  3. photographed “jets” are highly depending on the observing geometry:
    rotateif multiple concave areas align along the line of sight, a high imaged intensity results, but is not necessarily the result of a single main emission source. As an exemplary case, we analysed the brightest points in the Rosetta image taken on April 12, 2015, 12:12 and look at all contributing factors along the line of sight (yellow line) from the camera to the comet. The observed jet is actually resulting from multiple sources and in addition from contributions from all sunlit surface areas.

What are the implications of the theoretical model?

If dust is emitted from all sunlit areas of 67P/C-G, this implies a more homogeneous surface erosion of the illuminated nucleus and leaves less room for compositional heterogeneities. And finally: it makes the dust coma much more predictable, but still allows for additional (but unpredictable) spontaneous, 20-40 min outbreak events. Interestingly, a re-analysis of the comet Halley flyby by Crifo et al (Earth, Moon, and Planets 90 227-238 (2002)) also points to a more homogeneous emission pattern as compared to localized sources.

Apply now for the 621. WE-Heraeus-Seminar: From Photosynthesis to Photovoltaics: Theoretical Approaches for Modelling Supramolecular Complexes and Molecular Crystals

The conference focuses on theoretical and experimental new developments in the field of photoactive molecules. If you are working in this field, we encourage you to attend the meeting. Find out more and who already agreed to participate  at

Please submit your abstract within the next two weeks by filling out the linked template from the conference site.

We (the scientific organizers, Tobias Kramer and Jörg Megow) are grateful for the full support by the Wilhelm and Else Heraeus Foundation, which covers the accommodation for all successful applicants.

Weathering the dust around comet 67P/Churyumov–Gerasimenko

Bradford robotic telescope image of comet 67P (30th Oct 2015)
Bradford robotic telescope image of comet 67P/Churyumov–Gerasimenko (180s exposure time, 5:43 UTC, 30-10-2015). © 2015 University of Bradford

Comet 67P/Churyumov–Gerasimenko is past its perihelion and is currently visible in telescopes in the morning hours. The picture is taken from Tenerife by the Bradford robotic telescope, where I submitted the request. The tail is extending hundred thousands kilometers into space and consists of dust particles emitted from the cometary nucleus, which measures just a few kilometers. In a recent work just published in the Astrophysical Journal Letters (arxiv version), we have explored how dust, which does not make it into space, is whirling around the cometary nucleus. The model assumes that dust particles are emitted from the porous mantle and hover over the cometary surface for some time (<6h) and then fall back on the surface, delayed by the gas drag of gas molecules moving away from the nucleus. As in the predictions for the cometary coma discussed previously, we are sticking to a minimal-assumption model with a homogeneous surface activity of gas and dust emission.

Dust trajectories reaching the Philae descent area computed from a homogeneous dust emission model. Figure from Kramer/Noack Prevailing dust-transport directions on comet 67P/Churyumov-Gerasimenko, Astrophysical Journal Letters, 813, L33 (2015)
Dust trajectories reaching the Philae descent area computed from a homogeneous dust emission model. From Kramer/Noack “Prevailing dust-transport directions on comet 67P/Churyumov-Gerasimenko”, Astrophysical Journal Letters, 813, L33 (2015).

The movements of 40,000 dust particles are tracked and the average dust transport within a volumetric grid with 300 m sized boxes is computed. Besides the gas-dust interaction, we do also incorporate the rotation of the comet, which leads to a directional transport.
The Rosetta mission dropped Philae over the small lobe of 67P/C-G and Philae took a sequence of approach images which reveal structures resembling wind-tails behind boulders on the comet. This allowed Mottola et al (Science 349.6247 (2015): aab0232) to derive information about the direction of impinging particles which hit the surface unless sheltered by the boulder. Our model predicts a dust-transport inline with the observed directions in the descent region, it will be interesting to see how wind-tails at other locations match with the prediction. We put an interactive 3d dust-stream model online to visualize the dust-flux predicted from the homogeneous surface model.

Day and night at comet 67P/Churyumov–Gerasimenko

Comet 67P/Churyumov–Gerasimenko has passed its nearest distance to the sun and its tail has been observed from earth. The comet emits dust and displays spectacular but short-lived outbreaks of localized jet activity. Very detailed OSIRIS pictures of the near-surface dust emission ready for stereo viewing have been posted by Brian May. The pictures also allow one to have a look at the prediction from the homogeneous dust emission model discussed previously. When you direct your attention in Brian May’s pictures to the background activity, you find very similar patterns as expected from the homogenous emission model. This activity is dimmer but steadily blowing off dust from the nucleus. Matthias Noack and I have generated and uploaded a visualization of the dust data obtained from the homogeneous activity model. In contrast to a localized activity models, collimated jets arise from a bundle of co-propagating dust trajectories emanating from concave surface areas. The underlying topographical shape model is a uniform triangle remesh of Mattias Malmer’s excellent work based on the release of Rosetta’s NAVCAM images via the Rosetta blog. The following video takes you on a flight around 67P/C-G, with 16 hours condensed into 90 sec.

The video is a side-by-side stereoscopic 3d rendering of 67P/Churyumov–Gerasimenko and the dust cloud, which can be viewed in 3d with  a simple cardboard viewer. While the observer is encircling the nucleus, day and night passes and different parts of the comet are illuminated.

Gas flow around 67P/C-G computed from a homogeneous activity model.
Gas flow around 67P/C-G computed from a homogeneous activity model. arxiv:1505.08041

In the homogeneous activity model each sunlit triangle emits dust with an initial velocity component along the surface normal. Then dust is additionally dragged along within the outwards streaming gas, which is also incorporated in the model. In contrast to compact dust particles, the gas molecules are diffusing also in lateral directions and thus gas is not helping to collimate jets by itself. The Rosetta mission with its long term observation program offers fascinating ways to perform a reality check on various models of cometary activity, which differ considerably in the underlying physics and assumptions about the original distribution and lift-off conditions of the dust eventually forming the beautiful tails of comets.

Homogeneous dust emission and jet structure near active cometary nuclei: the case of 67P/Churyumov-Gerasimenko by Tobias Kramer, Matthias Noack, Daniel Baum, Hans-Christian Hege, Eric J. Heller.

For red-cyan glasses try our 3d video on youtube (flash player required, watch out for the settings and 3d options, 1080p HD recommended).

Dusting off cometary surfaces: collimated jets despite a homogeneous emission pattern.

Effective Gravitational potential of the comet (including the centrifugal contribution), the maximal value of the potential (red) is about 0.46 N/m, the minimal value (blue) 0.31 N/m computed with the methods described in this post.
Effective Gravitational potential of the comet (including the centrifugal contribution), the maximal value of the potential (red) is about 0.46 N/m, the minimal value (blue) 0.31 N/m computed with the methods described in this post. The rotation period is taken to be 12.4043 h. Image computed with the OpenCL cosim code. Image (C) Tobias Kramer (CC-BY SA 3.0 IGO).

Knowledge of GPGPU techniques is helpful for rapid model building and testing of scientific ideas. For example, the beautiful pictures taken by the ESA/Rosetta spacecraft of comet 67P/Churyumov–Gerasimenko reveal jets of dust particles emitted from the comet. Wouldn’t it be nice to have a fast method to simulate thousands of dust particles around the comet and to find out if already the peculiar shape of this space-potato influences the dust-trajectories by its gravitational potential? At the Zuse-Institut in Berlin we joined forces between the distributed algorithm and visual data analysis groups to test this idea. But first an accurate shape model of the comet 67P C-G is required. As published in his blog, Mattias Malmer has done amazing work to extract a shape-model from the published navigation camera images.

  1. Starting from the shape model by Mattias Malmer, we obtain a re-meshed model with fewer triangles on the surface (we use about 20,000 triangles). The key-property of the new mesh is a homogeneous coverage of the cometary surface with almost equally sized triangle meshes. We don’t want better resolution and adaptive mesh sizes at areas with more complex features. Rather we are considering a homogeneous emission pattern without isolated activity regions. This is best modeled by mesh cells of equal area. Will this prescription yield nevertheless collimated dust jets? We’ll see…
  2. To compute the gravitational potential of such a surface we follow this nice article by JT Conway. The calculation later on stays in the rotating frame anchored to the comet, thus in addition the centrifugal and Coriolis forces need to be included.
  3. To accelerate the method, OpenCL comes to the rescue and lets one compute many trajectories in parallel. What is required are physical conditions for the starting positions of the dust as it flies off the surface. We put one dust-particle on the center of each triangle on the surface and set the initial velocity along the normal direction to typically 2 or 4 m/s. This ensures that most particles are able to escape and not fall back on the comet.
  4. To visualize the resulting point clouds of dust particles we have programmed an OpenGL visualization tool. We compute the rotation and sunlight direction on the comet to cast shadows and add activity profiles to the comet surface to mask out dust originating from the dark side of the comet.

This is what we get for May 3, 2015. The ESA/NAVCAM image is taken verbatim from the Rosetta/blog.

Comparison of homogeneous dust model with ESA/NAVCAM Rosetta images.
Comparison of homogeneous dust mode (left panel)l with ESA/NAVCAM Rosetta images. (C) Left panel: Tobias Kramer and Matthias Noack 2015. Right panel: (C) ESA/NAVCAM team CC BY-SA 3.0 IGO, link see text.

Read more about the physics and results in our arxiv article T. Kramer et al.: Homogeneous Dust Emission and Jet Structure near Active Cometary Nuclei: The Case of 67P/Churyumov-Gerasimenko (submitted for publication) and grab the code to compute your own dust trajectories with OpenCL at

The shape of the universe

The following post is contributed by Peter Kramer.

hyperbolic dodecahedron
Shown are two faces of a hyberbolic dodecahedron.
The red line from the family of shortest lines (geodesics) connects both faces. Adapted from CRM Proceedings and Lecture Notes (2004), vol 34, p. 113, by Peter Kramer.

The new Planck data on the cosmic microwave background (CMB) has come in. For cosmic topology, the data sets contain interesting information related to the size and shape of the universe. The curvature of the three-dimensional space leads to a classification into hyperbolic, flat, or spherical cases. Sometimes in popular literature, the three cases are said to imply an inifinite (hyperbolic, flat) or finite (spherical) size of the universe. This statement is not correct. Topology supports a much wider zoo of possible universes. For instance, there are finite hyperbolic spaces, as depicted in the figure (taken from Group actions on compact hyperbolic manifolds and closed geodesics, arxiv version). The figure also shows the resulting geodesics, which is the path of light through such a hyperbolic finite sized universe. The start and end-points must be identified and lead to smooth connection.

Recent observational data seem to suggest a spherical space. Still, it does not resolve the issue of the size of the universe.
Instead of a fully filled three-sphere, already smaller parts of the sphere can be closed topologically and thus lead to a smaller sized universe. A systematic exploration of such smaller but still spherical universes is given in my recent article
Topology of Platonic Spherical Manifolds: From Homotopy to Harmonic Analysis.
In physics, it is important to give specific predictions for observations of the topology, for instance by predicting the ratio of the different angular modes of the cosmic microwave background. It is shown that this is indeed the case and for instance in a cubic (still spherical!) universe, the ratio of 4th and 6th multipole order squared are tied together in the proportion 7 : 4, see Table 5. On p. 35 of ( the Planck collaboration article) the authors call for models yielding such predictions as possible explanations for the observed anisotropy and the ratio of high and low multipole moments.

When two electrons collide. Visualizing the Pauli blockade.

The upper panel shows two (non-interacting) electrons approaching with small relative momenta, the lower panel with larger relative momenta.
The upper panel shows two electrons with small relative momenta colliding, in the lower panel with larger relative momenta.

From time to time I get asked about the implications of the Pauli exclusion principle for quantum mechanical wave-packet simulations.
I start with the simplest antisymmetric case: a two particle state given by the Slater determinant of two Gaussian wave packets with perpendicular directions of the momentum:
φa(x,y)=e-[(x-o)2+(y-o)2]/(2a2)-ikx+iky and φb(x,y)=e-[(x+o)2+(y-o)2]/(2a2)+ikx+iky
This yields the two-electron wave function
The probability to find one of the two electrons at a specific point in space is given by integrating the absolute value squared wave function over one coordinate set.
The resulting single particle density (snapshots at specific values of the displacement o) is shown in the animation for two different values of the momentum k (we assume that both electrons are in the same spin state).
For small values of k the two electrons get close in phase space (that is in momentum and position). The animation shows how the density deviates from a simple addition of the probabilities of two independent electrons.
If the two electrons differ already by a large relative momentum, the distance in phase space is large even if they get close in position space. Then, the resulting single particle density looks similar to the sum of two independent probabilities.
The probability to find the two electrons simultaneously at the same place is zero in both cases, but this is not directly visible by looking at the single particle density (which reflects the probability to find any of the electrons at a specific position).
For further reading, see this article [arxiv version].

The impact of scientific publications – some personal observations

I will resume posting about algorithm development for computational physics. To put these efforts in a more general context, I start with some observation about the current publication ranking model and explore alternatives and supplements in the next posts.

Solvey congress 19...
Solvey congress 1970, many well-known nuclear physicists are present, including Werner Heisenberg.

Working in academic institutions involves being part of hiring committees as well as being assessed by colleagues to measure the impact of my own and other’s scientific contributions.
In the internet age it has become common practice to look at various performance indices, such as the h-index, number of “first author” and “senior author” articles. Often it is the responsibility of the applicant to submit this data in electronic spreadsheet format suitable for an easy ranking of all candidates. The indices are only one consideration for the final decision, albeit in my experience an important one due to their perceived unbiased and statistical nature. Funding of whole university departments and the careers of young scientists are tied to the performance indices.

I did reflect about the usefulness of impact factors while I collected them for various reports, here are some personal observations:

  1. Looking at the (very likely rather incomplete) citation count of my father I find it interesting that for instance a 49 year old contribution by P Kramer/M Moshinsky on group-theoretical methods for few-body systems gains most citations per year after almost 5 decades. This time-scale is well beyond any short-term hiring or funding decisions based on performance indices. From colleagues I hear about similar cases.
  2. A high h-index can be a sign of a narrow research field, since the h-index is best built up by sticking to the same specialized topic for a long time and this encourages serialised publications. I find it interesting that on the other hand important contributions have been made by people working outside the field to which they contributed. The discovery of three-dimensional quasicrystals discussed here provides a good example. The canonical condensed matter theory did not envision this paradigmatic change, rather the study of group theoretical methods in nuclear physics provided the seeds.
  3. The full-text search provided by the search engines offers fascinating options to scan through previously forgotten chapters and books, but it also bypasses the systematic classification schemes previously developed and curated by colleagues in mathematics and theoretical physics. It is interesting to note that for instance the AMS short reviews are not done anonymously and most often are of excellent quality. The non-curated search on the other hand leads to a down-ranking of books and review articles, which contain a broader and deeper exposition of a scientific topic. Libraries with real books grouped by topics are deserted these days, and online services and expert reviews did in general not gain a larger audience or expert community to write reports. One exception might be the public discussion of possible scientific misconduct and retracted publications.
  4. Another side effect: searching the internet for specific topics diminishes the opportunity to accidentally stumble upon an interesting article lacking these keywords, for instance by scanning through a paper volume of a journal while searching for a specific article. I recall that many faculty members went every monday to the library and looked at all the incoming journals to stay up-to-date about the general developments in physics and chemistry. Today we get email alerts about citation counts or specific subfields, but no alert contains a suggestion what other article might pick our intellectual curiosity – and looking at the rather stupid shopping recommendations generated by online-warehouses I don’t expect this to happen anytime soon.
  5. On a positive note: since all text sources are treated equally, no “high-impact journals” are preferred. In my experience as a referee for journals of all sorts of impact numbers, the interesting contributions are not necessarily published or submitted to highly ranked journals.

To sum up, the assessment of manuscripts, contribution of colleagues, and of my own articles requires humans to read them and to process them carefully – all of this takes a lot of time and consideration. It can take decades before publications become alive and well cited. Citation counts of the last 10 years can be poor indicators for the long-term importance of a contribution. Counting statistics provides some gratification by showing immediate interest and are the (less personal) substitute for the old-fashioned postcards requesting reprints. People working in theoretical physics are often closely related by collaboration distance, which provides yet another (much more fun!) factor. You can check your Erdos number (mine is 4) or Einstein number (3, thanks to working with Marcos Moshinsky) at the AMS website.

How to improve the current situation and maintain a well curated and relevant library of scientific contributions – in particular involving numerical results and methods? One possibility is to make a larger portion of the materials surrounding a publication available. In computational physics it is of interest to test and recalculate published results shown in journals. The platform is in my view a best practice case for providing supplemental information on demand and to ensure a long-term availability and usefulness of scientific results by keeping the computational tools running and updated. It is for me a pleasure and excellent experience to work with the team around nanohub to maintain our open quantum dynamics tool. Another way is to provide and test background materials in research blogs. I will try out different approaches with the next posts.

Better than Slater-determinants: center-of-mass free basis sets for few-electron quantum dots

Error analysis of eigenenergies of the standard configuration interaction (CI) method (right black lines). The left colored lines are obtained by explicitly handling all spurious states.
Error analysis of eigenenergies of the standard configuration interaction (CI) method (right black lines). The left colored lines are obtained by explicitly handling all spurious states. The arrows point out the increasing error of the CI approach with increasing center-of-mass admixing.

Solving the interacting many-body Schrödinger equation is a hard problem. Even restricting the spatial domain to a two-dimensions plane does not lead to analytic solutions, the trouble-makers are the mutual particle-particle interactions. In the following we consider electrons in a quasi two-dimensional electron gas (2DEG), which are further confined either by a magnetic field or a harmonic oscillator external confinement potential. For two electrons, this problem is solvable for specific values of the Coulomb interaction due to a hidden symmetry in the Hamiltonian, see the review by A. Turbiner and our application to the two interacting electrons in a magnetic field.

For three and more electrons (to my knowledge) no analytical solutions are known. One standard computational approach is the configuration interaction (CI) method to diagonalize the Hamiltonian in a variational trial space of Slater-determinantal states. Each Slater determinant consists of products of single-particle orbitals. Due to computer resource constraints,  only a certain number of Slater determinants can be included in the basis set. One possibility is to include only trial states up to certain excitation level of the non-interacting problem.

The usage of Slater-determinants as CI basis-set introduce severe distortions in the eigenenergy spectrum due to the intrusion of spurious states, as we will discuss next. Spurious states have been extensively analyzed in the few-body problems arising in nuclear physics but have rarely been mentioned in solid-state physics, where they do arise in quantum-dot systems. The basic defect of the Slater-determinantal CI method is that it brings along center-of-mass excitations. During the diagonalization, the center-of-mass excitations occur along with the Coulomb-interaction and lead to an inflated basis size and also with a loss of precision for the eigenenergies of the excited states. Increasing the basis set does not uniformly reduce the error across the spectrum, since the enlarged CI basis set brings along states of high center-of-mass excitations. The cut-off energy then restricts the remaining basis size for the relative part.

The cleaner and leaner way is to separate the center-of-mass excitations from the relative-coordinate excitations, since the Coulomb interaction only acts along the relative coordinates. In fact, the center-of-mass part can be split off and solved analytically in many cases. The construction of the relative-coordinate basis states requires group-theoretical methods and is carried out for four electrons here Interacting electrons in a magnetic field in a center-of-mass free basis (arxiv:1410.4768). For three electrons, the importance of a spurious state free basis set was emphasized by R Laughlin and is a design principles behind the Laughlin wave function.

Slow or fast transfer: bottleneck states in light-harvesting complexes

Light-harvesting complex II, crystal structure 1RWT from Liu et al (Nature 2004, vol. 428, p. 287), rendered with VMD. The labels denote the designation of the chlorophyll sites (601-614). Chlorophylls 601,605-609 are of chlorophyll b type, the others of type a.

In the previous post I described some of the computational challenges for modeling energy transfer in the light harvesting complex II (LHCII) found in spinach. Here, I discuss the results we have obtained for the dynamics and choreography of excitonic energy transfer through the chlorophyll network. Compared to the Fenna-Matthews-Olson complex, LHCII has twice as many chlorophylls per monomeric unit (labeled 601-614 with chlorophyll a and b types).
Previous studies of exciton dynamics had to stick to simple exponential decay models based on either Redfield or Förster theory for describing the transfer from the Chl b to the Chl a sites. The results are not satisfying and conclusive, since depending on the method chosen the transfer time differs widely (tens of picoseconds vs picoseconds!).

Exciton dynamics in LHCII.
Exciton dynamics in LHCII computed with various methods. HEOM denotes the most accurate method, while Redfield and Förster approximations fail.

To resolve the discrepancies between the various approximate methods requires a more accurate approach. With the accelerated HEOM at hand, we revisited the problem and calculated the transfer rates. We find slower rates than given by the Redfield expressions. A combined Förster-Redfield description is possible in hindsight by using HEOM to identify a suitable cut-off parameter (Mcr=30/cm in this specific case).

Since the energy transfer is driven by the coupling of electronic degrees of freedom to vibrational ones, it of importance to assess how the vibrational mode distribution affects the transfer. In particular it has been proposed that specifically tuned vibrational modes might promote a fast relaxation. We find no strong impact of such modes on the transfer, rather we see (independent of the detailed vibrational structure) several bottleneck states, which act as a transient reservoir for the exciton flux. The details and distribution of the bottleneck states strongly depends on the parameters of the electronic couplings and differs for the two most commonly discussed LHCII models proposed by Novoderezhkin/Marin/van Grondelle and Müh/Madjet/Renger – both are considered in the article Scalable high-performance algorithm for the simulation of exciton-dynamics. Application to the light harvesting complex II in the presence of resonant vibrational modes (collaboration of Christoph Kreisbeck, Tobias Kramer, Alan Aspuru-Guzik).
Again, the correct assignment of the bottleneck states requires to use HEOM and to look beyond the approximate rate equations.

High-performance OpenCL code for modeling energy transfer in spinach

With increasing computational power of massively-parallel computers, a more accurate modeling of the energy-transfer dynamics in larger and more complex photosynthetic systems (=light-harvesting complexes) becomes feasible – provided we choose the right algorithms and tools.

OpenCL cross platform performance for tracking energy-transfer in the light-harvesting complex II found in spinach.
OpenCL cross platform performance for tracking energy-transfer in the light-harvesting complex II found in spinach, see Fig. 1 in the article . Shorter values show higher perfomance. The program code was originally written for massively-parallel GPUs, but performs also well on the AMD opteron setup. The Intel MIC OpenCL variant does not reach the peak performance (a different data-layout seems to be required to benefit from autovectorization).

The diverse character of hardware found in high-performance computers (hpc) seemingly requires to rewrite program code from scratch depending if we are targeting multi-core CPU systems, integrated many-core platforms (Xeon PHI/MIC), or graphics processing units (GPUs).

To avoid the defragmentation of our open quantum-system dynamics workhorse (see the previous GPU-HEOM posts) across the various hpc-platforms, we have transferred the GPU-HEOM CUDA code to the Open Compute Language (OpenCL). The resulting QMaster tool is described in our just published article Scalable high-performance algorithm for the simulation of exciton-dynamics. Application to the light harvesting complex II in the presence of resonant vibrational modes (collaboration of Christoph Kreisbeck, Tobias Kramer, Alan Aspuru-Guzik). This post details the computational challenges and lessons learnt, the application to the light-harvesting complex II found in spinach will be the topic of the next post.

In my experience, it is not uncommon to develop a nice GPU application for instance with CUDA, which later on is scaled up to handle bigger problem sizes. With increasing problem size also the memory demands increase and even the 12 GB provided by the Kepler K40 are finally exhausted. Upon reaching this point, two options are possible: (a) to distribute the memory across different GPU devices or (b) to switch to architectures which provide more device-memory. Option (a) requires substantial changes to existing program code to manage the distributed memory access, while option (b) in combination with OpenCL requires (in the best case) only to adapt the kernel-launch configuration to the different platforms.

The OpenCL device fission extension allows to investigate the scaling of the QMaster code with the number of CPU cores. We observe a linear scaling up to 48 cores.
The OpenCL device fission extension allows us to investigate the scaling of the QMaster code with the number of CPU cores. We observe a linear scaling up to 48 cores.

QMaster implements an extension of the hierarchical equation of motion (HEOM) method originally proposed by Tanimura and Kubo, which involves many (small) matrix-matrix multiplications. For GPU applications, the usage of local memory and the optimal thread-grids for fast matrix-matrix multiplications have been described before and are used in QMaster (and the publicly available GPU-HEOM tool on While for GPUs the best performance is achieved using shared/local memory and assign one thread to each matrix element, the multi-core CPU OpenCL variant performs better with fewer threads, but getting more work per thread done. Therefore we use for the CPU machines a thread-grid which computes one complete matrix product per thread (this is somewhat similar to following the “naive” approach given in NVIDIA’s OpenCL programming guide, chapter 2.5). This strategy did not work very well for the Xeon PHI/MIC OpenCL case, which requires additional data structure changes, as we learnt from discussions with the distributed algorithms and hpc experts in the group of Prof. Reinefeld at the Zuse-Institute in Berlin.
The good performance and scaling across the 64 CPU AMD opteron workstation positively surprised us and lays the groundwork to investigate the validity of approximations to the energy-transfer equations in the spinach light-harvesting system, the topic for the next post.

Flashback to the 80ies: filling space with the first quasicrystals

This post provides a historical and conceptional perspective for the theoretical discovery of non-periodic 3d space-fillings by Peter Kramer, later experimentally found and now called quasicrystals. See also these previous blog entries for more quasicrystal references and more background material here.
The following post is written by Peter Kramer.

Star extension of the pentagon. From Kramer 1982.
Star extension of the pentagon. Fig 1 from
Non-periodic central space filling with icosahedral symmetry using copies of seven elementary cells by Peter Kramer, Acta Cryst. (1982). A38, 257-264

When sorting out old texts and figures from 1981 of mine published in Non-periodic central space filling with icosahedral symmetry using copies of seven elementary cells, Acta Cryst. (1982). A38, 257-264), I came across the figure of a regular pentagon of edge length L, which I denoted as p(L). In the left figure its red-colored edges are star-extending up to their intersections. Straight connection of these intersection points creates a larger blue pentagon. Its edges are scaled up by τ2, with τ the golden section number, so the larger pentagon we call p(τ2 L). This blue pentagon is composed of the old red one plus ten isosceles triangles with golden proportion of their edge length. Five of them have edges t1(L): (L, τ L, τ L), five have edges t2(L): (τ L,τ L, τ2 L). We find from Fig 1 that these golden triangles may be composed face-to-face into their τ-extended copies as t1(τ L) = t1(L) + t2(L) and t2(τ L) = t1(L) + 2 t2(L).

Moreover we realize from the figure that also the pentagon p(τ2 L) can be composed from golden triangles as p(τ2 L) = t1(τ L) + 3 t2(τ L) = 4 t1(L) + 7 t2(L).

This suggests that the golden triangles t1,t2 can serve as elementary cells of a triangle tiling to cover any range of the plane and provide the building blocks of a quasicrystal. Indeed we did prove this long range property of the triangle tiling (see Planar patterns with fivefold symmetry as sections of periodic structures in 4-space).

An icosahedral tiling from star extension of the dodecahedron.
The star extension of the dodecahedron.
Star extension of the dodecahedron d(L) to the icosahedron i(τ2L) and further to d(τ3L) and i(τ5L) shown in Fig 3 of the 1982 paper. The vertices of these polyhedra are marked by filled circles; extensions of edges are shown except for d(L).

In the same paper, I generalized the star extension from the 2D pentagon to the 3D dodecahedron d(L) of edge length L in 3D (see next figure) by the following prescription:

  • star extend the edges of this dodecahedron to their intersections
  • connect these intersections to form an icosahedron

The next star extension produces a larger dodecahedron d(τ3L), with edges scaled by τ3. In the composition of the larger dodecahedron I found four elementary polyhedral shapes shown below. Even more amusing I also resurrected the paper models I constructed in 1981 to actually demonstrate the complete space filling!
These four polyhedra compose their copies by scaling with τ3. As for the 2D case arbitrary regions of 3D can be covered by the four tiles.

Elementary cells The paper models I built in 1981 are still around and complete enough to fill the 3D space.
The four elementary cells shown in the 1982 paper, Fig. 4. The four shapes are named dodecahedron (d) skene (s), aetos (a) and tristomos (t). The paper models from 1981 are still around in 2014 and complete enough to fill the 3D space without gaps. You can spot all shapes (d,s,a,t) in various scalings and they all systematically and gapless fill the large dodecahedron shell on the back of the table.

The only feature missing for quasicrystals is aperiodic long-range order which eventually leads to sharp diffraction patterns of 5 or 10 fold point-symmetries forbidden for the old-style crystals. In my construction shown here I strictly preserved central icosahedral symmetry. Non-periodicity then followed because full icosahedral symmetry and periodicity in 3D are incompatible.

In 1983 we found a powerful alternative construction of icosahedral tilings, independent of the assumption of central symmetry: the projection method from 6D hyperspace (On periodic and non-periodic space fillings of Em obtained by projection) This projection establishes the quasiperiodicity of the tilings, analyzed in line with the work Zur Theorie der fast periodischen Funktionen (i-iii) of Harald Bohr from 1925 , as a variant of aperiodicity (more background material here).

Tutorial #1: simulate 2d spectra of light-harvesting complexes with GPU-HEOM @ nanoHub

The computation and prediction of two-dimensional (2d) echo spectra of photosynthetic complexes is a daunting task and requires enormous computational resources – if done without drastic simplifications. However, such computations are absolutely required to test and validate our understanding of energy transfer in photosyntheses. You can find some background material in the recently published lecture notes on Modelling excitonic-energy transfer in light-harvesting complexes (arxiv version) of the Latin American School of Physics Marcos Moshinsky.
The ability to compute 2d spectra of photosynthetic complexes without resorting to strong approximations is to my knowledge an exclusive privilege of the Hierarchical Equations of Motion (HEOM) method due to its superior performance on massively-parallel graphics processing units (GPUs). You can find some background material on the GPU performance in the two conference talks Christoph Kreisbeck and I presented at the GTC 2014 conference (recored talk, slides) and the first nanoHub users meeting.

GPU-HEOM 2d spectra computed at nanohub

GPU-HEOM 2d spectra computed at nanohubComputed 2d spectra for the FMO complex for 0 picosecond delay time (upper panel) and 1 ps (lower panel). The GPU-HEOM computation takes about 40 min on the platform and includes all six Liouville pathways and averages over 4 spatial orientations.
  1. login on (it’s free!)
  2. switch to the gpuheompop tool
  3. click the Launch Tool button (java required)
  4. for this tutorial we use the example input for “FMO coherence, 1 peak spectral density“.
    You can select this preset from the Example selector.
  5. we stick with the provided Exciton System parameters and only change the temperature to 77 K to compare the results with our published data.
  6. in the Spectral Density tab, leave all parameters at the the suggested values
  7. to compute 2d spectra, switch to the Calculation mode tab
  8. for compute: choose “two-dimensional spectra“. This brings up input-masks for setting the directions of all dipole vectors, we stick with the provided values. However, we select Rotational averaging: “four shot rotational average” and activate all six Liouville pathways by setting ground st[ate] bleach reph[asing , stim[ulated] emission reph[asing], and excited st[ate] abs[orption] to yes, as well as their non-rephasing counterparts (attention! this might require to resize the input-mask by pulling at the lower right corner)
  9. That’s all! Hit the Simulate button and your job will be executed on the carter GPU cluster at Purdue university. The simulation takes about 40 minutes of GPU time, which is orders of magnitude faster than any other published method with the same accuracy. You can close and reopen your session in between.
  10. Voila: your first FMO spectra appears.
  11. Now its time to change parameters. What happens at higher temperature?
  12. If you like the results or use them in your work for comparison, we (and the folks at nanoHub who generously develop and provide the nanoHub platform and GPU computation time) appreciate a citation. To make this step easy, a DOI number and reference information is listed at the bottom of the About tab of the tool-page.

With GPU-HEOM we and now you (!) can not only calculate the 2d echo spectra of the Fenna-Matthews-Olson (FMO) complex, but also reveal the strong link between the continuum part of the vibrational spectral density and the prevalence of long-lasting electronic coherences as written in my previous posts

GPU and cloud computing conferences in 2014

Two conferences are currently open for registration related to GPU and cloud computing. I will be attending and presenting at both, please email me if you want to get in touch at the meetings.

Oscillations in two-dimensional spectroscopy

Transition from electronic coherence to a vibrational mode.
Transition from electronic coherence to a vibrational mode made visible by Short Time Fourier Transform (see text).

Over the last years, a debate is going on whether the observation of long lasting oscillatory signals in two-dimensional spectra are reflecting vibrational of electronic coherences and how the functioning of the molecule is affected. Christoph Kreisbeck and I have performed a detailed theoretical analysis of oscillations in the Fenna-Matthews-Olson (FMO) complex and in a model three-site system. As explained in a previous post, the prerequisites for long-lasting electronic coherences are two features of the continuous part of the vibronic mode density are: (i) a small slope towards zero frequency, and (ii) a coupling to the excitonic eigenenergy (ΔE) differences for relaxation. Both requirements are met by the mode density of the FMO complex and the computationally demanding calculation of two-dimensional spectra of the FMO complex indeed predicts long-lasting cross-peak oscillations with a period matching h/ΔE at room temperature (see our article Long-Lived Electronic Coherence in Dissipative Exciton-Dynamics of Light-Harvesting Complexes or arXiv version). The persistence of oscillations is stemming from a robust mechanism and does not require adding any additional vibrational modes at energies ΔE (the general background mode density is enough to support the relaxation toward a thermal state). But what happens if in addition to the background vibronic mode density additional vibronic modes are placed within the vicinity of the frequencies related electronic coherences? This fine-tuning model is sometimes discussed in the literature as an alternative mechanism for long-lasting oscillations of vibronic nature. Again, the answer requires to actually compute two-dimensional spectra and to carefully analyze the possible chain of laser-molecule interactions. Due to the special way two-dimensional spectra are measured, the observed signal is a superposition of at least three pathways, which have different sensitivity for distinguishing electronic and vibronic coherences. Being a theoretical physicists now pays off since we have calculated and analyzed the three pathways separately (see our recent publication Disentangling Electronic and Vibronic Coherences in Two-Dimensional Echo Spectra or arXiv version). One of the pathways leads to an enhancement of vibronic signals, while the combination of the remaining two diminishes electronic coherences otherwise clearly visible within each of them. Our conclusion is that estimates of decoherence times from two-dimensional spectroscopy might actually underestimate the persistence of electronic coherences, which are helping the transport through the FMO network. The fine tuning and addition of specific vibrational modes leaves it marks at certain spots of the two-dimensional spectra, but does not destroy the electronic coherence, which is still there as a Short Time Fourier Transform of the signal reveals.

Computational physics on GPUs: writing portable code

GPU-HEOM code comparison for various hardware.
Runtime in seconds for our GPU-HEOM code on various hardware and software platforms.

I am preparing my presentation for the simGPU meeting next week in Freudenstadt, Germany, and performed some benchmarks.
In the previous post I described how to get an OpenCL program running on a smartphone with GPU. By now Christoph Kreisbeck and I are getting ready to release our first smartphone GPU app for exciton dynamics in photosynthetic complexes, more about that in a future entry.
Getting the same OpenCL kernel running on laptop GPUs, workstation GPUs and CPUs, and smartphones/tablets is a bit tricky, due to different initialisation procedures and the differences in the optimal block sizes for the thread grid. In addition on a smartphone the local memory is even smaller than on a desktop GPU and double-precision floating point support is missing. The situation reminds me a bit of the “earlier days” of GPU programming in 2008.
Besides being a proof of concept, I see writing portable code as a sort of insurance with respect to further changes of hardware (however always with the goal to stick with the massively parallel programming paradigm). I am also amazed how fast smartphones are gaining computational power through GPUs!
Same comparison for smaller memory consumption. Note the drop in OpenCL performance for the NVIDIA K20c GPU.
Same comparison for smaller memory consumption. Note the drop in OpenCL performance for the NVIDIA K20c GPU.

Here some considerations and observations:

  1. Standard CUDA code can be ported to OpenCL within a reasonable time-frame. I found the following resources helpful:
    AMDs porting remarks
    Matt Scarpinos OpenCL blog
  2. The comparison of OpenCL vs CUDA performance for the same algorithm can reveal some surprises on NVIDIA GPUs. While on our C2050 GPU OpenCL works a bit faster for the same problem compared to the CUDA version, on a K20c system for certain problem sizes the OpenCL program can take several times longer than the CUDA code (no changes in the basic algorithm or workgroup sizes).
  3. The comparison with a CPU version running on 8 cores of the Intel Xeon machine is possible and shows clearly that the GPU code is always faster, but requires a certain minimal systems size to show its full performance.
  4. I am looking forward to running the same code on the Intel Xeon Phi systems now available with OpenCL drivers, see also this blog.

[Update June 22, 2013: I updated the graphs to show the 8-core results using Intels latest OpenCL SDK. This brings the CPU runtimes down by a factor of 2! Meanwhile I am eagerly awaiting the possibility to run the same code on the Xeon Phis…]

Computational physics on the smartphone GPU

Screenshot of the interacting many-body simulation on the Nexus 4 GPU.
Screenshot of the interacting many-body simulation on the Nexus 4 GPU.

[Update August 2013: Google has removed the OpenCL library with Android 4.3. You can find an interesting discussion here. Google seems to push for its own renderscript protocol. I will not work with renderscript since my priorities are platform independency and sticking with widely adopted  standards to avoid fragmentation of my code basis.]
I recently got hold of a Nexus 4 smartphone, which features a GPU (Qualcomm Adreno 320) and conveniently ships with already installed OpenCL library. With minimal changes I got the previously discussed many-body program code related to the fractional quantum Hall effect up and running. No unrooting of the phone is required to run the code example. Please use the following recipe at your own risk, I don’t accept any liabilities. Here is what I did:

  1. Download and unpack the Android SDK from google for cross-compilation (my host computer runs Mac OS X).
  2. Download and unpack the Android NDK from google to build minimal C/C++ programs without Java (no real app).
  3. Install the standalone toolchain from the Android NDK. I used the following command for my installation:

    /home/tkramer/android-ndk-r8d/build/tools/ \
  4. Put the OpenCL programs and source code in an extra directory, as described in my previous post
  5. Change one line in the cl.hpp header: instead of including <GL/gl.h> change to <GLES/gl.h>. Note: I am using the “old” cl.hpp bindings 1.1, further changes might be required for the newer bindings, see for instance this helpful blog
  6. Transfer the OpenCL library from the phone to a subdirectory lib/ inside your source code. To do so append the path to your SDK tools and use the adb command:

    export PATH=/home/tkramer/adt-bundle-mac-x86_64-20130219/sdk/platform-tools:$PATH
    adb pull /system/lib/
  7. Cross compile your program. I used the following script, please feel free to provide shorter versions. Adjust the include directories and library directories for your installation.

    rm plasma_disk_gpu
    /home/tkramer/android-ndk-standalone/bin/arm-linux-androideabi-g++ -v -g \
    -I. \
    -I/home/tkramer/android-ndk-standalone/include/c++/4.6 \
    -I/home/tkramer/android-ndk-r8d/platforms/android-5/arch-arm/usr/include \
    -Llib \
    -march=armv7-a -mfloat-abi=softfp -mfpu=neon \
    -fpic -fsigned-char -fdata-sections -funwind-tables -fstack-protector \
    -ffunction-sections -fdiagnostics-show-option -fPIC \
    -fno-strict-aliasing -fno-omit-frame-pointer -fno-rtti \
    -lOpenCL \
    -o plasma_disk_gpu plasma_disk.cpp
  8. Copy the executable to the data dir of your phone to be able to run it. This can be done without rooting the phone with the nice SSHDroid App, which by defaults transfers to /data . Don’t forget to copy the kernel .cl files:

    scp -P 2222 root@192.168.0.NNN:
    scp -P 2222 plasma_disk_gpu root@192.168.0.NNN:
  9. ssh into your phone and run the GPU program:
    ssh -p 2222 root@192.168.0.NNN
    ./plasma_disk_gpu 64 16
  10. Check the resulting data files. You can copy them for example to the Download path of the storage and use the gnuplot (droidplot App) to plot them.

A short note about runtimes. On the Nexus 4 device the program runs for about 12 seconds, on a MacBook Pro with NVIDIA GT650M it completes in 2 seconds (in the example above the equations of motion for 16*64=1024 interacting particles are integrated). For larger particle numbers the phone often locks up.

An alternative way to transfer files to the device is to connect via USB cable and to install the Android Terminal Emulator app. Next

cd /data/data/jackpal.androidterm
mkdir gpu
chmod 777 gpu

On the host computer use adb to transfer the compiled program and the .cl kernel and start a shell to run the kernel

adb push /data/data/jackpal.androidterm/gpu/
adb push plasma_disk_gpu /data/data/jackpal.androidterm/gpu/

You can either run the program within the terminal emulator or use the adb shell

adb shell
cd /data/data/jackpal.androidterm/gpu/
./plasma_disk_gpu 64 16

Let’s see in how many years todays desktop GPUs can be found in smartphones and which computational physics codes can be run!

Computational physics & GPU programming: exciton lab for light-harvesting complexes (GPU-HEOM) goes live on

User interface of the GPU-HEOM tool for light-harvesting complexes at
User interface of the GPU-HEOM tool for light-harvesting complexes at

Christoph Kreisbeck and I are happy to announce the public availability of the Exciton Dynamics Lab for Light-
Harvesting Complexes (GPU-HEOM) hosted on You need to register a user account (its free), and then you are ready to use GPU-HEOM for the Frenkel exciton model of light harvesting complexes. In release 1.0 we support

  • calculating population dynamics 
  • tracking coherences between two eigenstates
  • obtaining absorption spectra
  • two-dimensional echo spectra (including excited state absorption)
  • … and all this for general vibronic spectral densities parametrized by shifted Lorentzians.

I will post some more entries here describing how to use the tool for understanding how the spectral density affects the lifetime of electronic coherences (see also this blog entry).
In the supporting document section you find details of the implemented method and the assumptions underlying the tool. We are appreciating your feedback for further improving the tool.
We are grateful for the support of Prof. Gerhard Klimeck, Purdue University, director of the Network for Computational Nanotechnology to bring GPU computing to nanohub (I believe our tool is the first GPU enabled one at nanohub).

If you want to refer to the tool you can cite it as:

Christoph Kreisbeck; Tobias Kramer (2013), “Exciton Dynamics Lab for Light-Harvesting Complexes (GPU-HEOM),” (DOI:10.4231/D3RB6W248).

and you find further references in the supporting documentation.

I very much encourage my colleagues developing computer programs for theoretical physics and chemistry to make them available on platforms such as In my view, it greatly facilitates the comparison of different approaches and is the spirit of advancing science by sharing knowledge and providing reproducible data sets.

Good or bad vibrations for the Fenna-Matthews-Olson complex?

Electronic and vibronic coherences in the FMO complex using GPU HEOM by Kreisbeck and Kramer
Time-evolution of the coherence for the FMO complex (eigenstates 1 and 5 ) calculated with GPU-HEOM by Kreisbeck and Kramer, J. Phys. Chem Lett. 3, 2828 (2012).

Due to its known structure and relative simplicity, the Fenna-Matthews-Olson complex of green sulfur bacteria provides an interesting test-case for our understanding of excitonic energy transfer in a light-harvesting complex.

The experimental pump-probe spectra (discussed in my previous post catching and tracking light: following the excitations in the Fenna-Matthews-Olson complex) show long-lasting oscillatory components and this finding has been a puzzle for theoretician and led to a refinement of the well-established models. These models show a reasonable agreement with the data and the rate equations explain the relaxation and transfer of excitonic energy to the reaction center.

However, the rate equations are based on estimates for the relaxation and dephasing rates. As Christoph Kreisbeck and I discuss in our article Long-Lived Electronic Coherence in Dissipative Exciton-Dynamics of Light-Harvesting Complexes (arxiv version), an exact calculation with GPU-HEOM following the best data for the Hamiltonian allows one to determine where the simple approach is insufficient and to identify a key-factor supporting electronic coherence:

Wendling spectral density for FMO complex
Important features in the spectral density of the FMO complex related to the persistence of cross-peak oscillations in 2d spectra.

It’s the vibronic spectral density – redrawn (in a different unit convention, multiplied by ω2)  from the article by M. Wendling from the group of Prof. Rienk van Grondelle. We did undertake a major effort to proceed in our calculations as close to the measured shape of the spectral density as the GPU-HEOM method allows one. By comparison of results for different forms of the spectral density, we identify how the different parts of the spectral density lead to distinct signatures in the oscillatory coherences. This is illustrated in the figure on the rhs. To get long lasting oscillations and finally to relax, three ingredients are important

  1. a small slope towards zero frequency, which suppresses the pure dephasing.
  2. a high plateau in the region where the exciton energy differences are well coupled. This leads to relaxation.
  3. the peaked structures induce a “very-long-lasting” oscillatory component, which is shown in the first figure. In our analysis we find that this is a persistent, but rather small (<0.01) modulation.

2d spectra are smart objects

FMO spectrum calculated with GPU-HEOM
FMO spectrum calculated with GPU-HEOM for a 3 peak approximation of the measured spectral density, including disorder averaging but no excited state absorption.

The calculation of 2d echo spectra requires considerable computational resources. Since theoretically calculated 2d spectra are needed to check how well theory and experiment coincide, I conclude with showing a typical spectrum we obtain (including static disorder, but no excited state absorption for this example). One interesting finding is that 2d spectra are able to differentiate between the different spectral densities. For example for a a single-peak Drude-Lorentz spectral density (sometimes chosen for computational convenience), the wrong peaks oscillate and the life-time of cross-peak oscillations is short (and becomes even shorter with longer vibronic memory). But this is for the experts only, see the supporting information of our article.

Are vibrations good or bad? Probably both… The pragmatic answer is that the FMO complex lives in an interesting parameter regime. The exact calculations within the Frenkel exciton model do confirm  the well-known dissipative energy transfer picture. But on the other hand the specific spectral density of the FMO complex supports long-lived coherences (at least if the light source is a laser beam), which require considerable theoretical and experimental efforts to be described and measured. Whether the seen coherence has any biological relevance is an entirely different topic… maybe the green-sulfur bacteria are just enjoying a glimpse into Schrödinger’s world of probabilistic uncertainty.

Computational physics & GPU programming: interacting many-body simulation with OpenCL

Trajectories in a two-dimensional interacting plasma simulation, reproducing the density and pair-distribution function of a Laughlin state relevant for the quantum Hall effect. Figure taken from Interacting electrons in a magnetic field: mapping quantum mechanics to a classical ersatz-system.

In the second example of my series on GPU programming for scientists, I discuss a short OpenCL program, which you can compile and run on the CPU and the GPUs of various vendors. This gives me the opportunity to perform some cross-platform benchmarks for a classical plasma simulation. You can expect dramatic (several 100 fold) speed-ups on GPUs for this type of system. This is one of the reasons why molecular dynamics code can gain quite a lot by incorporating the massively parallel-programming paradigm in the algorithmic foundations.

The Open Computing Language (OpenCL) is relatively similar to its CUDA pendant, in practice the setup of an OpenCL kernel requires some housekeeping work, which might make the code look a bit more involved. I have based my interacting electrons calculation of transport in the Hall effect on an OpenCL code. Another examples is An OpenCL implementation for the solution of the time-dependent Schrödinger equation on GPUs and CPUs (arxiv version) by C. Ó Broin and L.A.A. Nikolopoulos.

Now to the coding of a two-dimensional plasma simulation, which is inspired by Laughlin’s mapping of a many-body wave function to an interacting classical ersatz dynamics (for some context see my short review Interacting electrons in a magnetic field: mapping quantum mechanics to a classical ersatz-system on the arxiv).

Continue reading “Computational physics & GPU programming: interacting many-body simulation with OpenCL”

Computational physics & GPU programming: Solving the time-dependent Schrödinger equation

I start my series on the physics of GPU programming by a relatively simple example, which makes use of a mix of library calls and well-documented GPU kernels. The run-time of the split-step algorithm described here is about 280 seconds for the CPU version (Intel(R) Xeon(R) CPU E5420 @ 2.50GHz), vs. 10 seconds for the GPU version (NVIDIA(R) Tesla C1060 GPU), resulting in 28 fold speed-up! On a C2070 the run time is less than 5 seconds, yielding an 80 fold speedup.

autocorrelation function in a uniform force field
Autocorrelation function C(t) of a Gaussian wavepacket in a uniform force field. I compare the GPU and CPU results using the wavepacket code.

The description of coherent electron transport in quasi two-dimensional electron gases requires to solve the Schrödinger equation in the presence of a potential landscape. As discussed in my post Time to find eigenvalues without diagonalization, our approach using wavepackets allows one to obtain the scattering matrix over a wide range of energies from a single wavepacket run without the need to diagonalize a matrix. In the following I discuss the basic example of propagating a wavepacket and obtaining the autocorrelation function, which in turn determines the spectrum. I programmed the GPU code in 2008 as a first test to evaluate the potential of GPGPU programming for my research. At that time double-precision floating support was lacking and the fast Fourier transform (FFT) implementations were little developed. Starting with CUDA 3.0, the program runs fine in double precision and my group used the algorithm for calculating electron flow through nanodevices. The CPU version was used for our articles in Physica Scripta Wave packet approach to transport in mesoscopic systems and the Physical Review B Phase shifts and phase π-jumps in four-terminal waveguide Aharonov-Bohm interferometers among others.
Here, I consider a very simple example, the propagation of a Gaussian wavepacket in a uniform potential V(x,y)=-Fx, for which the autocorrelation function of the initial state
⟨x,y|ψ(t=0)⟩=1/(a√π)exp(-(x2+y2)/(2 a2))
is known in analytic form:
⟨ψ(t=0)|ψ(t)⟩=2a2m/(2a2m+iℏt)exp(-a2F2t2/(4ℏ2)-iF2t3/(24ℏ m)).
Continue reading “Computational physics & GPU programming: Solving the time-dependent Schrödinger equation”

The physics of GPU programming

GPU cluster
Me pointing at the GPU Resonance cluster at SEAS Harvard with 32x448=14336 processing cores. Just imagine how tightly integrated this setup is compared to 3584 quad-core computers. Picture courtesy of Academic Computing, SEAS Harvard.

From discussions I learn that while many physicists have heard of Graphics Processing Units as fast computers, resistance to use them is widespread. One of the reasons is that physics has been relying on computers for a long time and tons of old, well trusted codes are lying around which are not easily ported to the GPU. Interestingly, the adoption of GPUs happens much faster in biology, medical imaging, and engineering.
I view GPU computing as a great opportunity to investigate new physics and my feeling is that todays methods optimized for serial processors may need to be replaced by a different set of standard methods which scale better with massively parallel processors. In 2008 I dived into GPU programming for a couple of reasons:

  1. As a “model-builder” the GPU allows me to reconsider previous limitations and simplifications of models and use the GPU power to solve the extended models.
  2. The turn-around time is incredibly fast. Compared to queues in conventional clusters where I wait for days or weeks, I get back results with 10000 CPU hours compute time the very same day. This in turn further facilitates the model-building process.
  3. Some people complain about the strict synchronization requirements when running GPU codes. In my view this is an advantage, since essentially no messaging overhead exists.
  4. If you want to develop high-performance algorithm, it is not good enough to convert library calls to GPU library calls. You might get speed-ups of about 2-4. However, if you invest the time and develop your own know-how you can expect much higher speed-ups of around 100 times or more, as seen in the applications I discussed in this blog before.

This summer I will lecture about GPU programming at several places and thus I plan to write a series of GPU related posts. I do have a complementary background in mathematical physics and special functions, which I find very useful in relation with GPU programming since new physical models require a stringent mathematical foundation and numerical studies.

Catching and tracking light: following the excitations in the Fenna-Matthews-Olson complex

Peak oscillations in the FMO complex calculated using GPU-HEOM
The animation shows how peaks in the 2d echo-spectra are oscillation and changing for various delay times. For a full explanation, see Modelling of Oscillations in Two-Dimensional Echo-Spectra of the Fenna-Matthews-Olson Complex by B.Hein, C. Kreisbeck, T. Kramer, M. Rodríguez, New J. of Phys., 14, 023018 (2012), open access.

Efficient and fast transport of electric current is a basic requirement for the functioning of nanodevices and biological systems. A neat example is the energy-transport of a light-induced excitation in the Fenna-Matthews-Olson complex of green sulfur bacteria. This process has been elucidated by pump-probe spectroscopy. The resulting spectra contain an enormous amount of information about the couplings of the different pigments and the pathways taken by the excitation. The basic guide to a 2d echo-spectrum is as follows:
You can find peaks of high intensity along the diagonal line which are roughly representing a more common absorption spectrum.  If you delay the pump and probe pulses by several picoseconds, you will find a new set of peaks at a horizontal axis which indicates that energy of the excitation gets redistributed and the system relaxes and transfers part of the energy to vibrational motion. This process is nicely visible in the spectra recorded by Brixner et al.
A lot of excitement and activity on photosynthetic complexes was triggered by experiments of Engel et al showing that besides the relaxation process also periodic oscillations are visible in the oscillations for more than a picosecond.

What is causing the oscillations in the peak amplitudes of 2d echo-spectra in the Fenna-Matthews Olson complex?

A purely classical transport picture should not show such oscillations and the excitation instead hops around the complex without interference. Could the observed oscillations point to a different transport mechanism, possibly related to the quantum-mechanical simultaneous superposition of several transport paths?

The initial answer from the theoretical side was no, since within simplified models the thermalization occurs fast and without oscillations. It turned out that the simple calculations are a bit too simplistic to describe the system accurately and exact solutions are required. But exact solutions (even for simple models) are difficult to obtain. Known exact methods such as DMRG work only reliable at very low temperatures (-273 C), which are not directly applicable to biological systems. Other schemes use the famous path integrals but are too slow to calculate the pump-probe signals.

Our contribution to the field is to provide an exact computation of the 2d echo-spectra at the relevant temperatures and to see the difference to the simpler models in order to quantify how much coherence is preserved. From the method-development the computational challenge is to speed-up the calculations several hundred times in order to get results within days of computational run-time. We did achieve this by developing a method which we call GPU-hierarchical equations of motion (GPU-HEOM). The hierarchical equations of motions are a nice scheme to propagate a density matrix under consideration of non-Markovian effects and strong couplings to the environment. The HEOM scheme was developed by Kubo, Tanimura, and Ishizaki (Prof. Tanimura has posted some material on HEOM here).

However, the original computational method suffers from the same problems as path-integral calculations and is rather slow (though the HEOM method can be made faster and applied to electronic systems by using smart filtering as done by Prof. YiJing Yan). The GPU part in GPU-HEOM stands for Graphics Processing Units. Using our GPU adoption of the hierarchical equations (see details in Kreisbeck et al[JCTC, 7, 2166 (2011)] ) allowed us to cut down computational times dramatically and made it possible to perform a systematic study of the oscillations and the influence of temperature and disorder in our recent article Hein et al [New J. of Phys., 14, 023018 (2012), open access] .

The Nobel Prize 2011 in Chemistry: press releases, false balance, and lack of research in scientific writing

To get this clear from the beginning: with this posting I am not questioning the great achievement of Prof. Dan Shechtman, who discovered what is now known as quasicrystal in the lab. Shechtman clearly deserves the prize for such an important experiment demonstrating that five-fold symmetry exists in real materials.

My concern is the poor quality of research and reporting on the subject of quasicrystals starting with the press release of the Swedish Academy of Science and lessons to be learned about trusting these press releases and the reporting in scientific magazines. To provide some background: with the announcement of the Nobel prize a press release is put online by the Swedish academy which not only announces the prize winner, but also contains two PDFs with background information: one for the “popular press” and another one with for people with a more “scientific background”. Even more dangerously, the Swedish Academy has started a multimedia endeavor of pushing its views around the world in youtube channels and numerous multimedia interviews with its own members (what about asking an external expert for an interview?).

Before the internet age journalists got the names of the prize winners, but did not have immediately access to a “ready to print” explanation of the subject at hand. I remember that local journalists would call at the universities and ask a professor who is familiar with the topic for advice or get at least the phone number of somebody familiar with it. Not any more. This year showed that the background information prepared in advance by the committee is taken over by the media outlets basically unchanged. So far it looks as business as usual. But what if the story as told by the press release is not correct? Does anybody still have time and resources for some basic fact checking, for example by calling people familiar with the topic, or by consulting the archives of their newspaper/magazine to dig out what was written when the discovery was made many years ago? Should we rely on the professor who writes the press releases and trust that this person adheres to scientific and ethic standards of writing?

For me, the unfiltered and unchecked usage of press releases by the media and even by scientific magazines shows a decay in the quality of scientific reporting. It also generates a uniformity and self-referencing universe, which enters as “sources” in online encyclopedias and in the end becomes a “self-generated” truth. However it is not that difficult to break this circle, for example by

  1.  digging out review articles on the topic and looking up encyclopedias for the topic of quasicrystals, see for example: Pentagonal and Icosahedral Order in Rapidly Cooled Metals by David R. Nelson and Bertrand I. Halperin, Science 19 July 1985:233-238, where the authors write: “Independent of these experimental developments, mathematicians and some physicists had been exploring the consequences of the discovery by Penrose in 1974 of some remarkable, aperiodic, two-dimensional tilings with fivefold symmetry (7). Several authors suggested that these unusual tesselations of space might have some relevance to real materials (8, 9). MacKay (8) optically Fourier-transformed a two-dimensional Penrose pattern and found a tenfold symmetric diffraction pattern not unlike that shown for Al-Mn in Fig. 2. Three-dimensional generalizations of the Penrose patterns, based on the icosahedron, have been proposed (8-10). The generalization that appears to be most closely related to the experiments on Al-Mn was discovered by Kramer and Neri (11) and, independently, by Levine and Steinhardt (12).
  2. identifying from step 1 experts and asking for their opinion
  3. checking the newspaper and magazine archives. Maybe there exists already a well researched article?
  4. correcting mistakes. After all mistakes do happen. Also in “press releases” by the Nobel committee, but there is always the option to send out a correction or to amend the published materials. See for example the letter in Science by David R. Nelson
    Icosahedral Crystals in Perspective, Science 13 July 1990:111 again on the history of quasicrystals:
    “[…] The threedimensional generalization of the Penrose tiling most closely related to the experiments was discovered by Peter Kramer and R. Neri (3) independently of Steinhardt and Levine (4). The paper by Kramer and Neri was submitted for publication almost a year before the paper of Shechtman et al. These are not obscure references: […]

Since I am working in theoretical physics I find it important to point out that in contrast to the story invented by the Nobel committee actually the theoretical structure of quasicrystals was published and available in the relevant journal of crystallography at the time the experimental paper got published. This sequence of events is well documented as shown above and in other review articles and books.
I am just amazed how the press release of the Nobel committee creates an alternate universe with a false history of theoretical and experimental publication records. It does give false credits for the first theoretical work on three-dimensional quasicrystals and at least in my view does not adhere to scientific and ethic standards of scientific writing.

Prof. Sven Lidin, who is the author of the two press releases of the Swedish Academy has been contacted as early as October 7 about his inaccurate and unbalanced account of the history of quasicrystals. In my view, a huge responsibility rests on the originator of the “story” which was put in the wild by Prof. Lidin, and I believe he and the committee members are aware of their power  since they use actively all available electronic media channels to push their complete “press package” out. Until today no corrections or updates have been distributed. Rather you can watch on youtube the (false) story getting repeated over and over again. In my view this example shows science reporting in its worst incarnation and undermines the credibility and integrity of science.

Quasicrystals: anticipating the unexpected

The following guest entry is contributed by Peter Kramer

Dan Shechtman received the Nobel prize in Chemistry 2011 for the experimental discovery of quasicrystals. Congratulations! The press release stresses the unexpected nature of the discovery and the struggles of Dan Shechtman to convince the fellow experimentalists. To this end I want to contribute a personal perspective:

From the viewpoint of theoretical physics the existence of icosahedral quasicrystals as later discovered by Shechtman was not quite so unexpected. Beginning in 1981 with Acta Cryst A 38 (1982), pp. 257-264 and continued with Roberto Neri in Acta Cryst A 40 (1984), pp. 580-587 we worked out and published the building plan for icosahedral quasicrystals. Looking back, it is a strange and lucky coincidence that unknown to me during the same time Dan Shechtman and coworkers discovered icosahedral quasicrystals in their seminal experiments and brought the theoretical concept of three-dimensional non-periodic space-fillings to live.

More about the fascinating history of quasicrystals can be found in a short review: gateways towards quasicrystals and on my homepage.

Time to find eigenvalues without diagonalization

Solving the stationary Schrödinger (H-E)Ψ=0 equation can in principle be reduced to solving a matrix equation. This eigenvalue problem requires to calculate matrix elements of the Hamiltonian with respect to a set of basis functions and to diagonalize the resulting matrix. In practice this time consuming diagonalization step is replaced by a recursive method, which yields the eigenfunctions for a specific eigenvalue.

A very different approach is followed by wavepacket methods. It is possible to propagate a wavepacket without determining the eigenfunctions beforehand. For a given Hamiltonian, we solve the time-dependent Schrödinger equation (i ∂t-H) Ψ=0 for an almost arbitrary initial state Ψ(t=0)  (initial value problem).

The reformulation of the determination of eigenstates as an initial value problem has a couple of computational advantages:

  • results can be obtained for the whole range of energies represented by the wavepacket, whereas a recursive scheme yields only one eigenenergy
  • the wavepacket motion yields direct insight into the pathways and allows us to develop an intuitive understanding of the transport choreography of a quantum system
  • solving the time-dependent Schrödinger equation can be efficiently implemented using Graphics Processing Units (GPU), resulting in a large (> 20 fold) speedup compared to  CPU code
Aharnov-Bohm Ring conductance oscillations
The Zebra stripe pattern along the horizontal axis shows Aharonov-Bohm oscillations in the conductance of a half-circular nanodevice due to the changing magnetic flux. The vertical axis denotes the Fermi energy, which can be tuned experimentally. For details see our paper in Physical Review B.

The determination of transmissions requires now to calculate the Fourier transform of correlation functions <Ψ(t=0)|Ψ(t)>. This method has been pioneered by Prof. Eric J. Heller, Harvard University, and I have written an introductory article for the Latin American School of Physics 2010 (arxiv version).

Recently, Christoph Kreisbeck  has done a detailed calculations on the gate-voltage dependency of the conductance in Aharonov-Bohm nanodevices, taking full adventage of the simultaneous probing of a range of Fermi energies with one single wavepacket. A very clean experimental realization of the device was achieved by Sven Buchholz, Prof. Saskia Fischer, and Prof. Ulrich Kunze (RU Bochum), based on a semiconductor material grown by Dr. Dirk Reuter and Prof. Anreas Wieck (RU Bochum). The details, including a comparison of experimental and theoretical results shown in the left figure, are published in Physical Review B (arxiv version).