The physics of GPU programming

GPU cluster
Me pointing at the GPU Resonance cluster at SEAS Harvard with 32x448=14336 processing cores. Just imagine how tightly integrated this setup is compared to 3584 quad-core computers. Picture courtesy of Academic Computing, SEAS Harvard.

From discussions I learn that while many physicists have heard of Graphics Processing Units as fast computers, resistance to use them is widespread. One of the reasons is that physics has been relying on computers for a long time and tons of old, well trusted codes are lying around which are not easily ported to the GPU. Interestingly, the adoption of GPUs happens much faster in biology, medical imaging, and engineering.
I view GPU computing as a great opportunity to investigate new physics and my feeling is that todays methods optimized for serial processors may need to be replaced by a different set of standard methods which scale better with massively parallel processors. In 2008 I dived into GPU programming for a couple of reasons:

  1. As a “model-builder” the GPU allows me to reconsider previous limitations and simplifications of models and use the GPU power to solve the extended models.
  2. The turn-around time is incredibly fast. Compared to queues in conventional clusters where I wait for days or weeks, I get back results with 10000 CPU hours compute time the very same day. This in turn further facilitates the model-building process.
  3. Some people complain about the strict synchronization requirements when running GPU codes. In my view this is an advantage, since essentially no messaging overhead exists.
  4. If you want to develop high-performance algorithm, it is not good enough to convert library calls to GPU library calls. You might get speed-ups of about 2-4. However, if you invest the time and develop your own know-how you can expect much higher speed-ups of around 100 times or more, as seen in the applications I discussed in this blog before.

This summer I will lecture about GPU programming at several places and thus I plan to write a series of GPU related posts. I do have a complementary background in mathematical physics and special functions, which I find very useful in relation with GPU programming since new physical models require a stringent mathematical foundation and numerical studies.

2 thoughts on “The physics of GPU programming

  1. Nice cluster! What is the general configuration of the GPU Resonance cluster? i.e. are you using Tesla cards and if so which model? What CPU are you using? What is the interconnect network – QDR Infiniband or something else?

    1. Hi,
      the system I am pointing to is administered by the Harvard SEAS Academic Computing division. Some info provided by SEAS about the setup consisting of 16 nodes with 192 CPU cores and 32 GPUs:

      Each node contains:
      2xC2070 nVidia GPU card (448 cores/GPU)
      Intel Xeon X5650 2.66 GHz 2×6 core
      48 GB RAM (4 GB/core average)

      The connections:
      QDR (40 Gb/s) cluster interconnect
      1GigE (1 Gb/s) network interconnect

      Cheers, Tobias

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.