GPGPU Computing

In case you have never hear about this, GPGPU Computing is the use of graphics processing units (typical gaming graphic cards) as accelerators for general purpose computations. For example, in simulation of physical systems. Well, I’m deep in the middle of that mess.

I’m a co-founder member of the GPGPU@CAB Computing Group and former member of GPGPU@FaMAF. I’ve participated as speaker and organizer in the First Argentinean School on GPGPU Computing for Scientific Applications 2011 (FaMAF-UNC) and on the teaching team of the course Introduction to Numerical Calculus on Processing Graphics Units 2012 (IB-UNCUYO Bariloche).

> Show more

A short personal view about GPGPU with stolen phrases

In the last years we have seen heterogeneous architectures appearing to mitigate the technical barriers emerged in the development of faster processors. The gain of GFlops in modern computers is given by their ability of processing applications in parallel, not for their increase in the processor clock frequency. In other words, computers do not get faster any more, they get wider. As a result, programs must be coded to run in parallel; otherwise, they won’t take advantage of the available hardware. In some cases the improvement of compilers helps to get that parallelization almost for free, but in general applications have to be re-think to fit new architectures. Scientific-computing software is not out of that reality. Different parallelization frameworks can and should be used (MPI, OpenMP, TBB, CUDA, OpenCL, etc). In this sense, the appearance of CUDA and OpenCL for the developing of programs running in a heterogeneous platform CPU+GPU, and their parallelization approach based on data parallelism rather than the number of available cores, has been a great novelty for modelling physics, since many of our problems fits well in such approach. Nowadays, with a reasonable programming learning curve and an appropriate GPU attached to our desktop machine, we can speedup considerably our codes. This, taken to small clusters of CPU+GPU machines, opens the possibility to attack new problems and get more and better quality data in a shorter period of time. That’s the challenge.

How a physicist ends in dedicating time to learn this kind of things?

It’s all about necessity. A computational physicist like me, dedicates a large portion of his time programming and dealing with computational issues more than with “pure” physics. And when I say “large” I mean at least 50% of my working time. Having a good performant and flexible approach to parallel implementations is a key point. The drawback is that one needs to learn to program at least decently, and that’s not necessary a virtue of a physicist. Basically, our academic careers are still not giving us the computational tools we may need later on. In the path to the world of numerical simulations one needs to learn many things ad hoc. For example, the first time I “learned” to write a program in Fortran 90 was for my graduation thesis. Without programming a code in Fortran that runs properly and gives the physically expected results I would never had my MS title. I used Fortran for most of my PhD work in single core applications, but that was not enough at some point if we wanted to face some computing demanding open problems. Then, together with my advisor and with the help of CS colleagues, we decided to jump to C to program CUDA and learn about GPGPU Computing. Now I can say that I use fairly C, C++ and CUDA in my applications, but all the way down has been pure test and fail, and I know that I still missing some basic concepts. A general background in programming should be included in the education programs in physics, not only computational physicist need it.

What we can do to facilitate the access of physicists (and scientists in general) to GPGPU and/or HPC?

Easy: Collaborate! and, if appropriate, teach special curses. We are not professional programmers, but a small amount of good practices and learning some basic programming tools are essential to avoid systematic mistakes and wasting time. When dealing with GPGPU, group learning is much more easier than learning things alone. Interdisciplinary meeting groups, e-mails lists and forums have all the answers we need. Another useful practice is to make our codes open and publicly available. The common habit of computational physicists of keeping codes under key is nonsense. But that’s a separate discussion…

My GPGPU Activity

Here is a list of the GPGPU programming projects were I’ve got involved so far. Some of them are publicly available at my Bitbucket page. Forks and pull requests are welcome!

0) Ferromagnetic 2D q-state Potts model
Parallelization strategy: chequerboard approach
Language: CUDA C
Libraries & Tools: MWC RNG
Published?: yes, https://bitbucket.org/ezeferrero/potts
Collaborators: J.P. De Francesco, N. Wolovick, S. Cannas

1) 2D q-state Potts glass
Parallelization strategy: chequerboard approach
Language: CUDA C
Libraries & Tools: MWC RNG
Published?: yes, https://bitbucket.org/ezeferrero/potts-glass
Collaborators: S. Bustingorry, P. Gleiser, F. Roma

2) Quenched Edwards-Wilkinson elastic line
Parallelization strategy: embarrassing parallel
Language: CUDA C, C++
Libraries & Tools: Thrust, cuFFT, MWC RNG, PHILOX RNG
Published?: yes, https://bitbucket.org/ezeferrero/qew
Collaborators: S. Bustingorry, A. Kolton

3) Disordered scalar Phi4 model with local interactions
Parallelization strategy: embarrassing parallel
Language: CUDA C, C++
Libraries & Tools: Thrust, cuFFT, MWC RNG, PHILOX RNG
Published?: Yes, https://bitbucket.org/ezeferrero/phi4/
Collaborators: S. Bustingorry, A. Kolton

4) 2D Phi4 model with dipolar interactions (long range)
Parallelization strategy: pseudo-spectral method + embarrassing parallel
Language: CUDA C, C++
Libraries & Tools: Thrust, cuFFT, QEW RNG, PHILOX RNG
Published?: Yes, https://bitbucket.org/ezeferrero/phi4/
Collaborators: S. Bustingorry, A. Kolton

5) 2D Coulomb Glass (long range interactions)
Parallelization strategy: parallel rejection KMC
Language: CUDA C, C++
Libraries & Tools: Thrust, PHILOX RNG
Published?: Yes , https://bitbucket.org/ezeferrero/coulomb_glass
Collaborators: A. Kolton, M. Palassini

6) 1D driven polymer in a 2D disordered media (on a lattice)
Parallelization strategy: partially parallel, searching strategy, ideal for dynamic parallelism (not implemented)
Language: CUDA C, C++
Libraries & Tools: Thrust, cuFFT, PHILOX RNG
Published?: Should be soon
Collaborators: A. Rosso, A. Kolton

7) 2D Mesoplastic model
Parallelization strategy: pseudo-spectral method + embarrassing parallel + 1-thread KMC
Language: CUDA C, C++
Libraries & Tools: Thrust, cuFFT, PHILOX RNG
Published?: Yes, https://bitbucket.org/ezeferrero/epm/
Collaborators: K. Martens, J-L Barrat