Thu 16 Jun 2016 17:00 - 17:30 at Grand Ballroom San Rafael - Parallelism I Chair(s): Tony Hosking

Prefix sums are an important parallel primitive, especially in massively-parallel programs. This paper discusses two orthogonal generalizations thereof, which we call higher-order and tuple-based prefix sums. Moreover, it describes and evaluates SAM, a GPU-friendly algorithm for computing prefix sums and other scans that directly supports higher orders and tuple values. Its templated CUDA implementation unifies all of these computations in a single 100-statement kernel. SAM is communication-efficient in the sense that it minimizes main-memory accesses. When computing prefix sums of a million or more values, it outperforms Thrust and CUDPP on both a Titan X and a K40 GPU. On the Titan X, SAM reaches memory-copy speeds for large input sizes, which cannot be surpassed. SAM outperforms CUB, the currently fastest conventional prefix sum implementation, by up to a factor of 2.9 on eighth-order prefix sums and by up to a factor of 2.6 on eight-tuple prefix sums.

Thu 16 Jun
Times are displayed in time zone: (GMT-07:00) Tijuana, Baja California change

17:00 - 18:00: Parallelism IResearch Papers at Grand Ballroom San Rafael
Chair(s): Tony HoskingAustralian National University, Data61, and Purdue University
17:00 - 17:30
Research Papers
Sepideh MalekiTexas State University, Annie YangTexas State University, Martin BurtscherTexas State University
Pre-print Media Attached
17:30 - 18:00
Research Papers
Junghyun KimSeoul National University, Gangwon JoSeoul National University, Jaehoon JungSeoul National University, Jungwon KimOak Ridge National Laboratory, Jaejin LeeSeoul National University
Media Attached