Gunrock: A High-Performance Graph Processing Library on the GPU Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, John D. Owens University of California, Davis
Why is Gunrock fast? How does Gunrock express graph algorithms?
Overview Gunrock is a stable, powerful, forward-looking, open-source substrate for GPU-based graph-centric research and development. Gunrock offers:
# ## import libraries from ctypes import * gunrock = cdll . LoadLibrary ( ’ ../../ build / lib / libgunrock . so ’) # ## read row_list for x in col_list for x in
// Compute delta values while (! f o r w a r d _ q u e u e _ o f f s e t s . empty ()) { // Compute delta values BCEnactor :: gunrock :: oprtr :: advance < BCProblem , BackwardEnactor >(); }
# ## output array scores = pointer (( c_float * nodes )()) # ## call gunrock function on device gunrock . bc ( scores , nodes , edges , row , col , 0) # ## sample results print ’ node bc scores : ’ , for idx in range ( nodes ): print scores [0][ idx ] ,
BFSProblem :: Extract (); // Get result
(a) Compute BC in Python. (b) Develop BC using Gunrock. Figure: Code snapshot of working with Gunrock and using Gunrock.
What is Gunrock’s Data-centric Programming Model?
Block0
t0 t1
Block1
t0 t1
Block255
t0 t1
frontier
BFS:
Advance Update Label Value
Filter Remove Redundant
BC:
Advance Accumulate Sigma Value
Filter Remove Redundant
CC:
functor
Advance Update Label Value
SSSP:
Filter Remove Redundant
Advance Compute BC Value
Filter For e=(v1,v2), assign c[v1] to c[v2]. Remove e when c[v1]==c[v2]
Near/Far Pile
Traversal Computation
Filter For v, assign c[v] to c[c[v]]. Remove v when c[v]==c[c[v]]
...
tn t0
t1
.. .. .. tn t0 t1
...
tn
tn
t0 t1
...
tn
t0
...
tn
t1
t1
...
t1
tn
t0
t1
...
7435.21
1000
4155.72
Speedup-Cusha
Speedup-Ligra
PR:
Speedup-hardwired GPU
t0
t1
...
t31 t0
t1
t0
t1
warp0
PageRank
CC
roadnet
kron
bitc
soc
roadnet
kron
bitc
soc
roadnet
kron
bitc
soc
roadnet
kron
bitc
soc
roadnet
bitc
kron
0.1 soc
tn
t0
t1
t2
t3
t4
...
t31 t0
warp1
t1
... ... t31
t0
t1
t31
warp31
Warp cooperative Advance of medium neighbor lists; t0
t1
...
tn
Per-thread Advance of small neighbor lists.
Website: http://gunrock.github.io/ • Author’s Email:
[email protected]
1
BC
...
Funding Agencies
• Gunrock
Speedup-MapGraph
10
SSSP
t1
DARPA XDATA W911QX-12-C-0059, STTR D14PC00023; NSF OCI-1032859, CCF-1017399.
100
BFS
t0
Block cooperative Advance of large neighbor lists;
Contact Information Speedup-BGL
tn
- Scale to multiple GPUs/nodes; - Asynchronous model; - Out-of-core and streaming support; - Expand core operators and new primitives; - In-depth performance characterization.
Advance Filter Distribute Update PR value. PR value to Remove when Neighbors PR value converge
Figure: Several graph primitives in Gunrock.
Compute
...
...
tn t0 t1
Future Work
traversal-based: breadth-first search, single-source shortest path; node-ranking: HITS, SALSA, PageRank, betweenness centrality; global: connected component, minimum spanning tree.
advance generate a new frontier from the edges or vertices of the current frontier filter generate a new frontier from a current frontier using a user-specified predicate compute run a user-specified computation in parallel on each element in the current
...
(a) Load-balanced traversal. (b) Dynamic-grouped traversal. Figure: Two core load-balancing strategies in Gunrock.
Primitives in Gunrock:
A frontier is a compact queue of nodes or edges. Gunrock’s three operators (below) manipulate frontiers.
Filter
in input CSR arrays from files = [ int ( x . strip ()) open ( ’ path / to / rowoffsets / r_file ’ )] = [ int ( x . strip ()) open ( ’ path / to / columnindices / c_file ’ )]
# ## convert CSR graph inputs for gunrock input row = pointer (( c_int * len ( row_list ))(* row_list )) col = pointer (( c_int * len ( col_list ))(* col_list )) nodes = len ( row_list ) - 1 edges = len ( col_list )
- the best performance on GPU graph analytics; - a high-level abstraction for graph algorithms on the GPU; and - the widest range of primitives.
Advance
BCProblem :: Init (); // Init ializati on // Accumulate sigma values while ( f r o n t i e r _ q u e u e _ l e n g t h > 0) { f o r w a r d _ q u e u e _ o f f s e t s . push ( new_offsets ); // Get neighbors and update scores BCEnactor :: gunrock :: oprtr :: advance < BCProblem , ForwardEnactor >(); // Cenerate new vertex frontier BCEnactor :: gunrock :: oprtr :: filter < BCProblem , ForwardEnactor >(); }
Powerful load-balancing capabilities that effectively address the inherent irregularity in graphs: