Gunrock: A High-Performance Graph Processing Library on the GPU Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, John D. Owens University of California, Davis

Why is Gunrock fast? How does Gunrock express graph algorithms?

Overview Gunrock is a stable, powerful, forward-looking, open-source substrate for GPU-based graph-centric research and development. Gunrock offers:

# ## import libraries from ctypes import * gunrock = cdll . LoadLibrary ( ’ ../../ build / lib / libgunrock . so ’) # ## read row_list for x in col_list for x in

// Compute delta values while (! f o r w a r d _ q u e u e _ o f f s e t s . empty ()) { // Compute delta values BCEnactor :: gunrock :: oprtr :: advance < BCProblem , BackwardEnactor >(); }

# ## output array scores = pointer (( c_float * nodes )()) # ## call gunrock function on device gunrock . bc ( scores , nodes , edges , row , col , 0) # ## sample results print ’ node bc scores : ’ , for idx in range ( nodes ): print scores [0][ idx ] ,

BFSProblem :: Extract (); // Get result

(a) Compute BC in Python. (b) Develop BC using Gunrock. Figure: Code snapshot of working with Gunrock and using Gunrock.

What is Gunrock’s Data-centric Programming Model?

Block0

t0 t1

Block1

t0 t1

Block255

t0 t1

frontier

BFS:

Advance Update Label Value

Filter Remove Redundant

BC:

Advance Accumulate Sigma Value

Filter Remove Redundant

CC:

functor

Advance Update Label Value

SSSP:

Filter Remove Redundant

Advance Compute BC Value

Filter For e=(v1,v2), assign c[v1] to c[v2]. Remove e when c[v1]==c[v2]

Near/Far Pile

Traversal Computation

Filter For v, assign c[v] to c[c[v]]. Remove v when c[v]==c[c[v]]

...

tn t0

t1

.. .. .. tn t0 t1

...

tn

tn

t0 t1

...

tn

t0

...

tn

t1

t1

...

t1

tn

t0

t1

...

7435.21

1000

4155.72

Speedup-Cusha

Speedup-Ligra

PR:

Speedup-hardwired GPU

t0

t1

...

t31 t0

t1

t0

t1

warp0

PageRank

CC

roadnet

kron

bitc

soc

roadnet

kron

bitc

soc

roadnet

kron

bitc

soc

roadnet

kron

bitc

soc

roadnet

bitc

kron

0.1 soc

tn

t0

t1

t2

t3

t4

...

t31 t0

warp1

t1

... ... t31

t0

t1

t31

warp31

Warp cooperative Advance of medium neighbor lists; t0

t1

...

tn

Per-thread Advance of small neighbor lists.

Website: http://gunrock.github.io/ • Author’s Email: [email protected]

1

BC

...

Funding Agencies

• Gunrock

Speedup-MapGraph

10

SSSP

t1

DARPA XDATA W911QX-12-C-0059, STTR D14PC00023; NSF OCI-1032859, CCF-1017399.

100

BFS

t0

Block cooperative Advance of large neighbor lists;

Contact Information Speedup-BGL

tn

- Scale to multiple GPUs/nodes; - Asynchronous model; - Out-of-core and streaming support; - Expand core operators and new primitives; - In-depth performance characterization.

Advance Filter Distribute Update PR value. PR value to Remove when Neighbors PR value converge

Figure: Several graph primitives in Gunrock.

Compute

...

...

tn t0 t1

Future Work

traversal-based: breadth-first search, single-source shortest path; node-ranking: HITS, SALSA, PageRank, betweenness centrality; global: connected component, minimum spanning tree.

advance generate a new frontier from the edges or vertices of the current frontier filter generate a new frontier from a current frontier using a user-specified predicate compute run a user-specified computation in parallel on each element in the current

...

(a) Load-balanced traversal. (b) Dynamic-grouped traversal. Figure: Two core load-balancing strategies in Gunrock.

Primitives in Gunrock:

A frontier is a compact queue of nodes or edges. Gunrock’s three operators (below) manipulate frontiers.

Filter

in input CSR arrays from files = [ int ( x . strip ()) open ( ’ path / to / rowoffsets / r_file ’ )] = [ int ( x . strip ()) open ( ’ path / to / columnindices / c_file ’ )]

# ## convert CSR graph inputs for gunrock input row = pointer (( c_int * len ( row_list ))(* row_list )) col = pointer (( c_int * len ( col_list ))(* col_list )) nodes = len ( row_list ) - 1 edges = len ( col_list )

- the best performance on GPU graph analytics; - a high-level abstraction for graph algorithms on the GPU; and - the widest range of primitives.

Advance

BCProblem :: Init (); // Init ializati on // Accumulate sigma values while ( f r o n t i e r _ q u e u e _ l e n g t h > 0) { f o r w a r d _ q u e u e _ o f f s e t s . push ( new_offsets ); // Get neighbors and update scores BCEnactor :: gunrock :: oprtr :: advance < BCProblem , ForwardEnactor >(); // Cenerate new vertex frontier BCEnactor :: gunrock :: oprtr :: filter < BCProblem , ForwardEnactor >(); }

Powerful load-balancing capabilities that effectively address the inherent irregularity in graphs:

A High-Performance Graph Processing Library on the ...

Gunrock of- fers: -the best performance on GPU graph analytics; ... build/lib/libgunrock.so'). ### read in .... •Gunrock Website: http://gunrock.github.io/. •Author's ...

279KB Sizes 2 Downloads 277 Views

Recommend Documents

No documents