The HDF Group

Parallel HDF5 Quincey Koziol [email protected] The HDF Group CScADS’11 July 25, 2011 July 25, 2011

CScADS'11

1

www.hdfgroup.org

What is HDF? •  HDF stands for Hierarchical Data Format •  A file format for managing any kind of data •  Software system to manage data in the format •  Designed for high volume or complex data •  Designed for every size and type of system •  Open format and software library, tools •  There are two HDF s: HDF4 and HDF5 •  Today we focus on HDF5 July 25, 2011

CScADS'11

2

www.hdfgroup.org

Brief History of HDF 1987

At NCSA (University of Illinois), a task force formed to create an architecture-independent format and library: AEHOO (All Encompassing Hierarchical Object Oriented format) Became HDF

Early NASA adopted HDF for Earth Observing System project 1990 s 1996

DOE s ASC (Advanced Simulation and Computing) Project began collaborating with the HDF group (NCSA) to create Big HDF (Increase in computing power of DOE systems at LLNL, LANL and Sandia National labs, required bigger, more complex data files). Big HDF became HDF5.

1998

HDF5 was released with support from National Labs, NASA, NCSA

2006

The HDF Group spun off from University of Illinois as non-profit corporation

July 25, 2011

CScADS'11

3

www.hdfgroup.org

The HDF Group •  Established in 1988 •  18 years at University of Illinois’ National Center for Supercomputing Applications •  5 years as independent non-profit company, “The HDF Group”

•  The HDF Group owns HDF4 and HDF5 •  Basic HDF4 and HDF5 formats, libraries, and tools are open and free

•  Currently employ 36 FTEs

July 25, 2011

CScADS'11

4

www.hdfgroup.org

Goals of The HDF Group •  Maintain and evolve HDF for sponsors and communities that depend on it •  Provide support to the HDF communities through consulting, training, tuning, development, research •  Sustain the company for the long term to assure data access over time July 25, 2011

CScADS'11

5

www.hdfgroup.org

HDF5 Philosophy A single platform with multiple uses •  One general format •  One library, with •  Options to adapt I/O and storage to data needs •  Layers on top and below

•  Ability to interact well with other technologies •  Attention to past, present, future compatibility July 25, 2011

CScADS'11

6

www.hdfgroup.org

HDF5 Data Model •  Groups – provide structure among objects •  Datasets – where the primary data goes •  Data arrays •  Rich set of datatype options •  Flexible, efficient storage and I/O

•  Attributes, for metadata

Everything else is built essentially from these parts. July 25, 2011

CScADS'11

7

www.hdfgroup.org

Structures to organize objects Groups /

(root)  

3-­‐D  array  

/foo   lat  |  lon  |  temp   -­‐-­‐-­‐-­‐|-­‐-­‐-­‐-­‐-­‐|-­‐-­‐-­‐-­‐-­‐    12  |    23  |    3.1    15  |    24  |    4.2    17  |    21  |    3.6  

palette  

Table  

Raster  image   Raster  image  

Datasets July 25, 2011

CScADS'11

2-­‐D  array  

8

www.hdfgroup.org

Users of HDF5 Software Most  data  consumers  are  here.     ScienEfic/engineering  applicaEons.   Domain-­‐specific  libraries/API,  tools.  

       Tools  &  Applications   HDF5  Application     Programming  Interface   Virtual  file  layer  (VFL)  

ApplicaEons,  tools  use  this  API  to   create,  read,  write,  query,  etc.   Power  users  (consumers)   Modules  to  adapt  I/O  to  specific   features  of  system,  or  do  I/O  in   some  special  way.  

File  system,  MPI-­‐IO,  SAN,  other  layers   File  could  be  on  parallel  system,   in  memory,  collecEon  of  files,  etc.  

    HDF5  File   July 25, 2011

CScADS'11

9

www.hdfgroup.org

Layers – parallel example Application

I/O flows through many layers from application to disk.

Parallel computing system (Linux cluster) Compute

node

Compute

Compute

Compute

node

node

node

I/O library (HDF5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers Disk architecture & layout of data on disk

July 25, 2011

CScADS'11

10

www.hdfgroup.org

Parallel HDF5 Now •  New DOE Funding: •  “ExaHDF5” – Project w/LBNL & PNL to enhance HDF5 and aim for exascale platforms •  “Scalable HDF5” – Contract w/LLNL to enhance HDF5 and explore high-performance, non-MPI-I/O solutions •  “Damsel” – Project w/ANL, NWU & ORNL to design and implement a next generation file format and I/O middleware package

July 25, 2011

CScADS'11

11

www.hdfgroup.org

ExaHDF5 Tasks •  Remove “collective” restriction for metadata modifications (if possible)* •  Including supporting compressed datasets

•  Add metadata and raw data indexing to HDF5 •  Add support for asynchronous parallel I/O •  Design and implement file system autotuning mechanism •  Support “ordered updates” in parallel July 25, 2011

CScADS'11

12

www.hdfgroup.org

Single-Writer/Multiple-Reader Access •  Situation: A long-running process is modifying an HDF5 file and simultaneously other processes want to inspect data in the file. •  Solution: Single-Writer/Multiple-Reader (SWMR) File Access, using “ordered updates” •  Allows simultaneous reading of HDF5 file while the file is being modified by another process •  No inter-process coordination necessary

•  Bonus! Crash-proofs file also! 

July 25, 2011

CScADS'11

13

www.hdfgroup.org

Scalable HDF5 Tasks •  Explore and implement alternate scalable I/O approaches: •  “Poor man’s parallel I/O” (PMPIO) (from LLNL) •  “Reduced-Blocking I/O” (rbIO) (from ANL)

•  Design new Virtual File Drivers tuned for “modern” parallel file systems •  Metadata aggregation & alignment in file •  Advanced page buffering within library •  Deferred/staged/segregated object creation

July 25, 2011

CScADS'11

14

www.hdfgroup.org

Other Planned HDF5 Tasks •  Design and implement “Virtual Object Layer” within HDF5 •  Allows creation of plugins operating at higherlevel of abstraction that Virtual File Layer •  HDF5 data model, without using HDF5 file •  Can we merge HDF5 with [parallel] file system?

•  Expand HDF5 data model •  Support “shared” dataspaces •  Attributes on datatypes and dataspaces (allows units on datatypes, etc.)

•  “Append-only” library and file format optimizations July 25, 2011

CScADS'11

15

www.hdfgroup.org

Parallel HDF5 Challenges •  We are implementing file system on top of MPI-I/O! •  Not enough support in MPI for necessary locking operations, etc. •  Difficult to create production-quality software in a portable and cost-effective way

•  Need more funding •  Support and reach out to HPC application development teams •  Keep up with research efforts: ADIOS, pnetCDF, etc. July 25, 2011

CScADS'11

16

www.hdfgroup.org

Parallel HDF5

Jul 25, 2011 - July 25, 2011. CScADS'11. 3. Brief History of HDF. 1987 At NCSA (University of Illinois), a task force formed to create an architecture-independent format and library: AEHOO (All Encompassing Hierarchical Object Oriented format). Became HDF. Early NASA adopted HDF for Earth Observing System project.

1MB Sizes 1 Downloads 190 Views

Recommend Documents

DWDC-Working-with-HDF5-Files.pdf
Early NASA adopted HDF for Earth Observing System project. 1990'. s. 1996 DOE's ASC (Advanced SimulaGon and CompuGng) Project began collaboraGng ...

Xcelium Parallel Simulator - Cadence
views. Shown here: Context-aware activity for finite state machine analysis. .... are big misses that can escape IP and subsystem .... External Data ... Plan. Figure 6: The vManager platform's advanced verification methodology control cycle ...

Parallel Universes
The Sloan Digital Sky Survey has found ∆M/M as small as 1% on the scale R ~ 1025m and cosmic mi- ..... up until the point when she answers the question. ∗∗∗Indeed, the standard mental picture of what the physical ..... structure would not cor

Parallel Seq Scan
PARAMS_EXEC parameters (Execution time params required for evaluation of subselects). – Tuple Queues, to send tuples from worker to master backend.

Parallel Scientific Advice
*FDA pre-meeting at least 8 business days before FDA/EMA SAWP2 meeting ... Sponsor sends in a revised proposal and meeting package prior to SAWP3.

Parallel Universes
CREDIT ALFRED T. KAMAJIAN ( background. ); CORNELIA BLIK ( top inset. ); SARA CHEN ( ..... game Tetris while in college. ..... servers—the frog perspective.

Parallel Processing.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Parallel ...

Parallel Automaton
parallel/synchronization automata table. ... changing conditions, by just changing the automata table. 2. ..... condition is reached (according to a reference-table.

Parallel Computing.pdf
4. (a) Write short notes on the following : 10. (i) Spin lock mechanism for ... 71. 5 j. MCSE-011 3. Page 3 of 3. Main menu. Displaying Parallel Computing.pdf.

Heterogeneous Parallel Programming - GitHub
The course covers data parallel execution models, memory ... PLEASE NOTE: THE ONLINE COURSERA OFFERING OF THIS CLASS DOES NOT ... DOES NOT CONFER AN ILLINOIS DEGREE; AND IT DOES NOT VERIFY THE IDENTITY OF ...

Parallel transport
Apr 30, 2005 - consequence, the performance and capabilities of the 3G wireless technologies already rivals that of some technologies proposed as.

Parallel Spectral Clustering
Key words: Parallel spectral clustering, distributed computing. 1 Introduction. Clustering is one of the most important subroutine in tasks of machine learning.

parallel port pdf
Whoops! There was a problem loading more pages. parallel port pdf. parallel port pdf. Open. Extract. Open with. Sign In. Main menu. Displaying parallel port pdf.

Parallel-Megan-Codes_v4b.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

UNIT –3 PARALLEL PROGRAMMING
In the following code fragment, the directives indicate that the outer two loops are ... iii) Using a completely new programming language for parallel programming (e.g. Ada). ... execution of the user code beyond the end of the parallel construct.

Parallel Programming Models
Department of Computer Engineering,. Sir Syed University of Engineering & Technology,. Web: http://sites.google.com/site/muhammadnaseem105.

IVIC2017 Parallel Sessions.pdf
12:00 noon – 12:15 pm A Hybrid Model of Differential Evolution with Neural Network on Lag Time. Selection for Agricultural Price Time Series Forecasting. Zhiyuan Chen, D.V. Khoa Le and Soon Boon Lee. 12:15 pm – 12:30 pm Identifying the Qur'anic S

UNIT –3 PARALLEL PROGRAMMING
Page 24. Parallel Algorithms &. Parallel Programming. Check Your Progress 3. 1) (a) syntax for parallel directive : #pragma omp parallel [set of clauses].

Parallel Computing Technologies -
Sep 4, 2015 - storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter .... (CP2K): IBM Regatta 690+ [8], Cray XT3 and XT5 [15], IBM BlueGene/P [2] and K-100 cluster of Keldysh ..

Parallel Sesion-UPDATE.pdf
Emie Silviana Mohd Zahid, Mohd. Zainodin Musatafa, Nurfarhana. Mohd Daud dan Mahasin. Saja@Mearaj. Aktiviti Pembangunan Spiritual Bagi Asnaf Zakat Fakir. Dan Miskin di Selangor. 2/16. Muhammad Rahimi Osman,. Norajila Che Man, Mohd Faizal. P.Rameli. P

Download Puella Magi Homura Tamura, Vol. 1: ~Parallel Worlds Do Not Remain Parallel Forever~ Read online
Puella Magi Homura Tamura, Vol. 1: ~Parallel Worlds Do Not Remain Parallel Forever~ Download at => https://pdfkulonline13e1.blogspot.com/0316344885 Puella Magi Homura Tamura, Vol. 1: ~Parallel Worlds Do Not Remain Parallel Forever~ pdf download,