ExM: system support for extreme-scale, many-task applications   Ian  Foster  (PI),  Ewing  Lusk  (PI),  Ketan  Maheshwari,  Todd  Munson,  Michael  Wilde  (Lead  PI),  Argonne   Tim  Armstrong,  Daniel  S.  Katz  (PI),  Justin  Wozniak,  Zhao  Zhang,  University  of  Chicago   Sameer  Al-­‐Kiswany,  Matei  Ripeanu  (PI),  Emalayan  Vairavanathan,  University  of  British  Columbia   Problem:  identify  &  scale  up  many-­task  applications   Exascale  computers  will  enable  and  demand  new   problem  solving  methods  that  involve  many  concurrent,   interacting  tasks.  Methodologies  such  as  rational  design,   uncertainty  quantification,  parameter  estimation,  and   inverse  modeling  all  have  this  “many-­‐task”  property.  All   will  frequently  have  aggregate  computing  needs  that   require  exascale  computers.  For  example,  proposed   next-­‐generation  climate  model  ensemble  studies  involve   1,000  or  more  runs,  each  requiring  10K  cores  for  a   week,  to  characterize  model  sensitivity  to  initial   condition  and  parameter  uncertainty.  Running  many-­‐ task  applications  efficiently,  reliably,  and  easily  on   extreme-­‐scale  computers  is  challenging.  System   software  designed  for  today’s  mainstream  single   program  multiple  data  (SPMD)  computations  is  not   necessarily  a  good  match  to  the  demands  of  many-­‐task   applications.  

perform  rapid,  data-­‐aware,  and  efficient  dispatch  of   billions  of  small  tasks  to  exascale  computing  systems   and  the  fault-­‐tolerant  execution  of  those  tasks.  These   components  will  be  efficiently  integrated  with  current   and  future  extreme-­‐scale  system  software  and  made   available  via  parallel  scripting  languages  and  APIs.   ExM  Architecture   Many-task application

Ultra-fast task distribution Graph executor

Virtual data store Task graph executor

Graph executor

Compute node

Graph executor

Goals   CS  research  to  achieve  the  technical  advances  required   to  execute  many-­‐task  applications  efficiently,  reliably,   and  easily  on  petascale  and  exascale  facilities.  Create   middleware  that  enables  new  problem  solving  methods   and  application  classes  on  these  extreme-­‐scale  systems.   Impact   The  ExM  project  will  produce  advances  in  computer   science  and  usable  middleware  that  enables  the  efficient   and  reliable  use  of  exascale  computers  for  new  classes   of  applications.  The  project  will  both  accelerate  access   to  exascale  computers  by  important  existing   applications  and  facilitate  the  broader  use  of  large-­‐scale   parallel  computing  by  new  application  communities  for   which  it  is  currently  out  of  reach.  The  project  will  also   train  students  and  postdocs  in  the  development  and  use   of  innovative  approaches  for  extreme-­‐scale  computing.   Approach   To  address  these  demands,  the  ExM  project  will  design,   develop,  apply  and  evaluate  two  new  system  software   components.  The  ExM  data  store  will  allow  concurrent   and  asynchronous  application  tasks  to  communicate   efficiently  and  reliably,  both  with  each  other  and  with   persistent  storage,  by  reading  and  writing  data  objects   maintained  in  node-­‐local  storage,  including  memory,   SSD,  and  local  disk.  The  ExM  parallel  evaluator  will  

Global persistent storage

Jets  prototype  confirms  feasibility  of  many-­‐parallel-­‐ task  (MPI)  programming  model.  Turbine  prototype   using  ADLB  showed  encouraging  scalability  and   suggests  exascale  goals  are  feasible.  AME  anyscale   many-­‐task  engine  and  store  measured  BG/P  scaling  and   data  exchange  to  16K-­‐core  level.  MosaStore  on  Blue   Gene/P  and  other  clusters  is  creating  model  of  virtual   data  store.    Evaluate  ExM  tools  on  3  science  applications   (earthquake  simulation,  image  processing,  protein/RNA   interaction).  4  publications  (at  www.mcs.anl.gov/exm).   Next  milestones  (Oct  2011  –  Sep  2012)   Extend  the  ExM  task  manager  (Turbine)  and  its   intermediate  representation  to  run  Swift  and  PySwift   programs  on  BG/P  and  Cray  XE.  Model  fault  recovery.   Integrate  the  MosStore  and  AME  data  stores  into  Turbine   to  provide  support  for  scalable  collective  data   management.  Explore  HDF5  or  NetCDF  integration.   Evaluate  the  performance  and  usability  of  this  integrate   on  DOE  and  INCITE  applications:  ParVis  climate  model   analysis;  SCEC  earthquake  simulation;  SWAT  biofuels   landuse  impact;  Power  grid  modeling;  protein  structure   and  interaction  prediction;  subsurface  impact  modeling.  

                                                             

                         

 

Accomplishments  

                 

     

           

 

ExM distributed task management targets high thread count and utilization, low latency, and resiliency in the face of failing components and interconnects. ExM complex-wide data storage – based on MosaStore – is embedded and distributed across nodes and RAM storage to provide a global namespace and fast data exchange. Task subgraph Task graph

Task graph partitioner Task subgraph

Graph executor

Task queue executor

Graph executor

Task queue executor

ExM studies DOE, INCITE and other national-priority exascale candidate applications

(a) Climate model analysis

(b) Biofuel production impact

(c) Subsurface flows

(d) UQ of electricity and energy economics

ExM  studies  Swi0  scripts  used  to  specify  and  execute  many-­‐task  applicaBons:  (a)  QA,  analysis  and  visualizaBon  of  climate  model   outputs,  (b)  impact  of  biofuel  producBon  on  hydrology  in  major  US  watersheds,  (c)  subsurface  flow  of  chemicals  in  groundwater,   and  (d)  uncertainty  quanBficaBon  studies  of  consumer  and  industrial  electricity  usage  and  related  energy  and  economic  factors.  

ExM prototypes show encouraging scalability

(d)  ADLB  task  dispatch  scaling

(a)  Scaling  of  AME  many-­‐task   dispatch  and  RAM-­‐based  data  store.

(b)  Turbine  data  store  access  rate.  

(c)  Jets  fault-­‐tolerance  test.  

For  more  informa+on:                                                                                                                            Contact:  Michael  Wilde,  [email protected]  

(e)  Jets  many-­‐parallel-­‐task   applicaBon  scaling  (NAMD)  

ExM  Project:  h$p://www.mcs.anl.gov/exm                                                  Swi$  parallel  scrip-ng  language:  h$p://www.ci.uchicago.edu/swi7   ADLB:  h$p://www.cs.mtsu.edu/~rbutler/adlb                                        MosaStore:  h$p://netsyslab.ece.ubc.ca/wiki/index.php/MosaStore    

ExM: system support for extreme-scale, many-task ...

inverse modeling all have this “many-‐task” property. All ... task (MPI) programming model. ... MosaStore: h#p://netsyslab.ece.ubc.ca/wiki/index.php/MosaStore.

3MB Sizes 3 Downloads 169 Views

Recommend Documents

SEIS: A Decision Support System for Optimizing ...
neural networks or other plug-in tools can access all the data .... Pilot Project (SWPP). It will be ..... visualization application that allows the proper visualization.

Implementing a Clinical Decision Support System for Glucose Control ...
flexible platform to maintain guidelines, the ability to adjust guidelines to in-. corporate changes .... Multimedia paging for clinical alarms on mobile platforms.

A Whole-Farm Planning Decision Support System for ...
13 Aug 1999 - CLIGEN's random number generator was replaced with. UNIRAN, which allows the control of stream numbers and has been thoroughly tested (Marse and Roberts, 1983). The CLIGEN module programs are run from FLAME by calling Windows applicatio

Decision Support System And Intelligent System 7th Edition ...
There was a problem previewing this document. Retrying. ... Decision Support System And Intelligent System 7th Edition- Turban_Aronson_Liang_2005.pdf.

The Cricket Location-Support System
The Cricket Location-Support. System. N.B. Priantha, A. Chakraborty, H. Balakrishnan ... Interest in demonstrating that nearest beacon is “good enough” for ...

Trama-A-Web-based-System-to-Support-Knowledge-Management ...
Try one of the apps below to open or edit this item. Trama-A-Web-based-System-to-Support-Knowledge-Management-in-a-Collaborative-Network.pdf.

A Nonlinear Hybrid Life Support System: Dynamic ...
develop control schemes for them have been pursued. The specific application domain for this work is advanced life support systems that are used ... a hybrid system, which form a sequence of four quarter-cycles that compose one full-cycle of ...

Merge-by-Wire: Algorithms and System Support
AFS and HoL were implemented using DSRC based AFR-CS protocol for inter-vehicle communication. HoL v/s AFS: DTTI and external control duration.

535 Educational Support System 10-15-12.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. 535 Educational ...

Trama-A-Web-based-System-to-Support ... - Drive
Whoops! There was a problem loading this page. Trama-A-Web-based-System-to-Support-Knowledge-Management-in-a-Collaborative-Network.pdf.

Cheap High Quality 15Mm Rail Rod Support System Baseplate ...
Cheap High Quality 15Mm Rail Rod Support System Ba ... 5D2 5D 5D3 7D Free Shipping & Wholesale Price.pdf. Cheap High Quality 15Mm Rail Rod Support ...

Specification of a Component-based Domotic System to Support User ...
more, scenario integration in the system should be au- tomatic and dynamic, in ... Few systems support more complex users requirements, but are based on ...

Merge-by-Wire: Algorithms and System Support
The DSRC-based wireless ..... the best match for our task set and hence implemented the same on ..... In future, we plan to complete our setup for empirical.

Merge-by-Wire: Algorithms and System Support
Merge-by-Wire: Algorithms and System Support. Vipul Shingde, Gurulingesh Raravi, Ashish Gudhe, Prakhar Goyal, Krithi Ramamritham. {vipul.shingde ...

PC Spr Exm Rev (2015) (Part1).pdf
Simplify the following algebraic expressions using the properties of exponents. 1. (3xx2yy)3 ... Determine the minimum interval for the graph of the polar curve. 15.rr = 6 − 5 ... An ellipse with foci at (1, 0) and (−1, 0) and minor axis of lengt

Google is vital support system for Brazilian blood donation app Heroes
All other company and product names may be trademarks of the respective companies ... information is shared across social networks, raising awareness of the.

t12: an advanced text input system with phonetic support for mobile ...
T12: AN ADVANCED TEXT INPUT SYSTEM WITH PHONETIC SUPPORT. FOR MOBILE ..... http://www.comp.lancs.ac.uk/ucrel/bncfreq/lists/1_2. _all_freq.txt. 8.

FEEDING SYSTEM FOR LIVESTOCK
Ration must contains minimum of 27% NDF or 19% ADF (DM basis), with 75% of the ration NDF derived from forage / roughage. • RDP to UDP or bypass protein ...

INTELLIGENT SYSTEMS FOR DECISION SUPPORT ...
small gap at T = 0 means that rules with T = 0 are excluded from being .... Uncertainties are emphasized here because according to Harvard Business .... identical situations and understand the same phrases differently when hearing or reading .... car

EudraVigilance Technical Support Plan for national Competent ...
Jun 26, 2017 - the NCA's local pharmacovigilance database/EVWEB (as applicable). b. .... to medicinal products for human use (OJ L 311, 28.11.2001, p. 67).

Support for Qualtrics software_ORSP_Gannon.pdf
Support for Qualtrics software_ORSP_Gannon.pdf. Support for Qualtrics software_ORSP_Gannon.pdf. Open. Extract. Open with. Sign In. Main menu. Whoops!