Sundials/ML: interfacing with numerical solvers∗ Timothy Bourke

Jun Inoue

Marc Pouzet

Inria Paris École normale supérieure, PSL Research University

National Institute of Advanced Industrial Science and Technology

[email protected]

[email protected]

Univ. Pierre et Marie Curie École normale supérieure, PSL Research University Inria Paris

ABSTRACT

OCaml heap Nvector.t ’d cnvec

We describe a comprehensive OCaml interface to the Sundials suite of numerical solvers (version 2.6.2). Noteworthy features include the treatment of the central vector data structure and benchmarking results.

1.

2.

’d -> bool

VECTORS

Vectors in Sundials are represented by an abstract data type that combines a pointer to the underlying array of floats and 25 function pointers to generic operations like nvdotprod, which computes the dot product, and nvclone, which creates a fresh copy of the given vector. There are ∗This work was supported by the ITEA 3 project 11004 MODRIO (Model driven physical systems operation). 1 http://inria-parkas.github.io/sundialsml/ Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

ACM Workshop on ML 2016 September 22, 2016, Nara, Japan c 2016 Copyright held by the owner/author(s).

ACM ISBN 123-4567-24-567/08/06. . . $15.00 DOI: 10.475/123 4

payload

C heap double[] *N_Vector+ content 25 ops

INTRODUCTION

Sundials [3] is a suite of six numerical solvers for ordinary differential equations, differential algebraic equations, and non-linear equations. We wrote an OCaml interface because we needed state-of-the-art solvers for the runtime system of Z´elus [1], but our interface could benefit any project that involves both symbolic manipulation and numeric computation. It augments the C library with the usual features of ML—namely, static and dynamic safety checks, automatic memory management, error propagation via exceptions, callbacks with closures, and a programming model based on types and modules. Although the basic techniques for interfacing OCaml and C are well understood [5, chap. 19], we required several iterations to find a design that minimizes copying, avoids memory leaks, and encompasses all features of the library without unnecessary duplication. The Sundials solvers share common datatypes for vectors and matrices. We present the solution we found for the former and benchmark results. Sourcecode and documentation are available online,1 including the benchmark examples and scripts for analyzing them.

[email protected]

GC root

backlink

Figure 1: Interfacing (serial) vectors five kinds of vectors: serial, OpenMP, Pthreads, parallel, and custom (with user-implemented operations). Vectors are passed to callbacks that perform problem-specific calculations in performance-critical solver loops, so an efficient way of accessing their payload from OCaml is important. Initially, we transparently converted Sundials vectors to and from bigarrays [5, chap. 30], which allow direct sharing of arrays of floats between OCaml and C. This approach is relatively simple and allows users to work directly with bigarrays, but it requires either frequent allocation of bigarray headers on the major heap or awkward caching. Moreover, different vector kinds require different interfaces requiring either code duplication or costly indirection. The approach we eventually settled on exploits features of the vector abstract datatype and polymorphic typing to treat vectors uniformly without caching or code duplication. The opaque Nvector.t type used throughout our interface is represented internally as type (’d, ’k) t = ’d * cnvec * (’d -> bool) where ’d is the type of the payload, a bigarray for serial vectors or a bigarray and an MPI communicator2 for parallel vectors, and ’k is a phantom type for distinguishing vector kinds. Figure 1 shows the memory layout; Nvector.t is pictured at left. The first component references the payload in the OCaml heap, the second points to a Sundials N_Vector in the C heap, and the third is a function for checking dynamic compatibility of vectors, like having equal lengths. The cnvec pointer passes via a custom block (not shown) in anticipation of planned changes to OCaml that forbid the use of naked C pointers as OCaml values. Sundials requires a ‘seed’ vector to create a solver session and allocates all internal storage by replicating the seed with nvclone. We wrote a custom nvclone operation that allocates extra space after the N_Vector structure to store a 2

https://forge.ocamlcore.org/projects/ocamlmpi/.

Parallel and custom vectors have opaque kinds. Functions that accept any of the first three kinds but no others take a vector with constraint ’k = [>Nvector_serial.kind], that is, any kind that includes the ‘Serial constructor. The type of sessions includes a ’k fixed by the initial seed vector.

3.

BENCHMARKS

Sundials is distributed with 56 example programs that use serial, OpenMP, or Pthreads vectors and 21 that use parallel vectors (ignoring duplicates). We reimplemented them in OCaml and compared their performance to the C versions. This process uncovered many bugs in our code and several in Sundials itself. We repeatedly executed examples with manual garbage collector invocations to expose memory-handling errors. After ensuring that the OCaml examples gave the same results as the C counterparts, we optimized running times. Most optimizations were in the example programs themselves where we followed the standard approach of adding type annotations to vector arguments to reduce polymorphism, avoided functions like Bigarray.Array1.sub that allocate fresh bigarrays on the major heap, and wrote numeric expressions and loops to avoid float boxing [4]. Figure 2 shows the ratios of the wall-clock times of the OCaml code against the C code. A value of 2.0 on the left axis means that the OCaml version takes twice as long as the C version. The running times of the parallel examples (not shown) are dominated by process set up and communication costs with ratios mostly close to 1.0 and rarely more than 1.1. The grey bars show the results for ‘customized’

2

1.8

1.6

1.4

1.2

1 kinRoboKin_slu kinRoboKin_dns_alt kinRoboKin_dns kinRoberts_fp kinLaplace_picard_bnd kinLaplace_bnd kinKrylovDemo_ls kinFoodWeb_kry_custom kinFoodWeb_kry kinFerTron_dns idasSlCrank_FSA_dns idasRoberts_FSA_dns idasRoberts_ASAi_dns idasHessian_ASA_FSA idasAkzoNob_dns idasAkzoNob_ASAi_dns idaSlCrank_dns idaRoberts_sps idaRoberts_klu idaRoberts_dns_alt idaRoberts_dns idaKrylovDemo_ls idaHeat2D_kry idaHeat2D_klu idaHeat2D_bnd idaFoodWeb_bnd cvsRoberts_FSA_dns cvsRoberts_ASAi_dns cvsHessian_ASA_FSA cvsFoodWeb_ASAp_kry cvsFoodWeb_ASAi_kry cvsDiurnal_FSA_kry cvsAdvDiff_FSA_non cvsAdvDiff_ASAi_bnd cvRoberts_sps cvRoberts_klu cvRoberts_dns_uw_alt cvRoberts_dns_uw cvRoberts_dnsL cvRoberts_dns cvKrylovDemo_prec cvKrylovDemo_ls cvDiurnal_kry_bp cvDiurnal_kry cvDirectDemo_ls cvAdvDiff_bndL cvAdvDiff_bnd ark_robertson_root ark_robertson ark_heat1D_adapt ark_heat1D ark_brusselator_fp ark_brusselator1D_klu ark_brusselator1D_FEM_slu ark_brusselator1D ark_brusselator ark_analytic_nonlin ark_analytic ark_KrylovDemo_prec ark_brusselator1D_omp

type Nvector_serial.kind = [‘Serial] type Nvector_openmp.kind = [‘OpenMP | ‘Serial] type Nvector_pthreads.kind = [‘Pthreads | ‘Serial]

7.6

2.2

Wall clock: OCaml / C

GC-registered root back to the associated OCaml payload. These roots are used to retrieve the values to pass to callback functions, which thus directly receive a payload rather than an Nvector.t. Extending N_Vector in this way means that the original code for operations like nvdotprod is called directly. The bigarray in the payload and the N_Vector both point to the same content array which is only freed when the bigarray is finalized by the garbage collector. We did not link back to an Nvector.t as this would have created an inter-heap cycle and necessitated special treatment to avoid memory leaks. Our approach thus involves two classes of vector: those created by OCaml code with a lifetime linked to the associated Nvector.t and determined by the garbage collector, and those cloned within Sundials, for which there is no Nvector.t and whose lifetime ends when Sundials explicitly destroys them, though the payload may persist. Sundials only destroys vectors that it cloned itself: data linked from vectors created in OCaml is never prematurely deallocated. Sundials poses two main constraints on vector use. Firstly, only a single kind may be used per solver session, since operations like nvdotprod access the underlying representation of all their arguments. Secondly, certain linear solvers may only be used with kinds that store their data in a single, local array; that is, with serial, OpenMP, or Pthreads vectors, but not with parallel vectors. These rules are enforced statically using the ’k type variable and OCaml’s polymorphic variants [2]. Three of the vector kinds are declared as closed sets of tags:

Figure 2: Serial examples: C (gcc 4.9.2 with -O2) versus OCaml native code (4.03.0). The hashed subbars show the results when compiled with unsafe. (Linux 3.16.0 on 2.60 GHz i7 with 1 MB L2/5 MB L1/8 GB RAM.) examples: the kinFoodWeb kry custom example uses custom vectors with low-level operations implemented in OCaml on float arrays; the ∗ alt examples use a linear solver reimplemented in OCaml. The hashed sub-bars give the ratios when the OCaml versions are compiled without checks on array access and vector compatibility. Nearly all of the examples run in less than 1 ms; the two longest are on the order of 3 s. As we could not profile such short executions directly, we had the examples repeatedly execute their main functions. Since the C code frees its data at each iteration, we also manually triggered a full heap compaction in OCaml before terminating. The fastest examples require 1000s of iterations to become measurable, so we varied the number of repetitions per example to avoid the slower examples taking hours. The OCaml version with the worst ratio is iterated 100 000 times. When major collections are not invoked manually and the minor heap is enlarged (to 128 MB), the OCaml/C ratio of this example drops from 7.6 to 1.8. Otherwise, the graph shows that the OCaml versions are rarely more than 60% slower than the originals and they are often less than 30% slower.

4.

REFERENCES

[1] T. Bourke and M. Pouzet. Z´elus: A synchronous language with ODEs. In HSCC, pages 113–118. ACM Press, Apr. 2013. [2] J. Garrigue. Programming with polymorphic variants. In ML Workshop. ACM, Sept. 1998. [3] A. C. Hindmarsh, P. N. Brown, K. E. Grant, S. L. Lee, R. Serban, D. E. Shumaker, and C. S. Woodward. SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers. ACM Trans. Mathematical Software, 31(3):363–396, Sept. 2005. [4] X. Leroy. Writing efficient numerical code in Objective Caml. http://caml.inria.fr/pub/old caml site/ocaml/ numerical.html, July 2002. [5] X. Leroy, D. Doligez, A. Frisch, J. Garrigue, D. R´emy, and J. Vouillon. The OCaml system: Documentation and user’s manual. Inria, 4.03 edition, Apr. 2016.

Sundials/ML: interfacing with numerical solvers - ML Family Workshop

Sep 22, 2016 - [email protected]. Jun Inoue. National Institute of Advanced. Industrial Science and. Technology. [email protected]. Marc Pouzet. Univ. Pierre et Marie Curie. École normale supérieure,. PSL Research University. Inria Paris. [email protected]. ABSTRACT. We describe a comprehensive OCaml ...

238KB Sizes 0 Downloads 231 Views

Recommend Documents

Sundials/ML: interfacing with numerical solvers - ML Family Workshop
Sep 22, 2016 - 4. REFERENCES. [1] T. Bourke and M. Pouzet. Zélus: A synchronous language with ODEs. In HSCC, pages 113–118. ACM. Press, Apr. 2013.

Mergeable Types - ML Family Workshop
systems with the ability to define and compose distributed ML computations around ... library on a single machine, this implementation behaves as expected.

Arduino programing of ML-style in ATS - ML Family Workshop
binaries generated from ATS source are very close (in terms of size) to those generated from the C counterpart. 2. ATS programming language. ATS is a programming language equipped with a highly expressive type system rooted in the framework Applied T

Tierless Modules - The ML Family Workshop
Web, client/server, OCaml, ML, Eliom, functional, module. 1 INTRODUCTION. Traditional Web applications are composed of several dis- tinct tiers: Web pages ...

Ambiguous pattern variables - The ML Family Workshop
Jul 29, 2016 - Let us define .... where the Bi,k are binding sets, sets of variables found ... new rows bind to a different position. [Bi,1 ... Bi,l. | K(q1,...,qk) pi,2.

Relational Conversion for OCaml - ML Family Workshop
preters (Programming Pearl) // Proceedings of the 2012 Work- shop on Scheme and Functional Programming (Scheme '12). [5] Henk Barendregt. Lambda ...

VOCAL – A Verified OCAml Library - ML Family Workshop
OCaml is the implementation language of systems used worldwide where stability, safety, and correctness are of ... An overview of JML tools and applications.

Relational Conversion for OCaml - The ML Family Workshop
St.Petersburg State University .... Logic in Computer Science (Vol. 2), 1992. [6] William E. ... Indiana University, Bloomington, IN, September 30, 2009. [7] Dmitry ...

Typer: An infix statically typed Lisp - The ML Family Workshop
Oxford, UK, September 2017 (ML'2017), 2 pages. ... the syntax of macro calls is just as exible as that of any other .... Conference on Functional Programming.

VOCAL – A Verified OCAml Library - ML Family Workshop
Libraries are the basic building blocks of any realistic programming project. It is thus of utmost .... verification of object-oriented programs. In 21st International ...

Extracting from F* to C: a progress report - The ML Family Workshop
raphy (ECC) primitives, and on extracting this code to C. ... verification extract the code back to C. .... pointers are made up of a block identifier along with an.

Extracting from F* to C: a progress report - The ML Family Workshop
sub-tree untouched. In short, hyperheaps provide framing guarantees. Each sub-tree is assigned a region-id (rid), and a hyperheap maps an rid to a heap.

OR310552-INTERFACING WITH MICROPROCESSORS.pdf
Write short notes on the following: (a) Relative and based addressing modes of 8086. (b) Interrupt structure of 8086. (c) Use of CALL and RET instructions in executing procedures. 2. (a) Write briefly about the importance of the 8086 LOOP instruction

OR310552-INTERFACING WITH MICROPROCESSORS.pdf
INTERFACING WITH MICROPROCESSORS. ( Common to Computer Science & Engineering and Information. Technology). Time: 3 hours. Max Marks: 70 ... (b) Write an 8086 assembly language program sequence which uses the LOOP ... (a) Bring out the importance of u

Interfacing to External Memory, Interfacing with 8255 Tutorials 1.pdf ...
programmable ROM), EAROM (electrically alterable ROM), or a NOVROM. (nonvolatile ROM). • These memory devices are electrically erasable in the system, ...

GADTs and exhaustiveness: looking for the impossible - ML Family ...
... !env expected_ty) expected_ty k else k (mkpat Tpat_any expected_ty). | Ppat_or (sp1, sp2) -> (* or pattern *) if mode = Check then let state = save_state env in try type_pat sp1 expected_ty k with exn ->. 3The code is available through OCaml's Su

GADTs and exhaustiveness: looking for the impossible - ML Family ...
log's SLD resolution, for which counter-example genera- tion (i.e. construction of a witness term) is known to be only semi-decidable. Another way to see it is that ...

Polymorphism, subtyping and type inference in MLsub - ML Family ...
Sep 3, 2015 - Polymorphism, subtyping and type inference in. MLsub. Stephen Dolan and Alan Mycroft ... We have two tricks for getting around the difficulties: • Define types properly. • Only use half of them. 2 ... Any two types have a greatest c

Billerica Public Schools Family Workshop ...
Mar 8, 2016 - This workshop is an introduction to Google Apps such as Google Docs, Slides, Calendar, and Gmail ... Parents will learn about how to use Aspen to it's greatest potential by reviewing settings, setting up .... home by explaining the posi

Polymorphism, subtyping and type inference in MLsub - ML Family ...
Sep 3, 2015 - Polymorphism, subtyping and type inference in. MLsub. Stephen Dolan and Alan Mycroft ... We have two tricks for getting around the difficulties: • Define types properly. • Only use half of them. 2 ... Any two types have a greatest c

Page 1 Z 7654 ML ML LEAL ML ML 8_2m1L _22.13_ _BML _BML ...
S e e e cl S t L_l cl 1 o. TITLE: ñrch BLE v1.84. Design: v? 32. 31. 29. 28. || 27. 26. 25. 19. En „3 21. En ai 22. En „5 23. En ná 24. 123456789 ...

ml harper, llc
Sep 20, 2016 - Emergency Contact. Contact Name: _Day Phone: Night Phone: Cellular Phone: Alternate Contact: Phone: ... Policy Number: I hereby authorize ...