Ambiguous pattern variables Gabriel Scherer, Luc Maranget, Thomas R´efis July 29, 2016 The or-pattern (p | q) matches a value v if either p or q match v. It may happen that both p and q match certain values, but that they don’t bind their variables at the same places. OCaml specifies that the left pattern p then takes precedence, but users intuitively expect an angelic behavior, making the “best” choice. Subtle bugs arise from this mismatch. When are (p | q) and (q | p) observably different? To correctly answer this question we had to go back to pattern matrices, the primary technique to compile patterns and analyze them for exhaustivity, redundant clauses, etc. There is a generational gap: pattern matching was actively studied when most ML languages were first implemented, but many of today’s students and practitioners trust our elders to maintain and improve them. Read on for your decadely fix of pattern matching theory!

means, but it is easily explained by the OCaml semantics detailed above. A guarded clause p when g -> e matches the scrutinee against p first, and checks g second. Our input matches both sides of the or-pattern; by the specified left-to-right order, the captured environment binds the pattern variable n to the value v (not n). The test is_neutral n fails in this environment, so the clause does not match the scrutinee. A new warning This is not an implementation bug, the behavior is as specified. This is a usability bug, as our intuition contradicts the specification. There is no easy way to change the semantics to match user expectations. The intuitive semantics of “try both branches” does not extend gracefully to or-patterns that are in depth rather that at the toplevel of the pattern. Another approach would be to allow when guards in depth inside patterns, but that would be a very invasive change, going against the current design stance of remaining in the pattern fragment that is easy to compile – and correspondingly has excellent exhaustiveness and usefulness warnings. The last resort, then, is to at least complain about it: detect this unfortunate situation and warn the user that the behavior may not be the intended one. The mission statement for this new warning was as follows: “warn on (p1 | q2 ) when g when an input could pass the guard g when matched by p2 , and fail when matched by p1 ”. We introduced this new warning in OCaml 4.03, released in April 2016.

A bad surprise Consider the following OCaml matching clause: | (Const n, a) | (a, Const n) when is_neutral n -> a This clause, part of a simplification function on some symbolic monoid expressions, uses two interesting features of OCaml pattern matching: when guards and orpatterns. A clause of the form p when g -> e matches a pattern scrutinee if the pattern p matches, and the guard g, an expression of type bool, evaluates to true in the environment enriched with the variables bound in p. Guards occur at the clause level, they cannot occur deep inside a pattern. The semantics of our above example seems clear: when given a pair whose left or right element is of the form Const n, where n is neutral, it matches and returns the other element of the pair. Unfortunately, this code contains a subtle bug: when passed an input of the form (Const v, Const n) where v is not neutral but n is, the clause does not match! This goes against our natural intuition of what the code

Specification and non-examples A pattern p may or may not match a value v, but if it contains or-patterns it may match it in several different ways. Let us define matches(p, v) as the ordered list of matching environments, binding the free variables of p to sub-parts of v; if it is the empty list, then the pattern does not match the value. A variable x ∈ p is ambiguous if there exists a value v such that distinct environments of matches(p, v) map x 1

to distinct values, and stable otherwise. We must warn when a guard uses an ambiguous variable. x is stable in ((x, None, _) | (x, _, None)), as it will always bind the same sub-value for any input. x is stable in ((x, None, _) | (_, Some _, x)), as no value may match both sides of the or-pattern.

Binding sets When splitting a matrix into submatrices, we peel off a layer of head constructors, and thus lose information on any variable bound at this position in the patterns. To correctly compute stable variables, we need to keep track of these binding sites: we enrich pattern matrices with information on what variables were peeled off each row. Our matrices are now of the form

  Pattern matrices Pattern matrices are a common B1,1 . . . B1,l | p1,1 p1,2 · · · p1,n representation for pattern-matching algorithms. A m×n  .. .. .. .. .. ..  ..  . . . . | . . .  pattern matrix corresponds to a m-disjunction of pattern B . . . B | p p · · · p m,1 m,1 m,2 m,n m,l on n arguments matched in parallel: where the Bi,k are binding sets, sets of variables found   p1,1 p1,2 · · · p1,n | (p1,1 , p1,2 , · · · , p1,n ) in the same position during pattern traversal. Variables  p2,1 p2,2 · · · p2,n  | (p2,1 , p2,2 , · · · , p2,n )   of different columns correspond to different binding po .. .. ..  is | . . . . .  . sitions, so they may bind distinct values. . . .  | (p , p , · · · , p ) The type-checker ensures that the two sides of an orm,1 m,2 m,n pm,1 pm,2 · · · pm,n pattern (p | q) bind the same variables, and that patA central operation is to split a matrix into sub- terns are otherwise linear – each variable occurs once. matrices along a given column, for example the first col- This guarantees that all rows bind the same environumn. Consider the matrix ment, and that each variable occurs either in a single   pattern of the row, or in one of the binding sets. K1 (q1,1 ) p1,2 · · · p1,n Variable binding at the head of the leftmost pattern K2 (q2,1 , q2,2 ) p2,2 · · · p2,n  are moved it to the rightmost binding set.        p3,1 · · · p3,n  . . . B1,l | (p as x) . . . ⇒ . . . (B1,l ∪ {x}) | p . . . K2 (q4,1 , q4,2 ) p4,2 · · · p4,n     . . . B1,l | x . . . ⇒ . . . (B1,l ∪ {x}) |

...

The first element of a n-tuple matching some row of the matrix starts with either (1) the head constructor K1 , We insert a new binding set when splitting on the head or (2) K2 , or (3) another one. The three following sub- constructor of the first pattern row: head variables of the matrices thus describe the shape of all possible values new rows bind to a different position.   matching this pattern – with the head constructor of the Bi,1 . . . Bi,l | K(q1 , . . . , qk ) pi,2 . . . pi,m   first column removed: ⇒ Bi,1 . . . Bi,l ∅ | q1 . . . qk pi,2 . . . pi,m     q1,1 p1,2 · · · p1,n q2,1 q2,2 p2,2 · · · p2,n When traversal ends on a matrix with empty rows, we (1) p3,1 · · · p3,n (2)  p3,1 · · · p3,n compute stability of this matrix from the binding sets:     q4,1 q4,2 p4,2 · · · p4,n B1,1 . . . B1,l | p ··· p (3) 3,1

3,n

 .. ..  . . Bm,1 . . .

A variable is stable in a matrix if it is stable in each of its sub-matrices. If a pattern in the column we wish to split does not start with a head constructor or , but with an orpattern, one can simplify it into two rows:   " # q1 r (q1 | q2 ) r q2 r  =⇒   .. .. . .. . . . . .

.. .

Bm,l

 |  |

Binding sets along a given column correspond to variables that are bound at the same position for all possible ways to enter this sub-matrix. The intersection of these sets thus gives the stable variables of the column. Because the variable sets are disjoint, a variable stable for a column cannot appear anywhere else. Acknowledgments This subtle bug was brought to our attention by Arthur Chargu´eraud, Martin Clochard and Claude March´e. Fran¸cois Pottier made the elegant remark that ambiguity corresponds to non-commutative or-patterns – (p | q) different from (q | p).

After repeated splitting, a column ends up with only nullary constructors or universal patterns _; the next split removes the column. Eventually, repeated splitting terminates on a matrix with several rows but no columns. 2

Ambiguous pattern variables - The ML Family Workshop

Jul 29, 2016 - Let us define .... where the Bi,k are binding sets, sets of variables found ... new rows bind to a different position. [Bi,1 ... Bi,l. | K(q1,...,qk) pi,2.

136KB Sizes 1 Downloads 253 Views

Recommend Documents

Mergeable Types - ML Family Workshop
systems with the ability to define and compose distributed ML computations around ... library on a single machine, this implementation behaves as expected.

Tierless Modules - The ML Family Workshop
Web, client/server, OCaml, ML, Eliom, functional, module. 1 INTRODUCTION. Traditional Web applications are composed of several dis- tinct tiers: Web pages ...

Arduino programing of ML-style in ATS - ML Family Workshop
binaries generated from ATS source are very close (in terms of size) to those generated from the C counterpart. 2. ATS programming language. ATS is a programming language equipped with a highly expressive type system rooted in the framework Applied T

Relational Conversion for OCaml - ML Family Workshop
preters (Programming Pearl) // Proceedings of the 2012 Work- shop on Scheme and Functional Programming (Scheme '12). [5] Henk Barendregt. Lambda ...

Sundials/ML: interfacing with numerical solvers - ML Family Workshop
Sep 22, 2016 - 4. REFERENCES. [1] T. Bourke and M. Pouzet. Zélus: A synchronous language with ODEs. In HSCC, pages 113–118. ACM. Press, Apr. 2013.

Sundials/ML: interfacing with numerical solvers - ML Family Workshop
Sep 22, 2016 - [email protected]. Jun Inoue. National Institute of Advanced. Industrial Science and. Technology. [email protected]. Marc Pouzet. Univ. Pierre et Marie Curie. École normale supérieure,. PSL Research University. Inria Paris.

Relational Conversion for OCaml - The ML Family Workshop
St.Petersburg State University .... Logic in Computer Science (Vol. 2), 1992. [6] William E. ... Indiana University, Bloomington, IN, September 30, 2009. [7] Dmitry ...

Typer: An infix statically typed Lisp - The ML Family Workshop
Oxford, UK, September 2017 (ML'2017), 2 pages. ... the syntax of macro calls is just as exible as that of any other .... Conference on Functional Programming.

VOCAL – A Verified OCAml Library - ML Family Workshop
OCaml is the implementation language of systems used worldwide where stability, safety, and correctness are of ... An overview of JML tools and applications.

VOCAL – A Verified OCAml Library - ML Family Workshop
Libraries are the basic building blocks of any realistic programming project. It is thus of utmost .... verification of object-oriented programs. In 21st International ...

Extracting from F* to C: a progress report - The ML Family Workshop
raphy (ECC) primitives, and on extracting this code to C. ... verification extract the code back to C. .... pointers are made up of a block identifier along with an.

Extracting from F* to C: a progress report - The ML Family Workshop
sub-tree untouched. In short, hyperheaps provide framing guarantees. Each sub-tree is assigned a region-id (rid), and a hyperheap maps an rid to a heap.

GADTs and exhaustiveness: looking for the impossible - ML Family ...
... !env expected_ty) expected_ty k else k (mkpat Tpat_any expected_ty). | Ppat_or (sp1, sp2) -> (* or pattern *) if mode = Check then let state = save_state env in try type_pat sp1 expected_ty k with exn ->. 3The code is available through OCaml's Su

GADTs and exhaustiveness: looking for the impossible - ML Family ...
log's SLD resolution, for which counter-example genera- tion (i.e. construction of a witness term) is known to be only semi-decidable. Another way to see it is that ...

Ambiguous Business Cycles
NBER WORKING PAPER SERIES. AMBIGUOUS BUSINESS CYCLES. Cosmin Ilut. Martin Schneider. Working Paper 17900 http://www.nber.org/papers/w17900.

Polymorphism, subtyping and type inference in MLsub - ML Family ...
Sep 3, 2015 - Polymorphism, subtyping and type inference in. MLsub. Stephen Dolan and Alan Mycroft ... We have two tricks for getting around the difficulties: • Define types properly. • Only use half of them. 2 ... Any two types have a greatest c

Billerica Public Schools Family Workshop ...
Mar 8, 2016 - This workshop is an introduction to Google Apps such as Google Docs, Slides, Calendar, and Gmail ... Parents will learn about how to use Aspen to it's greatest potential by reviewing settings, setting up .... home by explaining the posi

Polymorphism, subtyping and type inference in MLsub - ML Family ...
Sep 3, 2015 - Polymorphism, subtyping and type inference in. MLsub. Stephen Dolan and Alan Mycroft ... We have two tricks for getting around the difficulties: • Define types properly. • Only use half of them. 2 ... Any two types have a greatest c

Page 1 Z 7654 ML ML LEAL ML ML 8_2m1L _22.13_ _BML _BML ...
S e e e cl S t L_l cl 1 o. TITLE: ñrch BLE v1.84. Design: v? 32. 31. 29. 28. || 27. 26. 25. 19. En „3 21. En ai 22. En „5 23. En ná 24. 123456789 ...

3rd International Workshop on Pattern Recognition ...
electronic health records recording patient conditions, diagnostic tests, labs, imaging exams, genomics, proteomics, treatments, ... Olav Skrøvseth, University Hospital of North Norway. Rogerio Abreu De Paula, IBM Brazil ..... gram for World-Leading

On the Value of Variables
rewriting rules at top level, and then taking their closure by evaluation contexts. A peculiar aspect of the LSC is that contexts are also used to define the rules at top level. Such a use of contexts is how locality on proof nets (the graphical lang

On the Value of Variables
Call-by-value and call-by-need λ-calculi are defined using the distinguished ... (or imperative extensions of Plotkin's calculus [3]) employ a notion of practical ..... and theoretical values it would evaluate exactly in the same way as for CBN.