GADTs and exhaustiveness: looking for the impossible Jacques Garrigue and Jacques Le Normand

1

Synopsys

Function f is a classical GADT function, where different branches instantiate the type parameter differently. It is Sound exhaustiveness checking of pattern-matching is an clearly exhaustive. If we only look at the constructors, essential feature of GADTs, and OCaml has supported it the function g is not exhaustive. However, its input type from day one, by showing that the remaining cases could is restricted to int t, which is incompatible with the connever be typed [1]. Not only does it allow the programmer structor Bool, so that the only valid input is Int, making to be confident in the soundness of his code, but it also it exhaustive. Function h, is also exaustive, because repermits optimizations which make GADTs more efficient. quired x and y to have the same type. These functions However, while this approach is sound and can prune some have useful instances, and we want them to be recognized simple uses of GADTs, some other uses caused superfluous as exhaustive. warnings. In this talk we describe the original approach and how we ensure its soundness, and show that one can do better by turning the type-checking of extra cases into 3 First implementation a backtracking proof search algorithm. We also show that the exhaustiveness problem is undecidable for GADTs, so We were in a conundrum: a complete exhaustiveness that this proof search must be kept partial. check seemed very difficult1 , yet we wanted to add GADTs to the OCaml compiler and we needed a simple way to check the exhaustiveness. Our initial algorithm simply took all the missing patterns from the exhaustiveness 2 GADTs and exhaustiveness checker, and type checked them one by one in order to see Checking the exhaustiveness of pattern-matching is a dif- if they were actually possible patterns. Unfortunately the ficult problem. Technically, it is about checking whether original exhaustiveness algorithm did not return a comthere are values of the matched type that are not cov- plete enough set of patterns. For example, in the previous ered by the cases of the pattern-matching. There are example h, the exhaustiveness checker would only return well-known techniques to handle this problem for alge- Int, Bool as a missing pattern, while we also needed to braic datatypes [3], but they do not attempt to tackle check that Bool, Int is an invalid pattern to remove semantical questions, such as whether such a value can be any possible doubt that h is actually exhaustive. Consebuilt or not. For instance consider the following example: quently, we modified the exhaustiveness checker so that it would return a complete set of missing patterns. However, type empty = {e : ’a. ’a} as we will see, enthusiastic GADT users were more clever let f : empty option -> unit = function None -> () than the checker, and they got exhaustiveness warnings Since there is no way to build a value of type empty, this where none should be [5]. match is actually exhaustive, but the checker will still report a missing Some case. For normal types, this limitation does not matter: why Abstract types would one intentionally introduce an empty type? How- 4 ever, in the case of GADTs, the problem becomes acute. To make matters worse, in OCaml it is impossible to know type _ t = if two abstract types exported from other modules are | Int : int t equal or not. Take, for example: | Bool : bool t

type (_, _) cmp = | Eq : (’a, ’a) cmp | Any: (’a, ’b) cmp module A : sig type a type b val eq : (a, b) cmp end = struct type a type b = a let eq = Eq end let f : (A.a, A.b) cmp -> unit = function Any -> ()

let f : type a. a t -> a = function | Int -> 1 | Bool -> true let g : int t -> int = function | Int -> 1

This program properly signals that the function f is non exhaustive. Indeed, even though the types A.a and A.b appear to be different outside of module A, they are in fact

let h : type a. a t -> a t -> bool = fun x y -> match x, y with | Int, Int -> true | Bool, Bool -> true

1 It

1

turned out to be undecidable.

the same. To handle these kinds of cases, a new compatibility relation is introduced, and when the type checkers tries to unify indices of GADTs during pattern typing, it refers to this relation for non-unifiable type constructors, rather than immediately raising a unification error. In particular this compatibility relation assumes that abstract types are compatible with all other types. In this particular case, when typing the missing case Eq, it simply assumes A.a and A.b are compatible. In other words, the type checker is far more permissive with GADT indices inside patterns than inside expressions. In doubt, it is better to permit possibly impossible patterns and to reject potentially unsafe expressions. Note that since we use exactly the same function to type-check patterns and to check exhaustiveness, if the exhaustiveness check reports a missing pattern, then type checking will always allow it2 .

5

zero and succ encode type level natural numbers. plus is the Peano version of addition, in relational form; namely there is a term (a, b, c) plus if and only if a + b = c. trivial can be easily checked, as (zero succ, zero, zero) does not match either of Plus0 and PlusS. easy is a bit more difficult, as it seems to match Plus0, but unification between zero succ and zero fails later. For harder, unification with PlusS succeeds, however the argument becomes (zero, zero succ, zero) plus, which was inferred empty in easy. In deep, trivial and easy it is sufficient to explode the first according to its inferred type. However, harder requires to infer the type of the argument of the GADT constructor PlusS in order to explode it once more. Another interesting case is when there is a dependency between components of a tuple. let inv_zero : type a b c d. (a,b,c) plus -> (c,d,zero) plus -> bool = fun p1 p2 -> match p1, p2 with | Plus0, Plus0 -> true

Exploding and backtracking

While our original approach seemed mostly satisfactory, Here the extra patterns coming from the basic exhaustivethere are cases where it fails. For instance, consider the ness algorithm are: following function: Plus0, PlusS _ PlusS _, _

let deep : char t option -> char = function None -> ’c’

While the first pattern is clearly empty, the second one is typable if one does not explode the second . However, to do that we would need to first infer the type of the second component of the pair, which depends on the freshly generated first component. In this case again, typing patterns (for checking emptyness) and exploding wildcards Some Int must be interleaved. Some Bool The solution to this conundrum is to actually do all of Then we can call the type checker as before, to verify that these simultaneously. Namely, we modified the recursive they are incompatible with the given type. type pat3 , which is the main function for typechecking Note that as soon as we start to do deeper case anal- patterns, in order to turn it into a proof-searching funcysis, the approach switches from just checking whether a tion. The basic idea is to make it non-deterministic. Howpattern is type to checking whether a particular type is ever, since this function uses side-effecting unification, reinhabited by terms of a certain form. Here are a few more turning a list of results would not be easy. Rather we examples of the same kind, by order of difficulty. converted to continuation passing style, using backtracking to cancel unification where needed. In particular, it is type zero = Zero sufficient to explode wild cards into or-patterns, as they type _ succ = Succ are then interpreted in a non-deterministic way, allowing to check all combinations. type (_,_,_) plus =

Since t is only defined for int and bool, char t is actually the empty type, i.e. there are no values of the form Some at type char t option. However, to see that one needs to explode into its different cases, and check them separately. This gives us the following two patterns:

| Plus0 : (zero, ’a, ’a) plus | PlusS : (’a, ’b, ’c) plus -> (’a succ, ’b, ’c succ) plus

(* mode is Check or Type, k is the continuation *) let rec type_pat mode env spat expected_ty k = match spat.ppat_desc with | Ppat_any -> (* wild card *) if mode = Check && is_gadt expected_ty then type_pat mode env (explode_pat !env expected_ty) expected_ty k else k (mkpat Tpat_any expected_ty) | Ppat_or (sp1, sp2) -> (* or pattern *) if mode = Check then let state = save_state env in try type_pat sp1 expected_ty k with exn ->

let trivial : (zero succ, zero, zero) plus option -> bool = function None -> false let easy : (zero, zero succ, zero) plus option -> bool = function None -> false let harder : (zero succ, zero succ, zero succ) plus option -> bool = function None -> false 2 For a long time this was not the case with GHC. But there is some progress in the Haskell world too [2].

3 The code is available through OCaml’s Subversion server: svn checkout http://caml.inria.fr/svn/branches/gadt-warnings.

2

6

set_state state env; type_pat sp2 expected_ty k else (* old code *) | Ppat_pair (sp1, sp2) -> (* pair pattern *) let ty1, ty2 = filter_pair env expected_ty in type_pat mode env sp1 ty1 (fun p1 -> type_pat mode env sp2 ty2 (fun p2 -> k (mkpat (Tpat_pair (p1,p2) expected_ty)))) | ... (* other cases in CPS *)

at this particular position. However, this also requires introducing cases without right-hand side at the syntactic level. Another approach would be to use attributes on the match and function constructs to indicate how hard we want the checker to try: function [@exhaust 10] None -> (). In this case we would need a precise definition of the strength of the search.

Undecidability and heuristics

The dual of exhaustiveness checks is the detection of unused cases. Take, for example:

7

let deep’ : char t option -> char = function | None -> ’c’ | Some _ -> ’d’

The above definition of type pat, in all its expressiveness power, gives also a strong hint at why exhaustiveness checking of GADTs in undecidable. A simple way to see it is that GADTs can encode Horn clauses in a very direct way, each type definition being a predicate, and each constructor a clause, with its arguments the premises. Then the type pat functions precisely implements Prolog’s SLD resolution, for which counter-example generation (i.e. construction of a witness term) is known to be only semi-decidable. Another way to see it is that one can encode execution traces of some arbitrary Turing machine in a GADT definition, so that exhaustiveness checking is equivalent to the halting problem. This undecidability means that we have to find a good heuristics as to where to abandon the search. Note that the complexity is exponential in the number of wildcard patterns exploded. A simple heuristics, that seems sufficient in most cases, is to only explode wildcard patterns which have only GADT constructors and do not explode any of the generated subpatterns. This means that the harder example above would be flagged non-exhaustive while all the other examples would be correctly identified as exhaustive. Here is another example which would be incorrectly flagged: let deeper function Warning 8: Here is an Some _

Unused cases

Since we added a pattern at the end of an already exhaustive match, it is clearly redundant. The approach is similar: after refining the pattern to keep only subcases that are not covered by previous cases, one must check whether they are inhabited or not. Currently this second check is not done; doing it would require making the redundancy algorithm return an explicit list of cases. While detecting usused cases is technically less important —there is no direct impact on soundness for instance—, having accurate warnings would help the programmer reason about his program. Note however that we cannot hope to detect all unused cases, in the same way that we cannot guarantee that all counter-examples of exhaustiveness are really inhabited.

References [1] Jacques Garrigue and Jacques Le Normand. Adding GADTs to OCaml: the direct approach. In OCaml Meeting, September 2011.

: (char t * bool) option -> char = None -> ’c’ this pattern-matching is not exhaustive. example of a value that is not matched:

[2] Georgios Karachalias, Tom Schrijvers, Dimitrios Vytiniotis, and Simon Peyton Jones. GADTs meet their match. In ICFP, 2015.

Les avertissements du filtrage. Here the wild card corresponds to a tuple type, so that the [3] Luc Maranget. In Journ´ e es Francophones des Langages Applicatifs, case-analysis would stop there. Even in this very limited 2003. approach, one can still exhibit an exponential behavior: type _ t = A : int t | B : bool type (_,_,_,_) u = U : let f : type a b c d e a t * b t * c t * d * (a,b,c,d) u * function A, A, A, A,

[4] Ulf Norell. Dependently typed programming in Agda. In AFP 2008, volume 5832 of Springer LNCS, pages 230–266, 2009.

t | C : char t | D : float t (int, int, int, int) u f g h. t * e t * f t * g t * h t (e,f,g,h) u -> unit = A, A, A, A, U, U -> ()

[5] GADT exhaustiveness check incompleteness. OCaml problem report #6437, May 2014. http://caml.inria.fr/mantis/view.php?id=6437.

The above check takes about 10 seconds to exhaust all 65536 cases. As in Prolog, one can dramatically improve performance by changing the pattern order. Independently of the heuristics chosen, there will always be cases where one would like the algorithm to try harder. We can think of at least two ways to handle those. One is to introduce an absurd pattern ` a la Agda [4]. This pattern would tell the checker to try hard to prove emptyness 3

GADTs and exhaustiveness: looking for the impossible - ML Family ...

log's SLD resolution, for which counter-example genera- tion (i.e. construction of a witness term) is known to be only semi-decidable. Another way to see it is that ...

174KB Sizes 4 Downloads 217 Views

Recommend Documents

GADTs and exhaustiveness: looking for the impossible - ML Family ...
... !env expected_ty) expected_ty k else k (mkpat Tpat_any expected_ty). | Ppat_or (sp1, sp2) -> (* or pattern *) if mode = Check then let state = save_state env in try type_pat sp1 expected_ty k with exn ->. 3The code is available through OCaml's Su

Relational Conversion for OCaml - ML Family Workshop
preters (Programming Pearl) // Proceedings of the 2012 Work- shop on Scheme and Functional Programming (Scheme '12). [5] Henk Barendregt. Lambda ...

Tierless Modules - The ML Family Workshop
Web, client/server, OCaml, ML, Eliom, functional, module. 1 INTRODUCTION. Traditional Web applications are composed of several dis- tinct tiers: Web pages ...

Mergeable Types - ML Family Workshop
systems with the ability to define and compose distributed ML computations around ... library on a single machine, this implementation behaves as expected.

Ambiguous pattern variables - The ML Family Workshop
Jul 29, 2016 - Let us define .... where the Bi,k are binding sets, sets of variables found ... new rows bind to a different position. [Bi,1 ... Bi,l. | K(q1,...,qk) pi,2.

Relational Conversion for OCaml - The ML Family Workshop
St.Petersburg State University .... Logic in Computer Science (Vol. 2), 1992. [6] William E. ... Indiana University, Bloomington, IN, September 30, 2009. [7] Dmitry ...

Arduino programing of ML-style in ATS - ML Family Workshop
binaries generated from ATS source are very close (in terms of size) to those generated from the C counterpart. 2. ATS programming language. ATS is a programming language equipped with a highly expressive type system rooted in the framework Applied T

Sundials/ML: interfacing with numerical solvers - ML Family Workshop
Sep 22, 2016 - 4. REFERENCES. [1] T. Bourke and M. Pouzet. Zélus: A synchronous language with ODEs. In HSCC, pages 113–118. ACM. Press, Apr. 2013.

Sundials/ML: interfacing with numerical solvers - ML Family Workshop
Sep 22, 2016 - [email protected]. Jun Inoue. National Institute of Advanced. Industrial Science and. Technology. [email protected]. Marc Pouzet. Univ. Pierre et Marie Curie. École normale supérieure,. PSL Research University. Inria Paris.

Polymorphism, subtyping and type inference in MLsub - ML Family ...
Sep 3, 2015 - Polymorphism, subtyping and type inference in. MLsub. Stephen Dolan and Alan Mycroft ... We have two tricks for getting around the difficulties: • Define types properly. • Only use half of them. 2 ... Any two types have a greatest c

Polymorphism, subtyping and type inference in MLsub - ML Family ...
Sep 3, 2015 - Polymorphism, subtyping and type inference in. MLsub. Stephen Dolan and Alan Mycroft ... We have two tricks for getting around the difficulties: • Define types properly. • Only use half of them. 2 ... Any two types have a greatest c

Typer: An infix statically typed Lisp - The ML Family Workshop
Oxford, UK, September 2017 (ML'2017), 2 pages. ... the syntax of macro calls is just as exible as that of any other .... Conference on Functional Programming.

VOCAL – A Verified OCAml Library - ML Family Workshop
OCaml is the implementation language of systems used worldwide where stability, safety, and correctness are of ... An overview of JML tools and applications.

VOCAL – A Verified OCAml Library - ML Family Workshop
Libraries are the basic building blocks of any realistic programming project. It is thus of utmost .... verification of object-oriented programs. In 21st International ...

Page 1 Z 7654 ML ML LEAL ML ML 8_2m1L _22.13_ _BML _BML ...
S e e e cl S t L_l cl 1 o. TITLE: ñrch BLE v1.84. Design: v? 32. 31. 29. 28. || 27. 26. 25. 19. En „3 21. En ai 22. En „5 23. En ná 24. 123456789 ...

The ML Test Score: A Rubric for ML Production ... - Research at Google
lead to surprisingly large amounts of technical debt [1]. Testing and ... rapid, low-latency inference on a server. Features are often derived from large amounts of data such as streaming logs of incoming data. However, most of our recommendations ap

The ML Test Score: A Rubric for ML Production ... - Research at Google
As machine learning (ML) systems continue to take on ever more central roles in real-world production settings, ... machine learning models in real world systems [6]. Those rules are complementary to this rubric, which ...... professional services an

Extracting from F* to C: a progress report - The ML Family Workshop
raphy (ECC) primitives, and on extracting this code to C. ... verification extract the code back to C. .... pointers are made up of a block identifier along with an.

The Impossible Conversion.pdf
Jesus said, “I am the light of the world". (John 9:5). And, “This is ... the rest of your life! In fact, repentance is. the lifestyle of a healthy Christian! At times, it will be HARD! Hard or not, acceptance or refusal of this. GOOD NEWS will de

ALOJA-ML: A Framework for Automating Characterization and ... - UPC
Aug 11, 2015 - we prepared new a setup (on premise, 8 data nodes, 12 core,. 64 RAM, 1 .... 20paper-HadoopPerformanceTuning.pdf (Jan 2015). [9] D. Heger ...