Fundamentals of Functional Programming.pdf

Viewer
Transcript

Fundamentals of Functional Programming Lecture 1

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Syllabus I

λ-calculus ■

■

type-free λ-calculus as an abstract programming language (confluence, reduction strategies, datatype representation); relation to computability (Turing-completeness, infinitary datatypes);

■

simply typed λ-calculi (Church-style, Curry-style);

■

polymorphic λ-calculi (the λ-cube, logical interpretation);

■

proofs as programs.

2 of 472

Syllabus II

Category theory ■

■

categories (abstract and concrete categories, product categories, subcategories); basic constructions (morphisms, limits and colimits, exponentials, subobject classifiers, power objects);

■

functors and natural transformations (definitions and examples);

■

adjunctions (definition and examples);

■

models of the λ-calculi (Lambek’s constructions).

3 of 472

Texts I λ-calculus: ■

■

■

[HS] J.R. Hindley and J.P. Seldin, Lambda-calculus and Combinators: an Introduction, Cambridge University Press (2008). ISBN: 978-0521898850 The lectures are based on this book [Bar] H.P.Barendregt The Lambda Calculus: Its Syntax and Semantics, 2nd edition, North-Holland Elsevier (1984). ISBN: 978-0444875082 [Bar2] H.P. Barendregt, Lambda Calculus with Types, Chapter 2 of S. Abramsky, D.M. Gabbay, T.S.E. Maibaum (editors), Handbook of Logic in Computer Science, volume 2, Clarendon Press (1992). ISBN: 978-0198537618

4 of 472

Texts II

λ-calculus: ■

■

[Pierce] B.C. Pierce, Types and Programming Languages, MIT Press (2002). ISBN: 978-0262162098 [Sel] Peter Selinger, Lecture Notes on the Lambda Calculus, http://www.mscs.dal.ca/~selinger/papers/lambdanotes.pdf

5 of 472

Texts III Category theory: ■

■

■

[Pierce2] B.C. Pierce, Basic Category Theory for Computer Scientists, MIT Press (1991). ISBN: 978-0262660716 The lectures are based on this book [MacLane] S. Mac Lane, Categories for the Working Mathematician, 2nd edition, Springer-Verlag (1998). ISBN: 978-0387984032 [Goldblatt] R. Goldblatt, Topoi: The categorical analysis of logic, Elsevier (1984). ISBN: 978-0444867117 This book has been republished at a nicer price by Dover (2009). ISBN: 978-0486450261.

6 of 472

Texts IV Category theory: ■

■

[Lambek] L. Lambek and P.J. Scott, Introduction to Higher Order Categorical Logic, Cambridge University Press (1988). ISBN: 978-0521356534 [Cats] J. Adamek, H. Herrlich and G. Strecker, Abstract and concrete categories: The joy of cats, http://www.tac.mta.ca/tac/reprints/articles/17/tr17.pdf

■

[Rydeheard] D.E. Rydeheard, R.M. Burnstall, Computational Category Theory, Prentice Hall (1988). ISBN: 978-0131627369. http://www.cs.man.ac.uk/~david/categories/book/book.pdf

7 of 472

Texts V The ML programming language: ■

■

[Paulson] L.C. Paulson, ML for the working programmer, 2nd edition, Cambridge University Press (1996). ISBN: 978-0521565431 The lectures are based on this book [Harper] R. Harper, Programming in Standard ML, Carnegie Mellon University (2009). http://www.cs.cmu.edu/~rwh/smlbook/online.pdf

There are no mandatory textbooks: the material we cover is standard, so any reasonable book on the topics of interest will do.

8 of 472

Examination

The examination will be oral. It covers the whole program, and the student will be asked to answer to simple exercises, to prove results as seen in lessons, and to show understanding of the basic concepts of the course. Examinations will take place six times per year, at prefixed dates. Whoever wants to undertake the examination, must subscribe for the date. Students are required to bring their study material (books, handouts, etc.) to the examination.

9 of 472

Introduction This course wants to introduce the Mathematics of functional programming. Essentially, functional programming is a question of style. A functional programmer thinks to her/his code as a formal, mathematical object which computes the desired result. S/he does so by avoiding useless devices as variables, assignments, . . . In a word, functional means without state. Also, functional programs tend to be very short, very compact and very flexible. They are also harder to understand and to write since they require a deep understanding of the problem. This course explains how to think in a functional style and, thus, how to write functional programs.

10 of 472

An example: Quicksort I

The idea of quicksort is to sort an array by choosing an element, called pivot, and sorting the sub-arrays of the elements strictly less and strictly greater than the pivot. Then, the sorted array is given by the less-than sorted sub-array, the pivot and the greater-than sub-array. A recursive application of this rule calculates a sorted array for any given array. Usually, an iterative version of this algorithm is used, since “recursion is bad for efficiency”.

11 of 472

An example: Quicksort II A precise statement of the algorithm immediately reveals that we have to state the trivial cases of an empty array, and of an array containing just one element. We stipulate that arrays of these forms are sorted. Then, the algorithm qs taking A as input can be easily described as: ■

if A = [], i.e., A is empty, then qs(A) = A;

■

if A = [x], i.e., A contains just one element x, then qs(A) = A;

■

if A = a :: B, i.e., A contains at least one element a followed by another array B, then the result is qs(A) = qs(A1 ) @ [a] @ qs(A2 ), where A1 is the array containing the elements of B less than a and A2 is the array containing the elements of B greater than a.

12 of 472

An example: Quicksort III A direct encoding of this description in ML, gives: fun qs [] = [] | qs [x] = [x] | qs (a :: B) = let fun partition(l , r , []) = (qs l ) @ [a] @ (qs r ) | partition(l , r , x :: Xs) = if x <= a then partition(x :: l , r , Xs) else partition(l , x :: r , Xs) in partition([], [], B) end;

13 of 472

An example: Quicksort IV As you can see, the code is compact. There are no extra variables which are not needed in the definition. A simple test reveals that the code is fast, too. This is due to the fact that partition is tail-recursive. A simple test reveals that the code uses more space than the usual iterative version in, let say, C language. This is not a problem on modern computer with Gigabytes of RAM; in any case, a space efficient version can be developed. The major features toward a clean representation are recursion and pattern matching in the definition. 14 of 472

Another example: Summation I

We want to write a function to calculate the sum of the elements in a given list. A simple solution would be: fun sum [] =0 | sum(x :: xs) = x + (sum xs); This solution is correct, but unnecessarily complex.

15 of 472

Another example: Summation II We can introduce an abstract construction, fun foldr f z [] =z | foldr f z (x : xs) = f x (foldr f z xs); This higher-order functional expands to foldr f z [1, 2, 3, 4, 5] = f 1 (f 2 (f 3 (f 4 (f 5 z)))) . In this way, summation can be defined simply as val sum = foldr (op +) 0;

16 of 472

Another example: Summation III Programming by means of an extensive use of higher-order functionals, allows to simplify the way to encode functions. Higher-order functionals encode the “computational pattern” we need, and so we confine the complexity of a program into a few abstract pieces. In fact, in the sum example, we defined the iterative structure of the sum into foldr, a higher-order functional, which is then used to define our target function. If a computational pattern is frequent, it is very convenient to encode it as a functional. We want to remark that “thinking by functionals” is easier (when used to), allowing to concentrate on smaller and well-confined problems.

17 of 472

References and Hints

In this section, students will find references for the material which has been explained. Usually, they are textbooks or research articles where the lesson has been taken from. Although it is not compulsory to study these references, they can provide background information or a deeper insight on what has been explained.

18 of 472

Fundamentals of Functional Programming Lecture 2

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline

This is the first real lesson in this course: our aim is to introduce the λ-calculus in its untyped version. As a side aspect, we want to introduce also a presentation style, which is predominant in mathematical expositions, to allow students to get used to it. For these reasons, this lesson will be very short.

20 of 472

A programming introduction to λ-calculus The λ-calculus has been invented by Alonzo Church around 1930, as a mathematical tool to describe and to study the properties of computable functions. The λ-calculus is extremely simple, with a bare-bone syntax allowing for simple and compact proofs. There are different versions of the λ-calculus: ■ ■

pure λ-calculus is the easiest and more powerful system; typed λ-calculi are variations over the pure version, where terms have types in some algebra.

Every functional programming language is a (typed) λ-calculus with some additional constructs, for performance and clarity. We will start presenting the pure λ-calculus. 21 of 472

Syntax I Assume to have an infinite (denumerable) set of variables V .

Definition 2.1 (λ-term) A λ-term is inductively defined as follows, along with FV, the set if its free variables: ■ ■

■

if x ∈ V then x is a λ-term and FV(x) = { x }; if M and N are λ-terms, then (M · N) is a λ-term and FV(M · N) = FV(M) ∪ FV(N); these λ-terms are called applications and, usually, the · operation is not written; if x ∈ V and M is a λ-term, then (λx . M) is a λ-term and FV(λx . M) = FV(M) \ { x }; these λ-terms are called abstractions.

We should think to λ-terms as our programs and data. 22 of 472

Syntax II To simplify notation, capital letters (M, N, P, . . . ) will denote arbitrary λ-terms; also, x, y , z, u, v , w will denote variables. Parentheses are suppressed according to the following rules: ■

we always omit outermost parentheses;

■

we write (λx . P Q) for (λx . (P Q));

■

■

we write (M1 M2 · · · Mn ) for ((. . . (M1 · M2 ) . . .) · Mn ), i.e., application associates to the left; we write (λx1 , x2 , . . . , xn . M) for (λx1 . (λx2 . (. . . (λxn . M) . . . )))

Finally, we write M ≡ N to indicate that M and N are syntactically identical, i.e., they are equal as strings. Evidently, M N ≡ P Q implies M ≡ P and N ≡ Q, and λx . M ≡ λy . N implies x ≡ y and M ≡ N. The vice versa holds, too. 23 of 472

Syntax III Most proofs and definitions are given by induction on the structure of λ-terms. Here are some useful examples.

Definition 2.2 (Occurrence) For λ-terms P and Q, we say that P occurs in Q iff one of the following cases applies ■

P ≡Q

(*);

■

Q ≡ M N and P occurs in M or P occurs in N;

■

Q ≡ λx . M and P occurs in M.

An occurrence of P in Q is whenever the clause (*) applies.

Definition 2.3 (Scope) We say that the occurrence of M in λx . M is the scope of λx . M in P if λx . M occurs in P. We also say that x is bounded in that scope. 24 of 472

Syntax IV Definition 2.4 (Substitution) For any M, N λ-terms and x variable, M[N/x] denotes the λ-term where every unbounded occurrence of x in M is substituted with N. Explicitly ■

x[N/x] ≡ N;

■

y [N/x] ≡ y where x 6≡ y ;

■

(P Q)[N/x] ≡ (P[N/x])(Q[N/x]);

■

(λx . P)[N/x] ≡ λx . P;

■

(λy . P)[N/x] ≡ λy . P if x 6≡ y and x 6∈ FV(P);

■

(λy . P)[N/x] ≡ λy . P[N/x] if x 6≡ y , x ∈ FV(P) and y 6∈ FV(N);

■

(λy . P)[N/x] ≡ λz . (P[z/y ])[N/x] if x 6≡ y , x ∈ FV(P), y ∈ FV(N) and z 6∈ FV(N P).

25 of 472

Syntax V We stipulate that programs do not depend on the names of their bounded variables. This convention is captured via the following definition.

Definition 2.5 (α-conversion) We say that P α-converts to Q, notation P ≡α Q, if P and Q are identical except for a renaming of bounded variables. In formal terms, P ≡α Q iff: ■

P ≡ Q;

■

P ≡ M N, Q ≡ M 0 N 0 and M ≡α M 0 , N ≡α N 0 ;

■

P ≡ λx . M, Q ≡ λy . N and M[z/x] ≡α N[z/y ], where z 6∈ FV(M N).

26 of 472

Syntax VI Lemma 2.6 1. If P ≡α Q then FV(P) = FV(Q); 2. The relation ≡α is an equivalence, i.e., ä ä ä

P ≡α P; P ≡α Q implies Q ≡α P; P ≡α Q and Q ≡α R implies P ≡α R.

Proof. The first statement is by induction on the definition of α-conversion. The second statement is easy, except for symmetry, which is treated by induction on the definition of α-conversion. [Exercise] Complete the details. 27 of 472

Operational Semantics I Definition 2.7 (β-reduction) Any term of the form (λx . M)N is called a β-redex and the term M[N/x] is called its contractum. A term P β-reduces to Q in one step, notation P B1β Q, iff P contains an occurrence of a β-redex (λx . M)N, while Q, contains in the same place, its contractum M[N/x]. In formal terms, if P ≡ P 0 [(λx . M)N/z] and z occurs only once in P 0 , then Q ≡ P 0 [(M[N/x])/z]. We say that P β-reduces to Q, notation P Bβ Q, iff there exists a finite sequence of terms R1 , . . . , Rn such that R1 ≡ P, Rn ≡ Q and, for every i in 1 . . . (n − 1), either Ri B1β Ri +1 or Ri ≡α Ri +1 .

28 of 472

Operational Semantics II Definition 2.8 (β-normal form) A term P which contains no β-redex is said to be in β-normal form or β-nf for short; a term Q in β-nf is called a β-normal form for P if P Bβ Q. The idea behind β-nfs is that a program computes by β-reduction. A program terminates when it has nothing more to compute, i.e., when it is a β-nf. A program P on input M generates the λ-term P M, which computes by β-reduction until it produces a β-nf R which is its output.

29 of 472

Operational Semantics III

There are terms without a normal form: for example Ω ≡ (λx . x x)(λx . x x) cannot reduce to anything but itself. The sequence of reduction matters: for example, calling K ≡ (λx , y . x), the term K K Ω has K as β-nf, but there is a non-terminating sequence of reductions which insists on expanding the useless Ω.

30 of 472

References and Hints This lesson is based on Chapter 1 of [HS]. Some definitions and notations are slightly different, although equivalent. Remember that students are required to compute β-reductions without efforts, and thus you are invited to do exercises on this purpose, which can be found on any textbook. Also, try to understand why definitions have been formalised as shown, and what are the limit cases. Finally, do and redo the proofs until their logic is clear to you. They are required to pass the examination and, most of all, what is evaluated is your understanding rather than your memory. 31 of 472

Fundamentals of Functional Programming Lecture 3

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline We introduced so far the syntax of pure λ-calculus. Also, we stipulated that programs are identified modulo a renaming of local variables (α-conversion). Finally, we have shown how a λ-term computes by means of β-reduction. In this lecture, we want to prove that β-reduction acts as a “well-behaved” computational paradigm. Moreover, we want to define a notion of equality that identifies terms which will produce the same “output”.

33 of 472

Confluence I β-reduction behaves naturally with respect to substitution.

Lemma 3.1 1. If P Bβ Q then FV(Q) ⊆ FV(P); 2. If P Bβ P 0 and Q Bβ Q 0 then P[Q/x] Bβ P 0 [Q 0 /x].

Proof. (i) First, we prove that, if M B1β N, then FV(N) ⊆ FV(M). So, let M ≡ M 00 [(λx . M 0 ) N 0 /z] and N ≡ M 00 [(M 0 [N 0 /x])/z]. Thus, FV(N) ⊆ (FV(M 00 ) \ { z }) ∪ (FV(M 0 ) \ { x }) ∪ FV(N 0 ) = FV(M). Containment can be strict, e.g., consider M ≡ (λx . u) v and N ≡ u. Since P ≡ P1 B1β P2 ≡α P20 B1β · · · B1β Pn ≡α Q, by induction on n, the statement (1) holds because α-conversions do not change the set of free variables. ,→ 34 of 472

Confluence II ,→ Proof. (ii) As for (2), by induction on the length of the reduction P Bβ P 0 , it suffices to prove the cases P ≡ P 0 and P B1β P 0 . The former case is evident, since the reduction Q Bβ Q 0 can be replicated in the context P[Q/x] yielding P 0 [Q 0 /x]. The latter case is by induction on the length of the reduction Q Bβ Q 0 . Again it suffices to prove the property when P B1β P 0 and Q B1β Q 0 . It is safe to assume that P contains no bounded variables in FV(x Q) (if not, we can take P ∗ ≡α P). Hence, P[Q/x] Bβ P[Q 0 /x] by reducing the substituted occurrences of Q. But every redex which is present in P, will also be present in P[Q 0 /x], so we can contract it. 35 of 472

Confluence III

Theorem (Church-Rosser) If P Bβ M and P Bβ N, then there exists a term T such that M Bβ T and N Bβ T . The property stated in the Church-Rosser Theorem, is called confluence. An immediate consequence is that if P has a β-nf Q, then it is unique modulo α-conversion. The meaning of confluence is that a program has a unique output, if any, and every reduction strategy which terminates, yields it.

36 of 472

Confluence IV To prove the Church-Rosser Theorem we need a number of auxiliary definitions and lemmas.

Definition 3.2 (Marked λ-terms) Call Λ the set of the λ-terms. The set of marked λ-terms, Λ∗ is the minimal set such that: 1. If x ∈ V then x ∈ Λ∗ ; 2. If M , N ∈ Λ∗ then (M N) ∈ Λ∗ ; 3. If x ∈ V and M ∈ Λ∗ then (λx . M) ∈ Λ∗ ; 4. If x ∈ V and M , N ∈ Λ∗ then ((λ∗ x . M) N) ∈ Λ∗ . Thus, the marked λ-terms have some of their redexes signed by a ∗. 37 of 472

Confluence V Definition 3.3 (β∗ -reduction) Any term in Λ∗ of the form (λx . M) N or (λ∗ x . M) N is called a β∗ -redex and the term M[N/x] is called its contractum. A term P β∗ -reduces to Q in one step, notation P B1β∗ Q, iff P contains an occurrence of a β∗ -redex (λx . M) N or (λ∗ x . M) N, while Q contains in the same place its contractum M[N/x]. We say that P β∗ -reduces to Q, notation P Bβ∗ Q, iff there exists a finite sequence of terms R1 , . . . , Rn such that R1 ≡ P, Rn ≡ Q and, for every i in 1 . . . (n − 1), Ri B1β∗ Ri +1 or Ri ≡α Ri +1 . This definition is the same as Bβ extended to marked terms.

38 of 472

Confluence VI Definition 3.4 (Substitution in Λ∗ ) Substitution in Λ∗ is defined as in Λ, plus the following rules: ■

(λ∗ x . P)[N/x] ≡ λ∗ x . P;

■

(λ∗ y . P)[N/x] ≡ λ∗ y . P if x 6≡ y and x 6∈ FV(P);

■

(λ∗ y . P)[N/x] ≡ λ∗ y . P[N/x] if x 6≡ y , x ∈ FV(P) and y 6∈ FV(N);

■

(λ∗ y . P)[N/x] ≡ λ∗ z . (P[z/y ])[N/x] if x 6≡ y , x ∈ FV(P), y ∈ FV(N) and z 6∈ FV(N P).

Again, this definition is the same as the one for λ-terms extended to marked terms.

39 of 472

Confluence VII Definition 3.5 (Forgetful map) The function | · | : Λ∗ → Λ maps every marked λ-term into the corresponding unmarked term. We write M

| |

/ N if |M | = N.

Definition 3.6 (Contraction map) The function φ : Λ∗ → Λ maps every marked λ-term into a term where marked redexes are contracted: ■

φ(x) = x;

■

φ(M N) = φ(M) φ(N);

■

φ(λx . M) = λx . φ(M);

■

φ((λ∗ x . M) N) = φ(M)[φ(N)/x].

We write M 40 of 472

φ

/ N if φ(M) = N.

Confluence VIII Lemma 3.7 For all M , N ∈ Λ and M 0 ∈ Λ∗ , there is N 0 ∈ Λ∗ and a reduction M 0 Bβ∗ N 0 such that M 0 β∗ / N 0 | |

M

β

| |

/N

commutes, i.e., if |M 0 | = M and M Bβ N, then M 0 Bβ∗ N 0 and |N 0 | = N.

Proof. By induction on the length n of M Bβ N. If n = 0, M ≡ N so N 0 ≡ M 0 . If n = 1, N is obtained by contracting a redex in M and N 0 can be obtained by contracting the same redex in M 0 . Otherwise, M ≡α N and N 0 ≡α M 0 . If n > 1, the conclusion follows by transitivity. 41 of 472

Confluence IX Lemma 3.8 Let M , M 0 , N , L ∈ Λ∗ . Then 1. If x , y ∈ V with x 6≡ y and x 6∈ FV(L), then (M[N/x])[L/y ] = (M[L/y ])[N[L/y ]/x]. 2. φ(M[N/x]) = φ(M)[φ(N)/x]; 3. If M Bβ∗ N then φ(M) Bβ φ(N).

Proof. (1) By induction on the structure of M. (2) By induction on the structure of M, using (1) when M ≡ (λ∗ y . P) Q. (3) By induction on the length of the reduction, using (2). [Exercise] Fill the details. 42 of 472

Confluence X Lemma 3.9 If M ∈ Λ∗ then there is a reduction |M | Bβ φ(M).

Proof. By induction on the structure of M.

Lemma 3.10 (Strip lemma) Let M , N1 , N2 ∈ Λ. If M B1β N1 and M Bβ N2 , then there is N3 ∈ Λ such that N1 Bβ N3 and N2 Bβ N3 . That is, the following diagram commutes M 1β

N1 43 of 472

β

/ N2

β

β

/ N3

Confluence XI Proof. Since M B1β N1 , the redex R ≡ (λx . P) Q occurs in M and gets contracted in N1 . Let M 0 ∈ Λ∗ be obtained by replacing R in M with R 0 ≡ (λ∗ x . P) Q. Then |M 0 | = M and φ(M 0 ) = N1 . By Lemma 3.7 there is N20 ∈ Λ∗ such that |N20 | = N2 and M 0 Bβ∗ N20 . By Lemma 3.8, there is N3 ∈ Λ such that N3 = φ(N20 ) and N1 Bβ N3 . Finally, by Lemma 3.9, it holds that N2 Bβ N3 . In diagrams: M aBQQQ β

BB QQQ β BB QQQ QQQ BB | | QQQ B ( o 0

N1

φ

M β

N2`A

AA AA| | AA A ' ( o

N3

44 of 472

φ

N20

Confluence XII Theorem 3.11 (Church-Rosser) If P Bβ M and P Bβ N, then there exists a term T such that M Bβ T and N Bβ T .

Proof. By induction on the length of the reduction P Bβ M by means of the Strip Lemma. In diagrams: / P1 +3 Pk /M P

N [Exercise] Fill the details. 45 of 472

/ N1

+3 Nk

/T

β-equality I The notion of β-reduction naturally generates an equality between terms:

Definition 3.12 (β-equality) Given M , N ∈ Λ, we say that M =β N, spelt as M is β-equal or β-convertible to N, iff there is a finite sequence P1 , . . . , Pn of terms such that P1 ≡ M, Pn ≡ N and, for every i < n, Pi B1β Pi +1 , or Pi +1 B1β Pi or Pi ≡α Pi +1 . Two terms are equal if they are mutually reducible, modulo a suitable renaming. If we think to terms as stages of a computation, two terms are equal if they are different stages of the same computation.

46 of 472

β-equality II Lemma 3.13 β-equality is an equivalence relation.

Proof. Evident.

Lemma 3.14 If M =β M 0 and N =β N 0 then M[N/x] =β M 0 [N 0 /x].

Proof. By induction on the definition of M =β M 0 , via Lemma 3.1. Thus, β-equality is a congruence, i.e., it is stable under substitution. 47 of 472

β-equality III Theorem 3.15 (Church-Rosser) If M =β N then there is a term L such that M Bβ L and N Bβ L.

Proof. By induction on the generation of =β : Let P1 , . . . , Pn be a sequence such that P1 ≡ M, Pn ≡ N and, for every i < n, P1 B1β Pi +1 or Pi +1 B1β Pi or Pi ≡α Pi +1 . The case n = 1 is trivial. For 1 < i < n, the induction hypothesis says that M Bβ Ti and Pi Bβ Ti since M =β Pi . If Pi B1β Pi +1 then Theorem 3.11 provides the required Ti +1 such that M Bβ Ti +1 and Pi +1 Bβ Ti +1 . Otherwise, if Pi +1 B1β Pi or Pi ≡α Pi +1 then Ti +1 ≡ Ti satisfies the statement. 48 of 472

β-equality IV Corollary 3.16 ■

If P =β Q and Q is a β-nf, then P Bβ Q;

■

If P =β Q and P and Q are β-nfs then P ≡α Q;

■

■

If P =β Q then either P and Q both have the same β-nf (modulo α-conversion), or they both have no normal form; A term P has at most one normal form, modulo α-conversion.

Proof. Since β-nfs have no redexes, (1) and (2) are immediate. (3) is a direct consequence of β-equality being an equivalence relation; (4) is evident from (1) and (2).

49 of 472

References and Hints This lesson is based on Chapter 1 of [HS]. The proof of the Church-Rosser Theorem can be found in Appendix 2 of the same book. Exercises are required during examinations. Most of them are routine proofs which you should perform with minimal efforts. Finally, get familiar with the style of exposing statements and proofs: although somewhat difficult in the beginning, this is the standard way to present mathematical results, and it is adopted in any (decent) book.

50 of 472

Fundamentals of Functional Programming Lecture 4

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline We have proved that the λ-calculus is confluent, that is, if a term reduces in two different ways, then it is possible to perform a reduction on these terms that produces the same result. As a consequence, we know that the output of the computation on a λ-term is unique, if it exists, i.e., if the computation terminates. The different possibilities in reducing a term are a consequence of the non-deterministic nature of β-reduction. We can force a deterministic behaviour by fixing a strategy, that is, an algorithm which, given a λ-term, chooses a redex to contract.

52 of 472

Combinators I

Definition 4.1 (Combinator) A combinator is a λ-term M such that FV(M) = ;. Combinators are especially important in λ-calculus since they resemble programs: they are applied to inputs in order to produce outputs. In this view, λ-terms which are not combinators are thought to as intermediate steps in a computation process.

53 of 472

Combinators II Important combinators are given special names. The following ones are mostly used: ■

I = λx . x — Identity I(x) = x;

■

K = λx , y . x — Constant functions Ka (x) = K(a)(x) = a;

■

S = λx , y , z . x z (y z) — Stronger composition S(f , g )(x) = f (x , g (x));

■

B = λx , y , z . x (y z) — Function composition B(f , g )(x) = f (g (x));

■

B0 = λx , y , z . y (x z) — Reversed composition B0 (f , g )(x) = g (f (x));

■

C = λx , y , z . x z y — Commutative operator C(f )(x , y ) = f (y , x);

■

W = λx , y . x y y — Doubling operator W(f )(x) = f (x , x);

■

Ω = (λx . x x)(λx . x x) — Non-terminating operator.

54 of 472

Combinators III It is possible to show that only S and K are needed to build all the listed combinators. For example, I =β S K K. As a matter of fact, it is possible to prove that the minimal set of terms closed under composition and containing S and K suffices to represent to whole λ-calculus, up to β-equality. Thus, the simplest “programming language” is ■

S and K are “programs”;

■

If M and N are “programs”, then so is (M · N).

along with the reduction rules ■

K x y B x for every x, y “programs”;

■

S x y z B (x z)(y z) for every x, y and z “programs”.

55 of 472

Combinators IV Since λ-calculus is Turing-complete, i.e., every computable function can be represented in this system, it follows that the “simplest programming language” is Turing-complete as well. We will prove later that the λ-calculus is Turing-complete. The “simplest programming language” is called Combinatory Logic and, despite its apparent simplicity, it has a deep and involved theory. In practise, Combinatory Logic is often used to compile functional programs into highly efficient intermediate code, which is then executed by a very fast virtual machine based on the fundamental combinators.

56 of 472

Combinators V For the sake of precision, it is possible to slightly simplify Combinatory Logic by using just one combinator, U = λx . x K S K. The corresponding reduction rule is U x B x ((U U) U)(U (U U))((U U) U). It is easy to prove equivalence of Combinatory Logic with this reduced system, noticing that (U U)U =β K and U(U U) =β S. [Exercise] Prove the β-equivalences. We will not develop any further Combinatory Logic in this course: we will just develop some aspects of interest of the theory of combinators inside λ-calculus.

57 of 472

Fixed Points I In general, given a transformation f , a fixed point of f is any value x such that x = f (x). For example, the function g (x) = x 2 from real numbers to real numbers, has two fixed points, 0 and 1. In the λ-calculus,

Theorem 4.2 (Fixed-point) There is a combinator Y such that Y x Bβ x (Y x).

Proof (by Alan Turing). Let U ≡ λu , x . x (u u x) and let Y ≡ U U. So, Y x ≡ (λu , x . x (u u x)) U x, by definition of U, thus Yx Bβ ((λx . x (uux))x)[U/u] ≡ (λx . x (UUx))x Bβ x (UUx) ≡ x (Yx). 58 of 472

Fixed Points II

Corollary 4.3 ■

There is a combinator Y such that Y x =β x(Y x);

■

The equation M X =β X has X ≡ Y M as a solution;

■

The equation X y1 . . . yn =β Z with y1 , . . . , yn variables and Z a term, can be solved for X , i.e., there is a term X such that X y1 . . . yn =β Z [X /x].

Proof. (1) and (2) are evident. For (3), choose X ≡ Y(λx , y1 , . . . , yn . Z ).

59 of 472

Reduction Strategies I Definition 4.4 (Contraction) Given a λ-term X , a contraction in X is a triple 〈X , R , C 〉 where R is a redex occurrence in X and C is the result of contracting R in X . A contraction is written as X BR C .

Definition 4.5 (Reduction) A reduction ρ is a finite or infinite sequence of contractions separated by α-conversions: ρ ≡ (X1 BR1 Y1 ≡α X2 BR2 Y2 ≡α X3 BR3 Y3 ≡α . . . )

The start of ρ is X1 and the length of ρ is the number of its contractions. If the length of ρ is finite, say n, then Xn+1 is the terminus of ρ . 60 of 472

Reduction Strategies II Contractions and reductions are nothing more than a formal way to mark explicitly reductions and to study them.

Definition 4.6 (Maximal reduction) A reduction ρ is maximal iff either it has infinite length, or its terminus contains no redexes. Thus, a maximal reduction represents a complete computation of its start. We want to classify reductions in order to study their properties and to devise a sound and effective strategy which allows to always produce an output if its exists.

61 of 472

Reduction Strategies III Definition 4.7 (Leftmost reduction) An occurrence of a redex R1 in a term X1 is called maximal iff it is not contained in any other redex occurrence in X1 . It is called leftmost maximal iff it is the leftmost of the maximal redex-occurrences in X1 . A maximal reduction ρ such that, for each index i, the contracted redex-occurrence Ri is leftmost maximal in Xi , is called the (unique) leftmost reduction of X1 . The leftmost reduction of X operates searching, from left to right, the first maximal redex in X and contracting it, as far as there are no more redexes in X .

62 of 472

Reduction Strategies IV Definition 4.8 (Quasi-leftmost reduction) A quasi-leftmost reduction ρ of a term X1 is a maximal reduction such that, for each index i, if Xi is not the terminus, then there exists j ≥ i such that Rj is leftmost maximal. An infinite reduction is quasi-leftmost iff an infinity of its contractions are leftmost maximal and they are uniformly spread across the whole computation. An infinite reduction is leftmost if every redex in it is leftmost maximal. A finite reduction is leftmost if every redex in it is leftmost maximal, and it is quasi-leftmost if any non-leftmost maximal contraction is followed, sooner or later, by a leftmost maximal contraction. 63 of 472

Reduction Strategies V

Theorem 4.9 (Quasi-leftmost reduction) If a λ-term X has a normal form X ∗ , then every quasi-leftmost reduction of X is finite and its terminus is X ∗ . [Proof not required] From a computational point of view, the meaning of this result is that any quasi-leftmost reduction on X strategy terminates whenever the program X terminates.

64 of 472

Reduction Strategies VI There are other reduction strategies, whose motivation is not mathematical, but technical. The call-by-value strategy operates as follows: a term X is represented as a tree and the strategy evaluates it. Evaluation proceeds as follows: ■

■

if the root element is a variable or an abstraction, then the term evaluates to itself; if the root is an application, then the argument (right branch) is evaluated first, then the function (left branch) is evaluated, and finally, if the left-branch is a λ-abstraction, the tree is contracted.

65 of 472

Reduction Strategies VII The call-by-value strategy is natural and it allows for a simple implementation. But it may not terminate even if the term has a β-nf. Most programming languages, e.g., ML, use call-by-value.

Example 4.10 Consider the term K(I w )Ω ≡ (λx , y . x)((λz . z) w )((λu . u u)(λu . u u)). The leftmost strategy yields K(I w )Ω B (λy . I w )Ω B (λz . z)w B w , which is in β-nf. The call-by-value strategy yields K(I w )Ω B K w Ω B (λy . w )Ω B (λy . w )Ω B . . . forever.

66 of 472

Reduction Strategies VIII

The call-by-name strategy is similar to the call-by-value strategy: it operates in the same way except for that arguments are not evaluated, but rather directly substituted. Call-by-name always terminate if the term being reduced has a β-nf. But it may produce a result which is not in β-nf, since it does not operate inside abstractions. Some programming languages, like Haskell, are based on a variation of call-by-name which optimises repeated reductions.

67 of 472

References and Hints This lesson is based on Chapter 3 of [HS]. References to the call-by-value and call-by-name strategies can be found in Chapter 5 of [Pierce]. For a general overview of reduction strategies, consult http://en.wikipedia.org/wiki/Evaluation_strategy. The omitted proof of the Theorem on the quasi-leftmost reduction can be found in the classical text [Bar]. This beautiful book is the standard reference for untyped λ-calculus but it is very mathematical and somewhat expensive.

68 of 472

Fundamentals of Functional Programming Lecture 5 — Intermezzo

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline Till now, we have studied λ-calculus from a mathematical perspective: we have shown that it is confluent and we have devised strategies to make evaluation of terms deterministic. The aim of this lesson is to show how a programming language can be developed inside λ-calculus. Specifically, we want to develop the basic structure of a functional language, like ML, as an instance of λ-calculus. Although this is not perfectly correct, as ML is based on a typed calculus, it shows how one can effectively program using a rather abstract system like the one we have studied so far.

70 of 472

Data Structures I A data structure is a way to represent data in a program. Most data structures can be coded as the set of logical terms generated by a fixed multi-sorted signature. Each instance of the data structure is, then, a logical term.

Definition 5.1 (Data structure) A data structure D is a pair D = 〈S , F 〉 of finite sets of symbols, where S is called the set of sorts and F the set of functions. Each element f ∈ F is decorated with a type, f : s1 × · · · × sn → s, such that s , s1 , . . . , sn ∈ S with n ≥ 0. A sort t ∈ S is a parameter sort if, for every f :s1 ×· · ·× sn → s ∈ F , s 6≡ t.

71 of 472

Data Structures II Every data structure D can be represented, that is, every term constructed from the language D can be uniquely written as a λ-term in a way that allows to manage the term and its information content. The representation we use is a type-free version of a result due to Corrado Böhm and Alessandro Berarducci.

Definition 5.2 (Representation of data structures) ©

ª

Let D = 〈S , F 〉 a data structure, where F = f1 , . . . , fn . A term t on D has the form f (t1 , . . . , tm ) with f : s1 × · · · × sm → s and t1 , . . . , tm terms of type s1 , . . . , sm , respectively. This term is represented by a λ-term t ≡ f t1 . . . tm where f ≡ fi for some index i and f ≡ λx1 , . . . , xm , f1 , . . . , fn . fi s1 . . . sm with ½

si ≡ 72 of 472

xi if si is a parameter sort (xi f1 . . . fm ) otherwise

Booleans I Booleans form the data structure Bool = 〈{ bool }, { true : bool, false : bool }〉 .

Definition 5.3 (Booleans) We encode booleans as λ-terms following the general rule. The result is: ■

true ≡ λx , y . x;

■

false ≡ λx , y . y .

Here and in the following, we omit underlines when it is clear from the context if we are dealing with a logical term or with a λ-term. [Exercise] Check that the representation is correct. 73 of 472

Booleans II Since, in our representation, a constant c : s ∈ F is represented as c ≡ λf1 , . . . , fn . c, i.e., the i-th projector, it follows that if ≡ λp , x , y . p x y reduces to a selector when its first argument is a boolean. In fact, (if true a b) =β a and (if false a b) =β b. By a simple syntactical transformation we get the more usual if c then x else y ≡ if c x y , which is present in most functional languages. Notice how the “else” branch is mandatory. 74 of 472

Booleans III

With these tools, it is easy to define the usual operations on booleans: ■

and ≡ λx , y . if x y false;

■

or ≡ λx , y . if x true y ;

■

not ≡ λx . if x false true.

[Exercise] Check that the representation is correct by deriving the truth tables of these operations via β-reduction.

75 of 472

Enumerations I

Enumerations are coded by the following data structure Enum = 〈{ enum }, { e1 : enum, . . . , en : enum }〉 .

Definition 5.4 (Booleans) We encode the elements of an enumeration as λ-terms following the general rule. The result is ei ≡ λx1 , . . . , xn . xi . Notice that booleans are a special case of enumerations.

76 of 472

Enumerations II As we did for booleans, we can introduce a selector for enumerations: case ≡ λp , x1 , . . . , xn . p x1 . . . xn , which has the property that case ei a1 . . . an =β ai . So the syntax case e e1 : a1 .. . en : an end is a human-readable presentation of the λ-term (case e a1 . . . an ). 77 of 472

Tuples Tuples are instances of the family of data structures: A1 × · · · × An = 〈{ A1 , . . . , An , T }, { tuple : A1 × · · · × An → T }〉 .

Definition 5.5 (Tuples) We encode the tuple constructor as a λ-term following the general rule: tuple ≡ λx1 , . . . , xn , y . y x1 . . . xn . If we call i-th ≡ λy . y (λx1 , . . . , xn . xi ), then it is immediate to show that i-th(tuple x1 . . . xn ) =β xi . Since records are just tuples with named projectors, their representation is immediate. 78 of 472

Natural numbers I The natural numbers can be encoded in many ways. The most elegant one is due to Alonzo Church. It naturally follows from the general representation we adopted. Natural numbers are the elements of the data structure Nat ≡ 〈{ N }, { suc : N → N , 0 : N }〉 .

Definition 5.6 (Natural numbers) Following the general rule, we get: ■

0 ≡ λx , y . y ;

■

suc ≡ λx , y , z . y (x y z).

79 of 472

Natural numbers II Let f n (x) be an abbreviation for f (. . . (f x) . . . ) where f occurs n times. Then, the number k is coded as the λ-term k ≡ suck (0) =β λx , y . x k (y ), as it is almost immediate to prove by induction on k. Moreover, testing for equality with 0 is easy: iszero ≡ λn. n (λx . false) true , it has the property that iszero 0 =β true and iszero(suc n) =β false. [Exercise] Verify the β-equivalences.

80 of 472

Natural numbers III Basic arithmetical operations are easily defined ■

add ≡ λx , y , u , v . x u (y u v );

■

mult ≡ λx , y , z . x (y z);

■

expt ≡ λx , y , u , v . y x u v .

Obviously, there is a trivial translation from the usual syntax to combinators: ■

x + y ≡ add x y ;

■

x · y ≡ mult x y ;

■

x y ≡ expt x y .

It is simple to verify that (add m n) =β h iff h = m + n is true in arithmetic. Also the other obvious identities are proved similarly. 81 of 472

Lists I The data structure of lists over a given type A is List(A) ≡ 〈{ A, L }, { cons : A × L → L, nil : L }〉 .

Definition 5.7 (Lists) Following the general rule, we get: ■

nil ≡ λx , y . y ;

■

cons ≡ λx , y , u , v . u x (y u v ).

The general shape of a list [a1 , . . . , an ] ≡ cons a1 (cons a2 . . . (cons an nil) . . . ) , after a suitable β-reduction, is λx , y . x a1 (x a2 (. . . (x an y ) . . . )) as one can verify by induction on n. 82 of 472

Lists II It is easy to define the head operation, calculating the first element of a given list: hd ≡ λx . x K nil and it holds that hd(cons a l ) =β a. On the other hand, defining null and tl (for tail) such that null nil =β true and null (cons a l ) =β false, and tl (cons a l ) =β l , is not immediate and requires some additional knowledge. Specifically, one should write a loop over the internal structure of the λ-term representing the list. It is evident that all the usual data structures can be encoded in Λ following the rules we have shown. In fact, it is even possible to define a generic datatype construction which automatically compiles into the proper representation and which allows to inductively dismount terms in a standard way. For example, ML has the datatype constructor for such purposes. 83 of 472

Control Structures In the usual programming paradigms, we have three main control structures: sequence, selection and iteration. In the λ-calculus, as well as in most functional languages, we have the very same control structures, although they are “built-in” in the properties of β-reduction. For instance, “sequence” is encoded by the B combinator and “selection” by the if combinator operating on booleans. “Iteration”, in its purest form is controlled by fixed-points, and thus implemented via the Y combinator. No real programming language directly uses this approach: in functional programming, recursion is used instead of explicit iteration and, behind the scenes, it reduces to a clever application of the Y combinator. 84 of 472

References and Hints The representation of data structures as λ-terms is derived from: Alessandro Berarducci and Corrado Böhm, Automatic Synthesis of Typed Lambda-Programs on Term Algebras, Theoretical Computer Science 39 (1985) 135–154. It is also illustrated in [Bar2] The examples of representation are, more or less, standard. Most of them can be found in [Paulson] Chapter 9, and in [Pierce] Chapter 5.2. Church numerals are defined in Chapter 4 of [HS]. [Paulson] is also a reference for the datatypes in ML, in particular Chapters 2, 3 and 4.

85 of 472

Fundamentals of Functional Programming Lecture 6

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline Apart iteration, we have seen how the λ-calculus can be used to develop a real functional programming language, with the usual data types and control structures. But what can we calculate by means of λ-terms? This fundamental question will have an answer in this lesson, where we will show that the λ-calculus is Turing-complete. As a side effect, we will get a uniform way to represent recursive definitions as λ-terms, which solves the open point about iteration.

87 of 472

Representability Definition 6.1 (Representability) Let φ be a partial function from Nn into N. A λ-term X is said to represent φ iff, for all m1 , . . . , mn ∈ N, ■

if φ(m1 , . . . , mn ) = p, then X m1 . . . mn =β p;

■

if φ(m1 , . . . , mn ) is undefined, then X m1 . . . mn has no β-nf.

In the previous statements, m is the Church numeral representing the natural number m. It is easy to check that the combinators suc, add, mult, and expt, as previously defined, represent the corresponding operations.

88 of 472

Primitive recursive functions I Definition 6.2 (Primitive recursive functions) The set of primitive recursive functions on natural numbers is defined by induction: ■

The successor function suc is primitive recursive;

■

The number 0 is primitive recursive;

■

■

For each n ≥ 1 and k ≤ n, the projection function Πnk is primitive recursive, where Πnk (m1 , . . . , mn ) = mk ; If n, p ≥ 1 and ψ, ξ1 , . . . , ξp are primitive recursive, then so is φ defined by composition φ(m1 , . . . , mn ) = ψ(ξ1 (m1 , . . . , mn ), . . . , ξp (m1 , . . . , mn )) ;

89 of 472

,→

Primitive recursive functions II ,→ (Primitive recursive functions) ■

If ψ and ξ are primitive recursive, then so is φ defined by recursion as follows φ(0, m1 , . . . , mn ) = ψ(m1 , . . . , mn ) , φ(k + 1, m1 , . . . , mn ) = ξ(k , φ(m1 , . . . , mn ), m1 , . . . , mn ) .

By checking the definition, every primitive recursive function is total.

Example 6.3 The predecessor function pre defined by pre(0) = 0 and pre(k + 1) = k is primitive recursive. In fact, by recursion, pre(0) = 0 and pre(k + 1) = Π21 (k , pre(k)). 90 of 472

Primitive recursive functions III Theorem 6.4 Every primitive recursive function φ can be represented by a combinator φ.

Proof. The combinator φ is defined by induction. ■

suc ≡ λu , x , y . x (u x y ), as before;

■

0 ≡ λx , y . y , as before;

■

Πnk ≡ λx1 , . . . , xn . xk ;

■

(Composition) φ ≡ λx1 , . . . , xn . ψ (ξ1 x1 . . . xn ) . . . (ξp x1 . . . xn );

■

(Recursion) φ ≡ λu , x1 , . . . , xn . R (ψ x1 . . . xn )(λu , v . ξ u v x1 . . . xn ) u, where R is a recursion combinator satisfying R X Y 0 =β X and R X Y suck +1 (0) =β Y k(R X Y k).

91 of 472

Primitive recursive functions IV The recursive combinator R is constructed in steps: ■

Define D ≡ λx , y , z . z (K y ) x. It holds that Bβ X DXY 0 D X Y k + 1 Bβ Y .

■

Define Q ≡ λy , v . D (suc (v 0))(y (v 0) 1). It holds that Q Y (D n X ) Bβ D n + 1 (Y n X ) (Q Y )k (D 0 X ) Bβ D k W for some term W whose details are irrelevant.

■

Define R ≡ λx , y , u . u (Q y ) (D 0 x) 1. It holds that R X Y k Bβ W .

92 of 472

Primitive recursive functions V Calculating, we get: ■

■

R X Y 0 Bβ 0 (Q Y ) (D 0 X ) 1 Bβ D 0 X 1 Bβ X R X Y k + 1 Bβ k + 1 (Q Y ) (D 0 X ) 1 Bβ (Q Y )k +1 (D 0 X ) 1 Bβ (Q Y ) ((Q Y )k (D 0 X )) 1 Bβ (Q Y ) (D k W )) 1 Bβ D k + 1 (Y k W ) 1 Bβ Y k W Thus, R X Y k + 1 =β Y k (R X Y k), as required .

93 of 472

Primitive recursive functions VI An alternative definition for R uses the fixed-point combinator Y. First, the predecessor function, being primitive recursive, can be represented in λ-calculus, e.g., by the term R 0 K. Let pre be any λ-term representing the predecessor function and consider the equation in R R x y z = if iszero z then x else y (pre z) (R x y (pre z)) . By the Fixed-Point Theorem, it has a solution R which is R ≡ Y (λu , x , y , z . D x (y (pre z) (u x y (pre z))) z) .

94 of 472

Primitive recursive functions VII One can define the cut-off subtraction between natural numbers as ½

m−n =

p if m ≥ n, where m = n + p, 0 otherwise .

It is an easy exercise (Do it!) to show by means of the predecessor function that this subtraction is primitive recursive. So, we can really build a λ-term which computes it. Having subtraction, we can define the order relations on natural numbers as x < y ≡ iszero(x − y ) and x ≥ y ≡ not(x < y ). Also, x > y ≡ y < x and x ≤ y ≡ y ≥ x. Finally, equality on naturals is x = y ≡ and(x ≤ y )(y ≤ x). 95 of 472

Iteration I A non-immediate consequence of the theorems we have proved so far is that we have a construction that allows to model iteration. In fact, a recursive definition of a function, as far as it is primitive recursive (and this is a “natural” condition), can be represented by a λ-term which uses a recursive combinator. This fact allows to model the worst case of procedural programming: the GOTO statement. In the following example there is the technique, which is easy to generalise and to formalise. Nevertheless, this way of programming leads to poor style and obscure programs, so it is highly discouraged! 96 of 472

Iteration II Example 6.5 Consider the following code: var x := 0; y := 0; z := 0; α: x := x + 1; β: if y < z then goto α else y := x + y ; γ: if z > 0 then begin z := z − x; goto α; end else stop;

In a functional representation: α(x , y , z) = β(x + 1, y , z); β(x , y , z) = if y < z then α(x , y , z) else γ(x , x + y , z); γ(x , y , z) = if z > 0 then α(x , y , z − x) else (x , y , z);

Executing α(0, 0, 0), we get exactly the same computation as for the procedural code. 97 of 472

Iteration III Notice that the presented technique plus the results on arithmetic we have developed so far allow to formalise the assembly language inside λ-calculus. Since assembly, when full arithmetic is allowed, is Turing-complete, it follows that λ-calculus is at least as powerful as the class of Turing machines. Or, in other words, everything which is computable, is representable in the λ-calculus. This result can be formalised in a precise way, with the appropriate theorems, but it is long and tedious. Thus, since its development would not lead to significant advances in our knowledge, it is left out from our treatment. 98 of 472

Partial recursive functions I Definition 6.6 (Partial recursive functions) A function φ from a subset Nn into N is called partial recursive iff there exist primitive recursive ψ and ξ such that, for all m1 , . . . , mn ∈ N, φ(m1 , . . . , mn ) = ψ(µk[ξ(m1 , . . . , mn , k) = 0]) ,

where µk[ξ(m1 , . . . , mn , k) = 0 is the least k ∈ N such that ξ(m1 , . . . , mn , k) = 0, if such a k exists, and is undefined otherwise. Partial recursive functions, also called Kleene’s functions, after the name of their discoverer, are equivalent to Turing machines’ computed functions. Thus, they are Turing-complete. 99 of 472

Partial recursive functions II Theorem 6.7 Every partial recursive function can be represented by a combinator.

Proof. (i) Let ψ and ξ be primitive recursive and, for all m1 , . . . , mn ∈ N, let φ(m1 , . . . , mn ) = ψ(µk[ξ(m1 , . . . , mn , k) = 0]) .

By Theorem 6.4, ψ and ξ are representable. Consider the equation H x1 . . . xn y = (if ξ x1 . . . xn y = 0 then y else H x1 . . . xn (suc y )) . By the Fixed-Point Theorem, it has a solution H ≡ Y (λu , x1 , . . . , xn , y . D y (u x1 . . . xn (suc y ))(ξ x1 . . . xn y )) . ,→ 100 of 472

Partial recursive functions III ,→ Proof. (ii) Thus, if φ(m1 , . . . , mn ) is defined, it is represented by F ≡ λx1 , . . . , xn . ψ(H x1 . . . xn 0) . For all m1 , . . . , mn ∈ N, we have F m1 . . . mn =β φ(m1 , . . . , mn ). Let T ≡ λx . D 0 (λu , v . u (x (suc v )) u (suc v )) and P ≡ λx , y . T x (x y )(T x) y . Then, if X Y =β 0, P X Y =β Y , and if X Y =β m + 1, P X Y =β P X (suc Y ). Define φ ≡ λx1 , . . . , xn . P (ξ x1 . . . xn ) 0 I (F x1 . . . xn ) . First, suppose there is a minimal k such that ξ(m1 , . . . , mn , k) = 0. Then, φ m1 . . . mn =β k I (F x1 . . . xn ) =β Ij (F x1 . . . xn ) =β F x1 . . . xn =β φ(m1 , . . . , mn ). So, φ represents φ, when φ is defined. ,→ 101 of 472

Partial recursive functions IV ,→ Proof. (iii) Suppose that m1 , . . . , mn are such that there is no k such that ξ(m1 , . . . , mn , k) = 0. So, being ξ total, for every k there is a pk such that ξ(m1 , . . . , mn , k) = pk + 1. Calling X ≡ ξ m1 . . . mn , X k Bβ pk + 1. To show that X has no β-nf it suffices to find an infinite quasi-leftmost reduction. Consider the following reduction, where G ≡ F m1 . . . mn : X Bβ P X 0 I G Bβ T X (X 0)(T X ) 0 I G Bβ T X (X 1)(T X ) 1 I G Bβ . . . Bβ T X (X j)(T X ) j I G Bβ . . .

∗ ∗ ∗

The lines marked with ∗ clearly show that this reduction cannot terminate and they contains one leftmost maximal contraction. 102 of 472

Partial recursive functions V

In the first place, the previous result shows that the λ-calculus is Turing-complete, since it can compute all the computable functions, and nothing more. In the second place, the result allows to define recursive functions which are not terminating, thus extending our interpretation of iteration as a restricted form of recursion.

103 of 472

References and Hints

This lesson covers the material of Chapter 4 of [HS]. The model of the GOTO statement is taken from [Paulson] Chapter 2. More information on Kleene’s functions can be found in any standard textbook on Computability Theory or on Recursion Theory. The most authoritative reference is: Piergiorgio Odifreddi, Classical Recursion Theory, North Holland (1992). ISBN: 0444894837

104 of 472

Fundamentals of Functional Programming Lecture 7 — Intermezzo

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline We have seen that λ-calculus is Turing-complete. As a side effect, we have seen that a very general form of recursion can be used. In fact, as far as we are able to state a problem as a set of primitive recursive equations, and we ask for a minimal solution, we know that there is a λ-term which satisfies the equations. In practise, we never solve those equations, but, since we know that they can be solved, we can see the equations as a definition for a function. In the practise of functional programming, this approach is the common rule and, in this lesson, we want to show how far it can be pushed. We will see that, using these ideas in a rather elementary way, we can represent infinitary data structures and we can easily develop algorithms to treat them. 106 of 472

Currying I In functional programming, functions like add(x , y ) = x + y are usually written as add x y = x + y . As λ-terms they are different: ■

in the first case, add: N2 → N, and add ≡ λw : N2 . (fst w ) + (snd w );

■

in the second case, add: N → (N → N) and add ≡ λx : N, y : N. x + y .

A multiple-argument function f : A1 × · · · × An → B can always be turned into a function g : A1 → (. . . (An → B) . . . ) computing the same value. This operation is called currying after Haskell B. Curry, one of the fathers of combinatory logic.

107 of 472

Currying II Currying seems easy, but it is powerful too. In fact, if add x y = x + y , we can define suc ≡ add 1. In general, we can partially apply a curried function, leaving its last arguments free, to obtain new functions. In this way, currying is an essential ingredient when thinking “by combinators”. Treating functions as combinators allows to store them in data structures, to pass them as arguments to other functions, and to return them as results of a computation. Moreover, since functions are pieces of data, we can construct new functions at runtime, when needed. So, in a functional language, functions, treated as combinators, are first-class data objects. 108 of 472

List manipulation I As arrays are dominant in imperative programming, lists are the main data structure in functional programming. So, to show how thinking to functions as combinators, lists are a natural candidate. Lists are manipulated by a few basic functions: ■

null ≡ if (L = []) then true else false;

■

hd (cons X L) = X ;

■

tl (cons X L) = L.

We have already seen how null and hd can be explicited as λ-terms. In fact, also tl can be written in an explicit form: it is simply the solution of the above equation. 109 of 472

List manipulation II More complex functions are written, starting from these basic bricks. For example, take L n returns the first n elements in the L list. It is defined as the solution to the pair of equations: take [] i = [] take (x :: xs) i = if (i > 0) then x :: (take xs (i − 1)) else [] . Here, and in the following, we write [] for nil, the empty list, and x :: xs for cons x xs, following the ML syntax. Currying the take function gives LL ≡ take L, which is a function that the return the first n elements of a fixed list. 110 of 472

List manipulation III The opposite of take is drop, which leaves out the first n elements from a given list. It is defined as: drop [] i = [] drop (x :: xs) i = if (i > 0) then (drop xs (i − 1)) else (x :: xs) . Concatenating together two list is performed by append, usually written in the infix syntax @: [] @ L =L (x :: xs) @ L = x :: (xs @ L) . Notice how currying the append function gives “prepend”. 111 of 472

List manipulation IV Reversing a list is easy. We show an efficient algorithm: rev = revAux [] revAux L [] =L revAux L (x :: xs) = revAux (x :: L) xs . This algorithm is efficient since it is tail recursive: in a call-by-value reduction strategy, each recursive call is performed as the last step in the reduction. Notice how currying has been used to define rev from revAux. It is often convenient to use additional arguments in a function, to accumulate intermediate results. Then, we can get rid of them via currying. [Exercise] Write a function which computes the length of a given list. 112 of 472

List functionals I Functionals are functions taking functions as arguments The easiest example is map. Its action is map f [x1 , . . . , xn ] = [f (x1 ), . . . , f (xn )] . The functional map is defined as map f [] = [] map f (x :: xs) = (f x) :: (map f xs) . Notice how currying map lifts a function f from elements to lists. So, for example, double ≡ map (λx . x ∗ 2). 113 of 472

List functionals II Another important example of functional is filter. Given a predicate p, i.e., a function returning a boolean value, and a list L, it returns the sublist of L whose elements make p true. The functional filter is defined as: filter p [] = [] filter p (x :: xs) = if (p x) then (x :: (filter p xs)) else (filter p xs) . A small improvement in the efficiency of filter can be achieved by using the let construction, which is an abbreviation allowing to pre-compute a subterm occurring more than once: filter p [] = [] filter p (x :: xs) = let R ≡ (filter p xs) in if (p x) then (x :: R) else R . 114 of 472

List functionals III

Using map and filter, some complex operations can be coded in a simple and elegant way For example, matrix transpose, which exchanges rows with columns in a matrix, when the matrix is represented as a list of lists, becomes: transpose ([] :: L) = [] transpose r = (map hd r ) :: (transpose (map tl r )) .

115 of 472

List functionals IV Other interesting functionals on lists are exists and forall. They both take as arguments a predicate p and a list L. The former functional, exists, is true when there is an element in L satisfying p, while the latter, forall, is true when every element in L satisfies p. They are defined as follows: exists p [] = false exists p (x :: xs) = (p x) or (exists p xs) forall p [] = true forall p (x :: xs) = (p x) and (forall p xs) .

116 of 472

List functionals V As an application, list membership is readily expressed as: mem x ≡ exists (λy . x = y ) . We’ve assumed that = is some function testing equality. This assumption is correct for every datatype we have seen till now: we can write functions testing if, e.g., two numbers or two lists are equal. In general, this cannot be done. For example, functions cannot be tested for equality, because this problem, as β-equality, is undecidable. But, since functional languages are typed, there is a class of types that takes care of distinguishing types with an equality function from those without. Functions like mem are, implicitly, defined only on “equality types”. 117 of 472

List functionals VI More complex functionals are possible. Two of them are fundamental: foldl and foldr, “fold left” and “fold right”. They act as follows: foldl f e [x1 , . . . , xn ] = f xn (f xn−1 (. . . (f x1 e) . . . )) foldr f e [x1 , . . . , xn ] = f x1 (f x2 (. . . (f xn e) . . . )) . Their definitions are easy; foldl f e [] =e foldl f e (x :: xs) = foldl f (f x e) xs foldr f e [] =e foldr f e (x :: xs) = f x (foldr f e xs) . 118 of 472

List functionals VII A simple application of these functionals is: map f ≡ foldr (λy , l . (f x) :: l )[] . Another one, calculating the length of a list, is: length ≡ foldl (K suc) 0 . A more complex application is to calculate the Cartesian product of two lists, interpreted as sets: cartesianproduct xs ys ≡ foldr (λx , p . foldr (λy , l . (x , y ) :: l ) p ys) [] xs .

119 of 472

Sequences I The idea behind infinite lists, or sequences as they are usually called, is to store their values as a function. More technically, a sequence is a data structure defined as: 〈{ A, S }, { Cons : A × (1 → S) → S , Nil : S }〉 .

where 1 is the unit type, containing a single element, denoted by (). This datatype does not follow the previously defined format, in fact, it does not expand into a free algebra of terms. Anyway, it can be easily coded adjusting the general representation of datatypes: Cons ≡ λx , y , u , v . u x (K (y u v )) Nil ≡ λu , v . v . We use the same constructors as for list, to emphasise the analogy. 120 of 472

Sequences II To understand how the representation of sequences works, consider [a, b] represented as a sequence: [a, b] ≡ Cons a (Cons b Nil) B λu , v . u a (K (u b (K v )) . Compare this representation with the representation of lists. It is immediate to define hd (Cons x xs) = x tl (Cons x xs) = xs null Nil = true null (Cons x xs) = false where the solution for hd is λx . x K Nil, while, as before, tl and null are complex λ-terms. 121 of 472

Sequences III Consider the function from k = Cons k (K (from (k + 1))) , it evaluates as from 1 B Cons 1, (K (from 2)) B · · · B [1, 2, 3, . . . , K (from n)] B . . . If we assume that the reduction strategy does not expand K (from 2), the reduction stops after one step. This assumption, in a call-by-value environment, can be enforced since evaluation never applies β-reduction inside the scope of a λ. As a result, we can define a term which encodes the sequence of all natural numbers: Nat = from 0. 122 of 472

Sequences IV With the ability to define infinite data structures, as the sequence of natural numbers, it is natural to define functions to inspect and to manipulate them. We mimic lists, defining (almost) the same functions, but operating on sequences. For example, take 0 s = [] take n (Cons x s) = x :: (take (n − 1) (s())) which returns the list of the first n elements in the s sequence. Or also, Nil @ y =y (Cons x s) @ y = Cons x (K ((s()) @ y )) which appends the y sequence to the first argument. Notice how x @ y = x if x is infinite. 123 of 472

Sequences V A useful variant on append is interleave interleave Nil y =y interleave (Cons x s) y = Cons x (K (interleave y (s()))) Its action is best described by an example: take (interleave (from 0) (from 10))6 = [0, 10, 1, 11, 2, 12] . As these examples have shown, it is simple to define functions operating on sequences, since they are very similar to the corresponding functions on lists.

124 of 472

Sequences VI Most of the functionals on lists can be immediately redefined to operate on sequences: map f Nil = Nil map f (Cons x s) = Cons (fx) (K (map f (s()))) filter p Nil = Nil filter p (Cons x s) = if (p x) then (Cons x (K (filter p (x())))) else (filter p (s())) Clearly, exists and forall are not useful, since their full evaluation requires to inspect to whole sequence. Also, foldl is not definable, while foldr has an equivalent: iterates f x = Cons x (K (iterates f (f x))) . For example, take (iterates (λx . x ∗ x) 2) 5 = [2, 4, 8, 16, 32] . 125 of 472

Prime numbers I A natural number greater than 2 is said to be prime if it can be divided only by 1 and itself. An algorithm which allows to derive the sequence of all prime numbers is due to Eratosthenes. It starts with the sequence on natural numbers from 2. At every step, the head of s, the current sequence, is a prime number. Let’s call it p. The new sequence to consider is, then, the tail of s where all multiples of p are cancelled. This algorithm never terminates, because the sequence of prime numbers is infinite, as proved by Euclid. By the way, Euclid’s proof (300 B.C.) was the first proof of non-termination in the history. 126 of 472

Prime numbers II The algorithm to generate the prime numbers is simple to encode using sequences. First, we write a function eliminating multiples of a given number from a sequence: sift p = filter (λn. n mod p 6= 0) . Then, we iterate the procedure of erasing the multiples of the head from the tail of the sequence: sieve (Cons p s) = Cons p (K (sieve (sift p (s())))) . So, the sequence of prime numbers is defined and calculated by Primes = sieve (from 2) . 127 of 472

References and Hints

The material in this lecture comes from [Paulson] Chapter 3 and 5. The interested reader can find hints on the implementation of infinite data structures and lazy evaluation in non-functional languages in http://academic.udayton.edu/saverioperugini/courses/ cps343/lecture_notes/lazyevaluation.html.

128 of 472

Fundamentals of Functional Programming Lecture 8

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline Till now, we have worked within the pure λ-calculus, showing that it can be seen as a Turing-complete programming language. In the mathematical practise, as well as in the programming practise, the definition of a particular function usually includes a statement of the kinds of inputs it will accept, and the kind of outputs it will produce. By labelling terms with types, i.e., names for distinguished families of values, we can control the inputs and the outputs of functions.

130 of 472

Church vs Curry Typing There are two very different ways to attach types to terms: ■

■

The Church-style (called also explicit or rigid typing) where a type is a built-in part of a term; The Curry-style (called also implicit typing) where a type is assigned to a term after the term has been built.

Both typing styles are used in the implementation of functional languages. In a way, the Church-style is closer to what is common in imperative languages, like Java or C, while the Curry-style is usually preferred in functional languages because of its conciseness. For example, ML adopts the Curry-style.

131 of 472

Type Algebras On a different line, the set of allowed types form an algebraic structure, which has to mimic the application and abstraction of terms. There are many significant typing algebras. The simplest one is due to Church and its called simple (theory of) types. This algebra can be used in both a Church-style and a Curry-style. By adding more structure to the simple types, one can develop more sophisticated type systems. If the additional structure is developed with some care, these systems allow to model a computational meaning for many logical system, thus adding depth to the functional paradigm which becomes a special way to perform logic programming.

132 of 472

Polymorphism In a simpler way, one can fix an algebra and add variables for types, along with a notion of substitution. This idea leads to polymorphic types. Not all the typing system are “good”: in fact, even for very poor algebras, deciding whether a term is correctly typed can be an undecidable statement. Thus, the interesting algebras are limited by the computational power needed to deal with them. Most functional programming languages, like ML, are based on the simple theory of types, in a polymorphic version.

133 of 472

Types Definition 8.1 (Simple types) Let T be a set of symbols, called the atomic types, then τ is a type either if it is atomic, i.e., τ ∈ T , or if it is a function type. i.e., τ ≡ (σ → ρ ) where σ and ρ are types. It is customary to think to function types σ → ρ as denoting the functions from the domain set σ to the co-domain (or range) set ρ . As usual, we omit the outer parentheses, and we use the abbreviation σ1 → σ2 → · · · → σn → τ for (σ1 → (σ2 → (. . . (σn → τ) . . . ))). The simple theory of types limits type construction to just one binary operation, →, and assumes to have a predefined set T of constant types. Polymorphic systems will have also type variables, and more sophisticated systems will have more operations and axioms on them. 134 of 472

Terms I Definition 8.2 (Typed variables) The set V of typed variables is a denumerable set of pairs x : τ such that ■

x is a symbol and τ is a type;

■

for every x : τ, y : σ ∈ V , if x ≡ y then τ ≡ σ;

■

for each type τ, the set (x : τ) : (x : τ) ∈ V ©

ª

is denumerable.

The conditions on typed variables mean that we have always “enough variables” of any given type.

135 of 472

Terms II Definition 8.3 (Simply typed λ-terms) Assume to have an at most denumerable set C of constants, composed of pairs c : τ where c is a symbol and τ a type, such that C ∩ V = ;. A typed λ-term t of type τ, notation t : τ, is defined as ■

t : τ ∈ V or t : τ ∈ C ;

■

t ≡ (M N) where M : σ → τ and N : σ are typed λ-terms;

■

t ≡ (λx : σ. M) and τ ≡ σ → ρ , with M : ρ a typed λ-term and x : σ a typed variable.

Notation conventions on typed λ-terms are the usual ones. Notice how typed λ-terms have explicit types in the syntax of λ-abstraction. 136 of 472

Terms III

Example 8.4 For every type σ, the following is a typed term: Iσ ≡ (λx : σ. x : σ) : σ → σ . Usually, it is abbreviated into the simpler, but equivalent, λx : σ. x. Omitting types is “safe”: one can prove by induction on the construction of typed terms that the type of a term is uniquely determined by its structure and the types of variables and constants.

137 of 472

Substitution I Definition 8.5 (Substitution) The term (M : τ)[(N : σ)/(x : σ)] is defined exactly as for pure λ-calculus. It is routine to check that (M : τ)[(N : σ)/(x : σ)] is a typed term of type τ. Note that the definition of (M : τ)[(N : ρ )/(x : σ)] requires σ ≡ ρ . Intuitively, typed substitution is just pure substitution restricted to a given type σ.

138 of 472

Substitution II Lemma 8.6 1. In M : τ, replacing a term P : σ by Q : σ, leads to a term of type τ; 2. (α-conversion) If (λx : σ. M : τ) : σ → τ is a typed term, then so is (λy : σ. M[(y : σ)/(x : σ)]) : σ → τ; 3. (β-reduction) If ((λx : σ. M : τ) : σ → τ)(N : σ) is a typed term, then so is (M : τ)[(N : σ)/(x : σ)].

Proof. Straightforward inductions on the structure of typed terms. [Exercise] Develop the details of the proof. All the lemmas on substitution and α-conversion in the pure λ-calculus hold for typed terms, with proofs unchanged. 139 of 472

The formal system I Definition 8.7 (Equality in λβ →) The formal system λβ → has formulae of the form M : τ = N : τ with M : τ, N : τ typed terms, and the following axiom schemes and rules: (α) If y : σ 6∈ FV(M : τ), then λx : σ. M : τ = λy : σ. M : τ[y : σ/x : σ]; (β) ((λx : σ. M : τ)N : σ) : τ = M : τ[(N : σ)/(x : σ)]; (ρ ) M : τ = M : τ; (µ) If M : σ = N : σ, then (P : σ → τ)(M : σ) = (P : σ → τ)(N : σ); (ν) If M : σ → τ = N : σ → τ, then (M : σ → τ)(P : σ) = (N : σ → τ)(P : σ); (ξ) If M : τ = N : τ, then λx : σ. M : τ = λx : σ. N : τ; (τ) If M : σ = N : σ and N : σ = P : σ, then M : σ = P : σ; (σ) If M : σ = N : σ, then N : σ = M : σ. 140 of 472

The formal system II Definition 8.8 (Reduction in λβ →) The formal system λβ → has formula of the form M : τ B N : τ with M : τ, N : τ typed terms, and the following axiom schemes and rules: (α) If y : σ 6∈ FV(M : τ), then λx : σ. M : τ B λy : σ. M : τ[(y : σ)/(x : σ)]; (β) ((λx : σ. M : τ)N : σ) : τ B M : τ[(N : σ)/(x : σ)]; (ρ ) M : τ B M : τ; (µ) If M : σ B N : σ, then (P : σ → τ)(M : σ) B (P : σ → τ)(N : σ); (ν) If M : σ → τ B N : σ → τ, then (M : σ → τ)(P : σ) B (N : σ → τ)(P : σ); (ξ) If M : τ B N : τ, then λx : σ. M : τ B λx : σ. N : τ; (τ) If M : σ B N : σ and N : σ B P : σ, then M : σ B P : σ. 141 of 472

The formal system III The notions of redex, contraction, β-reduction, β-conversion and β-nf are defined on typed terms exactly as in pure λ-calculus. It is routine to prove that ■ If M : σ B N : τ, then σ ≡ τ; β ■ M : σ B N : σ iff M : σ B N : σ; β ■ M : σ = N : σ iff M : σ = N : σ. β [Exercise] Fill the proofs. Note that all other properties of reduction and equality hold, with the same proof as before. In particular, the Church-Rosser Theorem and the uniqueness of normal forms are true. Notice also that (λx : σ. x : σ) : (σ → σ) 6= (λx : τ. x : τ) : (τ → τ) . 142 of 472

Normalizability I Definition 8.9 (Weak normalizability, Strong normalizability) A term M is said to be weak normalisable (WN) iff it has a β-nf; it is called strongly normalisable (SN) iff all reductions starting at M have finite length. Evidently, SN implies WN. Consider the following terms: ■

Ω ≡ (λx . x x)(λx . x x);

■

T2 ≡ (λx . y )Ω;

■

T3 ≡ (λx . y )(λx . x).

The first term, Ω, has no normal form and it has an infinite reduction, so it is neither WN nor SN. The second term, T2 , has an infinite reduction, so it is not SN, but it has a normal form, so it is WN. The third term, T3 has only finite reductions, thus it is SN and hence WN. 143 of 472

Normalizability II Theorem 8.10 All terms in λβ → are SN.

Proof. A term M : τ is said to be strongly computable (SC) iff ■

either τ is atomic and M is SN;

■

or τ ≡ σ → ρ and, for every term X : σ, the term (M X ) : ρ is SC.

The proof of the theorem reduces to four lemmas, whose goal is to show that (1) every SC term is SN, and (2) every term is SC.

Corollary 8.11 Every typed term in λβ → has a β-nf.

Corollary 8.12 In the simply-typed λ-calculus, the relation =β is decidable. 144 of 472

Normalizability III Lemma 8.13 1. Each type τ can be written in a unique way in the form τ1 → τ2 → · · · → τn → θ where n ≥ 0 and θ is atomic; 2. If M : τ with τ ≡ τ1 → . . . τn → θ , then M is SC iff, for all SC terms Y1 : τ1 , . . . , Yn : τn , (M Y1 . . . Yn ) : θ is SN. 3. If X : τ is SC (or SN) and M ≡α N, then N is SC (or SN); 4. If X : σ → ρ is SC and Y : σ is SC, then so is (XY ) : ρ ; 5. If X : τ is SN, then so is every subterm of X ; 6. If M[N : ρ /x : ρ ] : τ is SC then so is M : τ.

Proof. Evident from the definition of SC and SN. 145 of 472

Normalizability IV Lemma 8.14 1. every (a X1 . . . Xn ) : τ, with a an atom and X1 , . . . , Xn all SN, is SC; 2. every atomic term a : τ is SC; 3. every SC term of type τ is SN.

Proof. (2) is an instance of (1). We prove (1) and (3) by induction on τ: ■ ■

τ is atomic: since X1 , . . . , Xn are SN, so is aX1 . . . Xn , thus it is SC; τ ≡ ρ → σ: let Y : ρ be SC, so by induction hypothesis (IH), it is SN. Moreover, for the same reason, (a X1 . . . Xn Y ) : σ is SC. Thus, so is a X1 . . . Xn by definition of SC. Let X : τ and let x : ρ 6∈ FV(X : τ). By IH, x : ρ is SC, so by the previous lemma (X x) : σ is SC. Thus, by IH, (X x) : σ is also SN.

But, by the previous lemma, X is SN, as well. 146 of 472

Normalizability V Lemma 8.15 If (M : σ)[N : ρ /x : ρ ] is SC, then so is (λx : ρ. M : σ)(N : ρ ), provided that N : ρ is SC when x : ρ 6∈ FV(M : σ).

Proof. (i) Let σ ≡ σ1 → · · · → σn → θ with θ atomic and let M1 : σ1 , Mn : σn be SC terms. Since (M : σ)[N : ρ /x : ρ ] is SC, it follows that ((M[N/x]) M1 . . . Mn ) : θ is SN, and so are all its subterms. Hence, M is SN by Lemma 8.13. Moreover, N is SN if it does not occur in M[N/x] by hypothesis and Lemma 8.14. ,→ 147 of 472

Normalizability VI ,→ Proof. (ii) So, an infinite reduction of ((λx . M) N M1 . . . Mn ) : θ has the form (λx . M) N M1 . . . Mn Bβ (λx . M 0 ) N 0 M10 . . . Mn0 B1β M 0 [N 0 /x] M10 . . . Mn0 Bβ . . . where M Bβ M 0 , N Bβ N 0 . etc. But, from M Bβ M 0 and N Bβ N 0 , we get that M[N/x] Bβ M 0 [N 0 /x], hence, we can construct the following reduction: (M[N/x] M1 . . . Mn ) Bβ M 0 [N 0 /x] M10 . . . Mn0 , which may continue forever, contradicting that the starting term is SN. Hence, (λx : ρ. M : σ)(N : ρ ) must be SN. 148 of 472

Normalizability VII Lemma 8.16 For every typed term M : τ: 1. M : τ is SC; 2. For all x1 : ρ 1 , . . . , xn : ρ n , with n ≥ 1, and all SC terms N1 : ρ 1 , . . . , Nn : ρ n such that none of the x1 , . . . , xi −1 variables occurs free in Ni , the term M ∗ ≡ M[N1 /x1 ] . . . [Nn /xn ] is SC.

Proof. (i) (1) is an instance of (2), where Ni ≡ xi , since every xi is SC by Lemma 8.14. The proof of (2) is by induction on the structure of M: ■

M ≡ xi and τ ≡ ρ i . Then M ∗ ≡ Ni , which is SC by assumption. ,→

149 of 472

Normalizability VIII ,→ Proof. (ii) ■

■

■

M is an atom distinct from x1 , . . . , xn . Then M ∗ ≡ M which is SC by Lemma 8.14. M ≡ M1 M2 . Then M ∗ ≡ M1∗ M2∗ . By induction hypothesis, M1∗ and M2∗ are SC, and so is M ∗ by Lemma 8.13. M ≡ (λx : ρ. M1 : σ) and τ ≡ ρ → σ. By Lemma 8.14, we can safely assume that x does not occur free in any N1 , . . . , Nn , x1 , . . . , xn . Then, M ∗ ≡ λx . M1∗ . Let N : ρ be SC, then M ∗ N ≡ (λx . M1∗ ) B1β M1∗ [N/x] ≡ M1 [N/x][N1 /x1 ] . . . [Nn /xn ] which is SC by the induction hypothesis applied to M1 and the sequence N , N1 , . . . Nn . Then M ∗ N is SC by Lemma 8.15. So, by definition, M ∗ is SC.

150 of 472

Representability I Since every term in λβ → has a β-nf, every function which can be represented in this system, must be total.

Definition 8.17 (Extended polynomials) An extended polynomial is a function Nk → N defined by induction: ■

■

the projections πi (x1 , . . . , xn ) = xi , the constant functions cm (x) = m, the addition +(x , y ) = x + y and the multiplication ∗(x , y ) = xy are extended polynomials, as ½ well as the functions sg : N → N and 1 if x ≥ 1 sg : N → N, where sg(x) = and sg(x) = 1 − sg(x). 0 otherwise the composition f ◦ g of two extended polynomials f and g , is an extended polynomial.

151 of 472

Representability II

Theorem 8.18 All extended polynomials are representable in λβ → and, vice versa, the only representable functions Nk → N are extended polynomials. [Proof not required] [Exercise] Prove that all extended polynomials are representable.

152 of 472

References and Hints This lecture is based on Chapter 10 of[HS]. The proof of strong normalizability of λβ → can be found in Appendix 3 of the same text. Theorem 8.18 has been proved by H. Schwichtenberg in 1976: H. Schwichtenberg, Definierbare Funktionen im λ-Kalkül mit Typen, Archiv für Mathematische Logik, 17 (1976) 113-114. An exposition of the result can be found also in A.S. Troelstra, H. Schwichtenberg, Basic Proof Theory, 2nd edition, Cambridge University Press (2000). ISBN: 0521779111 The techniques used to operate on reductions are more general and not limited to λ-calculi. An account can be found in F. Baader and T. Nipkow, Term Rewriting and All That, Cambridge University Press (1999). ISBN: 0521779200 153 of 472

Fundamentals of Functional Programming Lecture 9

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline We have studied a Church-style typed system, λβ →. We have seen that it is decidable, since it has the strong normalizability property, and that its expressive power is limited to a class of computable functions, the extended polynomials. As a matter of fact, introducing types allows a better control of functions, since we can limit the inputs and the outputs, thus constraining the behaviour of our programs, but at the price of limiting the expressive power. In this lecture, we want to show an extended version of λβ →: ■

It differs from the previous one, being a Curry-style system;

■

It enhances the previous one, being a polymorphic type system.

155 of 472

Types Definition 9.1 (Parametric simple types) Let T be a set of symbols, called the atomic types, and let VT be a denumerable set of symbols, called the type variables, disjoint from T , then τ is a type either if it is atomic, i.e., τ ∈ T ∪ VT , or if it is a function type. i.e., τ ≡ (σ → ρ ) where σ and ρ are types. A type is closed if it contains no type variables, and it is open if it contains only type variables. A type σ → ρ can be thought to represent a class of functions φ such that x ∈ σ implies φ(x) is defined and φ(x) ∈ ρ . This is slightly different from the interpretation in λβ →.

156 of 472

Terms and Formulae A term is simply a pure λ-term. This makes Curry-style the system we are defining. The “legal” terms will be the ones that can be typed.

Definition 9.2 (Formula) A type-assignment formula is any expression X : τ where X is a term and τ a type. The term X is referred to as the subject of the formula, while τ is referred to as the predicate of the formula. Informally, a statement X : τ is read as “X is assigned the type τ” or also “X has type τ”. Formulae represent the link between terms and types. Their validity depends on a possibly empty set of assumptions, corresponding to declarations in a programming language, and the validity is checked by a formal system. 157 of 472

Type assignment system I Definition 9.3 (The TA → type assignment system) Given a set Γ of formulae whose subjects are variables, Γ ` M : τ holds if one of the following rules applies: ■ ■

■

M ≡ x and there is x : τ ∈ Γ (α-rule); M ≡ P Q and if Γ ` P : σ → τ and Γ ` Q : σ then Γ ` P Q : τ (→ E rule); M ≡ λx . N, τ ≡ σ → ρ , and if Γ, z : σ ` N[z/x] : ρ , then Γ ` (λx . N) : σ → ρ where z is a variable new in Γ and N (→ I rule).

When Γ is empty, we write ` M : τ.

158 of 472

Type assignment system II Example 9.4 We want to prove that ` K : σ → τ → σ. The derivation is almost immediate: w : σ, z : τ ` w : σ w : σ ` λy . w : τ → σ ` λx , y . x : σ → τ → σ

→I →I

Notice how the (→ I ) rule forces to generate a large number of new variables. In fact, one can slightly modify the definition of derivation to take care of this aspect, at the price of introducing more complex rules in the use of variables. 159 of 472

Basic properties I Lemma 9.5 (α-invariance) Let Γ ` M : τ be any derivation and let M ≡α N. Then Γ ` N : τ.

Proof. By induction on the proof Γ ` M : τ: ■ ■

■

if M ≡ x and x : τ ∈ Γ, then M ≡ N and the conclusion is obvious; if Γ ` P : σ → τ, Γ ` Q : σ and M ≡ P Q, then N ≡ P 0 Q 0 and P ≡α P 0 , Q ≡α Q 0 , so, by induction hypothesis, Γ ` P 0 : σ → τ and Γ ` Q 0 : σ, thus Γ ` N : τ by (→ E ); if Γ, z : σ ` P[z/x] : ρ and M ≡ (λx . P), τ ≡ σ → ρ , then N ≡ (λy . P 0 ), P[z/x] ≡α P 0 [z/y ], with z new in P , P 0 , Γ. By induction hypothesis, Γ, z : σ ` P 0 [z/y ] : ρ , thus Γ ` N : τ by (→ I ).

160 of 472

Basic properties II Lemma 9.6 (Generation) ■

If Γ ` x : σ then x : σ ∈ Γ;

■

If Γ ` P Q : τ, then there is σ such that Γ ` P : σ → τ and Γ ` Q : σ;

■

If Γ1 ` λx . P : τ, then there are σ and ρ such that τ ≡ σ → ρ , and Γ, z : σ ` P[z/x] : ρ for some variable z new in P and Γ.

Proof. By induction on the derivation. This result is useful to show that certain terms have no types. [Exercise] Complete the details. 161 of 472

Basic properties III

Corollary 9.7 (Typability of subterms) Let M 0 be a subterm of M and let Γ ` M : τ. Then Γ0 ` M 0 : σ for some type σ and some set Γ0 of formulae.

Proof. By induction on the structure of M. Each step in the induction corresponds to an application of the generation lemma. Notice that Γ ⊆ Γ0 . [Exercise] Complete the details.

162 of 472

Basic properties IV

Lemma 9.8 (Substitution) ■

If Γ ` M : σ then Γ[τ/a] ` M : σ[τ/a] where a is a type variable;

■

If Γ, x : σ ` M : τ and Γ ` N : σ, then Γ ` M[N/x] : τ.

Proof. (1) follows by induction on the derivation Γ ` M : σ. (2) follows by induction on the generation of Γ, x : σ ` M : τ. [Exercise] Complete the details.

163 of 472

Basic properties V Lemma 9.9 (Context) ■

If Γ ⊆ ∆ and Γ ` M : τ, then ∆ ` M : τ;

■

If Γ ` M : τ then FV(M) ⊆ x : there is σ such that (x : σ) ∈ Γ .

■

©

ª

If Γ ` M : τ and ∆ = (x : σ) : (x : σ) ∈ Γ and x ∈ FV(M) , then ∆ ` M : τ. ©

ª

Proof. All statements are proved by induction on the generation of Γ ` M : τ. [Exercise] Complete the details. 164 of 472

Basic properties VI Theorem 9.10 (Subject reduction) If Γ ` M : σ and M Bβ M 0 , then Γ ` M 0 : σ.

Proof. By induction on the length of the reduction, it suffices to prove the one-step case. If M ≡α M 0 , then the result is evident by the α-invariance lemma. So, let M ≡ U[((λx . P) Q)/z] and M 0 ≡ U[(P[Q/x])/z] with z occurring only once in U. By the generation lemma, it suffices to prove the case Γ ` (λx . P) Q : τ implies Γ ` P[Q/x] : τ. But, if Γ ` (λx . P) Q : τ, by the generation lemma, Γ ` λx . P : σ → τ and Γ ` Q : σ. Applying the generation lemma again, we get Γ, z : σ ` P[z/x] : τ where z is new. Thus, by the substitution lemma, Γ ` P[z/x][Q/z] : τ. Since z is new in P, P[z/x][Q/z] ≡ P[Q/x]. 165 of 472

Basic properties VII

Notice that terms having a type are not closed under expansion, in fact ` S K : (τ → σ) → (τ → τ), ` λx , y . y : τ → (σ → σ) and S K Bβ λx , y . y but, evidently, S K 6` τ → (σ → σ). [Exercise] Perform the calculations.

166 of 472

Comparison with λβ → I

Definition 9.11 (Forgetful map) Let ΛT be the collection of typed λ-terms and let Λ be the collection of pure λ-terms. The forgetful map | · | : ΛT → Λ is defined as follows: ■

|x : σ| = x;

■

|(M : σ → τ) (N : σ)| = M N;

■

|(λx : σ. M : τ) : σ → τ| = λx . M.

The forgetful map, as the name suggests, forgets the type decorations of a typed term.

167 of 472

Comparison with λβ → II Theorem 9.12 ■

■

Let M : τ ∈ ΛT and let Γ = C ∪ (x : σ) : (x : σ) ∈ V and x ∈ FV(M) , where C and V are the typed constants and variables, respectively. Then Γ ` |M : τ| : τ. ©

ª

If Γ ` M : τ then there is N ∈ ΛT such that |N : τ| ≡ M and N : τ in λβ →, where the constant and the variables correspond to those in Γ.

Proof. By induction on the given derivations. [Exercise] Complete the details. 168 of 472

Strong normalizability

Theorem 9.13 (Strong normalizability) If Γ ` M : τ, then M is SN.

Proof. Since Γ ` M : τ, we know that there is a term N in λβ → such that |N : τ| ≡ M and N : τ. Thus, by the strong normalizability theorem in λβ →, |N : τ| ≡ M is SN.

169 of 472

References and Hints

The presented material is covered by Chapter 12 of [HS]. Beware that Definition 12.6 in that textbook is wrong. In fact, the (→ I ) rule should be changed as indicated in these slides, or the α-invariance result would be unprovable! The presentation follows [Bar2].

170 of 472

Fundamentals of Functional Programming Lecture 10

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline In the previous lectures, the simple theory of types has been introduced both in the Church-style and in the Curry-style. In the second case, type variables were present, although their use has been very limited. Most functional programming language are based on extensions of the simple theory of types. Extensions are on the type system, where additional constructions are introduced, on the term system, where interaction with types is admitted to some extent, hence implementing real polymorphism, and finally, on a richer set of basic terms, to exploit computer potential of doing arithmetic on integers and floating point numbers, to have pointers and state variables, and so on. In this lecture, we want to introduce a family of type system, i.e., typed λ-calculi, which are the ones on which most functional languages are based on. 172 of 472

Church-typing and polymorphism The system we will introduce in this lesson are Church-style typed λ-calculi. They allow types to depend on terms, and terms to depend on types. The idea of a term depending on a type and vice versa is quite easy to understand: a function f : A → B maps all the elements in the set A to some elements in the set B. Sometimes, we are interested in considering the image of f , i.e., the set f (A), as a type. In this case, we want to write functions like g ≡ (λx ∈ f (A). . . . ). Evidently the type f (A) depends on the term A. The systems we are about to introduce allow to mix terms and types in a controlled way. So, a term may have different types, which are related to each other, and a type assumes a deeper meaning than mere classification. 173 of 472

The λ-cube I Definition 10.1 (Pseudoterms) Assume to have a denumerable set of variables V and a set of type constants C . Then pseudoterms are inductively defined as follows: ■

Every variable x ∈ V is a pseudoterm, and FV(x) = { x };

■

Every constant c ∈ C is a pseudoterm, and FV(c) = ;;

■

■

■

If M and N are pseudoterms, then so is (M N), and FV(M N) = FV(M) ∪ FV(N); If x ∈ V and M and N are pseudoterms, then so is (λx : M . N), and FV(λx : M . N) = (FV(M) ∪ FV(N)) \ { x }; If M and N are pseudoterms and x ∈ V is such that x 6∈ FV(M), then (Πx : M . N) is a pseudoterm, and FV(Πx : M . N) = (FV(M) ∪ FV(N)) \ { x }.

174 of 472

The λ-cube II A pseudoterm represent a term or a type. It becomes a valid term or type when it can be assigned a type by the system we are working in.

Definition 10.2 (Reduction) Pseudoterms may β-reduce, following the rule: (λx : A. M) N B M[N/x] . The system we will consider will have two special constants ∗ and , called sorts.

175 of 472

The λ-cube III The systems we will consider define a notion of derivation, notation Γ ` A : B, from a context Γ to a pair of pseudoterms A and B.

Definition 10.3 (Context) Given a notion of derivation, a context is a finite sequence Γ = x1 : A1 , . . . , xn : An such that, for all i, ■

xi ∈ V ;

■

Ai is a pseudoterm;

■

xi 6∈ FV(Aj ) for j ≤ i;

■

either Ai is a sort, or x1 : A1 , . . . , xi −1 : Ai −1 ` Ai : s with s a sort.

176 of 472

The λ-cube IV Definition 10.4 (The λ-cube) The eight systems in the λ-cube are formed according to the following derivation rules: ■

(axiom): ` ∗:

■

(start) x 6∈ FV(Γ) ∪ FV(A) and s is a sort: Γ ` A:s Γ, x : A ` x : A

■

(weakening) x 6∈ FV(Γ) ∪ FV(A) and s is a sort: Γ ` M :B

Γ ` A:s

Γ, x : A ` M : B 177 of 472

,→

The λ-cube V ,→ (The λ-cube) ■

(application):

Γ ` M : (Πx : A. B)

Γ ` N :A

Γ ` M N : B[N/x] ■

(abstraction) x 6∈ FV(Γ) ∪ FV(A) and s is a sort: Γ, x : A ` M : B

Γ ` (Πx : A. B) : s

Γ ` (λx : A. M) : (Πx : A. B) ■

(product) x 6∈ FV(Γ) ∪ FV(A) and s1 , s2 sorts and (s1 , s2 ) ∈ R : Γ ` A : s1

Γ, x : A ` B : s2

Γ ` (Πx : A. B) : s2

178 of 472

,→

The λ-cube VI ,→ (The λ-cube) ■

(conversion) s is a sort: Γ ` M :A

A =β B

Γ ` B :s

Γ ` M :B ■

(α-conversion):

Γ ` M :A

M ≡α N

Γ ` N :A

A pseudoterm M is a term iff there are a context Γ and a pseudoterm A such that Γ ` M : A. A pseudoterm A is a type iff there are a context Γ and a pseudoterm M such that Γ ` M : A. ,→ 179 of 472

The λ-cube VII ,→ (The λ-cube)

The product rule characterises the systems in the λ-cube. Specifically, the eight systems are the following: R

System λ→ λ2 λP λP2 λω λω λP ω λP ω = λC

180 of 472

(∗, ∗) (∗, ∗) (∗, ∗) (∗, ∗) (∗, ∗) (∗, ∗) (∗, ∗) (∗, ∗)

(, ∗) (∗, ) (, ∗) (∗, ) (, ) (, ∗) (, ) (∗, ) (, ) (, ∗) (∗, ) (, )

The λ-cube VIII The systems in the λ-cube are usually represented as: λω λC λ2

|| || | | ||

λω || | | || || λ→

xx xx x x xx

λP2

λP

λP ω xx x x xx xx

(,O ∗)

where directions have the meaning 181 of 472

( , ) u: uu uu uu uu u u

/ (∗, )

The λ-cube IX Definition 10.5 (Arrow type) The arrow type A → B is defined to be (Πx : A. B) where x 6∈ FV(B). With this definition, it is easy to see that λ → is λβ → with type variables, thus it is equivalent to the Curry-style system TA →.

Definition 10.6 (Falsum) ⊥ ≡ (Πu : ∗. u).

In λ2 one can derive ■

` (λu : ∗, x : u . x) : (Πu : ∗. u → u) which is identity parametrised by

the type u; ■

` (λu : ∗, x : ⊥. x u) : (Πu : ∗. ⊥ → u) which is “ex falso quodlibet”.

182 of 472

The λ-cube X The system λω allows polymorphic recursive types. For example u : ∗ ` (λf : (∗ → ∗). f (f u)) : (∗ → ∗) → ∗: the term (λf : (∗ → ∗). f (f u)) takes a function f , regarded as a type constructor, and produces a type as its output. The system λP allows to define types depending on terms. It has been used to code the first logical frameworks, computer systems able to represent generic mathematical theories and their derivations. The system λω allows to define an internal notion of logical conjunction: interpreting → as implication, and calling and ≡ λu : ∗, v : ∗. Πw : ∗. (u → v → w ) → w , one can derive the usual rules for logical conjunction as and x y → x, and x y → y and x , y ` and x y , assuming that x : ∗, y : ∗. 183 of 472

The λ-cube XI The system λP2 is related to second-order intuitionistic predicate logic. The system λP ω corresponds to a weak version of higher-order intuitionistic logic and has been discovered “by symmetry” in the λ-cube. The system λC is the “calculus of constructions”, the basis for Coq, one of the most powerful proof assistants. A proof assistant is a computer system which allows mathematicians and computer scientists to develop formal proofs and programs, and, to some extent, to write very abstract programming system for special purposes, e.g., verifying the correctness of critical systems, like nuclear power-plants or avionics. 184 of 472

Elementary properties I Lemma 10.7 (Free variables) If x1 : A1 , . . . , xn : An ` M : A then FV(A) ∪ FV(M) ⊆ { x1 , . . . , xn }.

Proof. Induction on the derivation.

Lemma 10.8 (Transitivity) If x1 : A1 , . . . , xn : An ` M : A and Γ ` xi : Ai for all i, then Γ ` M : A.

Proof. Induction on the derivation x1 : A1 , . . . , xn : An ` M : A. [Exercise] Complete the details. 185 of 472

Elementary properties II Lemma 10.9 (Substitution) If Γ, x : A, ∆ ` M : B and Γ ` N : A, then Γ, ∆[N/x] ` M[N/x] : B[N/x].

Proof. By induction on the derivation of Γ, x : A, ∆ ` M : B.

Lemma 10.10 (Thinning) If Γ ⊆ ∆ with ∆ a context, and Γ ` M : A, then ∆ ` M : A.

Proof. By induction on the derivation Γ ` M : A. [Exercise] Complete the details. 186 of 472

Elementary properties III Lemma 10.11 (Generation) ■ ■

■

■

■

If Γ ` s : A, with s a sort, then s ≡ ∗ and A =β ; If Γ ` x : A, with x a variable, then there is a sort s and a B =β A such that x : B ∈ Γ, Γ ` B : s; If Γ ` (Πx : A. B) : C , then there are sorts s1 and s2 such that (s1 , s2 ) ∈ R and Γ ` A : s1 , Γ, x : A ` B : s2 and C =β s2 ; If Γ ` (λx : A. M) : C , then there are a sort s and term B such that Γ ` (Πx : A. B) : s, Γ, x : A ` M : B and C =β (Πx : A. B); If Γ ` M N, then there are A and B such that Γ ` M : (Πx : A. B), Γ ` N : A and C =β B[N/x].

Proof. Induction on the derivation of the main term. 187 of 472

Elementary properties IV

Lemma 10.12 (Subterm) If A is a term (type), and B is a subterm of A, then B is a term (type).

Proof. Suppose A is a term, then Γ ` A : M for some Γ and M. By induction on Γ ` A : M, via the generation lemma, one proves that B 0 =β B appears as the subject of a derived term.

188 of 472

Elementary properties V Theorem 10.13 (Subject-reduction) If Γ ` M : A and M Bβ M 0 , then Γ ` M 0 : A. [Proof not required]

Corollary 10.14 If Γ ` M : A and A Bβ A0 , then Γ ` M : A0 .

Proof. By induction on the derivation Γ ` M : A, one proves that A ≡ s or Γ ` A : s for some sort s. In the first case, we are done; in the second case, apply the subject-reduction theorem to obtain Γ ` A0 : s, thus an application of the conversion rule proves the corollary. 189 of 472

Elementary properties VI

Lemma 10.15 (Unicity of types) If Γ ` M : A and Γ ` M : A0 , then A =β A0

Proof. By induction on the structure of A.

190 of 472

Elementary properties VII Theorem 10.16 (Strong normalization) If x1 : A1 , . . . , xn : An ` M : B, then A1 , . . . , An , M and B are SN. [Proof not required] The proof is a major result. It suffices to prove the result for λC , but SN of λC reduces to SN in λω, modulo a suitable translation of types and terms. Nevertheless, proving SN of λω is still not elementary.

Corollary 10.17 In the λ-cube, type checking (Is Γ ` M : A correct?) and typability (Find A and Γ such that Γ ` M : A) are decidable. Notice that checking if a type S is inhabited (Find M and Γ such that Γ ` M : A), is NOT decidable in λC , even if it is for λP. 191 of 472

References and Hints

The content of this lesson comes from Chapter 13E of [HS]. The proofs, also the omitted ones, can be found in [Bar2].

192 of 472

Fundamentals of Functional Programming Lecture 11

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline Till now, we have interpreted the λ-calculi as programming systems, or as type systems, i.e., mathematical theories based on a type algebra and a notion of reduction. This view is significant since it allows to develop the fundamental ideas of functional programming, providing at the same time a solid and rigorous foundation. In this lesson, we want to introduce a different interpretation, where types are seen as logical formulae and terms as their proofs. The ultimate goal is to understand the “representation power” of the typed systems we have introduced so far. 194 of 472

Universal quantification I Definition 11.1 (Universal quantifier) The universal quantifier ∀ is an abbreviation for the product (Π) operator. In Logic, ∀x : A. B is a formula if A is a logical sort and B is a formula. We prefer to use the word “type” instead of “logical sort”, and we assume that every λ-term M : ∗ is a formula. In this way, the rule which governs the formation of universally quantified formulae is just the product rule in the λ-cube Γ, x : A ` B : ∗

Γ ` A:s

Γ ` (∀x : A. B) : ∗ 195 of 472

Universal quantification II A universally quantified formula obeys two inference rules, one to introduce the quantification, the other to eliminate it: Γ, p : A ` B Γ ` ∀x : A. B

∀I

Γ ` ∀x : A. B Γ ` B[t/x]

∀E

where t is a term of type A and p 6∈ FV (Γ ∪ { A }), i.e., p is an eigenvariable. If we compare the ∀I rule with the abstraction rule in the λ-cube, Γ, x : A ` M : B

Γ ` (Πx : A. B) : ∗

Γ ` (λx : A. M) : (Πx : A. B)

we see that the logical rule is the same thing as the type rule, as far as we check that the conclusion is, indeed, a formula. 196 of 472

Universal quantification III In the previous rule, the term λx : A. M encodes the proof of Πx : A. B ≡ ∀x : A. B, where M is the proof of B assuming x : A. Note how abstraction in the proof-term takes care of modelling eigenvariables in the proof. We have to check that B is a formula, but it does not suffice, since the type A can be incompatible with B, if the logical system is not powerful enough. Thus, the product rule, controlling the formation of universal formulae, allows for first-order or higher-order logics. Similarly, the ∀E rule is just the application rule in the λ-cube: Γ ` M : (Πx : A. B)

Γ ` t :A

Γ ` M t : B[t/x]

where we have to prove that t has type A. 197 of 472

Implication I Definition 11.2 (Implication) The logical implication, denoted by ⊃, corresponds to the type constructor →. It is defined as A ⊃ B ≡ A → B ≡ F A B, where F ≡ λu : ∗, v : ∗. (Πx : u . v ) . The formation rule for implications says that, if A and B are formulae, then so is A ⊃ B. Implication behaviour is described by its inference rules: Γ, A ` B Γ`A⊃B 198 of 472

⊃I

Γ`A⊃B Γ`B

Γ`A

⊃E

Implication II Lemma 11.3 In the λ-cube it holds that Γ ` A:∗

Proof. (i)

Γ ` B :∗

Γ ` A → B :∗

We have to prove that Γ ` F A B : ∗. By a double use of the application rule and hypotheses, it reduces to Γ ` F : (Πy : ∗, z : ∗. ∗). By the thinning lemma, it suffices to prove ` (λu : ∗, v : ∗. (Πx : u . v )) : (Πy : ∗, z : ∗. ∗). But, applying twice the abstraction rule and the thinning lemma, this goal reduces to prove u : ∗, v : ∗ ` (Πx : u . v ) : ∗, ` (Πz : ∗. ∗) : ∗ and ` (Πy : ∗, z : ∗. ∗) : ∗. Applying the product rule to the first goal, we see immediately that it holds. ,→ 199 of 472

Implication III ,→ Proof. (ii) The second goal reduces to the axiom, after the application of the product rule. The third goal reduces to the second goal after thinning the application of the product rule.

Lemma 11.4 The following rules hold in the λ-cube, when x 6∈ FV(Γ ∪ { B }): Γ ` M :A → B

Γ ` N :A

Γ ` M N :B

Γ, x : A ` M : B

Γ ` A → B :∗

Γ ` (λx : A. M) : A → B

Proof. Standard. (These proofs, as the previous one, are essentially mechanical and we will not develop them in the future). The first lemma corresponds to the formation rule for implication, while the second lemma derives the ⊃ I and ⊃ E rules. 200 of 472

Conjunction I Definition 11.5 (Conjunction) Let ■

∧ ≡ λu : ∗, v : ∗. (Πw : ∗. (u → v → w ) → w );

■

D ≡ λu : ∗, v : ∗, x : u , y : v , w : ∗, z : (u → v → w ). z x y ;

■

fst ≡ λu : ∗, v : ∗, x : (∧ u v ). x u (λy : u , z : v . y );

■

snd ≡ λu : ∗, v : ∗, x : (∧ u v ). x v (λy : u , z : v . z).

We write A ∧ B for ∧ A B when A and B are intended as logical formulae, and A × B when A and B are intended as types. The ∧ combinator represents the logical conjunction, while D, fst and snd are the corresponding proof-terms. They model the idea of pairs of proofs along with the pair projectors ([Exercise] compare these definitions with the datatypes of pairs, i.e., 2-tuples). 201 of 472

Conjunction II Lemma 11.6 In λ2 and stronger systems, it holds that ■

u : ∗, v : ∗ ` u ∧ v : ∗;

■

u : ∗, v : ∗ ` D u v : u → v → u ∧ v ;

■

u : ∗, v : ∗ ` fst u v : u ∧ v → u;

■

u : ∗, v : ∗ ` snd u v : u ∧ v → v .

Proof. Standard. The first statement is the formation rule for conjunction: if A and B are formulae, so is A ∧ B. The second statement is the introduction rule: Γ`A Γ`B ∧I Γ ` A∧B 202 of 472

Conjunction III The third and the fourth statements are the elimination rules: Γ ` A∧B Γ`A

∧E1

Γ ` A∧B Γ`B

∧E2

Lemma 11.7 In λ2 and stronger systems, it holds that ■

u : ∗, v : ∗x : u , y : v ` fst u v (D u v x y ) : u;

■

u : ∗, v : ∗x : u , y : v ` snd u v (D u v x y ) : v ;

■

fst u v (D u v x y ) =β x and snd u v (D u v x y ) =β y .

Proof. Standard. The first and the second statement show that introducing and eliminating a conjunction is a redundant detour. The third statement tells us that the detour is eliminated by reducing the proof term.

203 of 472

Disjunction I Definition 11.8 (Disjunction) Let ■

∨ ≡ λu : ∗, v : ∗. (Πw : ∗. (u → w ) → ((v → w ) → w ));

■

inl ≡ λu : ∗, v : ∗, x : u , w : ∗, f : u → w , g : v → w . f x;

■

inr ≡ λu : ∗, v : ∗, y : u , w : ∗, f : u → w , g : v → w . g y ;

■

case ≡ λu : ∗, v : ∗, z : (∨ u v ), f : u → w , g : v → w . z w f g .

We write A ∨ B as an abbreviation for ∨ A B; when dealing with types, we also write A + B instead of A ∨ B. The ∨ combinator represents the logical disjunction, while inl, inr and case are the corresponding proof-terms. They model the datatype representing the disjoint union of pairs of proofs, along with the standard injections. 204 of 472

Disjunction II Lemma 11.9 In λ2 and stronger systems, the following facts are true: ■

u : ∗, v : ∗ ` inl u v : u → u ∨ v ;

■

u : ∗, v : ∗ ` inr u v : v → u ∨ v ;

■

u : ∗, v : ∗ ` case u v : u ∨ v → (Πw : ∗. (u → w ) → ((v → w ) → w )).

Proof. Standard. Thus, inl and inr represent the proofs of disjunction introduction: Γ`A Γ ` A∨B 205 of 472

∨Il

Γ`B Γ ` A∨B

∨Ir

Disjunction III Similarly, case represents the proof of disjunction elimination: Γ ` A∨B

Γ, x : A ` C Γ`C

Γ, x : B ` C

∨E

where, as usual x 6∈ FV(Γ ∪ { A, B , C }). As before, introducing and immediately eliminating a disjunction is a detour, since a more compact proof can be obtained by combining the left (right) introduction rule with the second (third, respectively) assumption in the ∨E rule. The following lemma shows that these detours are correct and they are eliminated by reducing the proof-terms. 206 of 472

Disjunction IV

Lemma 11.10 The following facts are true in λ2 and stronger systems: ■

u : ∗, v : ∗, x : u , y : v , f : u → w , g : v → w ` case u v (inl u v x) w f g : w ;

■

u : ∗, v : ∗, x : u , y : v , f : u → w , g : v → w ` case u v (inr u v y ) w f g : w ;

■

case u v (inl u v x) w f g =β f x;

■

case u v (inr u v y ) w f g =β g y .

Proof. Standard.

207 of 472

Falsity I Definition 11.11 (Falsity) The void type is defined as void ≡ Πu : ∗. u. We write ⊥ for void when dealing with the logical interpretation.

Lemma 11.12 In every system in the λ-cube, ■

` ⊥ : ∗;

■

x : ⊥, u : ∗ ` x u : u.

Proof. Standard. As usual, the first statement is a formation rule: ⊥ is a formula. 208 of 472

Falsity II The second statement corresponds to the ⊥E inference rule: Γ`⊥ Γ`A

⊥E

for any formula A.

Theorem 11.13 (Consistency) In the λ-cube there is no closed term M such that ` M : ⊥.

Proof. (i) Suppose there is such an M. Then, by the application rule, we can derive u : ∗ ` M u : u and, since the systems in the λ-cube are strong normalising, M u =β N for some N in β-nf. ,→ 209 of 472

Falsity III ,→ Proof. (ii) By the generation lemma, N cannot be an abstraction, so N ≡ a N1 . . . Nn , where a is a constant or a variable, u : ∗ ` a : (Πy1 : A1 , . . . , yn : An . ∗) and, for each i, u : ∗ ` Ni : Ai . If a is a constant then a : (Πy1 : A1 , . . . , yn : An . ∗) must be an axiom, which is impossible. Otherwise, if a is a variable, then a ≡ u and n = 0, but the only possible type for u is ∗, not u. This result shows that all the systems in the λ-cube are logically consistent, i.e., they cannot prove the false proposition. This is the syntactical counterpart of soundness, which says that any provable proposition is true. 210 of 472

Existential quantification I Definition 11.14 (Existential quantifier) Let ■

Σ ≡ λu : ∗, v : u → ∗. (Πw : ∗. (Πx : u . v x → w ) → w );

■

D 0 ≡ λu : ∗, v : u → ∗, x : u , y : v x , w : ∗, z : (Πx : u . v x → w ). z x y ;

■

proj ≡ λu : ∗, v : u → ∗, w : ∗, z : (Πx : u . v x → w ), y : (Πx : u . v x). y w z.

We write ∃x : A. B as an abbreviation for Σ A (λx : A. B). In Logic, ∃x : A. B is a formula if A is a type and B is a formula. The behaviour of the ∃ quantifier is fixed by the following inference rules: Γ ` B(t) Γ ` ∃x : A. B(x)

∃I

Γ ` ∃x : A. B(x)

Γ, B(p) ` C

Γ`C

where t is a term of type A and p 6∈ FV(Γ ∪ { C }).

211 of 472

∃E

Existential quantification II Lemma 11.15 In λ2 and stronger systems it holds that ■

u : ∗, v : u → ∗ ` (∃x : u . v x) : ∗;

■

u : ∗, v : u → ∗ ` D 0 u v : (∀t : u . v t ⊃ (∃x : u . v x));

■

u : ∗, v :u → ∗ ` proju v :(Πw : ∗. (∀y :u . v y → w ) ⊃ ((∃x :u . v x) ⊃ w )).

Proof. Standard. As usual, the statements encode the formation rule and the two inference rules, as it is immediate to see.

212 of 472

Existential quantification III Lemma 11.16 In λ2 and stronger systems, it holds that ■

■

u : ∗, v : u → ∗, w : ∗, x : u , y : v x , z : Πx : u . v x → w ` proj u v w x (D 0 u v x y ) : w ; proj u v w x (D 0 u v x y ) =β z x y .

Proof. Standard. This lemma shows the correctness of the detour of introducing and immediately eliminating an existential quantification. It also shows that reduction of proof-terms takes care of eliminating the detour. 213 of 472

Equality I Definition 11.17 (Equality) The typed equality M =A N is defined to be Q A M N where Q ≡ λu : ∗, x : u , y : u . (Πz : u → ∗. z x ⊃ z y ) . This definition makes sense when A : ∗ has already been proved or assumed in the context. In Logic, M = N is a formula if M and N are both terms of type A. The behaviour of equality is controlled by the rules:

`x =x 214 of 472

refl

Γ`t =s

Γ ` B(t)

Γ ` B(s)

subst

Equality II Lemma 11.18 In λ2 and stronger systems, it holds that ■

` Q : Πu : ∗. u → u → ∗;

■

u : ∗, x : u ` (λz : u → ∗, w : z x . w ) : (x =u x);

■

u : ∗, x : u , y : u , m : (x =u y ), z : u → ∗, n : z x ` m z n : z y .

Proof. Standard. The first statement encodes the formation rule, while the second and the third statements are the formalisation of the refl and subst inference rules, respectively. 215 of 472

Expressivity I The expressive power of the systems in the λ-cube is given by the expressive power of the corresponding logical system. The logical systems are intuitionistic, and precisely System λ→ λ2 λP λP2 λω λω λP ω λP ω = λC 216 of 472

propositional ∀ → fragment only p p × × p p × ×

p

× p × p × p ×

order I II I II weak H H weak H H

Expressivity II

Since in λ2 and stronger systems, it is possible to represent Peano arithmetic, it follow that those systems cannot be logically complete because the Gödel’s Incompleteness Theorem holds in them. In λ2 and stronger systems, for the same argument, all the partial recursive functions can be represented. On the other hand, since these systems are subsystems of the pure λ-calculus, no other functions can be represented. Thus, λ2 and stronger systems are Turing-complete.

217 of 472

References and Hints

The content of this lesson has been taken from [HS], Chapter 13G. The detours in logical proofs and their elimination is the base to construct normalisation proofs in Logic. This topic will not be analysed any further in this course, but the interested reader may find an in-depth treatment in A.S. Troelstra, H. Schwichtenberg, Basic Proof Theory, 2nd edition, Cambridge University Press (2000). ISBN: 0521779111

218 of 472

Fundamentals of Functional Programming Lecture 12 — Intermezzo

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline This lesson, which concludes the first part of this course, wants to introduce exceptions and their implementation in a functional language. Exceptions, as in Java, are used in a variety of ways, although they are mainly employed for error reporting. This highly procedural feature can be smoothly incorporated in a functional language by means of a proper evaluation semantics. What is surprising is that the semantics has a logical counterpart, relating classical proofs with intuitionistic ones, in a way which is “natural” with respect to the formulae-as-types interpretation of typed programs.

220 of 472

Exceptions I Exceptions are a common way to treat special cases in a computation. In most functional languages they are implemented via two keywords: catch and throw. The construction catch j in M; operates by evaluating M in the usual way. If the construction throw j N is a subterm of N, when (and if) it gets evaluated, the value of the whole expression “catch j in . . . ” becomes the value of N.

221 of 472

Exceptions II

To model exceptions, we start with the simply typed λ-calculus and we extend it with a new term constructor. Then, we interpret the so obtained calculus as a computational structure by giving a suitable notion of reduction, which extends the usual β-reduction taking care of the new constructor. Our aim is to show that there is a “natural” interpretation of the calculus in Logic, establishing a variant of the formulae-as-types interpretation.

222 of 472

Syntax I Definition 12.1 (Types) Given a set C of atomic types, the set of types, T , is inductively defined as the smallest set such that ■

⊥ ∈ T and ⊥ 6∈ C ;

■

C ⊆T;

■

if t1 , t2 ∈ T then t1 → t2 ∈ T .

As usual, we abbreviate α → ⊥ in ¬α. The set of types is the same as for the simply typed λ-calculus except for the requirement of a special atomic type, ⊥.

223 of 472

Syntax II Definition 12.2 (Terms) Given a set of types T and a set V of variables such that there is a denumerable quantity of them for each type in T , the set of terms, ΛC , is inductively defined as the smallest set such that ■ ■

■

■

x : τ ∈ V implies x ∈ ΛC and x : τ and FV(x) = { x }; if M , N ∈ ΛC and M : α → β and N : α, then (M · N) ∈ ΛC and (M · N) : β and FV(M · N) = FV(M) ∪ FV(N); if M ∈ ΛC , M : β and x : α ∈ V , then (λx : α. M) ∈ ΛC , (λx : α. M) : α → β and FV(λx : α. M) = FV(M) \ { x }; if M ∈ ΛC and M : ¬¬α, then C (M) ∈ ΛC , C (M) : α and FV(C (M)) = FV(M).

We say that a term t is a value if t is a variable or a λ-abstraction. 224 of 472

Syntax III The operational meaning of the C term constructor is: whenever the subterm C (M) is reduced in the term E [C (M)], the whole term reduces to M applied to the procedural abstraction of the initial context. Formally, E [C (M)] B M (λz . A (E [z])) , where z 6∈ FV(E [M]) and A (N) ≡ C (λd . N) ,

with d 6∈ FV(N).

225 of 472

Syntax IV Since E [A (M)] B (λd . M)(λz . A (E [z])) B M, the evaluation of A (M) in a context E throws away the current context and continues by evaluating its argument M. So, A aborts the current context E and continues by evaluating its argument M. Similarly, C (M) throws away the current context E , but it can be recovered if the term M is a λ-abstraction: in this case the abstracted variable x is instantiated by β-reduction to the context deprived from the redex; applying x to some term t evaluates to the current context applied to the term t. So C allows to control the current context by passing it as a parameter to its argument M.

226 of 472

Semantics I Definition 12.3 (Evaluation context) Fixed a set of terms, the evaluation context E is an object satisfying one of the following clauses ■

E ≡ [];

■

E ≡ E 0 N where N ∈ ΛC and E 0 is an evaluation context;

■

E ≡ V e 0 where V is a value and E 0 an evaluation context.

If E is an evaluation context and M a term, then E [M] denotes the term that results replacing M in the hole [] of E . Intuitively, an evaluation context is useful to identify the position of the β-redex we want to reduce. Because of its definition, an evaluation context always identifies the outermost leftmost redex with its hole. 227 of 472

Semantics II Lemma 12.4 Any closed term M is either a value or it can be written in a unique way as M = E [R] where R is a β-redex.

Proof. Choose R as the outermost-leftmost redex of M. This lemma shows that defining reductions in terms of evaluation contexts forces to follow the leftmost reduction strategy. Moreover, the definition of value and its interaction with the definition of evaluation context forces the call-by-value paradigm.

228 of 472

Semantics III Definition 12.5 (C -reduction) The reduction relation is defined as the reflexive and transitive closure of the following rules: ■

■

■

C (λk . E [(λx . M) V ]) B C (λk . E [M[V /x]]), where k 6∈ FV(E [(λx . M) V ]) (β-reduction); C (λk . E [C (M)]) B C (λk . M (λz . A (E [z]))), where k , z 6∈ FV(E [C (M)]) (C -reduction); C (λk . k V ) B V , where V is a value and k 6∈ FV(V ) (cleanup).

Instead of evaluating an expression M : α, we start by evaluating C (λk : ¬α. k M).

229 of 472

Semantics IV This reduction relation allows to evaluate in a call-by-value every term, even if containing a C constructor, yielding a value. If the term M does not contain the C operator, it reduces in the usual way, following β-reductions until it reaches a value: in this case the cleanup rule is applied, yielding the result. If the term M contains an instance of the C operator and it reduces via the C -reduction, then the surrounding context is thrown away and is passed to the argument as a parameter, as previously explained. The initial context of M, I ≡ C (λk . k []), is used to “protect” the evaluation of M, giving a reference context where every exception is captured. 230 of 472

Catch and throw I

The catch/throw mechanism that implements exceptions is modelled as follows: ■

the expression catch j in M is equivalent to the term C (λj . j M);

■

the expression throw j N is equivalent to the term j N.

231 of 472

Catch and throw II If there is no throw expression j N in the scope of a catch, the corresponding term reduces as follows: E [C (λj . j M)] B (λj . j M)(λz . A (E [z])) B A (E [M])) B E [M] , as expected. If there is a throw j Nexpression, it reduces to N, as required. For example, consider the program P ≡ λx . catch e in if x = 0 then throw e “error” else 1000/x; Then, P 0 B "error" and P 2 B 500. 232 of 472

Logical interpretation I Consider the implicational fragment of classical propositional logic: Γ`A→B Γ`B

Γ`A

→E

Γ, A ` B Γ`A→B

→I

Γ ` ¬¬A Γ`A

⊥C

The corresponding translation in the λ-calculus is defined as follows: ■ an assumption A is mapped into the term x : A; ■ if the proof Γ ` A → B corresponds to the term M : A → B and Γ ` A corresponds to N : A, then an application of the → E rule is mapped into (M N) : B; ■ if Γ, A ` B corresponds to M : B where the assumption A corresponds to x : A, then an application of the → I rule is mapped into (λx : A. M) : A → B; ■ if Γ ` ¬¬A corresponds to M : ¬¬A, then an application of the ⊥ C rule is mapped into C (M) : A. 233 of 472

Logical interpretation II In this way, to every proof in the implicational fragment of the classical propositional logic is assigned a corresponding typed λ-term M : α. Differently from the corresponding intuitionistic fragment, the output of the computation may not be of type α. This happens exactly when the classical rule is used, or, in computational terms, if the C operator is evaluated. What we want to show is that it is possible embed the considered classical fragment into the corresponding intuitionistic one and, moreover, the embedding preserves the computational content.

234 of 472

Logical interpretation III Definition 12.6 (CPS-translation) Given a term t, its CPS-transformation is inductively defined as: ■

x = λk . k x;

■

λx . M = λk . k (λx . M);

■

M N = λk . M (λm. N (λn. m n k));

■

C (M) = λk . M (λm. m (λz , d . k z)(λx . x));

where the abstracted variables we introduced, are all new. Also, given a type τ, its CPS-transformation is inductively defined as: ■

τ∗ = τ if τ is atomic;

■

(α → β)∗ = a∗ → (β∗ → ⊥) → ⊥.

235 of 472

Logical interpretation IV Theorem 12.7 If M : α is a term in ΛC , then M has type (α∗ → ⊥) → ⊥ in the simple theory of types. [Proof not required] Equivalently, in logical terms, the theorem can be formulated as

Theorem 12.8 If Π is a proof of Γ ` A in the implicational fragment of classical propositional logic corresponding to the term M : A, then there is an intuitionistic proof Π of Γ∗ ` ¬¬A∗ that corresponds to M, where © ∗ ª ∗ Γ = B : B ∈Γ .

236 of 472

Logical interpretation V Corollary 12.9 The CPS-translation is an embedding of the implicational fragment of classical propositional logic into the implicational fragment of intuitionistic propositional logic.

Proof. Evident from the theorem and the fact that ` A ↔ ¬¬A holds in the classical fragment.

Corollary 12.10 The evaluation of every well-typed term M : A in ΛC is finite.

Proof. Suppose there is an infinite reduction of M : A. Then, by the theorem, there is an infinite reduction of M : A∗ , which is impossible, being every simply-typed term SN. 237 of 472

Conclusion As a result of the analysis, we have coded the exception mechanism in a suitable extension of a functional language, defining an appropriate reduction strategy. Then, we have interpreted the language in logical terms, obtaining a bijective correspondence with the implicational fragment of classical propositional logic. Finally, via the CPS-translation, we have shown that the whole language can be embedded into the simple theory of types, via its logical interpretation as the implicational fragment of intuitionistic propositional logic. So, as a side result, we have constructed an implementation of the exception mechanism inside the simple theory of types. In this lesson, we have shown how to use the Mathematics of λ-calculus to encode in a smooth way the exception mechanism into a functional language, using logic as a bridge. 238 of 472

References and Hints

The material in this lesson has been taken from T.G. Griffin, A Formulae-as-Types Notion of Control, in Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, ACM Press, pp. 47–58 (1990). In that paper, the interested reader may find an outline of the omitted proof.

239 of 472

Fundamentals of Functional Programming Lecture 13

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline I In the second part of this course, we will study Category Theory. In the first part, we have seen how a functional language is constructed starting from two basic operations: application and abstraction. As we have seen in Combinatory Logic, abstraction is not essential and it can be simulated by means of application. As a matter of fact, most functional languages are constructed in this way: they are, more or less, a typed λ-calculus with a number of syntactical decorations to simplify the work of coding; the programs are compiled into an intermediate code, which is executed by an abstract machine according to a formal semantics, the one of β-reduction. To gain performances, the intermediate code can be expressed in Combinatory Logic, or in some other formalism suited to a formal model of λ-calculus. E.g., ML uses a “continuation-based” semantics, which is rooted in Domain Theory. 241 of 472

Outline II In Category Theory, we change point of view: instead of application and abstraction, we will use just one basic operation, which is function composition. It turns out that the mathematical world we will construct is deeper and much more powerful than λ-calculus. And, as a bonus, we get an abstract model for typed λ-calculi which is, in a sense, more elementary. Category Theory is a hard piece of Mathematics. Its difficulty lies in the general way it uses to pose problems and solutions. Its study requires a “shift of view”, abandoning the traditional framework of set-based notions in favour of a more abstract style of reasoning. It turns out to be the “right” style also to write functional programs, and this is the main reason to embark in its explanation. 242 of 472

Categories I Definition 13.1 (Category) A category C is a structure C = 〈O , A, dom, cod, ◦, id〉 such that ■

O is a collection of objects, denoted as Obj C;

■

A is a collection of arrows;

■

■

■

dom is an operation assigning to each arrow f an object dom f , its domain; cod is an operation assigning to each arrow f an object cod f , its codomain; ◦ is an operation, called composition, assigning to each pair of arrows f and g such that cod f = dom g , an arrow g ◦ f such that dom(g ◦ f ) = dom f and cod(g ◦ f ) = cod g ; Moreover, ◦ satisfies the

associative law: for any arrows f , g , h such that the following composition is defined, h ◦ (g ◦ f ) = (h ◦ g ) ◦ f ; 243 of 472

,→

Categories II ,→ (Category)

id is an operation, called identity, assigning to each object P an arrow idP such that dom(idP ) = P = cod(idP ); Moreover, idP satisfies the identity law: for any arrow f with dom f = P and cod f = Q, idQ ◦ f = f = f ◦ idP . If P and Q are objects, we write Hom(P , Q) or C(P , Q) for the collection of arrows whose domain is P and whose codomain is Q. We write f : P → Q if f ∈ Hom(P , Q). ■

Definition 13.2 Given a category C = 〈O , A, dom, cod, ◦, id〉, we say that ■

C is small if O and A are sets;

■

C is locally small if, for every P , Q ∈ O, Hom(P , Q) is a set;

■

C is large otherwise.

244 of 472

Concrete categories I Example 13.3 (Set) The category Set has sets as objects and (total) functions between them as arrows. Specifically, Set = 〈O , A, dom, cod, ◦, id〉 and ■ ■

■

■

O is the proper class of all sets; A is the proper class of f : D → C where f is a total function from D, its domain, to C , its codomain; ◦ is the usual composition of functions: given f : D → E and g : E → C , g ◦ f : D → C is (g ◦ f )(x) = g (f (x)), for all x ∈ D;

idP : P → P is the identity on P, i.e., for all x ∈ P, idP (x) = x.

It is immediate to see that the associative law and the identity law both hold. [Exercise] Prove it. 245 of 472

Concrete categories II There is a subtlety in the definition of Set: each function corresponds to many arrows in Set, in fact, given f : D → C and g : D → E such that for all x ∈ D, f (x) = g (x), it does not follow that f = g , unless C = E. Usually, arrows are defined along with their domain and codomain, as we did. Also, most of the times, composition and identities are obvious from the context; in these cases, it is customary to define the category specifying only the objects and the arrows. Sometimes, when also the arrows (objects) are clear from the context, just the objects (arrows) are specified.

246 of 472

Concrete categories III Definition 13.4 (Poset) A preorder is a pair 〈P , ≤P 〉 such that ≤P is a binary relation on P which is reflexive and transitive. A partially ordered set or poset is a preorder 〈P , ≤P 〉 where the relation is also anti-symmetric. An order preserving function or monotone function from the poset (preorder) 〈P , ≤P 〉 to the poset (preorder)〈Q , ≤Q 〉 is a function f : P → Q such that, if p ≤P p 0 , then f (p) ≤Q f (p 0 ). Posets and preorders are an example of algebraic structure, that is, a set, the universe, plus a number of operations acting on the universe.

247 of 472

Concrete categories IV

Example 13.5 (Poset) The category Poset has all the posets as objects and all the monotone functions between them as arrows.

Example 13.6 (Preorder) The category Preorder has all the preorders as objects and all the monotone functions between them as arrows. [Exercise] Check the Poset and Preorder are categories.

248 of 472

Concrete Categories V Example 13.7 (Mon) The category Mon has all the monoids as objects and their homomorphisms as arrows. Thus, an object of Mon has the form 〈M , ·M , eM 〉, where ·M is a binary operation on M, and eM ∈ M, such that (i) ·M is associative and (ii) eM is the unit of ·M . An arrow f : 〈M , ·M , eM 〉 → 〈N , ·N , eN 〉 of Mon is a function f : M → N preserving the product and the unit, i.e., (i) f (x ·M y ) = f (x) ·N f (y ) for all x , y ∈ M, and (ii) f (eM ) = eN .

Example 13.8 (Grp) The category Grp has all groups as objects and their homomorphisms as arrows. A group is a monoid with an inverse operation, and group homomorphisms are the monoid homomorphisms preserving inverses. 249 of 472

Concrete categories VI All the previous examples are composed by objects which are sets with some additional structure and arrows which are function preserving the structure. In general, this pattern gives raise to a category which is called concrete. Examples of concrete categories are: Category Set Pfn FinSet Mon Grp Poset Rng Vect Top 250 of 472

Objects sets sets finite sets monoids groups posets rings vectors spaces topologies

Arrows total functions partial functions total functions homomorphisms homomorphisms monotone functions homomorphisms linear transformations continuous functions

Abstract Categories I Abstract categories are not concrete. Many interesting and useful examples of categories are abstract.

Example 13.9 (0) The category 0 has no objects and no arrows.

Example 13.10 (1) The category 1 has a single object and only one arrow, the identity.

Example 13.11 (2) The category 2 has two objects and three arrows: the two identities and an arrow from one object to the second one. 251 of 472

Abstract Categories II

Example 13.12 (Discrete categories) A discrete category is such that its arrows are just the identities. Note that 0 and 1 are discrete categories. Note also that every set is a discrete category whose objects are its elements

252 of 472

Abstract Categories III

Example 13.13 (Preorder categories) A preorder 〈P , ≤P 〉 can be seen as a category P whose set of objects is P and whose arrows are defined as follows: whenever p , q ∈ P and p ≤P q, then there is a unique arrow p → q. By reflexivity and transitivity, it follows that P is a category. Note that every poset, being a preorder, gives raise to a category.

253 of 472

Abstract Categories IV

Example 13.14 (Monoid categories) A monoid 〈M , ·M , eM 〉 can be seen as a category M whose set of objects is a singleton, and whose set of arrows is M. Composition is the monoid product ·M and the identity is the monoid unit eM . Since all groups are monoids, every group can be thought to as a category. The category Set, as most of the concrete categories, is locally small. All the examples we have seen of abstract categories are small.

254 of 472

Opposite category Definition 13.15 (Opposite category) Given a category C, the opposite or dual category Cop has the same objects as C, but its arrows are the opposites of the arrows of C: if f : A → B is an arrow of C, then f : B → A is an arrow of Cop . Composition and identities are defined from C in the obvious way. The dual category provides a duality principle: any statement S about a category C can be transformed into a dual statement S op by exchanging the words “domain” and “codomain”, and replacing each composite g ◦ f by f ◦ g . If the statement S holds in C, then S op holds in Cop . Since (Cop )op = C, if a statement S is true in any category, then so is S op . Moreover, any construction x can be “dualised”: the dual construction, whose arrows are reverted, is called co-x. 255 of 472

Product category and subcategories I

Definition 13.16 (Product category) Given a pair C, D of categories, the product category C × D has as objects the pairs (A, B) where A ∈ Obj C and B ∈ D, and as arrows the pairs (f , g ) where f is a C-arrow and g is D-arrow. Composition and identities are defined pointwise: (f , g ) ◦ (h, i) = (f ◦ h, g ◦ i) and id(A,B) = (idA , idB ).

256 of 472

Product category and subcategories II

Definition 13.17 (Subcategory) A category C is a subcategory of a category D if ■

Obj C ⊆ Obj D;

■

for all A, B ∈ Obj C, HomC (A, B) ⊆ HomD (A, B);

■

composites and identities are the same in C as in D.

Moreover, C is full if, for all A, B ∈ Obj C, HomC (A, B) = HomD (A, B).

257 of 472

Product category and subcategories III

For example Poset is a full subcategory of Preorder; similarly, Grp is a full subcategory of Mon; evidently, also FinSet is a full subcategory of Set. But Set is a subcategory of Pfn which is not full. It is immediate to see that the product category of two discrete categories C and D is the discrete category such that Obj(C × D) = (Obj C) × (Obj D). [Exercise] Check it. Also, it is easy to see that, for any category C, 0 × C = 0 = C × 0.

258 of 472

References and Hints

This lesson follows Chapter 1.1 of [Pierce2]. It is very important to fully understand the definition of category and to remember the examples we have presented: they will often recur.

259 of 472

Fundamentals of Functional Programming Lecture 14

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline The definition of category is very flexible and it captures most of the mathematical theories, especially the ones of interest for Computer Science. Category Theory, at the most superficial level, provides a uniform language to describe Mathematics, offering a unifying view of its problems and techniques. Nevertheless, it becomes rapidly difficult to manage categorical properties without an agile notation. The first goal of this lesson is to introduce such a notation. We have seen that categories can be used to construct new categories. But it is possible and extremely significant to search for categorical structures inside a given category. In this lesson, we will state the basics on this direction. 261 of 472

Diagrams Definition 14.1 (Diagram) A diagram in a category is a directed multi-graph whose vertexes are the objects and whose edges are the arrows. A diagram is said to commute if, for every pair of paths with a common origin and target, the compositions of arrows along the paths yield the same result For example, saying that the following diagram commutes f / B @@ k @@ h @@ g C i /D

A@

means that g ◦ f = k = i ◦ h. 262 of 472

Monics and epics I

Definition 14.2 (Monomorphism) An arrow f : B → C in a category C is a monomorphism, or monic, or is mono, if, for every pair g , h : A → B of arrows in C, f ◦ g = f ◦ h implies g = h. In a diagram, a monomorphism f is denoted as /

263 of 472

f

/.

Monics and epics II

Dualising the notion of monomorphism, we obtain

Definition 14.3 (Epimorphism) An arrow f : A → B in a category C is an epimorphism, or epic, or is epi, if, for every pair g , h : B → C of arrows in C, g ◦ f = h ◦ f implies g = h. In a diagram, an epimorphism f is denoted as

264 of 472

f

/ /.

Monics and epics III In Set, it is easy to verify that monomorphisms are exactly the injective functions, and that epimorphisms are exactly the surjective functions. [Exercise] Check it. In general, many categorical notions are built by abstraction over the relevant notions of Set, or same other interesting category. This abstraction process takes place by trying to express the concepts in terms of objects and arrows only. In this way, one can try to replicate the proofs involving the abstracted concept in the categorical framework. This exercise allows to transport the result in other categories, apparently unconnected. So, this abstraction process leads to very general notions, which may behave unexpectedly. 265 of 472

Monics and epics IV Example 14.4 Both 〈Z, +, 0〉 and 〈N, +, 0〉 are objects in Mon. Consider i : 〈N, +, 0〉 → 〈Z, +, 0〉 n 7→ n Being an injection, i is mono. But it is also epi, although not surjective. In fact, let f , g : 〈Z, +, 0〉 → 〈M , ∗, e 〉 such that f ◦ i = g ◦ i, and let z ∈ Z. If z ≥ 0 then f (z) = f (i(z)) = g (i(z)) = g (z). If z < 0, then f (z) = f (z) ∗ e = f (z) ∗ g (0) = f (z) ∗ g (−z + z) = f (z) ∗ g (−z) ∗ g (z) = f (z) ∗ g (i(−z)) ∗ g (z) = f (z) ∗ f (i(−z)) ∗ g (z) = f (z) ∗ f (−z) ∗ g (z) = f (z + −z) ∗ g (z) = f (0) ∗ g (z) = e ∗ g (z) = g (z). So f = g . 266 of 472

Isomorphisms I Definition 14.5 (Isomorphism) An arrow f : A → B is an isomorphism, or is iso, if there is an arrow f −1 : B → A, the inverse of f , such that f −1 ◦ f = idA and f ◦ f −1 = idB . In this case, A is said to be isomorphic to B, notation A ∼ = B. It is easy to prove that, if it exists, f −1 is unique, and the “being isomorphic” relation is an equivalence. As one would expect, in Set, isomorphisms are exactly the bijective functions. Also, in concrete categories, isomorphisms are the usual invertible homomorphisms. Notice that, in Mon, by the previous example, an arrow which is mono and epi does not need to be iso. 267 of 472

Isomorphisms II In general, Category Theory considers objects and arrows up to isomorphisms, meaning that isomorphic objects are considered indistinguishable.

Definition 14.6 (Subobject) If f : A → B is mono in the category C, then f is a subobject of B. In Set, subobjects denote subsets, up to isomorphisms. In fact, if A ⊆ B then the canonical injection i : A → B is mono and, thus, i is a subobject of B. Its image, i(A), is identical to A, so it is also isomorphic to A. Consider f : A B, its image, f (A) ∼ = A and it is a subset of B, so “A through f ” is a subset of B, but, since A is the domain of f , it is more correct to say that f is a subobject of B. 268 of 472

Initial and terminal objects I Definition 14.7 (Initial object) Given a category C, 0 ∈ Obj C is initial if, for every A ∈ Obj C, there is a unique arrow !: 0 → A. [Exercise] Show that 0, if it exists, is unique up to isomorphisms. Dually,

Definition 14.8 (Terminal object) Given a category C, 1 ∈ Obj C is terminal if, for every A ∈ Obj C, there is a unique arrow !: A → 1. A notational convention denotes unique arrows by !. 269 of 472

Initial and terminal objects II In Set, ; is the unique initial object, while any singleton is a terminal object. For this reason, an arrow f : 1 → A is said to be a (global) element of a category. In Set, this amounts to say that, if f is an element, then its image, f (1), is a singleton, thus f denotes a unique member (element) of the set A. In a preorder category P, if it exists, the initial object is the minimum, and the terminal object is the maximum. In Grp, any trivial group T containing just the unit, is the initial object, as well as the terminal object. An object which is both initial and terminal is said to be a zero object. Notice that, in Grp, this phenomenon can be spelt out as 0 = 1! 270 of 472

Products and coproducts I Definition 14.9 (Product) Given a category C, the product of A, B ∈ Obj C is an object A × B, together with two arrows π1 : A × B → A and π2 : A × B → B, its projections, such that the diagram Ao

π1

A×B

π2

/B

is universal, that is, for any object C and pairs of arrows f : C → A and g : C → B, there is a unique arrow 〈f , g 〉 : C → A × B making the following diagram to commute C y EE EE g f yyy EE EE yy 〈f ,g 〉 y E" y |y o A π1 A × B π2 / B

271 of 472

Products and coproducts II Definition 14.10 (Coproduct) Given a category C, the coproduct of A, B ∈ Obj C is an object A + B, together with two arrows i1 : A → A + B and i2 : B → A + B, its injections, such that the following diagram is co-universal A

i1

/ A + B o i2

B

Thus, dually to products, in a coproduct the following diagram commutes for each appropriate C , f and g , with [f , g ] unique C y< O bEEE Eg yy y y [f ,g ] EEE EE yy y y i1 / A + B o i2 B A f

[Exercise] Prove that [f , g ] is an isomorphism. By duality, it follows that also 〈f , g 〉 is iso.

272 of 472

Products and coproducts III In Set, the product A × B is just the Cartesian product, with π1 and π2 the canonical projections; the coproduct object A + B is the disjoint union A t B and i1 , i2 are its canonical injections. In a poset category P, the product object A × B, when it exists, is the greatest lower bound of A and B, while, symmetrically, A + B is the least upper bound of A and B. In analogy with the set-theoretic interpretation of products and coproducts, it is worth extending the definitions from binary operations to operations of an arbitrary arity.

273 of 472

Products and coproducts IV Definition 14.11 (Product) The product of a family {Ai }i ∈I of objects indexed by the set I is an object Πi ∈I Ai and a family {πi : Πi ∈I Ai → Ai }i ∈I of arrows such that the following diagram is universal: Πi ∈I AFi FF y y FFπk yy FF y y FF y " |yy ··· Aj Ak πj

If |I | = 2, this definition of product reduces to the one for binary products. If |I | = 1, the notion of product reduces to Πi ∈I Ai = Ax , where x is the unique element of I , with π1 = idAx . If |I | = 0, the notion of product reduces to the definition of terminal object. 274 of 472

Products and coproducts V

A category where, for any collection U of objects, there exists the product ΠU, is said to have arbitrary products. If the property holds only when U is finite, the category is said to have finite products. Dually, for coproducts. For example, Set has arbitrary products and coproducts. A poset category P having finite products and coproducts is a bounded lattice; if it has arbitrary products and coproducts, it is a complete and bounded lattice.

275 of 472

Limits and colimits I In the definitions of products we used the notion of universal construction. Properly speaking, universal constructions are limits.

Definition 14.12 (Cone) Let C be a category and D a diagram in C. A cone for D is an © ª X ∈ Obj C together with a family fi : X → Di Di ∈Obj D of arrows indexed by the objects in D such that, for each g : Di → Dj arrow in D, the following diagram commutes ~~ ~~ ~ ~ ~ ~ fi

Di 276 of 472

X@ @ g

@@ fj @@ @ / Dj

Limits and colimits II Definition 14.13 (Limit) A limit for a diagram D in© a category ªC is a cone fi : X → Di for D with the property that, if fi 0 : X 0 → Di is a cone for D, then there is a unique arrow k : X 0 → X such that the diagram ©

k

X0 A

AA AA A f 0 AA i

Di commutes for every Di in D.

277 of 472

/X ~ ~ ~ ~~f ~ i ~

ª

Limits and colimits III

Example 14.14 Given A, B ∈ Obj C, consider the diagram D whose vertexes are A and B, with no edges. The limit of D is, if it exists, the product A × B. Similarly, the product ΠU, for U ⊆ Obj C, if it exists, is the limit of the diagram D with U as the set of vertexes and no edges.

Example 14.15 Consider the empty diagram D. Hence, a cone for D is any object and, thus, its limit is a terminal object, if it exists.

278 of 472

Limits and colimits IV Dualising the notion of limit, we obtain the concept of colimit.

Definition 14.16 (Colimit) A cocone for a diagram D in a category C is an X ∈ Obj C and a © ª collection fi : Di → X of arrows indexed by the objects in D such that fj ◦ g = fi for each g : Di → Dj in D. © ª A colimit, or inverse limit, for D is a cocone fi : Di ª→ X for D with © 0 the co-universal property that, for any fi : Di → X 0 cocone for D, there is a unique arrow k : X → X 0 such that k ◦ fi = fi 0 for every object Di in D. [Exercise] Show that coproducts and initial objects are colimits. 279 of 472

References and Hints

This lesson follows Chapter 1.2, 1.3, 1.4, 1.5, 1.6 and 1.9 of [Pierce2]. Useful hints and explanations can be found in Chapter 3 of [Goldblatt]. Some of the examples have been taken from this highly approachable text. Beware that Exercise 1.3.10.2 in [Pierce2] is wrong. In fact, the last sentence should be read as “Also, if g ◦ f is monic then so is g ”. You may want to prove that the actual text is wrong by providing a counterexample.

280 of 472

Fundamentals of Functional Programming Lecture 15

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline

Beside products, coproducts, terminal and initial objects, there is a number of interesting and useful structures which can be described as limits or colimits. In this lesson, we will introduce them. Most of these structures arise from relevant constructions in Set, while a few others derive from algebraic categories, like Grp or Vect. These constructions are the essential ingredients of the categorical language and they have a meaning in most mathematical theories.

282 of 472

Equalisers and coequalisers I Definition 15.1 (Equaliser) An arrow e : X → A in a category C is an equaliser of a pair of arrows f , g : A → B if ■ ■

f ◦ e = g ◦ e; whenever e 0 : X 0 → A satisfies f ◦ e 0 = g ◦ e 0 , there is a unique arrow k : X 0 → X such that e : k = e 0 .

In other words, an equaliser is the limit of the diagram A

283 of 472

f g

// B

Equalisers and coequalisers II Definition 15.2 (Coequaliser) A coequaliser in a category C for a pair of arrows f , g : A → B is the colimit of the diagram A

// B

f g

Example 15.3 ©

ª

Let f , g : A → B in Set, and let X = x ∈ A: f (x) = g (x) . Then, the inclusion e : X© → A is an equaliserª of f , g . Also, let S = (f (x), g (x)): x ∈ A ⊆ A × B and let R be the minimal equivalence relation containing S. Call [y ]R the equivalence class containing y ∈ B. Then, the map fR : B → B/R given by b 7→ [b]R is the coequaliser of f , g . 284 of 472

Equalisers and coequalisers III As usual, equalisers and coequalisers are unique up to isomorphisms.

Lemma 15.4 Every equaliser is monic.

Proof. Suppose i : E → A equalises f , g : A → B. Let i ◦ j = i ◦ l where j , l : C → E and let h : C → A be h = i ◦ j. We have f ◦ h = f ◦ i ◦ j = g ◦ i ◦ j = g ◦ h and so there is a unique k : C → E with i ◦ k = h. But h = i ◦ j, so k must be j. However, i ◦ l = i ◦ j = h, so k = l , and j = l .

Corollary 15.5 Every coequaliser is epic. 285 of 472

Existence of limits I Theorem 15.6 (Limit) Let D be a diagram in a category C, with sets V of vertexes and E of edges. If every V -indexed and every E -indexed family of objects in C has a product and every parallel pair of arrows in C has an equaliser, then D has a limit.

Proof. (i) For any Di ∈ V and (e : Di → Dj ) ∈ E , consider the diagram: Di o e

Dj o 286 of 472

πi

πe

ΠDi ∈V Di MMM MMMπj MMM MMM & / Dj Π(e : Di →Dj )∈E Dj πe

,→

Existence of limits II ,→ Proof. (ii) Since Π(e : Di →Dj )∈E Dj is a product, there is a unique p : ΠDi ∈V Di → Π(e : Di →Dj )∈E Dj such that πe ◦ p = πj . For the same reason, there is a unique q : ΠDi ∈V Di → Π(e : Di →Dj )∈E Dj such that πe ◦ q = e ◦ πi . In diagrams: ΠDi ∈V Di Di o π i M e

Dj o 287 of 472

MMM π MMMj MMM MM& / Dj Π(e : Di →Dj )∈E Dj q

πe

p

πe

,→

Existence of limits III ,→ Proof. (iii) Let h : X → ΠDi ∈V Di be an equaliser of p , q, and call fi = πi ◦ h. pX ppp p p p h ppp fj xoppp ΠDi ∈V Di πi MMM MMMπj q p MMM MMM & o / Dj Π(e : Di →Dj )∈E Dj fi

Di e

Dj

πe

πe

Consider fi : X → Di Di ∈V . It holds that e ◦ fi = e ◦ πi ◦ h = πe ◦ q ◦ h = © ª = πe ◦ p ◦ h = πj ◦ h = fj , so fi : X → Di Di ∈V is cone for D. ,→ ©

288 of 472

ª

Existence of limits IV ,→ Proof. (iv) © ª Assume that fi 0 : X 0 → Di D ∈V is a cone for D. By the universal i property of products, there is a unique arrow h0 : X 0 → ΠDi ∈V Di such that πi ◦ h0 = fi 0 for each Di ∈ V . So, for any (e : Di → Dj ) ∈ E , πe ◦ p ◦ h0 = πj ◦ h0 = fj0 = e ◦ fi 0 = e ◦πi ◦ h0 = = πe ◦ q ◦ h0 . Thus, the following diagram commutes: fj0

/D r8 j r r rrr p ◦h0 q ◦h0 rrr π r e rr

X0

Π(e : Di →Dj )∈E Dj

which, by the universal property of the product, implies that p ◦ h0 = q ◦ h0 , or, in other terms, h0 equalises p , q. 289 of 472

,→

Existence of limits V ,→ Proof. (v) Thus, by the universal property of equalisers, there is a unique k : X 0 → X such that h ◦ k = h0 . Looking at the commutative diagram XO II k

X0

fi II IIh II I$ ΠDi ∈V Di : vv v v v vv 0 vv h fi0

πi

/&8 Di

it holds that fi ◦ k = π©i ◦ h ◦ k = πiª◦ h0 = fi 0 . So k : X 0 → X has the property to say that fi : X → Di Di ∈V is a limit. But we must prove that k is unique. ,→ 290 of 472

Existence of limits VI ,→ Proof. (vi) Suppose k 0 : X 0 → X satisfies fi ◦ k 0 = fi 0 . Then, since πi ◦ h0 = fi 0 = fi ◦ k 0 = πi ◦ h ◦ k 0

for all Di ∈ V , the universal property of the product ΠDi ∈V Di guarantees that h ◦ k 0 = h0 . But the unique arrow with this property is k, so k 0 = k.

Corollary 15.7 A category with equalisers and arbitrary products has all limits.

Corollary 15.8 A category with equalisers and finite products has all finite limits. 291 of 472

Pullbacks and pushouts I Definition 15.9 (Pullback) The pullback of a pair of arrows f : A → C and g : B → C is the limit of the diagram A

f

/C o g

B .

Dually,

Definition 15.10 (Pushout) The pushout of a pair of arrows f : C → A and g : C → B is the colimit of the diagram Ao

292 of 472

f

C

g

/B .

Pullbacks and pushouts II

Example 15.11 Let f : B → C in Set and let A ⊆ C . Then the following is a pullback: f −1 (A) f|f −1 (A)

⊆

A

/B

⊆

f

/C

where f|f −1 (A) is f restricted to f −1 (A) = x ∈ B : f (x) ∈ A . ©

293 of 472

ª

Pullbacks and pushouts III Example 15.12 In Set, the following is a pullback which defines intersection: A∩B ⊆

⊆

A

/B

⊆

⊆

/C

Example 15.13 In Set, let f : A → C and let g : B → C . Then, ©

ª

P = (a, b) ∈ A × B : f (a) = g (b)

is the pullback object, while its projections are πf : P → B, (a, b) 7→ b, and πg : P → A, (a, b) 7→ a. 294 of 472

Pullbacks and pushouts IV Example 15.14 In any category with a terminal object, the following is a pullback: A×B π1

π2

/B

A

!

/1

!

Example 15.15 In any category, if X e

e

A

/A

f

g

/B

is a pullback, then e is an equaliser of f , g . 295 of 472

Pullbacks and pushouts V Lemma 15.16 (Pullback) Consider the following diagram: /•

• • ■

■

α

/•

/• β

/•

If both the inner squares α and β are pullbacks, then so is the outer rectangle; If the β square and the outer rectangle are pullbacks, then so is the α square.

296 of 472

Pullbacks and pushouts VI Proof. (i) First, notice that, in both cases, the diagram is commutative. For (1), consider the following commutative diagram: X / UUUUU

UUUU // UUfUU // UUUU UUUU // U*/ / / C B g / P // // α β / /• D e /A

Since β is a pullback and e ◦ g : X → A, f : X → B, there is a unique k : X → C making the diagram to commute. But α is a pullback and k : X → C , g : X → D, so there is a unique h : X → P making the diagram to commute. ,→ 297 of 472

Pullbacks and pushouts VII ,→ Proof. (ii) For (2), consider the following commutative diagram: f

X@ @

P

c

/( C

@@ @ α j g @@ d a / D A

i β

/B /•

Since the outer rectangle is a pullback and g : X → D, i ◦ f : X → B, there is a unique h : X → P such that i ◦ f = i ◦ c ◦ h and g = d ◦ h. Then, α is a pullback if f = c ◦ h. From β being a pullback and a ◦ g : X → A, i ◦ f : X → B, there is a unique k : X → C such that i ◦ f = i ◦ k and a ◦ g = j ◦ k. But both f and c ◦ h satisfy the equations for k, so, by uniqueness of k, f = c ◦ h. 298 of 472

Pullbacks and pushouts VIII Lemma 15.17 If the following diagram is a pullback, then g 0 is mono: P g0

f0

A

/B

f

g

/C

Proof. Let h, k : X → P be such that g 0 ◦ h = g 0 ◦ k. Then, g ◦ f 0 ◦ h = f ◦ g 0 ◦ h = f ◦ g 0 ◦ k = g ◦ f 0 ◦ k. But, g is mono, so f 0 ◦ h = f 0 ◦ k. Since the square is a pullback and g 0 ◦ h = g 0 ◦ k : X → A, f 0 ◦ h = f 0 ◦ k : X → B, there is a unique l such that g 0 ◦ h = g 0 ◦ k = g 0 ◦ l , thus h = k. 299 of 472

Pullbacks and pushouts IX

[Exercise] By duality, state and prove the corresponding results for pushouts. [Exercise] Characterise pullbacks and pushouts in a poset category. Equalisers and pullbacks, as well as their dual counterparts, are typical constructions in Category Theory. Differently from products, initial and terminal objects, they sound unfamiliar to set-theoretic reasoning. Take some effort to understand them and their manipulation.

300 of 472

References and Hints

This lesson covers Chapters 1.7, 1.8 and 1.9 of [Pierce2]. Some examples and hints have been taken from [Goldblatt].

301 of 472

Fundamentals of Functional Programming Lecture 16

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline

In this lesson, we want to introduce some more constructions. These structures are closely related to the functional interpretation of categories, and they have strong links with the λ-calculus. Moreover, these constructions are central in Category Theory since they provide, when present, a sort of “completeness” of the category. With these tools, we will be ready to build the advanced instruments of Category Theory, which allow to exploit the full power of this approach to (functional) reasoning.

303 of 472

Kernels I Definition 16.1 (Kernel relation) In Set, let f : A → B be a function. The kernel relation Rf associated to f is defined as: x Rf y iff f (x) = f (y ). In categorical terms, the kernel relation is defined by the pullback Rf π1

π2

A

/A

f

f

/B

where π1 and π2 are the projections of ©

ª

Rf = (x , y ) ∈ A × A: f (x) = f (y ) 304 of 472

.

Kernels II Definition 16.2 (Kernel) Let C be a category with a zero (initial and terminal) object. Then, the kernel K of f : A → B is defined by the pullback /A

K !

0

!

f

/B

In Mon, Grp ©and Vect, as inª many other categories derived form Algebra, K = x ∈ A: f (x) = e , where e is the unit of the algebraic structure. 305 of 472

Exponentiation I Definition 16.3 (Exponentiation) A category C with all products has exponentiation if, for any A, B ∈ Obj C, there is B A ∈ Obj C, the exponential object, and ev : B A × A → B, the evaluation arrow, such that, for any C ∈ Obj C and g : C × A → B, there is a unique h : C → B A making the following diagram to commute: BA × F O A 〈h,idA 〉

FF FFev FF FF # ;B w ww w w ww g ww

C ×A 306 of 472

Exponentiation II

Since g = ev ◦〈h, idA 〉, from g we can construct h and vice versa, it follows that Hom(C × A, B) ∼ = Hom(C , B A ). Hence, the interpretation of having exponentiation is that the category allows currying of its arrows.

307 of 472

Exponentiation III

Definition 16.4 (Cartesian closed category) A category C is complete if every diagram in C has a limit; dually, it is co-complete if every diagram in C has a colimit. A finite diagram is one having finite vertexes and edges. A category C is finitely complete (finitely co-complete) if it has a limit (co-limit, respectively) for any finite diagram. Sometimes, finitely complete is referred to as Cartesian. A finitely complete category with exponentiation is said to be Cartesian closed.

308 of 472

Exponentiation IV

Example 16.5 The category Set is Cartesian closed since it is finitely complete and B A = Hom(A, B). It is also complete and co-complete.

Example 16.6 The category Grp is finitely complete, since it has all products and a terminal object, but it does not admit exponentiation, thus is it not Cartesian closed.

309 of 472

Exponentiation V Theorem 16.7 A category C having a terminal object and all pullbacks is finitely complete.

Proof. (i) Considering the pullback A×B π1

π2

A

/B

!

!

/1

we see that C has all binary products. By an easy induction, starting from the terminal object, it follows that C has all finite products. ,→ 310 of 472

Exponentiation VI ,→ Proof. (ii) Let f , g : A → B, then 〈idA , f 〉, 〈idA , g 〉 : A → A × B. Forming their pullback E q

p

/A 〈idA ,g 〉

A 〈id ,f 〉/ A × B A

we see that 〈q , f ◦ q 〉 = 〈idA , f 〉 ◦ q = 〈idA , g 〉 ◦ p = 〈p , g ◦ p 〉, so p = q and f ◦ q = g ◦ p. Moreover, being a pullback, for every i , j : X → A such that 〈idA , g 〉 ◦ j = 〈idA , f 〉 ◦ i, there is a unique h : X → E such that i = q ◦ h and j = p ◦ h. Thus, i = j and E is an equaliser of f , g . 311 of 472

Subobject classifiers I Definition 16.8 (Subobject classifier) In a category C with a terminal object, a subobject classifier is an object Ω with an arrow > : 1 → Ω satisfying the Ω-axiom: for each subobject f : A B, there is a unique χf : B → Ω, the characteristic arrow, such that A / !

f

χf

1

/B

>

/Ω

is a pullback square It is easy to show that the subobject classifier, when it exists, is unique up to isomorphisms, and that the characteristic arrow is monic. 312 of 472

Subobject classifiers II Lemma 16.9 If A and B are isomorphic and f : A C , g : B C are subobjects of C , then χf = χg , and vice versa.

Proof. Consider the diagram

g

6 B? k /A / f ?? ?? ? ! ! ??

1

>

/( C χf

/Ω

If χf = χg , the outer square commutes, so, being the inner square a pullback, there is a unique k : B → A such that g = f ◦ k. But k is iso because the outer square is a pullback, so A ∼ = B. ∼ Vice versa, if A = B and k : B → A is iso, it is easy to see that the outer square is a pullback, thus χf = χg . 313 of 472

Subobject classifiers III Definition 16.10 (Topos) An elementary topos, usually abbreviated in topos, is a category E such that ■

E is finitely complete;

■

E is finitely co-complete;

■

E has exponentiation;

■

E has a subobject classifier.

It can be shown, by means of a very technical proof, that being finitely co-complete is implied by the other conditions, so a topos is a Cartesian closed category with a subobject classifier. 314 of 472

Subobject classifiers IV Example 16.11 Set is a topos since it is Cartesian closed and its subobject classifier is the set Ω = { 0, 1 } with > : 1 → Ω defined by >(x) = 1. In fact, let A ⊆ B, thus i : A → B with i(x) = x: A / !

with χi (x) =

/B χi

1 ½

i

>

/Ω

1 if x ∈ A . 0 otherwise

Notice that FinSet is a topos too, because of the same argument. 315 of 472

Power objects I Definition 16.12 (Power object) A category C with products has power objects if, to each A ∈ Obj C, there are ℘(A), ²A ∈ Obj C and a monic ² : ²A ℘(A) × A such that, for any B ∈ Obj C and monic r : R B × A, there is a unique fr : B ℘(A) for which the following diagram is a pullback R /

r

²A /

316 of 472

/ B ×A

²

〈fr ,idA 〉

/ ℘(A) × A

Power objects II Example 16.13 In Set, ℘(A) =©{ U : U ⊆ A }, ²A is the relation x ∈ U with U ⊆ A, which ª is the set ²A = (U , x) ∈ ℘(A) × A: x ∈ U , and ² is the canonical inclusion. © ª In fact, given a relation R ⊆ B × A, we can define fR (x) = y ∈ A: x r y S since R can be thought to as x ∈A fR−1 (x), or, equivalently, as h : R → ²A with h(x , y ) = (fR (x), y ). Then R / h

/ B ×A

²A /

²

〈fR ,idA 〉

/ ℘(A) × A

commutes and, indeed, it is a pullback square. 317 of 472

Power objects III

Theorem 16.14 A category E is a topos iff E is finitely complete and has power objects. [Proof not required] Power objects are a useful concept, but somewhat difficult to work with, hence we, as most topos-theorists, prefer the definition of a topos as a Cartesian closed category with subobject classifiers.

318 of 472

Slice categories I In order to conclude our set of basic constructions, we need a last tool:

Definition 16.15 (Slice category) Let C be a category and A ∈ Obj C. The slice category C/A on A has as objects the arrows in C with codomain A and as arrows the arrows f of C making the following triangle to commute: B? ?

?? ?? ?

f

g

319 of 472

A

/C h

Slice categories II Example 16.16 Let C be a small discrete category with I = Obj C. Then Set/I is a slice category corresponding to the bundles on I . Let p : A → I be a function in Set, then the fibre of p on i ∈ I is the set Ai = p −1 ({ i }), and the bundle of p on I is the set { Ai : i ∈ I }, i.e., the set A partitioned by the inverse images of p. Thus, the slice category Set/I is the category whose objects are partitioned sets and whose arrows are partition-preserving functions. It can be shown that Set/I is, essentially, a topos. To state precisely this last assertion, and to prove it, we need a different view, which we will undertake from the next lecture. 320 of 472

References and Hints This lesson is taken from Chapter 1.10 of [Pierce2]. The definition of kernel, the proof of Theorem 16.7, the notion of subobject classifier, power object and slice category come from Chapter 3 and 4 of [Goldblatt]. Also, the omitted proof can be find in that textbook. If you want to try the exercises in [Pierce2], it is better to do 1.10.5.1 after 1.10.5.6 and 1.10.5.7: this should give you an hint on how to construct the requires counterexample; you also want to look at the concept of non-distributive lattice.

321 of 472

Fundamentals of Functional Programming Lecture 17 — Intermezzo

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline

The purpose of this lecture is to develop Category Theory as a programming instrument. Specifically, we will define data structures and procedures in a functional language to code categories and their basic constructions. In this way, we will be able to use a mathematical theory, expressed in the categorical framework, as a set of computational constructions which allow to calculate interesting and possibly difficult tasks.

323 of 472

Categories I Our first task is to define categories as elements of a datatype. We adopt an ML-like syntax. datatype (o,a)Cat = cat (a → o) × (a → o) × × (o → a) × (a → a → a); The meaning is as follows: ■

o and a are type variables, representing parametric types, which stand for objects and arrows, respectively:

■

cat is a type constructor with four arguments;

■

the first two arguments are the dom and cod maps, respectively;

■

the third argument, given an object A, calculates idA ;

■

the fourth argument, is just the composition operation.

324 of 472

Categories II As usual, we can define destructors for this datatype: fun fun fun fun

dom (cat X ) = π1 X ; cod (cat X ) = π2 X ; id (cat X ) A = π3 X A; compose (cat X ) f g = π4 X g f ;

We have used the keyword fun to indicate the solution in the first variable of the equation in the λ-calculus behind the functional syntax. So, fixing X , and considering dom a variable, the first definition is equivalent to asking the solution of the equation dom (cat X ) = π1 X 325 of 472

Categories III There is no control over the correctness of an instance of the Cat type. We assume that identities behave as units for composition and that composition is associative. This assumption is reasonable: the proofs of these facts have no computational meaning by themselves, but they ensure that identities and composition behave as they should. We stipulate that we never promote a quadruple of appropriate values to the type Cat unless we have checked in advance that it forms, indeed, a category.

326 of 472

FinSet As an example, we can code FinSet, the category of finite sets as: datatype o SetArrow = setarrow (o Set) × (o → o) × (o Set); let fun setdom (setarrow(x,f ,y )) = x; fun setcod (setarrow(x,f ,y )) = y ; fun setid (A : o Set) = setarrow(A, λx . x, A)); fun setcomp (setarrow(c,g ,d )) (setarrow(a,f ,b)) = if b = c then setarrow(a,λx . g (f (x)),d ) else raise non-composable-pair; in FinSet = cat(setdom, setcod, setid, setcomp); We used the ML keyword raise to generate an exception which simulates a partial operation. This is slightly simpler than constructing an internally coherent mechanism to cope with non composable pairs. 327 of 472

Categories IV We can construct categories from categories as functions over the datatype Cat. For example, the opposite category is defined as: fun opCat (cat(s,t,i,c)) = cat(t,s,i,λg , f . c f g ); Since limits are colimits in the opposite category, we can construct the formers from the latters, or vice versa, as it is simpler to code. In order to code the basic constructions, we expand the convention on categories: each construction is coded as a type and, although we will not necessarily check it, we assume that every instance satisfies the intended properties of the construction. 328 of 472

Initial object I An initial object is composed by two elements: an object I and a (co-)universal function which, given an object A in the category, returns the unique arrow I → A. This description is directly coded as a datatype: datatype (o,a)InitialObj = initialobj o × (o → a); where the first component is the initial object and the second component is the family of arrows departing from the initial object to the index, where the index ranges over all the objects in the category.

329 of 472

Initial object II

Since initial objects are always isomorphic, we can code this fact as a function returning an isomorphism, i.e., a pair of arrows, one the inverse of the other: fun isoinitial(initialobject(A,univA ), initialobject(A0 ,univA0 )) = (univA (A0 ), univA0 (A));

330 of 472

FinSet

In FinSet, the initial object is the empty set, and the arrow f : ; → A, for every A, is the function nowhere defined. So, the initial object is coded as: nilfn = λx . raise nil-function; setinit = initialobj(;, λA. setarrow(;,nilfn,A));

331 of 472

Binary coproducts I Similarly to initial objects, we can code binary coproducts: datatype (o,a)CoproductCocone = coproductcocone (o × a × a) × (o × a × a → a); datatype (o,a)Coproduct = coproduct (o × o → (o , a)CoproductCocone); The meaning of the second declaration is that the coproduct is an operation which takes two objects and calculates the corresponding cocone. The cocone is (a + b, f , g ) along with the universal arrow u depending on (c 0 , f 0 , g 0 ) as in the following diagram: g f / a+b o b CC {{ CC { CC u {{{ f 0 CC! }{{{ g 0

aC

c0

332 of 472

FinSet In FinSet, we define binary coproducts as follows: datatype (a)Tag = l a | r a; fun setcoprod(A,B) = let s = (mapset l A) ∪ (mapset r B); fun u(c,setarrow(_,f ,_),setarrow(_,g ,_)) = let fg(l x) = f x; fg(r x) = g x; in setarrow(s,fg,c); in coproductcocone((s, setarrow(A,l,s), setarrow(B,r,s)), u); setcoproduct = coproduct setcoprod;

333 of 472

Diagrams and colimits I Finite graphs and diagrams may be directly represented as types: datatype Graph = graph (N Set) × (E Set) × (E → N) × (E → N); datatype (o,a)Diagram = diagram Graph × (N → o) × (E → a); A diagram is a graph together with a map from nodes to objects and a map from edges to arrows of a given category. A graph is a set of nodes, a set of edges together with maps for domains and codomains of edges. Since the definition does not limit one edge per pair of nodes, we are really working with multi-graphs. Also, graphs and diagrams are finite, since the primitive type Set stands for finite sets. 334 of 472

Diagrams and colimits II A cocone is represented as datatype (o,a)Cocone = cocone o × (o,a)Diagram × (N → a); Thus a cocone is an object A together with a diagram, its base, and a family of arrows indexed by the nodes N of the diagram, in such a way that each arrow in the family has A as codomain. In keeping with the categorical dictum of defining arrows as well as structures, we define arrows between cocones with the same base: datatype (o,a)CoconeArrow = coconearrow (o,a)Cocone × a × (o,a)Cocone;

335 of 472

Diagrams and colimits III Hence, colimits are represented as types: datatype (o,a)ColimitingCocone = colimcocone (o,a)Cocone × ((o,a)Cocone → (o,a)CoconeArrow); datatype (o,a)Colimit = colimit ((o,a)Diagram → (o,a)ColimitingCocone); whose meaning is fairly evident. A finitely cocomplete category has a colimit for every finite diagram: datatype (o,a)CocompleteCat = cocompletecat (o,a)Cat × (o,a)Colimit;

336 of 472

Calculating colimits I

To calculate the colimit of a diagram in a given category C, we assume the category to have initial objects, binary coproducts, and coequalisers: datatype (o,a)IOCPCECat = iocpcecat (o,a)Cat × (o,a)InitialObj × × (o,a)Coproduct × (o,a)Coequalizer; [Exercise] Define the type (o,a)Coequalizer.

337 of 472

Calculating colimits II The function which calculates the colimit of a finite diagram in a given category is: fun finitecolimit (iocpcecat(C ,init,bcoprod,coeq)) (diagram(graph(N,E ,s,t),fo,fa)) = let cC = iocpcecat(C ,init,bcoprod,coeq); d = diagram(graph(N,E ,s,t),fo,fa); in if E = ; then finitecoproduct (C ,init,bcoprod) d else let { e } ∪ E1 = E ; d1 = diagram(graph(N,E1 ,s,t),fo,fa); in addedge (C ,coeq) ((finitecolimit cC d1 ),e);

338 of 472

Calculating colimits III The logic is as follows: if the diagram has no edges, its colimit is the finite coproduct of its objects; otherwise, we obtain the colimit by a construction (addedge) which adds an edge e to the colimit of the diagram deprived of e. Of course, we have to write the functions finitecoproduct and addedge. Their types are: finitecoproduct: (o,a)Cat × (o,a)InitialObj × (o,a)Coproduct → ((o,a)Diagram → (o,a)ColimitingCocone); addedge: (o,a)Cat × (o,a)Coequalizer → ((o,a)ColimitingCocone × Edge → (o,a)ColimitingCocone);

339 of 472

Calculating colimits IV

When the diagram has no nodes, finitecoproduct reduces to: let initialobj(i,u) = init; icocone = cocone(i,nildiagram,nilfn); in colimcocone(icocone,λc . coconearrow(icocone,u (coapex c), c)); where nildiagram = diagram(graph(;, ;, nilfn, nilfn), nilfn, nilfn); coapex (cocone(a,-,-)) = a;

340 of 472

Calculating colimits V When the diagram D = diagram(graph(N,E ,s,t),fo,fa) is non-empty, finitecoproduct operates as: let { n } ∪ N1 = N; colimcocone(c,uc) = finitecoproduct (C ,init,bcoprod) (diagram(graph(N1 ,E ,s,t),fo,fa)); coproductcocone((b,f ,g ), ucp) = bcoprod(coapex c,fo n); resultcocone = cocone(b,D,λm. if m = n then g else compose(C )(f , sides c n); universal = colimcocone(λc . let u = coapexarrow(u c); v = ucp(coapex c,u,sides c n); in coconearrow(resultcocone,v ,c)) in colimitingarrow(resultcocone,universal); 341 of 472

Calculating colimits VI In the previous code: ■

■

sides returns the function from nodes in the base to the apex in a given cocone; coapexarrow extracts the arrow between the apices of a given cocone arrow.

Apart the complexity of the details in constructing all the pieces of the resulting cocone, its logic is clear: recursively constructing finite coproducts as binary coproducts until there are no more nodes and thus the colimit is the initial object.

342 of 472

Calculating colimits VII The function addedge is defined as fun addedge (C ,coeq) ((c,u),e) = let diagram(graph(N,E ,s,t),fo,fa) = base c; ((b,h),ceu) = coeq(sides c (s e), compose C (sides c (t e,fa e))); resultdiagram = diagram(graph(N,{ e } ∪ E ,s,t),fo,fa); resultcocone = cocone(b,resultdiagram, λn. compose C (h,sides c n)); universal = λc1 . let w = coapexarrow (u c); v = ceu (coapex c1 ,w ); in coconearrow(resultcocone,v ,c1 ); in (resultcocone,universal);

343 of 472

Calculating colimits VIII In the previous code: ■

base returns the diagram which forms the base of a given cocone.

Again, apart the intricacies of the construction of all the involved arrows, the logic is clear. As a matter of fact, the code calculating finite colimits is in one-to-one correspondence with the proof of the lemma stating that a category with initial object, binary coproducts, and coequalisers has all the finite colimits.

344 of 472

References and Hints

This lesson is derived from selected material in Chapters 3 and 4 of [Rydeheard].

345 of 472

Fundamentals of Functional Programming Lecture 18

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline Categories are the subject of Category Theory, which is a mathematical theory. Categories are formal models of mathematical theories. Thus, it makes sense to imagine a “category of categories” having categories as objects. The aim of this lesson is to develop a notion of arrows between categories. Also, in order to study the behaviour of these arrows, we need another construction, which embodies the idea of “canonical” transformations of arrows. These notions are called functors and natural transformations, and they form the real core of Category Theory. 347 of 472

Functors I Definition 18.1 (Functor) Let C and D be categories. A functor F : C → D is formed by ■ ■

a map F : Obj C → Obj D; a family of maps F : HomC (A, B) → HomD (F (A), F (B)), one for each pair of objects A, B ∈ Obj C;

such that ■

F (idA ) = idF (A) for all A ∈ Obj C;

■

F (g ◦ f ) = F (g ) ◦ F (f ) for all f ∈ HomC (A, B), g ∈ HomC (B , C ).

Definition 18.2 (Presheaf) Any functor F : Cop → Set is called a presheaf. 348 of 472

Functors II

Example 18.3 (Forgetful functors) Let U : Mon → Set defined as ■

for 〈M , ∗, e 〉 ∈ ObjMon, U(〈M , ∗, e 〉) = M;

■

for f : 〈M , ∗, e 〉 → 〈M 0 , ∗0 , e 0 〉, U(f ) = f : M → M 0 ;

where the homomorphism f is thought to as a function. These functors, which forget about some structure of the source category, are called forgetful functors.

349 of 472

Functors III

Example 18.4 (Identity functor) For any category C, the identity functor, IdC : C → C is defined as IdC (A) = A for A ∈ Obj C, and IdC (f ) = f for f ∈ HomC (A, B).

350 of 472

Functors IV

Example 18.5 (Product functors) Let C be a category with products. Then, each A ∈ Obj C determines a pair of functors (− × A): C → C, the right product functor, and (A × −): C → C, the left product functor, such that ■

for B ∈ Obj C, (− × A)(B) = B × A and (A × −)(B) = A × B;

■

for f ∈ HomC (B , C ), (− × A)(f ) = 〈f , idA 〉 and (A × −)(f ) = 〈ida , f 〉.

351 of 472

Functors V

Definition 18.6 (Contravariant functor) A contravariant F : C → D is the same thing as the functor F : Cop → D. A contravariant functor is a useful concept when dealing with presheaves, since it operates like a normal (covariant) functor on objects, but it reverses the direction of arrows.

352 of 472

Functors VI Example 18.7 (Hom functors) Let C be a locally small category. Then, each A ∈ Obj C determines a pair of functors Hom(A, −): C → Set and Hom(−, A): Cop → Set ■

for B ∈ Obj C, Hom(A, −)(B) = HomC (A, B) and Hom(−, A)(B) = HomC (B , A) ;

for f ∈ HomC (B , C ), Hom(A, −): HomC (A, B) → HomC (A, C ) and g 7→ f ◦ g Hom(−, A): HomC (C , A) → HomC (B , A) . g 7→ g ◦ f Notice how Hom(−, A) is a contravariant functor. ■

353 of 472

Functors VII Example 18.8 (Subobject functor) Let C be a category with pullbacks. Then C determines a contravariant functor Sub: Cop → Set defined as ■ ■

for A ∈ Obj C, Sub(A) = B ∈ Obj C : B is a subobject ofA ; ©

ª

for f ∈ HomC (A, B), Sub(f ): Sub(B) → Sub(A) which assigns to g : C B the arrow h : D A defined by the following pullback: D /

h

C /

354 of 472

/A

g

f

/B

Functors VIII

Definition 18.9 (Functor composition) Let F : C → D and G : D → E be functors. Then, their composition G ◦ F is a functor defined as ■

for A ∈ Obj C, (G ◦ F )(A) = G (F (A));

■

for f ∈ HomC (A, B), (G ◦ F )(f ) = G (F (f )): G (F (A)) → G (F (B)).

It is immediate to check that functor composition is associative and that the identity functor acts like a unit.

355 of 472

Functors IX

Definition 18.10 (Cat) The category Cat has all small categories as objects, and all the functor between them as arrows. We cannot form a category of locally small, or large categories since it would be too big, leading to an analogous of the Russell’s paradox.

356 of 472

Functors X Definition 18.11 (Full and faithful functors) Given a functor F : C → D, we say ■

■

■ ■

F is faithful if F : HomC (A, B) → HomD (F (A), F (B)) is injective for all A, B ∈ Obj C; F is full if F : HomC (A, B) → HomD (F (A), F (B)) is surjective for all A, B ∈ Obj C; F is surjective if F : Obj C → Obj D is surjective; F is essentially surjective if, for every D ∈ Obj D, there is a C ∈ C such that B ∼ = F (C ).

357 of 472

Functors XI

Example 18.12 (Inclusion functor) A functor F : C → D is an inclusion functor, notation F : C ,→ D, if, for all A ∈ Obj C, F (A) = A, and, for all f ∈ HomC (A, B), F (f ) = f . If F is an inclusion functor, then C is a subcategory of D. Evidently, F is faithful, so that the image of C through F is a copy of C inside D. Also, if F is full, then C is a full subcategory of D, so that the image of C through F is a copy of C inside D with all the arrows not in the image going to or coming from an object not in the image.

358 of 472

Natural transformations I Definition 18.13 (Natural transformation) Let C and D be categories and let F , G : C → D be functors. A natural . transformation α from F to G , notation α : F −→ G , is a family of © ª arrows α : F (A) → G (A) A∈Obj C , indexed by the objects of C such that, for any f ∈ HomC (A, B), the following diagram commutes in D: F (A) F (f )

αA

F (B)

/ G (A)

αB

G (f )

/ G (B)

The collection of all natural transformations from F to G is denoted by Nat(F , G ). 359 of 472

Natural transformations II

Definition 18.14 (Natural isomorphism) .

A natural transformation α : F −→ G is a natural isomorphism if each component αA : F (A) → G (A) is an isomorphism. In this case, we would write F ∼ = G saying that F and G are naturally isomorphic.

360 of 472

Natural transformations III

Definition 18.15 (Equivalence) Two categories C and D are said to be equivalent, notation C ' D, if there are functors F : C → D and G : D → C such that F ◦ G ∼ = IdD and G ◦F ∼ = IdC .

Theorem 18.16 A functor is part of an equivalence of categories iff it is full, faithful and essentially surjective. [Proof not required — It uses the Axiom of Choice]

361 of 472

Natural transformations IV

Example 18.17 (Identity transformation) Let F : C → D be a functor, then ι = idF (A) A∈Obj C is an evident natural transformation, called the identity transformation. ©

362 of 472

ª

Natural transformations V

Definition 18.18 (Vertical composition) Let C and D be categories, and let F , G , H : C → D be functors. If . . σ : F −→ G and τ : G −→ H are natural transformations, then . (τ ◦ σ): F −→ H is the natural transformation defined as (τ ◦ σ)A = τA ◦ σA . It is simple to check that the vertical composition is associative and that the identity transformation is a unit for it.

363 of 472

Natural transformations VI Definition 18.19 (Horizontal composition) Let C, D and E be categories, and let S , T : C → D and S 0 , T 0 : D → E . . be functors. If σ : S −→ T and τ : S 0 −→ T 0 are natural transformations, then the following diagram commutes: (S 0 ◦ S)(A) S 0 (σA )

(S 0 ◦ T )(A)

τS(A)

/ (T 0 ◦ S)(A)

(σ•τ)A τT (A)

'

T 0 (σA )

/ (T 0 ◦ T )(A) .

The horizontal composition (σ • τ): S 0 ◦ S −→ T 0 ◦ T is a natural transformation defined as the diagonal of the above square: (σ • τ)A = T 0 (τA ) ◦ τS(A) = τT (A) ◦ S 0 (σA ). 364 of 472

Natural transformations VII

Again, it is easy to show that the horizontal composition is associative and that the identity transformation is a unit for it. We will not use horizontal composition in this course, so when we speak about composition of natural transformations, we really mean vertical composition.

365 of 472

Natural transformations VIII Example 18.20 (Evaluation) Let C be a category with exponentiation, and let A ∈ Obj C. Then FA : C → C defined as FA (B) = B A × A for each B ∈ Obj C, and FA (f ) = 〈(f ◦ −), idA 〉 for each f ∈ HomC (B , C ), is a functor. . Thus, ev : FA −→ IdC , the evaluation transformation, is a natural transformation, since the following diagram commutes for every g : C → B: FA (C ) = C A × A FA (g )=〈(g ◦−),idA 〉

evC

/ C = IdC (C ) g =IdC (g )

FA (B) = B A × A evB / B = IdC (B)

366 of 472

Natural transformations IX

Definition 18.21 (Functor category) Let C and D be categories, then DC , the functor category, is a category whose objects are the functors C → D, and whose arrows are the corresponding natural transformations, with vertical composition.

367 of 472

Natural transformations X

Theorem 18.22 The category Cat is Cartesian closed.

Proof. The category 1 is a terminal object in Cat; it has binary products, so it has all the finite products, as well; also, it has equalisers, as it is easy to verify. Exponentiation is given by the functor category.

368 of 472

Natural transformations XI

Example 18.23 (Evaluation functor) The functor Ev : DC × C → D, called the evaluation functor, is defined as Ev(G , A) = G (A), with A ∈ Obj C and G : C → D, and . Ev(α, f ) = G (f ) ◦ αA = αB ◦ F (f ), with α : F −→ G , F , G : C → D and f ∈ HomC (A, B). The evaluation functor is the ev arrow of Cat.

369 of 472

References and Hints

This lesson roughly follows [Pierce2] Chapters 2.1 and 2.3. Most examples comes from [Goldblatt], while definitions follows [MacLane]. The omitted proof can be found in [MacLane].

370 of 472

Fundamentals of Functional Programming Lecture 19 — Intermezzo

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline Functors and natural transformations are essential tools in the theoretical analysis of categories. They allow to look at a given category from outside, studying how the category relates with other categories. From a computational point of view, most canonical constructions are naturally coded as functors. Representing functors as maps between categories in some functional language adds a level of abstraction to programming which is useful to model complex situations in a very compact way.

372 of 472

Functors I Functors, consisting as they do of two functions, one on objects, the other on arrows, can be represented quite simply. datatype (oA,aA,oB,aB)Functor = functor (oA,aA)Cat × (oA → oB) × (aA → aB) × (oB,aB)Cat; As before, we adopt the convention that a pair of maps is promoted to an instance of the type Functor if and only if we have actually proved that they form a functor.

Example 19.1 (Identity functor) The identity functor, mapping objects and arrows to themselves, is defined as fun I(C ) = functor(C ,λx . x,λx . x,C ); 373 of 472

Functors II Example 19.2 (Product functor) The functor P : FinSet → FinSet mapping A 7→ A × A and f 7→ 〈f , f 〉, where 〈f , f 〉(x , y ) = (f x , f y ), is defined as let fun cartprod(A,B) = if A = ; then ; else let { a } ∪ A0 = A; in (mapset (λb. (a, b)) B) ∪ cartprod(A0 ,B); fun prodarrow(setarrow(A,f ,B),setarrow(C ,g ,D)) = setarrow(cartprod(A,C ),λ(x , y ). (f x , g y ), cartprod(B,D)); in functor(FinSet,λA. cartprod(A,A),λf . prodarrow(f ,f ),FinSet);

374 of 472

Natural transformations I Natural transformations are easily coded as a datatype: datatype (oA,aA,oB,aB)NatTransform = nattransform (oA,aA,oB,aB)Functor × × (oA → aB) × × (oA,aA,oB,aB)Functor; To simplify the basic definitions on natural transformations, we define a couple of auxiliary functions: fun ofo (functor(C ,fo,fa,D)) A = fo A; fun ofa (functor(C ,fo,fa,D)) g = fa g ; They define application of a functor to an object or an arrow. 375 of 472

Natural transformations II In this way, we can define the identity of natural transformations: fun id(A,cat(-,-,i,-)) F = nattransform (F ,λx . i(ofo F x),F ); Also, we can define vertical composition: fun vcomp (A,cat(-,-,-,c)) (nattransform(-,β,H)) (nattransform(F ,α,-)) = nattransform(F ,λx . c(β x,α x),H);

376 of 472

Natural transformations III

Similarly, we define horizontal composition: fun hcomp (cat(-,-,-,c)) (nattransform(G ,β,G 0 )) (nattransform(F ,α,F 0 )) = nattransform(FunComp F G , λx . c(β (ofo F 0 x),ofa G (α x)), FunComp G 0 F 0 ); where FunComp is functor composition ([Exercise] Define it!).

377 of 472

Natural transformations IV With these instruments, we can construct the functor category which, starting from two given categories A and B, has as objects the functors from A to B, and the corresponding natural transformations as arrows. In code fun functorcategory(A,B) = cat(λ nattransform(s,-,-). s, λ nattransform(-,-,t). t, id(A,B), vcomp(A,B));

378 of 472

A Different Application I As we did with colimits, it is possible to develop functions to code functorial constructions of any sort and their arrows, which will be natural transformations. Instead, we want to use Category Theory to show that a software engineering method that allows to merge experts’ evaluations in a risk analysis is impossible to achieve. More specifically, we want to show how to use the categorical framework, and functors in particular, to show that it is impossible to construct a “most general” common metric to measure risk in a system when two experts use different metrics where some values are identified.

379 of 472

A Different Application II We will not introduce and discuss all the preliminaries of this application, but only the core of the result, where Category Theory plays an essential role. This application has been chosen because it is simple, without being a toy example of the techniques it wants to suggest. The presentation will first introduce some algebraic definitions, which are used to describe what we intend for metric; then, we will state the problem we want to solve; finally, we develop the necessary results to show that the proposed problem is, indeed, unsolvable.

380 of 472

Metrics I A metric is a structure whose values are used to assign a measure to the risk a system or one of its parts are exposed to. Since these values are combined using maxima and minima, it is natural to organise them as an algebraic structure based on orders.

Definition 19.3 (Lattice) A lattice is a partial order 〈O , ≤〉 such that every pair x , y ∈ O has a unique least upper bound (lub), denoted by x ∨ y , and a unique greatest lower bound (glb), denoted by x ∧ y . A lattice is finite is O is a finite set.

381 of 472

Metrics II Definition 19.4 (Complete lattice, bounded lattice) Given a lattice 〈O , ≤〉, let U ⊆ O . Then U and U are, respectively, the lub and the glb of the elements in U, when they exist. A lattice is complete if every subset U ⊆ O has a lub and a glb. A lattice is bounded if there are two distinct elements > and ⊥ in O V W such that ⊥ = O and > = O . W

In a bounded lattice, it is immediate to see that by duality.

Lemma 19.5 A finite lattice is bounded and complete. [Proof not required] 382 of 472

V

W

; = ⊥ and

V

; = >,

Metrics III Definition 19.6 (Metric) A metric is a finite lattice.

Definition 19.7 (Met) The category Met has all the metrics as objects and its arrows are the functions f preserving the order, ⊥ and >, i.e., x ≤ y implies f (x) ≤ f (y ), f (⊥) = ⊥ and f (>) = >. It is easy to see that Met is a category: it is important to notice that it is not the category of finite lattices, since its arrows are not necessarily homomorphisms of lattices because they should not preserve lubs and glbs. 383 of 472

The problem The problem we would like to solve is: given two metrics A and B where some values are identified via e1 : E → A and e2 : E → B, we want to find the most general metric, up to isomorphisms, containing both A and B where the elements e1 (x) and e2 (x) are identified for each x ∈ E . In categorical terms, this amounts to say that we want to find the pushout of the diagram Ao

e1

E

e2

/B

We will prove that such a pushout does not always exist, and we will provide a way to construct it whenever it is possible. 384 of 472

The solution I Lemma 19.8 Met has an initial object and binary coproducts.

Proof. Let 0 = 〈{ ⊥, > }, ≤〉 with ⊥ ≤ >. Then 0 is a metric and it is obviously initial. Also, let A and B be metrics and define C = 〈A t⊥,> B , ≤〉 where A t⊥,> B is the disjoint union of A and B with tops and bottoms identified, and the order is naturally defined as the union of the orders on A and B. Then C is a metric and it is immediate to show that the embedding jA : A C and jB : B C are its injections, forcing C to be the coproduct of A and B.

385 of 472

The solution II Lemma 19.9 In a category having initial objects, binary coproducts and coequalisers, every pushout is the coequaliser of a coproduct.

Proof. Immediate from colimit construction. In particular, if the following is a pushout diagram E e2

e1

B

/A

pB

pA

/P

then P is the coequaliser of the diagram E

jA ◦e1 jB ◦e2

386 of 472

// A + B .

The solution III Lemma 19.10 Let U : Met → Set be the forgetful functor. Then every pushout g

/P o / C in Met yields a B A C of the diagram B o / o U(P) U(C ) of the diagram pushout U(B) / U(C ) in Set U(B) o U(A) f

Proof. Elementary calculation. In a simpler way, this lemma says that the forgetful functor U : Met → Set preserves pushouts. It is extremely useful since, pushouts in Set are easy to calculate: the pushout of U(B) and U(C ) is the object P = U(B) t U(C )/σ where σ is the minimal equivalence relation such that f (x) = g (x) for all x ∈ U(A). 387 of 472

The solution IV Consider the pushout diagram in Met E e2

e1

B

/A

pB

pA

/P

It is generated coequalising A + B in Met, because of Lemma 19.9. Hence, it suffices to show that Met does not have all the coequalisers to prove that Met has not all the pushouts, that is, our initial problem is unsolvable.

388 of 472

The solution V Consider the diagram jA ◦e1

E

jB ◦e2

//

A+B

Suppose that P is the coequaliser of the diagram. By Lemma 19.10, P must be a finite lattice whose universe P =ªA t B/σ where © σ = (ρ ∪ ρ −1 )∗ and ρ = (e1 (x), e2 (x)): x ∈ E . Also, the order of P is fully determined by knowing that P is a coequaliser, in fact, if q : A + B → P preserves the order of A + B, it must be the case that ≤P =≤A+B /σ, as elementary Algebra tells us. Notice how the calculation of P is effective: if it exists, then we are able to uniquely determine it. 389 of 472

The solution VI To construct a counterexample, take as E the lattice α= ==

>= ==

⊥

β

Also, take as A and B two copies of >

c @@@ a= == b d

⊥ 390 of 472

The solution VII Thus, A + B is ll > RRRRRR lll RR lll cB cA D D z DDD { D {{ zz aA A aB B bA bB AA BB | | | ||| dA QQQ d QQQ mmm B QQ mmmmm ⊥

Let eY : E → Y , with Y either A or B, be   > when    a when Y eY (x) =  b Y when    ⊥ when 391 of 472

x => x =α x =β x =⊥ .

The solution VIII So, the pushout is fully determined as the order >? ?? · PPP nn · Pn nnnPPP · PnPP nn · Pn nnnPPP · · ?n? ? ⊥

which is not a lattice. Thus, there is no coequaliser for the diagram jA ◦eA

E

jB ◦eA

//

A+B

and, consequently, no pushout for the diagram Bo 392 of 472

eB

E

eA

/A

References and Hints

The content of the first part of this lesson has been taken from [Rydeheard] Chapters 3 and 5. The content of the second part is taken from M. Benini, S. Sicari, A Mathematical Derivation of a Risk Assessment Procedure, IAENG Journal of Applied Mathematics 40(2), pp. 52-62 (2010). In that paper, there also the omitted proof.

393 of 472

Fundamentals of Functional Programming Lecture 20

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline The notion of adjunction, we will explore in this lesson, is a deep and complex concept. It arises everywhere, and its recognition leads to a deeper understanding of theories, as well as a broad possibility to build canonical constructions. In this second aspect, constructions, adjoints are of particular interest in Computer Science and, specifically, in functional programming, since they allow for the solution of difficult problems whose “inverse problems” are easy to describe. With adjunctions, we will finish our development of Category Theory, except for a few specific topics strictly related to the categorical interpretation of λ-calculi. 395 of 472

Adjunctions I Definition 20.1 (Adjunction) An adjunction consists of ■

a pair of categories C and D;

■

a pair of functors F : C → D and G : D → C;

■

a natural isomorphism between the functors HomD (F (−), −)), HomC (−, G (−)): Cop × D → Set, i.e., a family of bijections HomD (F (A), B) ∼ = HomC (A, G (B)) natural in A ∈ Obj C and B ∈ Obj D. The functor F is said to be left adjoint to G , while G is said to be right adjoint to F , and we write F a G .

396 of 472

Adjunctions II The definition of adjunction is somewhat cryptic. The situation it models is F / F (A) A f

g

G (B) o

G

B

that is, whenever we make a move f : A → G (B) in C, we can replicate the move in D, choosing a suitable (and unique) g : F (A) → B, and vice versa.

397 of 472

Adjunctions III The situation is formalised by naturality of θA,B : HomD (F (A), B) ∼ = HomC (A, G (B)) .

The naturality in A ∈ Obj C means that, fixed B ∈ Obj D, the following diagram commutes for every f ∈ HomC (A0 , A): HomD (F (A), B) −◦F (f )

θ A, B

HomD (F (A0 ), B)

/ HomC (A, G (B))

θ A0 , B

−◦f

/ HomC (A0 , G (B))

In informal terms, in C, whenever f : A0 → A and we make a move A → G (B), then we can also make a move A0 → G (B), and, in both cases, the moves can be replicated in D.

398 of 472

Adjunctions IV The naturality in B ∈ Obj C is symmetric. Their combination produces the naturality in both A and B. It is simpler to dismount θA,B into two natural transformations.

Definition 20.2 (Unit and co-unit) Given an adjunction 〈F , G , θ〉, its unit is η A = θA,F (A) (idF (A) ): A → F (G (A)) ,

while its co-unit is −1 ²B = θG (B),B (idG (B) ): F (G (B)) → B .

The unit and the co-unit of an adjunction are strictly linked to each other: their constructions from θA,B are symmetric, so we can focus on unit, leaving the properties of co-unit to be derived by symmetry.

399 of 472

Adjunctions V Lemma 20.3 Given an adjunction 〈F , G , θ〉, for each f ∈ HomD (F (A), B), it holds that θA,B (f ) = G (f ) ◦ η A .

Proof. (i) For each f ∈ HomD (F (A), B), the following diagram commutes, by naturality of θ : HomD (F (A), F (A)) f ◦−

θA,F (A)

HomD (F (A), B)

/ HomC (A, G (F (A)))

θ A, B

G (f )◦−

/ HomC (A, G (B))

,→ 400 of 472

Adjunctions VI ,→ Proof. (ii) Specialising it for idF (A) ∈ HomD (F (A), F (A)), we obtain θA,F (A)

idF (A) f ◦−

401 of 472

f

/ ηA

θA,B

G (f )◦−

/ θA,B (f ) = G (f ) ◦ η A

Adjunctions VII Lemma 20.4 Given an adjunction 〈F , G , θ〉, for each g ∈ HomC (A, G (B)), there is a unique f ∈ HomD (F (A), B) such that g = G (f ) ◦ η A .

Proof. Since θA,B is a bijection, to g must correspond a unique arrow f ∈ HomD (F (A), B) such that g = θA,B (f ). So g = G (F ) ◦ η A by the previous lemma. For the co-unit ²B the same results are phrased as: ■ ■

for each g ∈ HomC (A, G (B)), θA−,1B (g ) = ²B ◦ F (g ); for each f ∈ HomD (F (A), B), there is a unique g ∈ HomC (A, G (B)), such that f = ²B ◦ F (g ).

402 of 472

Adjunctions VIII Theorem 20.5 Let F : C → D and G : D → C be functors. If there are natural . . transformations η : IdC −→ G ◦ F and ² : F ◦ G −→ IdD such that, for every f ∈ HomD (F (A), B), there is a unique g ∈ HomC (A, G (B)) such that f = ²B ◦ F (g ), and for each g ∈ HomC (A, G (B)), there is a unique f ∈ HomD (F (A), B) such that g = G (f ) ◦ η A , then 〈F , G , θ〉, with θA,B (f ) = G (f ) ◦ η A , is an adjunction.

Proof. Let τA,B (g ) = ²B ◦ F (g ) for every g ∈ HomC (A, G (B)). Then, it follows that τA,B and θA,B are inverses to each other, so θA,B is an isomorphism. Also, it is immediate to show that it is natural in both A and B. 403 of 472

Adjunctions IX The interest in adjunctions lies in their properties. For our purposes, the main results are:

Theorem 20.6 Let C and D be categories. Then C ' D via the functor F : C → D iff F is part of an adjunction 〈G , F , θ〉 with G a F such that its unit η and co-unit ² are natural isomorphisms: η : IdD ∼ = G ◦ F and ² : F ◦ G ∼ = IdC . [Proof not required]

Theorem 20.7 If F a G then G preserves limits, while F preserves colimits. [Proof not required]

404 of 472

Adjunctions X Example 20.8 (Initial and terminal objects) Let C be a category and let !: C → 1 the only possible functor to the category 1, having a single object, ∗, and a single morphism, id∗ . If F a! for some functor F then C has an initial object. In fact, consider an arrow g ∈ Hom1 (∗, !(A)): necessarily g = id∗ , and, by adjunction, there is a unique arrow f : F (∗) → A in C, for each A ∈ Obj C. Thus, F (∗) is an initial object of C. The unit of the adjunction is η A = id∗ and the co-unit ²B is the unique arrow !: 0 → A, with 0 = F (∗). For similar reasons, if ! a G , then C has a terminal object.

405 of 472

Adjunctions XI It can be shown that the limit and the colimit of any type of diagram in a category C arise, when they exist, from right and left adjoints of a diagonal functor C → CJ , where J is a canonical category having the “shape” of the diagram. The unit for the left adjoint is the universal co-cone, while the co-unit of the right adjoint is the universal cone.

Example 20.9 (Product) Consider the discrete category J with two objects, and define ∆ : C → CJ as ∆(A) = (A, A) and ∆(f : A → B) = 〈f , f 〉. Note that the definition of ∆ uses the fact that CJ ∼ = C × C. Then C has binary products iff ∆ has a right adjoint; moreover, C has binary co-products iff ∆ has a left adjoint. 406 of 472

Adjunctions XII When there is an equivalence or an isomorphism between categories or collection of arrows, it is often the case that there is an adjunction.

Example 20.10 (Exponentiation) If the category C has exponentiation, then HomC (C × A, B) ∼ = HomC (C , B A ) . Thus, the right product functor (− × A) has a right adjoint, the functor (−A ). The converse also holds: if the right product functor has a right adjoint F , then the category has exponentiation and F (B) is the exponential object, for each B ∈ Obj C. Note that the co-unit ²B is precisely the evaluation arrow. 407 of 472

Adjunctions XIII The forgetful functors offer a way to generate interesting objects: in fact, usually the have a left adjoint, which produces free objects.

Example 20.11 Consider the forgetful functor U : Mon → Set. Its left adjoint F exists and it is the free monoid F (A) generated by the set A of elements. Note that we should always check that the forgetful functor admits a left adjoint: for example, the category of fields has an obvious forgetful functor to Set, but it does not have a left adjoint. In fact, there is no such a thing as a “free field” in Algebra.

408 of 472

Yoneda Lemma I When we consider functors from a locally small category to Set, there is a nice result that allows to characterise their natural transformations. It is also very useful, as we will see.

Theorem 20.12 (Yoneda lemma) Let F : C → Set be a functor from a locally small category C, and let A ∈ Obj C. Then, there is a bijective correspondence θF ,A : Nat(Hom(A, −), F ) ∼ = F (A) .

Proof. (i) .

For a given natural transformation α : Hom(A, −) −→ F , we define θF ,A = αA (idA ). Moreover, given a ∈ F (A), for each B ∈ Obj C, τ(a)B : HomC (A, B) → F (B) is defined as τ(a)B (f ) = F (f )(a), with f ∈ HomC (A, B). ,→ 409 of 472

Yoneda Lemma II ,→ Proof. (ii) This class of mappings defines a natural transformation . τ(a): Hom(A, −) −→ F since, for every g ∈ HomC (B , C ) and f ∈ HomC (A, B), (F (g ) ◦ τ(a)B )(f ) = F (g )(τ(a)B (f )) = = F (g )(F (f )(a)) = (F (g ) ◦ F (f ))(a) = F (g ◦ f )(a) = τ(a)C (g ◦ f ) = = τ(a)C (Hom(A, g )(f )) = (τ(a)C ◦ Hom(A, g ))(f ). In a diagram: HomC (A, B) Hom(A,g )=g ◦−

τ(a)B

HomC (A, C )

/ F (B)

τ(a)C

F (g )

/ F (C )

,→ 410 of 472

Yoneda Lemma III ,→ Proof. (iii) But, θF ,A and τ are inverses to each other. In fact, letting a ∈ F (A), (θF ,A ◦ τ)(a) = θF ,A (τ(a)) = τ(a)A (idA ) = F (idA )(a) = idF (A) (a) = a. . Also, if α : Hom(A, −) −→ F and f ∈ HomC (A, B), (τ ◦ θF ,A )(αB (f )) = = τ(θF ,A (α))B (f ) = τ(αA (idA ))B (f ) = F (f )(αA (idA )) = = αB (Hom(A, f )(idA )) = αB (f ◦ idA ) = αB (f ).

Corollary 20.13 The bijections θF ,A of the Yoneda lemma are natural in A. Moreover, if C is small, they are also natural in F . [Proof not required] 411 of 472

Yoneda Lemma IV Definition 20.14 (Yoneda functors) The functor Y : Cop → SetC , defined as ■ ■

for each A, B ∈ Obj C, (Y(A))(B) = HomC (A, B); for each f ∈ HomC (A, B), . Y(f ) = Hom(f , −): HomC (B , −) −→ HomC (A, −);

is called the (contravariant) Yoneda functor. Its dual is the (covariant) Yoneda functor: it sends each . f ∈ HomC (A, B) to Y(f ) = Hom(−, f ): HomC (−, A) −→ HomC (−, B). The Y functor goes from C to the category of presheaves on C: op Y : C → SetC .

412 of 472

Yoneda Lemma V Lemma 20.15 The Yoneda functors are full and faithful.

Proof. Direct consequence of the Yoneda lemma. The importance of the Yoneda functors lies in the fact that, given any small category C, we can “complete” it. In fact, the image of C through Y is an isomorphic copy of C, with no extra arrows, being Y full and faithful. op But the category SetC is a topos, so it has all the finite limits and co-limits, as well as exponentials and a subobject classifier. So, we can op think to C as a full subcategory of SetC , and we can work in the larger category; in this way, we are “adding” to C the finite categorical constructions it may lack. 413 of 472

References and Hints This lesson comes from [Pierce2], Chapter 2.4. The definition of adjunction is slightly different, although equivalent. It comes from [Goldblatt]. The proofs of most theorems are expanded versions of the ones in [MacLane]. The Yoneda Lemma is taken from [MacLane], while its proof can be found in F. Borceux, Handbook of Categorical Algebra I: Basic Category Theory, Cambridge University Press (1994). ISBN: 978-0521061193 All the omitted proofs can be found either in [MacLane] or in the Borceux’s book. 414 of 472

Fundamentals of Functional Programming Lecture 21 — Intermezzo

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline Adjunctions are extremely useful in the contemporary mathematical practise. In Theoretical Computer Science, because of its proximity with Mathematics, many applications have been developed. In this lesson, we want to introduce a particular instance of adjunctions: Galois connections. Essentially, a Galois connection is an adjunction in the category of posets. It turns out that Galois connections provide a simple yet elegant and powerful way to give a formal semantics to imperative languages, which has been fruitfully used to verify and synthesise programs from formal specifications.

416 of 472

Relational semantics I Procedural (imperative) computer programs can be seen as operating on a finite or infinite state space X whose elements are vectors of which each component is of an appropriate datatype, these components being the values which the program’s variables can take. Fixed a program P, it will be helpful to distinguish between a set G ⊆ X of initial states and a set M ⊆ X of final states. Then, a non-deterministic program P is modelled by a relation R ⊆ G × M: if R(x , y ) holds, then the program reaches the final state y when starting from the initial state x. A deterministic program is just a non-deterministic program represented by a function.

417 of 472

Relational semantics II Definition 21.1 (Predicate) Given a set X , a predicate (over X ) is a statement taking a value true, denoted by >, or a value false, denoted by ⊥. Equivalently, a predicate p is a function from X to { >, ⊥ }. We write P(X ) for the set of predicates on X and order it by implication: for p , q ∈ P(X ), p→q

iff

©

ª

©

ª

x ∈ X : p(x) = > ⊆ x ∈ X : q(x) = >

.

So, 〈P(x), →〉 is a partial ordered set. Define a map φ : P(X ) → ℘(X ) by φ(x) = x ∈ X : p(x) = > . Then φ is an order isomorphism between 〈P(X ), →〉 and 〈℘(X ), ⊆〉, the collection of subsets of X ordered by inclusion. ©

418 of 472

ª

Relational semantics III

We think of a subset Y ⊆ M, the set of final states of a program P, as specifying a property of final states true precisely on Y . These properties are called postconditions. Similarly, a predicate on initial states is called a precondition. Note how we use the previously defined φ isomorphism to move from subsets to predicates and vice versa. Since φ is an isomorphism, we will never make its use explicit in the future.

419 of 472

Weakest precondition I We may stipulate what a program should do by giving conditions on the state of the system before and after program’s execution. Now, fix G and M, initial and final states, and consider a relation R ∈ ℘(G × M) modelling a program P, possibly non-deterministic but always terminating. For a given postcondition Y , the weakest precondition wpR (Y ) is the set of input states x such that P is guaranteed to terminate in a state in Y when it is started from x. We may regard wpR as a map from P(M) to P(G ).

420 of 472

Weakest precondition II The map wpR preservesª implication: in fact, if Yª ⊆ Z then wpR (Y ) = © x ∈ X : ∃y ∈ Y . R(x , y ) ⊆ x ∈ X : ∃z ∈ Z . R(x , y ) = wpR (Z ).

©

Any map f : P(M) → P(G ), where G , M ⊆ X , preserving implication is called a predicate transformer. Given a predicate transformer τ : P(M) → P(G ), we may associate a relation Rτ to it by Rτ (x , y ) ≡ ∀Y ∈ ℘(M). x ∈ τ(Y ) → y ∈ Y , for any x ∈ G and y ∈ M, It is immediate to verify ([Exercise] do it!) that, for R ∈ ℘(G × M) and τ : P(M) → P(G ), R ⊆ Rτ iff τ → wpR . 421 of 472

Weakest precondition III Taking these ideas one step further, the program itself may be modelled by a predicate transformer, namely its weakest precondition. Thus, we can move backward and forward between the relational and the predicate transformer semantics by means of the connection R ⊆ Rτ iff τ → wpR . We are not interested in developing any further the relational semantics of programs, or the predicate transformers semantics in this course. But, we wish to show that the above connection is, indeed, an adjunction.

422 of 472

Galois connections I

Definition 21.2 (Galois connection) Let P and Q be ordered sets. A pair (B, C) of maps B: P → Q and C: Q → P, is a Galois connection if, for all p ∈ P and q ∈ Q Bp ≤ q

iff p ≤ Cq .

The map B is called the lower adjoint of C, and the map C is called the upper adjoint of B.

423 of 472

Galois connections II Any partially ordered set P can be formalised as a category P whose objects are the elements of P and whose arrows p → q mean that p ≤ q. Let P and Q be two posets and let f : P → Q be order preserving, that is, if a ≤P b then f (a) ≤Q f (b). In the categorical setting, f induces a functor f from P to Q . In fact, it maps objects of P into objects of Q , and arrows in HomP (a, b) into arrows in HomQ (f (a), f (b)). Moreover, it trivially preserves composition of arrows and identities.

424 of 472

Galois connections III

Hence, a Galois connection (B, C) between P and Q is a pair of functors B: P → Q , C: Q → P such that HomQ (Bp , q) ∼ = HomP (p , Cq) , for all p ∈ P and q ∈ Q. It is simple to verify that this bijection is natural in both p and q. So, by definition, a Galois connection (B, C) is an adjunction B a C between the categories P and Q .

425 of 472

Galois connections IV Lemma 21.3 Assume (B, C) is a Galois connection between ordered sets P and Q. Let p , p1 , p2 ∈ P and q , q1 , q2 ∈ Q. Then 1. p ≤ C ◦ Bp and B ◦ Cq ≤ q; 2. p1 ≤ p2 implies Bp1 ≤ Bp2 and q1 ≤ q2 implies Cq1 ≤ Cq2 ; 3. Bp = B ◦ C ◦ Bp and Cq = C ◦ B ◦ Cq. Conversely, a pair of maps B: P → Q and C: Q → P satisfying (1) and (2) for all p , p1 , p2 ∈ P and for all q , q1 , q2 ∈ Q sets up a Galois connection between P and Q.

Proof. (i) For p ∈ P, we have Bp ≤ Bp by reflexivity so, being (B, C) a Galois connection, p ≤ C ◦ Bp. Dually for B ◦ Cq ≤ q. This establishes (1). ,→ 426 of 472

Galois connections V ,→ Proof. (ii) For (2), consider p1 ≤ p2 , then p1 ≤ C ◦ Bp2 by (1) and transitivity, which is equivalent to Bp1 ≤ Bp2 , being (B, C) a Galois connection. Dually for q1 ≤ q2 . For (3), from (1), p ≤ C ◦ Bp, we obtain Bp ≤ B ◦ C ◦ Bp by (2). But (B, C) is a Galois connection, so C ◦ Bp ≤ C ◦ Bp, which holds by reflexivity, implies B ◦ C ◦ Bp ≤ Bp. Thus, Bp = B ◦ C ◦ Bp by symmetry. Dually for q. Lastly, assume (1) and (2) hold universally. Let Bp ≤ q. By (2), C ◦ Bp ≤ Cq, but p ≤ C ◦ Bp by (1), so p ≤ Cq by transitivity. The reverse implication follows in the same way. 427 of 472

Galois connections VI Definition 21.4 Let φ : P → Q be a map between ordered sets. We say that φ W preserves existing joins if whenever S exists in P for some S ⊆ P, W W W then φ(S) exists in Q and φ( S) = φ(S). Preservation of existing meets is defined dually.

Lemma 21.5 Let (B, C) be a Galois connection between P and Q. Then B preserves existing joins and C preserves existing meets.

Proof. Since a join is a categorical colimit, namely the coproduct of the objects in S, and a meet is a limit, namely the product of the objects in S, then the left adjoint C preserves limits and the right adjoint B preserves colimits when they exist, as for Theorem 20.7. 428 of 472

Refinement I

The development of a computer program often starts from a specification of the task the program is to perform. In practise, the transformation from specifications to executable programs will be carried out in stages. The objective is to do this by applying fixed rules which guarantee correctness at each step. This process is known as stepwise refinement.

429 of 472

Refinement II

A suitable setting in which to work is provided by a specification space S = 〈S , v〉 of commands, relative to some fixed state space X . This space consists of a chosen imperative programming language augmented by specification, that is, descriptions of computations not cast in an executable form. It is worthwhile to include in S commands such as magic, which miraculously meets every specification, that are far removed from code for feasible specifications: magic provides a top for the order S.

430 of 472

Refinement III The admittance of commands for arbitrary non-deterministic choice corresponds to the existence of arbitrary meets in S. So, S becomes a complete lattice. The presence of such a strong mathematical structure in the specification space means that the full power of the theory of Galois connections is available: in particular, maps from S to S preserving arbitrary meets (joins) possess lower (upper) adjoints. The existence of such adjoints guarantees the existence of commands that may assist in program development. Furthermore, the calculational rules obeyed by Galois connections will supply laws governing these commands. 431 of 472

Refinement IV

The v relation denotes the idea of “is refined by”. It is clearly reflexive and transitive, and by a suitable quotient of the specification space, it becomes also anti-symmetric, when we identify specifications with the same end result. The instances in which refinement is most productive are those in which one ordered structure 〈A, ≤〉 is refined by another 〈C , ≤〉 and where there exists a Galois connection (B, C) between A (abstract) and C (concrete).

432 of 472

Refinement V

We think of the orders on A and C as having the interpretation “is less informative than”, so that x ≤ y means that y serves every purpose that x does. Assume that a programming command is described by c in C and suppose we wish to show that c implements some specification a in the more abstract level A. We can either prove that Ba ≤ c in the concrete model, or, equivalently, prove that a ≤ Cc in the abstract model.

433 of 472

Refinement VI

We have that B ◦ Cc ≤ c for any c ∈ C : this expresses the fact that, in general, abstraction results in a loss of information. In the context of relational and predicate transformer models for imperative programs, such Galois maps are provided by the map taking a relation to the associated weakest precondition transformer and its upper adjoint. This Galois connection can be extended from models of programs to models of specifications to yield a powerful refinement calculus.

434 of 472

References and Hints This lesson is taken from Chapter 7 of B.A. Davey, H.A. Priestly Introduction to Lattices and Order, 2nd edition, Cambridge University Press (2002). ISBN: 0521784514. This very readable text covers the essential aspects of the algebra of orders and lattices in a clear and understandable style with many examples devoted to computer scientists. To whom is interested, the idea of using the weakest precondition to model programs and specifications dates back to E.W. Dijkstra, and is explained in his book A discipline of programming, Prentice Hall (1976). ISBN: 013215871X.

435 of 472

Fundamentals of Functional Programming Lecture 22

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline The purpose of this lesson is to show a correspondence between Cartesian closed categories and a typed λ-calculus. In this way, we can associate to each Cartesian closed category a λ-theory and, vice versa, we can interpret any λ-theory in a suitable Cartesian closed category. Thus, Cartesian closed categories are really the models of our λ-calculus, and the interpretation is both natural and complete. Similar results can be obtained for all the typed λ-calculi we considered, but the technical aspects are more involved, so we omit them in the present course.

437 of 472

Typed λ-calculus I First, we define the typed λ-calculus we want to model.

Definition 22.1 (Types) Given a set K of type constant, types are defined as ■

each k ∈ K is a type;

■

1 is a type, informally denoting the empty product;

■

if A and B are types, then so is A × B;

■

if A and B are types, then so is A → B.

So, our λ-calculus is the simple theory of types extended with explicit product types. 438 of 472

Typed λ-calculus II Definition 22.2 (Terms) Given a class of types, a set V of typed variables such that there is a denumerable quantity of them for each type, and a set F of typed constants, terms are defined as ■ ■

■ ■

■

if x : A ∈ V then x : A and FV(x) = { x }; if f : A → B ∈ F and t : A is a term, then f (t) : B and FV(f (t)) = FV(t); ∗ : 1 is a term and FV(∗) = ;;

if s : A and t : B are terms, then 〈s , t 〉 : A × B is a term and FV(〈s , t 〉) = FV(s) ∪ FV(t); if t : A × B is a term then fst(t) : A and snd(t) : B are terms and FV(fst(t)) = FV(snd(t)) = FV(t); ,→

439 of 472

Typed λ-calculus III ,→ (Terms) ■

■

if t : B is a term and x : A ∈ V then (λx : A. t) : A → B is a term and FV(λx : A. t) = FV(t) \ { x }; if s : A → B and t : A are terms, then (s t) : B is a term and FV(s t) = FV(s) ∪ FV(t).

There is a subtlety in the definition: for formal purposes, which will be clear later, it is useful to distinguish between application of a function symbol to a term, and application of a term to a term.

440 of 472

Typed λ-calculus IV

Definition 22.3 (Equality) The only formulae we allow in the formation of a λ-theory, which is a set of formulae, are equalities in a context: X . t =A s, where t : A and s : A are terms, and the context X ⊆ V is a finite set of variables such that FV(t) ∪ FV(s) ⊆ X . Equalities are the axioms of a λ-theory. Often, we do not write contexts explicitly: in those cases, it is assumed to be the canonical one, that is, FV(t) ∪ FV(s).

441 of 472

Typed λ-calculus V Definition 22.4 (Calculus) The rules of inference of the λ-calculus are: ■

X .s = t Y . s[r /x] = t[r /x]

(subst1) ;

■

X .x = x

(refl) ;

■

X .x = y X .y = x

(sym) ;

■

X .x = y

X .y = z

X .x = z 442 of 472

(trans) ;

,→

Typed λ-calculus VI ,→ (Calculus) ■

X .s = t Y . r [s/x] = r [t/x]

(subst2) ;

■

X . x =1 ∗

(unit) ;

■

X . fst(〈x , y 〉) = x

(fst) ;

■

X . snd(〈x , y 〉) = y

(snd) ;

■

X . 〈fst(z), snd(z)〉 = z

(pair) ;

,→ 443 of 472

Typed λ-calculus VI ,→ (Calculus) ■

X . (λy : A. s) t = s[t/y ]

(β) ,

where y 6∈ FV(t); ■

X . λy : A. t y = t

(η) ,

where y 6∈ FV(t); ■

X , y : A. s =B t X . (λy : A. s) = (λy : A. t)

444 of 472

(λ) .

Interpretation I Definition 22.5 (λ-structure) Let C be a Cartesian closed category. A λ-structure M in C is defined by ■

■

a function M : K → Obj C mapping the type constants to objects of C; a function M from F , the set of function symbols, to the arrows of C such that M(f : A → B): M A → M B.

The function M : K → Obj C is extended to arbitrary types as ■

M 1 = 1C , the terminal object of C;

■

M(A × B) = M A ×C M B, the product of C;

■

M(A → B) = (M B)M A , the exponential object of C.

We omit subscripts, for clarity. 445 of 472

Interpretation II Definition 22.6 (Interpretation) If M is a λ-structure in a Cartesian closed category C, we assign to each term in a context X . t : B an interpretation [[X . t]]M : M A1 × · · · × M An → M B where X = { x1 : A1 , . . . , xn : An }, and M A abbreviates M A1 × · · · × M An in the following way: ■

if t ≡ xi then [[X . t]] = πi , the i-th projection;

■

if t ≡ f (t 0 ) then [[X . t]] = M f ◦ [[X . t 0 ]];

■

if t ≡ ∗ then [[X . t]] is the unique morphism M A → M B;

■

if t ≡ 〈t 0 , t 00 〉 and B ≡ B1 × B2 then [[X . t]] = 〈[[X . t 0 ]], [[X . t 00 ]]〉 : M A → M B1 × M B2 ;

446 of 472

,→

Interpretation III ,→ (Interpretation) ■

if t ≡ fst(t 0 ) then [[X . t]] = π1 ◦ [[X . t 0 ]];

■

if t ≡ snd(t 0 ) then [[X . t]] = π2 ◦ [[X . t 0 ]];

■

if t ≡ λx : C . t 0 , where we assume z : C 6∈ X , and B ≡ C → D then [[X . t]] = h with h : M A → (M B)M C defined as the exponential transpose in the following evaluation diagram: (M B)M C × M C O

NNN NNev NNN NNN & /D ; M A×M C 0

〈h,idM C 〉

[[X ,z:A.t ]]

,→ 447 of 472

Interpretation IV

,→ (Interpretation) ■

if t ≡ t 0 t 00 then [[X . t]] = ev ◦〈[[X . t 0 ]], [[X . t 00 ]]〉.

Moreover, [[X . t =A t 0 ]] is the equaliser [[X . t =A t 0 ]] /

448 of 472

/ MA

[[X . t]]

//

[[X . t 0 ]]

MB .

Interpretation V

Definition 22.7 (Validity) An equality X . s =A t is valid in a λ-structure M in a Cartesian closed category C iff [[X . s =A t]] = idM A1 ×···×M An with X = { x1 :A1 , . . . , xn :An }.

Definition 22.8 (Model) A λ-structure M in a Cartesian closed category C is a model for a λ-theory T iff each axiom X . t =A s in T is valid in M.

449 of 472

Soundness Theorem 22.9 (Soundness) If X . s =A t is derivable in a λ-theory T , then it is valid in all models for T in every Cartesian closed category.

Proof. We need to check that the rules of the λ-calculus preserve validity: ■ ■

■

■

the axioms (refl), (sym), (unit) and the rule (trans) are obvious; rules (subst1) and (subst2) are proved to preserve validity by a trivial induction which shows that [[X : t[s/y ]] = [[Y . t]] ◦ [[X . s]]; rules (fst), (snd) and (pair) are straightforward from the definition of interpretation and the properties of product; rules (β) and (η) are straightforward from the definition of interpretation and the properties of evaluation.

[Exercise] Fill the details. 450 of 472

Completeness I Definition 22.10 (Syntactic category) Let T be a λ-theory. We define a category CT as follows: ■ ■

■ ■

the objects of CT are the types of the language of T ; the arrows of CT are equivalence classes [x : A. t] of terms in contexts where [x . s] = [x . t] iff x . t = s is provable in T , or [x . t] = [y .t[y /x]]. The substitution and equality rules ensure that this definition does not depend on the choice of t; the identity morphism is [x . x]: A → A; composition is given by substitution: given [x . t]: A → B and [y . s]: B → C , [y . s] ◦ [x . t] = [z .s[t/y ]].

Note that we do not need contexts with more that one variable, having product types. 451 of 472

Completeness II Lemma 22.11 The category CT is Cartesian closed.

Proof. ■

■

■

the terminal object is the type 1 and, for each object A, the unique arrow A → 1 is [x : A. ∗]; the product of A and B is the type A × B, with projections [z . fst(z)] and [z . snd(z)], and the morphism C → A × B induced by [w . s]: C → A and [w . t]: C → B is [w . 〈s , t 〉]; the exponential B A is the type A → B, with evaluation map (A → B) × A → B defined as [w . fst(w ) snd(w )], and, given any [z . t]: C × A → B, its exponential transpose C → (A → B) is [w . λx : A. t[〈w , x 〉/z]].

452 of 472

Completeness III Theorem 22.12 (Completeness) The Cartesian closed category CT contains a λ-structure M which validates exactly the equalities derivable form the λ-theory T . Moreover, for any Cartesian closed category D, there is a bijection between natural isomorphisms classes of Cartesian closed functors CT → D and isomorphisms classes of T -models in D.

Proof. (i) The structure MT sends types to themselves and each primitive function symbol f : A → B to [x : A. f (x)]. By an easy induction we get that [[x . t]]MT = [x .t]. Hence, the equalities in a context valid in MT are exactly those provable in T . ,→ 453 of 472

Completeness IV

,→ Proof. (ii) Given a model N in D, the corresponding functor FN : CT → D sends A to N A for each type A, and [x . t] to [[x . t]]N . It is clear that FN is a Cartesian closed functor and that FN (MT ) = N. In the opposite direction, since any Cartesian closed functor F : CT → D must preserve interpretations of arbitrary terms in a context, it is easily seen to be naturally isomorphic to FN where N = F (MT ).

454 of 472

References and Hints This lecture has been taken from [Pierce2] Chapter 3.1. Theorems and proofs follow the treatment as given in Chapter D4.2 of P. Johnstone, Sketches of an Elephant: A Topos Theory Compendium volume 2, Oxford University Press (2002). ISBN: 978-0198515982 Johnstone’s text is the last reference work on Topos Theory and it contains almost everything which is known on this fascinating topic. It is also a very difficult book, although very well written, in a crystal-clear style, requiring a mature mathematical attitude before approaching.

455 of 472

Fundamentals of Functional Programming Lecture 23

Prof. M. Benini [email protected] http://www.dicom.uninsubria.it/~mbenini

Laurea Magistrale in Informatica Facoltà di Scienze MM.FF.NN. di Varese Università degli Studi dell’Insubria

a.a. 2010/11

Outline This lesson wants to introduce a class of models for the λ-calculus in its pure version. As for typed λ-calculus, the models we will consider are of a categorical nature, and we will prove that they are a sound and complete semantics for the λ-calculus. With respect to the typed λ-calculus, in the pure version we will follow an algebraic approach instead of using a purely categorical construction. This way is less abstract, but slightly more difficult, at least from the technical point of view.

457 of 472

C-monoids I The categorical structures that we will use as models, are called C-monoids. Intuitively, they are Cartesian closed categories deprived of the terminal object.

Definition 23.1 (C-monoid) A C-monoid is an algebraic structure 〈M , ·, 1, π1 , π2 , ², ∗, 〈〉〉 where 〈M , ·, 1〉 is a monoid, π1 , π2 , ² ∈ M, (−)∗ is a unary operation on M, and 〈−, −〉 is a binary operation on M. These operations are required to satisfy the following axioms for all a, b, c, h and k in M: ■

π1 · 〈a, b 〉 = a and π2 · 〈a, b 〉 = b;

■

〈π1 · c , π2 · c 〉 = c;

■

² · 〈h∗ · π1 , π2 〉 = h;

■

(² · 〈k · π1 , π2 〉)∗ = k.

458 of 472

C-monoids II The intuition behind the definition is that 〈−, −〉, π1 and π2 define the product with its projections, while ² stands for a pairing operation, precisely ² = λz . 〈fst z , snd z 〉, and (−)∗ stands for functional application, precisely h∗ = λx , y . h 〈x , y 〉. The relation with Cartesian closed categories can be recovered by thinking to C-monoids as monoids, i.e., categories with a single object, having some constraints on arrows: it is possible to construct a Cartesian closed category by using the C-monoid arrows as objects and defining an appropriate notion of arrows. We will not explore this connection, as it does not shade light to the semantics of λ-calculus.

459 of 472

C-monoids III Definition 23.2 (Product and exponential) In any C-monoid, a × b ≡ 〈a · π1 , b · π2 〉 and g f ≡ (g · ² · 〈π1 , f · π2 〉)∗ . In a monoid, the product · allows to construct sequences of elements, called words. If we add a variable standing for elements, we get

Definition 23.3 (Polynomials) Given a C-monoid M , we construct another C-monoid M [x]: the elements of M [x] are polynomials, i.e., words built up from the symbol x and the elements of M using the C-monoid operations, modulo the smallest congruence relation which satisfies the axioms of C-monoids. The structure of M [x] is determined by h : M → M [x] which maps every element of M into itself, thought to as a constant polynomial of M [x]. 460 of 472

C-monoids IV Theorem 23.4 (Functional completeness) If φ(x) is a polynomial in the variable x over a C-monoid M , there exists a unique f in M such that f · 〈(x · π1 )∗ , 1〉 = φ(x) in M [x].

Proof. (i) Define ρ x · φ(x) by induction on the structure of φ(x): ■

ρ x · k ≡ k · π2 if k is an element of M ;

■

ρ x · x ≡ ²;

■

ρ x · (ξ(x) · ψ(x)) ≡ ρ x · ξ(x) · 〈π1 , ρ x · ψ(x)〉;

■

ρ x · 〈ψ(x), ξ(x)〉 ≡ 〈ρ x · ψ(x), ρ x · ξ(x)〉;

■

ρ x · ψ(x)∗ ≡ (ρ x · ψ(x) · α)∗ , where α ≡ 〈π1 · π1 , 〈π1 · π1 , π2 〉〉.

First, we show that, if φ(x) = ψ(x), then ρ x · φ(x) = ρ x · ψ(x). 461 of 472

,→

C-monoids V ,→ Proof. (ii) It suffices to prove the fact for the C-monoid axioms since = is the smallest congruence relation satisfying them. But, if A = B is a C-monoid axiom, it is easy to check that ρ x · A = ρ x · B. Also, ρ x · φ(x) · 〈(x · π2 )∗ , 1〉 = φ(x) by direct calculation. So, ρ x · φ(x) satisfies the statement of the Theorem. Finally, suppose f · 〈(x · π2 )∗ , 1〉 = φ(x) for some constant f of M , then ρ x · φ(x) = ρ x · f · 〈(x · π2 )∗ , 1〉 = f · π2 · 〈(x · π2 )∗ , 1〉 = f · 1 = f . So ρ · φ(x) is unique, as required.

462 of 472

C-monoids VI Definition 23.5 (Application and abstraction) The application of g to a, elements of a C-monoid M , is defined as g • a ≡ ² · 〈g · (α · π2 )∗ , 1〉. The abstraction of a variable x from a polynomial φ(x) over M is defined as λx . φ(x) ≡ (ρ x · φ(x))∗ .

Corollary 23.6 If φ(x) is a polynomial in the variable x over a C-monoid M , then there exists a unique g in M such that g • x = φ(x).

Proof. Take g = λx . φ(x). 463 of 472

λ-calculus I Definition 23.7 (Terms) Given a denumerable set V of variables, and a possibly empty set K of constants, the set of terms is freely generated by the following rules: ■

if x ∈ V then x is a term and FV(x) = { x };

■

if k ∈ K then k is a term and FV(k) = ;;

■

if f and t are terms, then so is (f t) and FV(f t) = FV(f ) ∪ FV(t);

■

■

■

if a and b are terms, then so is 〈a, b〉 and FV(〈a, b〉) = FV(a) ∪ FV(b); if t is a term, then so are fst(t) and snd(t), and FV(fst(t)) = FV(snd(t)) = FV(t); if t is a term and x ∈ V , then (λx . t) is a term and FV(λx . t) = FV(t) \ { x }.

464 of 472

λ-calculus II Definition 23.8 (Equality) Equality between terms is defined as usual, as the congruence relation between terms, i.e., an equivalence relation which is stable under substitution of equals with equals, which extends α-equivalence, β-equality, η-equality and the usual rules for product and projections. An important note is that product is definable inside λ-calculus, as we have seen, but, as proved by H.P. Barendregt, it is impossible to give a definition which satisfies surjective pairing, that is, 〈fst(t), snd(t)〉 = t. Since surjective pairing is a natural property when working with products, it is usually incorporated in the definition of terms. 465 of 472

λ-calculus III Lemma 23.9 Every C-monoid M gives rise to a λ-calculus L(M ).

Proof. The constants of L(M ) are the elements of M, variables are chosen from a denumerable set V , and the terms of L(M ) are constructed as follows ■

fst(t) ≡ π1 · t and snd(t) ≡ π2 · t;

■

〈a, b 〉 ≡ 〈λx . a, λx . b 〉 • 1, where x 6∈ FV(a) ∪ FV(b);

■

application and abstraction are defined as before.

Finally, a = b holds in L(M ) iff a = b holds in M [FV(a) ∪ FV(b)]. It is immediate to show that L(M ) is a λ-calculus from the properties of C-monoids. 466 of 472

λ-calculus IV Lemma 23.10 Every λ-calculus L gives rise to a C-monoid M(L ).

Proof. (i) The elements of M(L ) are equivalence classes of equivalent closed terms of L ,and ■

1 ≡ λx . x;

■

g · f ≡ λx . g (f x);

■

π1 = λx . fst(x) and π2 = λx . snd(x);

■

〈f , g 〉 ≡ λx . 〈f x , g x 〉;

■

² ≡ λz . 〈fst(z), snd(z)〉;

■

h∗ ≡ λx , y . h 〈x , y 〉.

Moreover, a = b in M(L ) iff a = b holds in L . 467 of 472

,→

λ-calculus V

,→ Proof. (ii) The axioms for a C-monoid as easily checked. For example: ² · 〈h∗ · π1 , π2 〉 · x = ² · 〈h∗ · π1 · x , π2 · x 〉 = 〈h∗ · fst(x), snd(x)〉 = h · 〈fst(x), snd(x)〉 = h · x, so ² · 〈h∗ · π1 , π2 〉 = h. The other axioms are left as exercises.

468 of 472

λ-calculus VI

Theorem 23.11 The maps M and L of the previous lemmas establish a one-to-one correspondence between C-monoids M and λ-calculi L : M ◦ L(M ) = M and L ◦ M(L ) = L . [Proof not required]

469 of 472

λ-calculus VII Definition 23.12 (CMon) The category CMon has C-monoids as objects and C-homomorphisms as arrows. A C-homomorphism is a monoid homomorphism preserving the C-monoid structure.

Definition 23.13 (λ-Calc) The category λ-Calc has λ-calculi as objects and translations as arrows. A translation is a mapping sending variables to variables and closed term to closed terms, such that preserves the term forming operations and the axioms.

Corollary 23.14 The category CMon is isomorphic to the category λ-Calc. [Proof not required] 470 of 472

References and Hints

The material of this lecture is taken from [Lambek], Chapter I.15 and I.17. In that book, the interested reader may find the omitted proof.

471 of 472

Conclusion This lecture is the last one in this course. Who has been interested in the subjects developed in this course, may consider to do her/his dissertation on them: the lecturer is doing active research on some of these themes. If you just want to deepen the subject, ask the lecturer: he may point you books, articles and other references which may satisfy your interests. If someone would like to develop her/his dissertation abroad, in another European university, on these themes, ask the lecturer: he has direct contacts with some of the researchers in the centres where research on these subjects is pursued.

The End 472 of 472

Journal of Functional Programming A representation ... - CiteSeerX