Monday, March 21, 2016

Agda is not a purely functional language

One of the more surprising things I've learned as a lecturer is the importance of telling lies to students. Basically, the real story regarding nearly everything in computer science is diabolically complicated, and if we try to tell the whole truth up front, our students will get too confused to learn anything.

So what we have to do is to start by telling a simple story which is true in broad outline, but which may be false in some particulars. Then, when they understand the simple story, we can point out the places where the story breaks down, and then use that to tell a slightly less-false story, and so on and so on, until they have a collection of models, which range from the simple-but-inaccurate to the complex-but-accurate. Then our students can use the simplest model they can get away with to solve their problem.

The same thing happens when teaching physics, where we start with Newtonian mechanics, and then refine it with special relativity and quantum mechanics, and then refine it further (in two incompatible ways!) with general relativity and quantum field theories. The art of problem solving is to pick the least complicated model that will solve the problem.

But despite how essential lies are, it's always a risk that our students discover that we are telling lies too early -- because their understanding is fragile, they don't have a sense of the zone of applicability of the story, and so an early failure can erode the confidence they need to try tackling problems. Happily, though, one advantage that computer science has over physics is that computers are more programmable than reality: we can build artificial worlds which actually do exactly obey the rules we lay out.

This is why my colleague Martín Escardó says that when teaching beginners, he prefers teaching Agda to teaching Haskell -- he has to tell fewer lies. The simple story that we want to start with when teaching functional programming is that data types are sets, and computer functions are mathematical functions.

In the presence of general recursion, though, this is a lie. Martín has done a lot of work on more accurate stories, such as data types are topological spaces and functions are continuous maps, but this is not what we want to tell someone who has never written a computer program before (unless they happen to have a PhD in mathematics)!

Agda's termination checking means that every definable function is total, and so in Agda it's much closer to true that types are sets and functions are functions. But it's not quite true!

Here's a little example, as an Agda program. First, let's get the mandatory bureaucracy out of the way -- we're defining the program name, and importing the boolean and identity type libraries.

module wat where 

open import Data.Bool
open import Relation.Binary.PropositionalEquality using (_≡_; refl)

Next, let's define the identity function on booleans.

f : Bool  Bool
f true = true
f false = false

In fact, let's define it a second time!

g : Bool  Bool
g true = true
g false = false

So now we have two definitions of the identity function f, and g, so they must be equal, right? Let's see if we can write a proof this fact.

wat : f  g 
wat = refl

Oops! We see that refl is in red -- Agda doesn't think it's a proof. In other words, two functions which are identical except for their name are not considered equal -- α-equivalence is not a program equivalence as far as Agda is concerned!

This is somewhat baffling, but does have an explanation. Basically, an idea that was positively useful in non-dependent functional languages has turned toxic in Agda. Namely, type definitions in ML and Haskell are generative -- two declarations of the same type will create two fresh names which cannot be substituted for each other. For example, in Ocaml the following two type definitions

type foo = A | B 
type bar = A | B 

are not interchangeable -- Ocaml will complain if you try to use a foo where a bar is expected:

OCaml version 4.02.3

let x : foo = A 

let bar_id (b : bar) = b

let _ = bar_id x;;

Characters 96-97:
  let _ = bar_id x;;
Error: This expression has type foo but an
        expression was expected of type bar

In other words, type definitions have an effect -- they create fresh type names. Haskell alludes to this fact with the newtype keyword, and in fact that side-effect is essential to its design: the type class mechanism permits overloading by type, and the generativity of type declarations is the key feature that lets users define custom overloads for their application-specific types.

Basically, Agda inherited this semantics of definitions from Haskell and ML. Unfortunately, this creates problems for Agda, by complicating its equality story. So we can't quite say that Agda programs are sets and functions. We could say that they are sets and functions internal to the Schanuel topos, but (a) that's something we may not want to hit students with right off the bat, and (b) Agda doesn't actually include the axioms that would let you talk about this fact internally. (In fact, Andy Pitts tells me that the right formulation of syntax for nominal dependent types is still an open research question.)

Thursday, March 17, 2016

Datafun: A Functional Datalog

Together with Michael Arntzenius, I have a new draft paper, Datafun: a Functional Datalog
Datalog may be considered either an unusually powerful query language or a carefully limited logic programming language. It has been applied successfully in a wide variety of problem domains thanks to its "sweet spot" combination of expressivity, optimizability, and declarativeness. However, most use-cases require extending Datalog in an application-specific manner. In this paper we define Datafun, an analogue of Datalog supporting higher-order functional programming. The key idea is to track monotonicity via types.

I've always liked domain specific languages, but have never perpetrated one before. Thanks to Michael, now I have! Even better, it's a higher-order version of Datalog, which is the language behind some of my favorite applications, such as John Whaley and Monica Lam's BDDBDDB tool for writing source code analyses.

You can find the Github repo here, as well. Michael decided to implement it in Racket, which I had not looked at closely in several years. It's quite nice how little code it took to implement everything!