Making Complete Programs

So far, we have technically only written Haskell modules, and not complete Haskell programs. This is fine for prototyping or developing using ghci, but if you want to create a final product which is fully compiled with ghc you need to deal with input and output.

In Haskell, as in most languages, I/O programming is different from ordinary programming, and not easy for beginners. Also, since Haskell is often used for prototyping, I/O can be an unwelcome distraction from the main thread of development, and it isn't necessary for absorbing the essential ideas about declarative programming.

We are not going to describe Haskell's I/O facilities here. If you need I/O, you can pick it up from a book. Instead, we are just going to make a few comments for your interest, and provide a small collection of examples.

Modules

A Haskell program consists of a number of modules, each of which is usually in its own file with a matching name (e.g. module M is in file M.hs). As with other languages, this matching of the names is important so that tools can automatically find the modules they need.

A complete Haskell program must have one module called Main. If you have no module declaration at the top of a file, the default is to assume that the module is called Main. Tools don't usually need to search for a Main module, so it is quite common amongst Haskell programmers to have a Main module inside a file which is not called Main.hs.

If you are using ghci, this allows you to have several complete Haskell programs in one directory. However, if you are using ghc, you can't have several programs in one directory because they will all want to create files called Main.hi and Main.o. In my opinion, this was a bad choice in the Haskell design. (Things are slightly better with version 6.2 of ghc which has a -main-is flag to say which module and which function to use to start execution.).

If you just want to use ghc for efficiency rather than for production programming, you can still avoid I/O. You can compile just the modules where efficiency is a problem, and continue using ghci to run them, because ghci is capable of handling a mixture of compiled and interpreted modules.

The main function

Inside the Main module, a complete Haskell program must have a function called main where execution begins. This is similar to other languages, and is a relatively harmless convention.

However, it leads to an irritating feature of ghci, which is that if it loads a module which appears to be a Main module, and it does not find main, it complains.

To deal with this, we have suggested in these pages that each Haskell file should start with a module declaration. As well as keeping ghci happy by ensuring that the module is proper standard Haskell, this also allows the module to be tested (e.g. for automatic marking of assignments) by adding a separate Main.hs file which imports it and carries out tests.

Functional Input and Output

Input and output appear at first sight to be things which don't match the functional view of the world where there is no sequencing or assignment or state. In fact, that isn't true. Haskell has a sophisticated system of input and output which is purely functional. There are also graphics, multimedia and networking libraries (though these are not part of the Haskell standard).

What is more, various language features make some I/O tasks quite painless. For example, all lists in Haskell, including strings, are lazily evaluated. This means that if you want to handle a stream of characters coming from a keyboard or other device, it just has type String, whereas in other languages you might have to decide between a raw stream, a buffered stream, an array, a linked list, or whatever.

However, to match current operating systems efficiently, Haskell ensures that I/O operations are single threaded. What does that mean? Well, suppose that fs0 represents the state of the file system at the beginning of a program. You could make a definition fs1 = f1 fs0 which applies a function to change the state, e.g. deleting a file. That's OK. But what if you also defined fs2 = f2 fs0 which made some other change to the initial state (not the altered state fs1). Now you have a real problem, because you have to represent two file system states at the same time, and there is only one actual file system. This is thought of as having two threads of change, each starting with fs0.

Haskell's solution to this is to avoid direct access to the file system, but instead have an abstract data type representing operations on the file system. As well as primitive operations which affect files, a restricted set of combining functions is provided. These only allow you to combine operations sequentially, so the file system is changed in a single threaded way. This very effectively keeps I/O purely functional.

The problem with this is that there is now what amounts to a separate sub-language of combining functions which you have to learn to do I/O in Haskell. If you are going to be using Haskell's I/O for substantial production programs, it is worth learning this sub-language, but otherwise it probably isn't.

My own opinion on this issue is that the I/O facilities in Haskell are no harder to learn than the special I/O libraries in other languages, which often also amount to an I/O sub-language with a big learning curve. However, Haskell is very often used casually at infrequent intervals, rather than on an every day basis, and that makes it very unfortunate that the separate I/O sub-language uses different conventions from "normal" Haskell programming. Part of the reason for this situation is that current operating systems have a procedural design (and a poor and over-complex one at that, for historical reasons). If operating systems were redesigned to be more declarative-language-friendly, or (more realistically) more effort were put into implementing declarative-language-friendly interfaces to existing operating systems, I believe the situation could be improved dramatically.

Examples

The trick in I/O programming is not to try to mix I/O with the logic of the program (which is actually good advice in any language), and not to try to mix the I/O sub-language programming style with the normal programming style. In each I/O function, the right hand side starts with the do keyword, which is used only for I/O. Then there is a sequence of actions, one per line. Output actions are just calls. Input actions use the <- keyword to give a name to the item read in; again this is only for I/O.

Add.hs Output only (a program which solves a fixed problem)
Greet.hs Input from command line (a non-interactive program with a few simple parameters)
Lines.hs Input from a file (a non-interactive program with larger amounts of input)
Echo.hs Input typed by user (an interactive line-by-line program, with no state)
Total.hs Input typed by user (an interactive line-by-line program, with state)

Taking Greet.hs as an example, you can compile it and run it with ghc like this:

> ghc Greet.hs -o greet
> greet Ian Holyer
Hello Ian Holyer

You can also run it with ghci using the :set args command to simulate command line arguments, and then explicitly executing main:

> ghci Greet.hs
*Main> :set args Ian Holyer
*Main> main
Hello Ian Holyer

You may notice that this is different from evaluating an expression:

*Main> greet Ian Holyer
"Hello Ian Holyer"

In this case, there are quotes round the result. This is because you are not running the program, so much as debugging it. You are asking for the result of calling the greet function, and that result is a string, and a string presented in programmer's source text form has quotes round it. In the previous execution, we were asking for an I/O operation to be performed. That I/O operation was asked to take the characters of the string one by one and explicitly write them to the screen, so there were no quotes in that case.