Part I

File Input and Output

When we design Scheme functions, we develop small examples and test the functions with these examples. Typically, we apply the functions directly to the data, or we add some test expressions to the bottom of Definitions window. In many cases, however, the goal of developing a function is not to process small pieces of data but to have users enter data and possibly large pieces of data. Generally these users don't understand Scheme (or any other language) and just want to write down their data and run a program. In short, no program is an island; useful programs require extensions for entering data and for presenting data.

In this part, we discuss several methods for programs to read and print data or, technically speaking, file input and output. The first section concerns reading and printing S-expressions from files, including the text interface to a program. An external S-expression is a parenthesized form of data for which Scheme has a standard, natural translation into nested lists. This form of representing and entering data is as old as Lisp (1958) and is clearly the most effective method for reading and printing data. The second section presents reading and printing XML -- a modern form of parenthesized data. The third section covers reading and printing plain strings. The final section is about binary input and output.

1  Input and Output: S-expressions

Roughly speaking, an S-expression is a parenthesized collection of information.2 The atomic pieces of information are numbers, symbols, and strings. Placing a pair of parentheses around any number of S-expressions forms another S-expression. Here are three examples of atomic S-expressions:

a-symbol      10.20      "hello world"
The first corresponds to a symbol, the second is a number, and the third is a Scheme string. Here are three examples of parenthesized S-expressions:
(world hello)      ()      (life ((is) 1) "mess")
The first represents a sequence of symbols in parentheses. The second one is a pair of parentheses around the empty sequence of S-expressions. The third one consists of three S-expressions, including a string; its second component is a complex S-expression that contains a number.

Every S-expression naturally corresponds to a piece of Scheme data. Here are the Scheme data for the first six examples of S-expressions:

a-symbol 'a-symbol
10.20 10.20
"hello world" "hello world"
(world hello) (list 'world 'hello)
() (list)
(life ((is) 1) "mess") (list 'life (list (list 'is) 1) "mess")

In general, each atomic sequence of keyboard characters is represented by a symbol, and every other atomic S-expressions is represented by itself. To translate a parenthesized S-expression into Scheme, we add list to the right of ``('' and then translate the rest.

Scheme's simplest form of input and output reads from the keyboard and prints S-expressions to the screen. We discuss this form of reading and writing data in the first subsection. The second subsection explains how to replace these default devices for reading and printing data with files. In the third subsection, we take a closer look at the relationship between Scheme programs and a computer's files.

1.1  Basic Input and Output

Help Desk: input, output, read, write, display, newline

Let's take a look at the toy example of computing the average of a non-empty list of numbers. Figure 1 contains the definitions that an expert (HtDP: Part IV) or a beginner (HtDP: Part II) may design. That is, the left one relies on two of Scheme's numerous built-in ``loops,'' which are really functions. The right one defines everything from scratch. The contract for both specifies that the function consumes a non-empty list and produces a number. The purpose statement specifies what the function computes.

;; average : (cons num (listof num)) -> num
;; to compute the average of a non-empty list of numbers


 

;; The expert version
(define (average alon)
  (/ (apply + alon) 
     (length alon)))













            

;; The beginner version
(define (average alon)
  (/ (sum alon) 
     (how-many alon)))

;; sum : (listof num) -> num
(define (sum alon)
  (cond
    [(empty? alon) 0]
    [else (+ (first alon) (sum (rest alon)))]))

;; how-many : (listof num) -> num
(define (how-many alon)
  (cond
    [(empty? alon) 0]
    [else (+ (how-many (rest alon)) 1)]))


 

Figure 1:  Computing the average of a list of numbers

If we wish to use the function, we apply it to a list of numbers in DrScheme's Interactions window:

> (average (list 1 2 3))
2

Others may use the program, too, as long as they know what it means to to apply function and to form a list of numbers in Scheme. Clearly, this severely limits the audience for our program.

To get around this limitation, we must add Scheme code that reads the sequence of numbers from the keyboard and prints the result. That is, we need to add a function that reads a list of numbers, that applies average to that sequence, and that prints the result. Here is the simplest such function:

;; main : -> void
;; to read an S-expression of the shape (num ... num), and
;; to write its average on a new line
(define (main)
  (write (average (read))))

It composes read with average and write. The first and the last function are Scheme's built-in input and output functions; they read and print S-expressions in the expected manner. The two functions, plus a third one, are specified in figure 2.

;; A first data definition for internal-S-expression:
internal-S-expression = 
 (union bool
        num
        stri 
        (listof internal-Sexpression))

;; read : -> internal-S-expression
;; to read an S-expression typed at the keyboard and
;; to produce it in its standard Scheme representation

;; write : internal-S-expression -> void
;; to print a Scheme value as an S-expression on a single line

;; display : internal-S-expression -> void
;; to print a Scheme value as an S-expression on a single line,
;; stripping a string's quotes

;;  newline : -> void
;; to end a line of printed matter

Figure 2:  Scheme's basic functions for input and output

If we now wish to use the average function, we type (main) in the Interactions window and then type in a parenthesized S-expression that consists of numbers:

> (main)
(1 2 3)
2

Here the function responds with printing 2, the average of 1, 2, and 3. More concretely, the application (read) intercepts what we type at the keyboard -- the S-expression (1 2 3) -- and converts it into (list 1 2 3) -- the natural Scheme representation for the S-expression. Then main applies average to this list, which produces 2. That is, write is applied to 2 and, in turn, prints 2 to the screen. Figure 3 contains a screen dump of the action.

Figure 3:  Input and output in DrScheme's Interactions window

Although using main is slightly simpler for a user without DrScheme knowledge, it is still obscure because the user must also know that the program is started with (main) and that once it is started it immediately waits. It is much more polite to prompt the user with a short phrase such as ``Enter a list of numbers.'' Here is a variant of main that achieves just this effect:

;; main-with-prompt : -> void
;; to prompt the user for an an S-expression of the shape (num ... num),
;; to read the expression,
;; and to print its average on a new line
(define (main-with-prompt-draft)
  (begin
    (write "Enter a list of numbers: ")
    (write (average (read)))
    ( newline)))

When we now use main-with-prompt in the Interactions window, we get the following kind of dialog:

> (main-with-prompt-draft)
"Enter the a list of numbers: " (1 2 3)
2

That is, the function prints the short phrase between the doublequotes, and then waits for the user to enter a parenthesized S-expression on the same line. Once the user hits Enter, the program reads the list of numbers, computes the average, prints it, and terminates the line with (newline).

While main-with-prompt-draft is an improvement over main, the doublequotes around the prompt string are unappealing. We can avoid them by using display instead of write:

;; main-with-prompt : -> void
;; to prompt the user for an an S-expression of the shape (num ... num),
;; to read the expression,
;; and to print its average on a new line
(define (main-with-prompt)
  (begin
    (display "Enter a list of numbers: ")
    (write (average (read)))
    ( newline)))

Figure 3 also contains the screen dump for an interaction with this function. It shows that display doesn't print doublequotes around strings, which is what we want for ordinary user interactions. For the second print effect, namely the printing of the average, we can use write or display because the effect is to print a number and the two functions print numbers in the same way.

At this point, the user must still know how to start a DrScheme program. Ideally, the user should be able to walk up to a computer and just use it as an ``average computing machine.'' That is, someone should start a program once and that program should compute averages until the users get tired of it. Here is such a program:

;; main-forever : -> void
;; to prompt the user for an an S-expression of the shape (num ... num)
;; continuously, to read the expression,
;; and to print its average on a new line
(define (main-forever)
  (begin
    (display "Enter a list of numbers: ")
    (let ([the-input (read)])
      (cond
        [(eq? 'x the-input)
	 (begin 
	   (display "Good bye.")
	   ( newline))]
        [else
	 (begin 
	   (display (average the-input))
	   ( newline)
	   (main-forever))]))))

Following a rather common convention, the program prompts the user for data and deals with two different kinds of user inputs: the symbol 'x and a list of numbers. If it reads 'x, the program stops -- or exits. Otherwise it computes an average and starts over. The last screen dump in figure 3 illustrates how this function works.

By now, we have a pretty useful program. Ideally, we should also eliminate the need for the start-up user to know anything about DrScheme. Clicking on some icon or typing in ``turn-on-average-machine'' at some shell prompt should suffice. Naturally, we can convert Scheme program into such stand-alone scripts; doing this is the topic of part IV.

Bad Inputs

According to the data definition in figure 2 and the contract for average in figure 1, (read) may produce many more values than average may consume. Indeed, the definition of main-forever exploits this fact. If a user types x at the keyboard, main-forever quits. This suggests that we write a checked version of average (HtDP: Part I):

;; checked-average : internal-S-expression -> number
(define (checked-average x)
  (cond
    [(and (list? x) (andmap number? x)) (average x)]
    [else (error 'average "expects a list of numbers")]))

Now we can substitute average in main, main-prompt, main-forever. If the user types anything else than a parenthesized S-expression or if the S-expression contains a non-number, the program will fail gracefully with an informative error message.

In the case of main and main-prompt a failure is just fine. The user tried to compute the average of a single list of numbers, failed to use the program properly, and the program stopped. If, however, we started the ``average-computing machine'' with (main-forever), a user who types

  (1 2 q 4 5 6 8 9 12)

by accident instead of

  (1 2 2 4 5 6 8 9 12)

would certainly just like to fix the inputs. Stopping the machine is a questionable thing. Instead, the program should print a message and prompt the user for another list.

;; main-forever : -> void
;; to prompt the user for an an S-expression of the shape (num ... num)
;; continuously, to read the expression,
;; and to print its average on a new line
(define (main-forever2)
  (begin
    (display "Enter a list of numbers: ")
    (let ([the-input (read)])
      (cond
        [(eq? 'x the-input)
	 (begin 
	   (display "Good bye.")
	   ( newline))]
        [(and (list? the-input) (andmap number? the-input))
	 (begin 
	   (display (average the-input))
	   ( newline)
	   (main-forever2))]
	[else
	 (begin
	   (display "Expected a list of numbers. Given: ")
	   (display x)
	   ( newline)
	   (main-forever2))]))))

Figure 4:  Computing the averages of a list of numbers with protection

We can overcome the problem with main-forever in two ways. First, we can integrate checked-average into the main function: see figure 4. Second, we can use checked-average and an exception handler in the revision of main-forever, that is, a Scheme construct that deals with failures such as applications of error. We will discuss exception handling and its uses in part [***].

 

Now suppose you have become the head grader for Rice's famous Comp210 course. One of your chores is to produce grade-point averages for all of the students in class. To do that, you keep track of homework grades in a file and, since Scheme is the only language you know, the file has an S-expression format:

  ((Adam 78 88 69 66)
   (Brad 88 87 86 22)
   (Cath 99 88 88 90)
   (Dave 77 78 77 78)
   (Fawn 90 89 81 60)
   (Gege 67 78 81 85))

Assuming this expression was quoted (HtDP: Intermezzo 2), writing the program that produces these results would be a straightforward exercise. Figure 5 contains the complete definition. There is only one problem left, namely, the grades are in a file and, before we can apply our function, we need to get the data from the file and convert them into an internal S-expression.

;; Data Definitions:
;; line = (cons sym (listof num))
;; name+average = (list sym num)
;; NOTE: the numbers are integers between 0 and 100.

;; compute-averages : (listof line) -> (listof name+average)
;; to compute the homework gpa for each item in lines
(define (compute-averages lines)
  (map compute-one-average lines))

;; compute-one-average : line -> name+average
;; to compute the homework gpa for a-line
(define (compute-one-average a-line)
  (list (first a-line) (average (rest a-line))))

;; average : (cons num (listof num)) -> num
;; to compute the average of a non-empty list of numbers
(define (average alon)
  (/ (apply + alon) 
     (length alon)))

Figure 5:  Computing the homework gpa for a course

Fortunately, we can solve half the problem with read. Using read, we can write a program that reads in our S-expression of homework grades and can apply compute-averages to the result:

;; grader : -> void
;; to read an S-expression of the shape ((sym (num ... num)) ...),
;; and to print an S-expression of the shape ((sym num) ...)
;; the output shape has as many items as the input shape, the syms
;; are the same for the ith item, and the second item is the
;; average of the list of nums associated with the ith item
(define (grader)
  (display (compute-averages (read))))

And voilà, we have all the averages we need. To use it, we evaluate (grader) in the Interactions windowand paste the grade information into the read window.

While this solves most of our problem, it still leaves us with cutting and pasting the S-expression into DrScheme. If we manage a dozen students or so, that's fine. But, if we manage a section with 100 or 500 students, cutting and pasting won't do. We could lose or corrupt data. To overcome this problem, we need to know how to read data from files and that's the topic of the next section.


Exercises

Exercise 1.1.1.  


1.2  Redirecting Input and Output

Help Desk: input, output, with-input-from-file, with-output-to-file

Suppose we have a program that interacts with the user via the default devices. That is, it reads data from the keyboard and prints data to the screen. But suppose we actually want a program that reads from an existing file and prints its results to a new file. In that case we need to re-direct the reading and printing actions of the program. Ideally, we should do that without changing the existing program and, indeed, we can do that in Scheme easily with the two functions in figure 6: with-input-from-file and with-output-to-file.3

;; with-input-from-file : str (-> X) -> X
;; to open the file f for reading
;; as if it were the default input device
;; during the evaluation of (thunk)
(define (with-input-from-file f thunk) 
  ...)

;; with-output-to-file : str (-> X) -> X
;; to open the file f for printing
;; as if it were the default output device
;; during the evaluation of (thunk)
(define (with-output-to-file f thunk)
  ...)

Figure 6:  Scheme's basic functions for input and output

Using the function with-input-from-file, we can avoid the cutting and pasting of the grade file into the Interactions window. Let's the grades are in a file called "grades-for-2001s.dat" (in the current directory). Then we just evaluate

> (with-input-from-file "grades-for-2001s.dat" grader)
... 

That is, we apply with-input-from-file to a filename and grader. The latter is a function of no arguments, which is what with-input-from-file requires. This forces the read expression that is evaluated due to the evaluation of (grader) to read a parenthesized S-expression from the file "grades-for-2001s.dat". It still prints the results in the Interactions window, and we have to copy them into a file from there.

Of course, cutting and pasting data from the Interactions window to some file is as burdensome as copying some other data into the Interactions window. We should save our results to a new4 file. To overcome this last problem, we use with-output-to-file:

> (with-output-to-file "final-grades-for-2001s.dat"
    (lambda () 
      (with-input-from-file "grades-for-2001s.dat" grader)))
>

Recall that (lambda () ...) is a (nameless) procedure of no arguments, which is precisely what with-output-to-file consumes. The expression first creats a file with name "final-grades-for-2001s.dat". From then on, the evaluation of a (display ...) or a (newline) expression affects that file. In the end, the file contains the following data:

   ((Adam 75.25) (Brad 70.75) (Cath 91.25) ...

That is, all of the results are on a single line. While putting everything on one line is acceptable if the file is read by other (Scheme) programs, it is bad for human consumption. Human beings are better off reading nicely indented S-expressions.


Exercises

Exercise 1.2.1.   Modify grader so that it returns a thunk that when applied to no arguments, performs the computation:

;; grader : -> (-> void)
;; to read an S-expression of the shape ((sym (num ... num)) ...),
;; ...
(define (grader) ...)

Then simplify the file-redirection expression. What does this suggest for programs that perform i/o operations? 

Exercise 1.2.2.   Define the function ia-grader, which consumes two strings. The first is the name of an input file that contains an S-expression with the homework grades. The second is the name of an output file, which doesn't exist yet. 

Exercise 1.2.3.   Define a function that protects grader from erroneous inputs. 

;; double : -> void
;; to read a parenthesized S-expression
;; to double the number of items in the 'list'
(define (double)
  (write 
   (apply append
          (map (lambda (x) (list x x)) (read)))))

;; count : -> number
;; to read a parenthesized S-expression
;; and to count the number of items in the 'list'
(define (count)
  (length (read)))

;; pipe : (-> Y) (-> X) -> (-> X)
;; to evaluate (f) and (g) such that the standard file output of (f)
;; is read as the standard input by (g); produce g's result
(define (pipe f g)
  (lambda ()
    (let ([tmp-file (string-append "tmp" (numberstring (random 10000)))])
      ;; assume that tmp-file-name doesn't exist
      (with-output-to-file tmp-file f)
      (with-input-from-file tmp-file g))))

Figure 7:  Composing programs

Exercise 1.2.4.   Consider the program in figure 7. It contains two functions that process standard input files. The third function can compose such functions in such a way that the output of the first is the input of the second.

Use pipe to connect double and count.

Also design a simple test that relates the result of the connected functions to the result of (with-input-from-file "foo.ss" count).

Note: Composing input/output functions in this manner creates a temporary file. If this temporary file already exists, pipe fails. We can avoid this with additional file manipulation functions (see part V). Alternatively, we can avoid the creation of a temporary file altogether with string ports (see part II). 


1.3  Pretty Printing

Help Desk: pretty print, pretty-print

Collection: pretty.ss

Printing S-expressions on a single line is only useful if the S-expressions are short. No human being can inspect a lot of data that is printed on a single line. Quite the opposite: we should always assume that a human being may have to read the output of a program.

Because human beings need ``pretty'' output, Scheme provides a library for pretty printing. The most useful function from this library is pretty-print, which consumes an S-expression and formats it in a manner that takes the line width into account. Thus, if we define grader using pretty-print:

;; grader : -> void
;; to read an S-expression of the shape ((sym (num ... num)) ...),
;; and to print an S-expression of the shape ((sym num) ...)
;; the output shape has as many items as the input shape, the syms
;; are the same for the ith item, and the second item is the
;; average of the list of nums associated with the ith item
(define (grader)
  (pretty-print (compute-averages (read))))

and if we then evaluate the expression

> (with-output-to-file "final-grades-for-2001s.dat"
    (lambda () 
      (with-input-from-file "grades-for-2001s.dat" grader)))
>

The evaluation of this expression terminates without printing anything to the screen. Instead it prints the following S-expression to the file "final-grades-for-2001s.dat":

  ((Adam 78 88 69 66)
   (Brad 88 87 86 22)
   (Cath 99 88 88 90)
   (Dave 77 78 77 78)
   (Fawn 90 89 81 60)
   (Gege 67 78 81 85))


Exercises

Exercise 1.3.1.  



2 See Intermezzo 2 in ``How to Design Progams''.

3 The actual reading and writing is managed by the operating system, the layer of software that manages the connection between applications and the actual pieces of the computer. Fortunately, the designers of operating systems have arranged the connections between programs such as ours and devices such as the keyboard in a flexible way. More technically, these devices are dealt with as if they were files, and hence, it is possible to switch one file for another.

4 The function with-output-to-file creates a new file. If a file with the given name already exists in the working directory, it signals an error. We will learn to deal with this aspect of the problem in part V.