When we design Scheme functions, we develop small examples and test the functions with these examples. Typically, we apply the functions directly to the data, or we add some test expressions to the bottom of Definitions window. In many cases, however, the goal of developing a function is not to process small pieces of data but to have users enter data and possibly large pieces of data. Generally these users don't understand Scheme (or any other language) and just want to write down their data and run a program. In short, no program is an island; useful programs require extensions for entering data and for presenting data.
In this part, we discuss several methods for programs to read and print data or, technically speaking, file input and output. The first section concerns reading and printing S-expressions from files, including the text interface to a program. An external S-expression is a parenthesized form of data for which Scheme has a standard, natural translation into nested lists. This form of representing and entering data is as old as Lisp (1958) and is clearly the most effective method for reading and printing data. The second section presents reading and printing XML -- a modern form of parenthesized data. The third section covers reading and printing plain strings. The final section is about binary input and output.
Roughly speaking, an S-expression is a parenthesized collection of information.2 The atomic pieces of information are numbers, symbols, and strings. Placing a pair of parentheses around any number of S-expressions forms another S-expression. Here are three examples of atomic S-expressions:
a-symbol 10.20 "hello world"The first corresponds to a symbol, the second is a number, and the third is a Scheme string. Here are three examples of parenthesized S-expressions:
(world hello) () (life ((is) 1) "mess")The first represents a sequence of symbols in parentheses. The second one is a pair of parentheses around the empty sequence of S-expressions. The third one consists of three S-expressions, including a string; its second component is a complex S-expression that contains a number.
Every S-expression naturally corresponds to a piece of Scheme data. Here are the Scheme data for the first six examples of S-expressions:
|
In general, each atomic sequence of keyboard characters is represented by
a symbol, and every other atomic S-expressions is represented by itself.
To translate a parenthesized S-expression into Scheme, we add
list to the right of ``('' and then translate the rest.
Scheme's simplest form of input and output reads from the keyboard and prints S-expressions to the screen. We discuss this form of reading and writing data in the first subsection. The second subsection explains how to replace these default devices for reading and printing data with files. In the third subsection, we take a closer look at the relationship between Scheme programs and a computer's files.
Help Desk: input, output, read, write, display, newline
Let's take a look at the toy example of computing the average of a non-empty list of numbers. Figure 1 contains the definitions that an expert (HtDP: Part IV) or a beginner (HtDP: Part II) may design. That is, the left one relies on two of Scheme's numerous built-in ``loops,'' which are really functions. The right one defines everything from scratch. The contract for both specifies that the function consumes a non-empty list and produces a number. The purpose statement specifies what the function computes.
If we wish to use the function, we apply it to a list of numbers in DrScheme's Interactions window:
> (average (list 1 2 3)) 2
Others may use the program, too, as long as they know what it means to to apply function and to form a list of numbers in Scheme. Clearly, this severely limits the audience for our program.
To get around this limitation, we must add Scheme code that reads the
sequence of numbers from the keyboard and prints the result. That is, we
need to add a function that reads a list of numbers, that applies
average to that sequence, and that prints the result. Here is the
simplest such function:
;;main : -> void;; to read an S-expression of the shape(num ... num), and ;; to write its average on a new line (define (main) (write (average (read))))
It composes read with average and write.
The first and the last function are Scheme's built-in input and output
functions; they read and print S-expressions in the expected manner.
The two functions, plus a third one, are specified in
figure 2.
;; A first data definition forinternal-S-expression: internal-S-expression = (union bool num stri (listof internal-Sexpression)) ;;read : -> internal-S-expression;; to read an S-expression typed at the keyboard and ;; to produce it in its standard Scheme representation ;;write : internal-S-expression -> void;; to print a Scheme value as an S-expression on a single line ;;display : internal-S-expression -> void;; to print a Scheme value as an S-expression on a single line, ;; stripping a string's quotes ;;newline : -> void;; to end a line of printed matter
Figure 2: Scheme's basic functions for input and output
If we now wish to use the average function, we type (main) in the
Interactions window and then type in a parenthesized S-expression that consists of
numbers:
> (main) (1 2 3) 2
Here the function responds with printing 2, the average of
1, 2, and 3. More concretely, the application
(read) intercepts what we type at the keyboard -- the S-expression
(1 2 3) -- and converts it into (list 1 2 3) -- the
natural Scheme representation for the S-expression. Then main applies
average to this list, which produces 2. That is,
write is applied to 2 and, in turn, prints 2 to
the screen. Figure 3 contains a screen dump of the
action.
Figure 3: Input and output in DrScheme's Interactions window
Although using main is slightly simpler for a user without DrScheme
knowledge, it is still obscure because the user must also know that the
program is started with (main) and that once it is started it
immediately waits. It is much more polite to prompt the user with a short
phrase such as ``Enter a list of numbers.'' Here is a variant of
main that achieves just this effect:
;;main-with-prompt : -> void;; to prompt the user for an an S-expression of the shape(num ... num), ;; to read the expression, ;; and to print its average on a new line (define (main-with-prompt-draft) (begin (write "Enter a list of numbers: ") (write (average (read))) ( newline)))
When we now use main-with-prompt in the Interactions window, we get the following
kind of dialog:
> (main-with-prompt-draft) "Enter the a list of numbers: " (1 2 3) 2
That is, the function prints the short phrase between the doublequotes, and
then waits for the user to enter a parenthesized S-expression on the same
line. Once the user hits Enter, the program reads the list of
numbers, computes the average, prints it, and terminates the line with
(newline).
While main-with-prompt-draft is an improvement over main,
the doublequotes around the prompt string are unappealing. We can avoid
them by using display instead of write:
;;main-with-prompt : -> void;; to prompt the user for an an S-expression of the shape(num ... num), ;; to read the expression, ;; and to print its average on a new line (define (main-with-prompt) (begin (display "Enter a list of numbers: ") (write (average (read))) ( newline)))
Figure 3 also contains the screen dump for an
interaction with this function. It shows that display doesn't print
doublequotes around strings, which is what we want for ordinary user
interactions. For the second print effect, namely the printing of the
average, we can use write or display because the effect
is to print a number and the two functions print numbers in the same way.
At this point, the user must still know how to start a DrScheme program. Ideally, the user should be able to walk up to a computer and just use it as an ``average computing machine.'' That is, someone should start a program once and that program should compute averages until the users get tired of it. Here is such a program:
;;main-forever : -> void;; to prompt the user for an an S-expression of the shape(num ... num);; continuously, to read the expression, ;; and to print its average on a new line (define (main-forever) (begin (display "Enter a list of numbers: ") (let ([the-input (read)]) (cond [(eq? 'x the-input) (begin (display "Good bye.") ( newline))] [else (begin (display (average the-input)) ( newline) (main-forever))]))))
Following a rather common convention, the program prompts the user for
data and deals with two different kinds of user inputs: the symbol
'x and a list of numbers. If it reads 'x, the program
stops -- or exits. Otherwise it computes an average and starts over. The
last screen dump in figure 3 illustrates how this
function works.
By now, we have a pretty useful program. Ideally, we should also eliminate the need for the start-up user to know anything about DrScheme. Clicking on some icon or typing in ``turn-on-average-machine'' at some shell prompt should suffice. Naturally, we can convert Scheme program into such stand-alone scripts; doing this is the topic of part IV.
According to the data definition in figure 2 and the
contract for average in figure 1, (read)
may produce many more values than average may consume. Indeed,
the definition of main-forever exploits this fact. If a user
types x at the keyboard, main-forever quits. This suggests
that we write a checked version of average (HtDP: Part I):
;; checked-average : internal-S-expression -> number
(define (checked-average x)
(cond
[(and (list? x) (andmap number? x)) (average x)]
[else (error 'average "expects a list of numbers")]))
Now we can substitute average in main,
main-prompt, main-forever. If the user types anything
else than a parenthesized S-expression or if the S-expression contains a
non-number, the program will fail gracefully with an informative error
message.
In the case of main and main-prompt a failure is just
fine. The user tried to compute the average of a single list of numbers,
failed to use the program properly, and the program stopped. If, however,
we started the ``average-computing machine'' with (main-forever),
a user who types
(1 2 q 4 5 6 8 9 12)
by accident instead of
(1 2 2 4 5 6 8 9 12)
would certainly just like to fix the inputs. Stopping the machine is a questionable thing. Instead, the program should print a message and prompt the user for another list.
;;main-forever : -> void;; to prompt the user for an an S-expression of the shape(num ... num);; continuously, to read the expression, ;; and to print its average on a new line (define (main-forever2) (begin (display "Enter a list of numbers: ") (let ([the-input (read)]) (cond [(eq? 'x the-input) (begin (display "Good bye.") ( newline))] [(and (list? the-input) (andmap number? the-input)) (begin (display (average the-input)) ( newline) (main-forever2))] [else (begin (display "Expected a list of numbers. Given: ") (display x) ( newline) (main-forever2))]))))
Figure 4: Computing the averages of a list of numbers with protection
We can overcome the problem with main-forever in two ways. First,
we can integrate checked-average into the main function: see
figure 4. Second, we can use
checked-average and an exception handler in the revision of
main-forever, that is, a Scheme construct that deals with
failures such as applications of error. We will discuss exception
handling and its uses in part [***].
Now suppose you have become the head grader for Rice's famous Comp210 course. One of your chores is to produce grade-point averages for all of the students in class. To do that, you keep track of homework grades in a file and, since Scheme is the only language you know, the file has an S-expression format:
((Adam 78 88 69 66) (Brad 88 87 86 22) (Cath 99 88 88 90) (Dave 77 78 77 78) (Fawn 90 89 81 60) (Gege 67 78 81 85))
Assuming this expression was quoted (HtDP: Intermezzo 2), writing the program that produces these results would be a straightforward exercise. Figure 5 contains the complete definition. There is only one problem left, namely, the grades are in a file and, before we can apply our function, we need to get the data from the file and convert them into an internal S-expression.
;; Data Definitions: ;;line = (cons sym (listof num));;name+average = (list sym num);; NOTE: the numbers are integers between 0 and 100. ;;compute-averages : (listof line) -> (listof name+average);; to compute the homework gpa for each item inlines(define (compute-averages lines) (map compute-one-average lines)) ;;compute-one-average : line -> name+average;; to compute the homework gpa fora-line(define (compute-one-average a-line) (list (first a-line) (average (rest a-line)))) ;;average : (cons num (listof num)) -> num;; to compute the average of a non-empty list of numbers (define (average alon) (/ (apply + alon) (length alon)))
Figure 5: Computing the homework gpa for a course
Fortunately, we can solve half the problem with read. Using
read, we can write a program that reads in our S-expression of
homework grades and can apply compute-averages to the result:
;;grader : -> void;; to read an S-expression of the shape((sym (num ... num)) ...), ;; and to print an S-expression of the shape((sym num) ...);; the output shape has as many items as the input shape, thesyms ;; are the same for theith item, and the second item is the ;; average of the list ofnums associated with theith item (define (grader) (display (compute-averages (read))))
And voilà, we have all the averages we need. To use it, we evaluate
(grader) in the Interactions windowand paste the grade information into the
read window.
While this solves most of our problem, it still leaves us with cutting and pasting the S-expression into DrScheme. If we manage a dozen students or so, that's fine. But, if we manage a section with 100 or 500 students, cutting and pasting won't do. We could lose or corrupt data. To overcome this problem, we need to know how to read data from files and that's the topic of the next section.
Exercises
Help Desk: input, output, with-input-from-file, with-output-to-file
Suppose we have a program that interacts with the user via the default
devices. That is, it reads data from the keyboard and prints data to the
screen. But suppose we actually want a program that reads from an existing
file and prints its results to a new file. In that case we need to re-direct the reading and printing actions of the program. Ideally, we
should do that without changing the existing program and, indeed, we can
do that in Scheme easily with the two functions in
figure 6: with-input-from-file and
with-output-to-file.3
;;with-input-from-file : str (-> X) -> X;; to open the fileffor reading ;; as if it were the default input device ;; during the evaluation of(thunk)(define (with-input-from-file f thunk) ...) ;;with-output-to-file : str (-> X) -> X;; to open the fileffor printing ;; as if it were the default output device ;; during the evaluation of(thunk)(define (with-output-to-file f thunk) ...)
Figure 6: Scheme's basic functions for input and output
Using the function with-input-from-file, we can avoid the cutting
and pasting of the grade file into the Interactions window. Let's the grades are in a file
called "grades-for-2001s.dat" (in the current directory). Then
we just evaluate
> (with-input-from-file "grades-for-2001s.dat" grader) ...
That is, we apply with-input-from-file to a filename and
grader. The latter is a function of no arguments, which is what
with-input-from-file requires. This forces the read
expression that is evaluated due to the evaluation of (grader) to
read a parenthesized S-expression from the file
"grades-for-2001s.dat". It still prints the results in the Interactions window,
and we have to copy them into a file from there.
Of course, cutting and pasting data from the Interactions window to some file is as
burdensome as copying some other data into the Interactions window. We should save our
results to a new4 file. To overcome this last problem,
we use with-output-to-file:
> (with-output-to-file "final-grades-for-2001s.dat" (lambda () (with-input-from-file "grades-for-2001s.dat" grader))) >
Recall that (lambda () ...) is a (nameless) procedure of no
arguments, which is precisely what with-output-to-file consumes.
The expression first creats a file with name
"final-grades-for-2001s.dat". From then on, the evaluation of a
(display ...) or a (newline) expression affects that
file. In the end, the file contains the following data:
((Adam 75.25) (Brad 70.75) (Cath 91.25) ...
That is, all of the results are on a single line. While putting everything on one line is acceptable if the file is read by other (Scheme) programs, it is bad for human consumption. Human beings are better off reading nicely indented S-expressions.
Exercises
Exercise 1.2.1.
Modify grader so that it returns a thunk that when applied to no
arguments, performs the computation:
;;grader : -> (-> void);; to read an S-expression of the shape((sym (num ... num)) ...), ;; ... (define (grader) ...)
Then simplify the file-redirection expression. What does this suggest for programs that perform i/o operations?
Exercise 1.2.2.
Define the function ia-grader, which consumes two strings. The
first is the name of an input file that contains an S-expression with the
homework grades. The second is the name of an output file, which doesn't
exist yet.
Exercise 1.2.3.
Define a function that protects grader from erroneous inputs.
;;double : -> void;; to read a parenthesized S-expression ;; to double the number of items in the 'list' (define (double) (write (apply append (map (lambda (x) (list x x)) (read))))) ;;count : -> number;; to read a parenthesized S-expression ;; and to count the number of items in the 'list' (define (count) (length (read))) ;;pipe : (-> Y) (-> X) -> (-> X);; to evaluate (f) and (g) such that the standard file output of (f) ;; is read as the standard input by (g); produce g's result (define (pipe f g) (lambda () (let ([tmp-file (string-append "tmp" (numberstring (random 10000)))]) ;; assume that tmp-file-name doesn't exist (with-output-to-file tmp-file f) (with-input-from-file tmp-file g))))
Figure 7: Composing programs
Exercise 1.2.4. Consider the program in figure 7. It contains two functions that process standard input files. The third function can compose such functions in such a way that the output of the first is the input of the second.
Use pipe to connect double and count.
Also design a simple test that relates the result of the connected
functions to the result of (with-input-from-file "foo.ss"
count).
Note: Composing input/output functions in this manner
creates a temporary file. If this temporary file already exists,
pipe fails. We can avoid this with additional file manipulation
functions (see part V). Alternatively, we can avoid the
creation of a temporary file altogether with string ports (see
part II).
Help Desk: pretty print, pretty-print
Collection: pretty.ss
Printing S-expressions on a single line is only useful if the S-expressions are short. No human being can inspect a lot of data that is printed on a single line. Quite the opposite: we should always assume that a human being may have to read the output of a program.
Because human beings need ``pretty'' output, Scheme provides a library for
pretty printing. The most useful function from this library is
pretty-print, which consumes an S-expression and formats it in a
manner that takes the line width into account. Thus, if we define
grader using pretty-print:
;;grader : -> void;; to read an S-expression of the shape((sym (num ... num)) ...), ;; and to print an S-expression of the shape((sym num) ...);; the output shape has as many items as the input shape, thesyms ;; are the same for theith item, and the second item is the ;; average of the list ofnums associated with theith item (define (grader) (pretty-print (compute-averages (read))))
and if we then evaluate the expression
> (with-output-to-file "final-grades-for-2001s.dat" (lambda () (with-input-from-file "grades-for-2001s.dat" grader))) >
The evaluation of this expression terminates without printing anything to
the screen. Instead it prints the following S-expression to the file
"final-grades-for-2001s.dat":
((Adam 78 88 69 66) (Brad 88 87 86 22) (Cath 99 88 88 90) (Dave 77 78 77 78) (Fawn 90 89 81 60) (Gege 67 78 81 85))
Exercises
2 See Intermezzo 2 in ``How to Design Progams''.
3 The actual reading and writing is managed by the operating system, the layer of software that manages the connection between applications and the actual pieces of the computer. Fortunately, the designers of operating systems have arranged the connections between programs such as ours and devices such as the keyboard in a flexible way. More technically, these devices are dealt with as if they were files, and hence, it is possible to switch one file for another.
4 The function with-output-to-file creates
a new file. If a file with the given name already exists in the working
directory, it signals an error. We will learn to deal with this aspect of
the problem in part V.