Think Java: How to Think Like a Computer Scientist by Allen B. Downey - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

Chapter 1

The way of the program

The goal of this book is to teach you to think like a computer scientist. I

like the way computer scientists think because they combine some of the best

features of Mathematics, Engineering, and Natural Science. Like mathemati-

cians, computer scientists use formal languages to denote ideas (specifically

computations). Like engineers, they design things, assembling components

into systems and evaluating tradeoffs among alternatives. Like scientists,

they observe the behavior of complex systems, form hypotheses, and test

predictions.

The single most important skill for a computer scientist is problem-solving.

By that I mean the ability to formulate problems, think creatively about

solutions, and express a solution clearly and accurately. As it turns out,

the process of learning to program is an excellent opportunity to practice

problem-solving skills. That’s why this chapter is called “The way of the

program.”

On one level, you will be learning to program, which is a useful skill by itself.

On another level you will use programming as a means to an end. As we go

along, that end will become clearer.

1.1

What is a programming language?

The programming language you will be learning is Java, which is relatively

new (Sun released the first version in May, 1995). Java is an example of a

2

Chapter 1. The way of the program

high-level language; other high-level languages you might have heard of

are Python, C or C++, and Perl.

As you might infer from the name “high-level language,” there are also low-

level languages, sometimes called machine language or assembly language.

Loosely-speaking, computers can only run programs written in low-level lan-

guages. Thus, programs written in a high-level language have to be trans-

lated before they can run. This translation takes time, which is a small

disadvantage of high-level languages.

The advantages are enormous. First, it is much easier to program in a high-

level language: the program takes less time to write, it’s shorter and easier

to read, and it’s more likely to be correct. Second, high-level languages are

portable, meaning that they can run on different kinds of computers with

few or no modifications. Low-level programs can only run on one kind of

computer, and have to be rewritten to run on another.

Due to these advantages, almost all programs are written in high-level lan-

guages. Low-level languages are only used for a few special applications.

There are two ways to translate a program; interpreting and compiling.

An interpreter is a program that reads a high-level program and does what

it says. In effect, it translates the program line-by-line, alternately reading

lines and carrying out commands.

A compiler is a program that reads a high-level program and translates it

all at once, before running any of the commands. Often you compile the

program as a separate step, and then run the compiled code later. In this

case, the high-level program is called the source code, and the translated

program is called the object code or the executable.

Java is both compiled and interpreted. Instead of translating programs into

machine language, the Java compiler generates byte code. Byte code is

easy (and fast) to interpret, like machine language, but it is also portable,

like a high-level language. Thus, it is possible to compile a program on one

machine, transfer the byte code to another machine, and then interpret the

byte code on the other machine. This ability is an advantage of Java over

many other high-level languages.

index-23_1.png

1.2. What is a program?

3

Although this process may seem complicated, in most program development

environments these steps are automated for you. Usually you will only have

to write a program and press a button or type a single command to compile

and run it. On the other hand, it is useful to know what steps are happening

in the background, so if something goes wrong you can figure out what it is.

1.2

What is a program?

A program is a sequence of instructions that specifies how to perform a com-

putation1. The computation might be something mathematical, like solving

a system of equations or finding the roots of a polynomial, but it can also be

a symbolic computation, like searching and replacing text in a document or

(strangely enough) compiling a program.

The instructions, which we will call statements, look different in different

programming languages, but there are a few basic operations most languages

perform:

input: Get data from the keyboard, or a file, or some other device.

output: Display data on the screen or send data to a file or other device.

math: Perform basic mathematical operations like addition and multiplica-

tion.

testing: Check for certain conditions and run the appropriate sequence of

statements.

1This definition does not apply to all programming languages; for alternatives, see

http://en.wikipedia.org/wiki/Declarative_programming.

4

Chapter 1. The way of the program

repetition: Perform some action repeatedly, usually with some variation.

That’s pretty much all there is to it. Every program you’ve ever used, no

matter how complicated, is made up of statements that perform these oper-

ations. Thus, one way to describe programming is the process of breaking a

large, complex task up into smaller and smaller subtasks until the subtasks

are simple enough to be performed with one of these basic operations.

1.3

What is debugging?

For whimsical reasons, programming errors are called bugs and the process

of tracking them down and correcting them is called debugging.

There are a three kinds of errors that can occur in a program, and it is useful

to distinguish them to track them down more quickly.

1.3.1

Syntax errors

The compiler can only translate a program if the program is syntactically

correct; otherwise, the compilation fails and you will not be able to run your

program. Syntax refers to the structure of your program and the rules about

that structure.

For example, in English, a sentence must begin with a capital letter and end

with a period. this sentence contains a syntax error. So does this one

For most readers, a few syntax errors are not a significant problem, which is

why we can read the poetry of e e cummings without spewing error messages.

Compilers are not so forgiving. If there is a single syntax error anywhere in

your program, the compiler will print an error message and quit, and you

will not be able to run your program.

To make matters worse, there are more syntax rules in Java than there are in

English, and the error messages you get from the compiler are often not very

helpful. During the first weeks of your programming career, you will probably

spend a lot of time tracking down syntax errors. As you gain experience, you

will make fewer errors and find them faster.

1.3. What is debugging?

5

1.3.2

Run-time errors

The second type of error is a run-time error, so-called because the error does

not appear until you run the program. In Java, run-time errors occur when

the interpreter is running the byte code and something goes wrong.

Java tends to be a safe language, which means that the compiler catches a

lot of errors. So run-time errors are rare, especially for simple programs.

In Java, run-time errors are called exceptions, and in most environments

they appear as windows or dialog boxes that contain information about what

happened and what the program was doing when it happened. This infor-

mation is useful for debugging.

1.3.3

Logic errors and semantics

The third type of error is the logic or semantic error. If there is a logic error

in your program, it will compile and run without generating error messages,

but it will not do the right thing. It will do something else. Specifically, it

will do what you told it to do.

The problem is that the program you wrote is not the program you wanted

to write. The semantics, or meaning of the program, are wrong. Identifying

logic errors can be tricky because you have to work backwards, looking at

the output of the program and trying to figure out what it is doing.

1.3.4

Experimental debugging

One of the most important skills you will acquire in this class is debugging.

Although debugging can be frustrating, it is one of the most interesting,

challenging, and valuable parts of programming.

Debugging is like detective work. You are confronted with clues and you

have to infer the processes and events that lead to the results you see.

Debugging is also like an experimental science. Once you have an idea what

is going wrong, you modify your program and try again. If your hypothesis

was correct, then you can predict the result of the modification, and you

take a step closer to a working program. If your hypothesis was wrong, you

6

Chapter 1. The way of the program

have to come up with a new one. As Sherlock Holmes pointed out, “When

you have eliminated the impossible, whatever remains, however improbable,

must be the truth.” (from A. Conan Doyle’s The Sign of Four).

For some people, programming and debugging are the same thing. That is,

programming is the process of gradually debugging a program until it does

what you want. The idea is that you should always start with a working

program that does something, and make small modifications, debugging them

as you go, so that you always have a working program.

For example, Linux is an operating system that contains thousands of lines

of code, but it started out as a simple program Linus Torvalds used to ex-

plore the Intel 80386 chip. According to Larry Greenfield, “One of Linus’s

earlier projects was a program that would switch between printing AAAA

and BBBB. This later evolved to Linux” (from The Linux Users’ Guide Beta

Version 1).

In later chapters I make more suggestions about debugging and other pro-

gramming practices.

1.4

Formal and natural languages

Natural languages are the languages that people speak, like English, Span-

ish, and French. They were not designed by people (although people try to

impose order on them); they evolved naturally.

Formal languages are languages designed by people for specific applica-

tions. For example, the notation that mathematicians use is a formal lan-

guage that is particularly good at denoting relationships among numbers and

symbols. Chemists use a formal language to represent the chemical structure

of molecules. And most importantly:

Programming languages are formal languages that have

been designed to express computations.

Formal languages have strict rules about syntax. For example, 3 + 3 = 6 is

a syntactically correct mathematical statement, but 3$ = is not. Also, H2O

is a syntactically correct chemical name, but 2Zz is not.

1.4. Formal and natural languages

7

Syntax rules come in two flavors, pertaining to tokens and structure. Tokens

are the basic elements of the language, like words and numbers and chemical

elements. One of the problems with 3$ = is that $ is not a legal token in

mathematics (at least as far as I know). Similarly, 2Zz is not legal because

there is no element with the abbreviation Zz.

The second type of syntax rule pertains to the structure of a statement; that

is, the way the tokens are arranged. The statement 3$ = is structurally

illegal, because you can’t have an equals sign at the end of an equation.

Similarly, molecular formulas have to have subscripts after the element name,

not before.

When you read a sentence in English or a statement in a formal language,

you have to figure out what the structure of the sentence is (although in a

natural language you do this unconsciously). This process is called parsing.

Although formal and natural languages have features in common—tokens,

structure, syntax and semantics—there are differences.

ambiguity: Natural languages are full of ambiguity, which people deal with

by using contextual clues and other information. Formal languages

are designed to be unambiguous, which means that any statement has

exactly one meaning, regardless of context.

redundancy: To make up for ambiguity and reduce misunderstandings, nat-

ural languages are often redundant. Formal languages are more concise.

literalness: Natural languages are full of idiom and metaphor. Formal lan-

guages mean exactly what they say.

People who grow up speaking a natural language (everyone) often have a

hard time adjusting to formal languages. In some ways the difference between

formal and natural language is like the difference between poetry and prose,

but more so:

Poetry: Words are used for their sounds as well as for their meaning, and

the whole poem together creates an effect or emotional response. Am-

biguity is common and deliberate.

Prose: The literal meaning of words is more important and the structure

contributes more meaning.

8

Chapter 1. The way of the program

Programs: The meaning of a computer program is unambiguous and literal,

and can be understood entirely by analysis of the tokens and structure.

Here are some suggestions for reading programs (and other formal languages).

First, remember that formal languages are much more dense than natural

languages, so it takes longer to read them. Also, the structure is important,

so it is usually not a good idea to read from top to bottom, left to right.

Instead, learn to parse the program in your head, identifying the tokens and

interpreting the structure. Finally, remember that the details matter. Little

things like spelling errors and bad punctuation, which you can get away with

in natural languages, can make a big difference in a formal language.

1.5

The first program

Traditionally the first program people write in a new language is called

“Hello, World.” because all it does is display the words “Hello, World.”

In Java, this program looks like this:

class Hello {

// main: generate some simple output

public static void main(String[] args) {

System.out.println("Hello, world.");

}

}

This program includes features that are hard to explain to beginners, but it

provides a preview of topics we will see in detail later.

Java programs are made up of class definitions, which have the form:

class CLASSNAME {

public static void main (String[] args) {

STATEMENTS

}

}

1.6. Glossary

9

Here CLASSNAME indicates a name chosen by the programmer. The class

name in the example is Hello.

main is a method, which is a named collection of statements. The name

main is special; it marks the place in the program where execution begins.

When the program runs, it starts at the first statement in main and ends

when it finishes the last statement.

main can have any number of statements, but the example has one. It is a

print statement, meaning that it displays a message on the screen. Confus-

ingly, “print” can mean “display something on the screen,” or “send some-

thing to the printer.” In this book I won’t say much about sending things

to the printer; we’ll do all our printing on the screen. The print statement

ends with a semi-colon (;).

System.out.println is a method provided by one of Java’s libraries. A

library is a collection of class and method definitions.

Java uses squiggly-braces ({ and }) to group things together. The outermost

squiggly-braces (lines 1 and 8) contain the class definition, and the inner

braces contain the definition of main.

Line 3 begins with //. That means it’s a comment, which is a bit of English

text that you can put a program, usually to explain what it does. When the

compiler sees //, it ignores everything from there until the end of the line.

1.6

Glossary

problem-solving: The process of formulating a problem, finding a solution,

and expressing the solution.

high-level language: A programming language like Java that is designed

to be easy for humans to read and write.

low-level language: A programming language that is designed to be easy

for a computer to run. Also called “machine language” or “assembly

language.”

formal language: Any of the languages people have designed for specific

purposes, like representing mathematical ideas or computer programs.

All programming languages are formal languages.

10

Chapter 1. The way of the program

natural language: Any of the languages people speak that have evolved

naturally.

portability: A property of a program that can run on more than one kind

of computer.

interpret: To run a program in a high-level language by translating it one

line at a time.

compile: To translate a program in a high-level language into a low-level

language, all at once, in preparation for later execution.

source code: A program in a high-level language, before being compiled.

object code: The output of the compiler, after translating the program.

executable: Another name for object code that is ready to run.

byte code: A special kind of object code used for Java programs. Byte code

is similar to a low-level language, but it is portable, like a high-level

language.

statement: A part of a program that specifies a computation.

print statement: A statement that causes output to be displayed on the

screen.

comment: A part of a program that contains information about the pro-

gram, but that has no effect when the program runs.

method: A named collection of statements.

library: A collection of class and method definitions.

bug: An error in a program.

syntax: The structure of a program.

semantics: The meaning of a program.

parse: To examine a program and analyze the syntactic structure.

syntax error: An error in a program that makes it impossible to parse (and

therefore impossible to compile).

1.7. Exercises

11

exception: An error in a program that makes it fail at run-time. Also called

a run-time error.

logic error: An error in a program that makes it do something other than

what the programmer intended.

debugging: The process of finding and removing any of the three kinds of

errors.

1.7

Exercises

Exercise 1.1. Computer scientists have the annoying habit of using common

English words to mean something other than their common English meaning.

For example, in English, statements and comments are the same thing, but

in programs they are different.

The glossary at the end of each chapter is intended to highlight words and

phrases that have special meanings in computer science. When you see fa-

miliar words, don’t assume that you know what they mean!

1. In computer jargon, what’s the difference between a statement and a

comment?

2. What does it mean to say that a program is portable?

3. What is an executable?

Exercise 1.2. Before you do anything else, find out how to compile and run

a Java program in your environment. Some environments provide sample

programs similar to the example in Section 1.5.

1. Type in the “Hello, world” program, then compile and run it.

2. Add a print statement that prints a second message after the “Hello,

world!”. Something witty like, “How are you?” Compile and run the

program again.

3. Add a comment to the program (anywhere), recompile, and run it again.

The new comment should not affect the result.

12

Chapter 1. The way of the program

This exercise may seem trivial, but it is the starting place for many of the

programs we will work with. To debug with confidence, you have to have

confidence in your programming environment. In some environments, it is

easy to lose track of which program is executing, and you might find your-

self trying to debug one program while you are accidentally running another.

Adding (and changing) print statements is a simple way to be sure that the

program you are looking at is the program you are running.

Exercise 1.3. It is a good idea to commit as many errors as you can think

of, so that you see what error messages the compiler produces. Sometimes

the compiler tells you exactly what is wrong, and all you have to do is fix it.

But sometimes the error messages are misleading. You will develop a sense

for when you can trust the compiler and when you have to figure things out

yourself.

1. Remove one of the open squiggly-braces.

2. Remove one of the close squiggly-braces.

3. Instead of main, write mian.

4. Remove the word static.

5. Remove the word public.

6. Remove the word System.

7. Replace println with Println.

8. Replace println with print. This one is tricky because it is a logic

error, not a syntax error. The statement System.out.print is legal,

but it may or may not do what you expect.

9. Delete one of the parentheses. Add an extra one.