01: Welcome and Java Review

Welcome

Hello and welcome!

Welcome to our time of learning together.

Welcome to the many first years in this classroom, who are attending their first day of college classes today. Welcome to returning students.

Welcome to people of all ages, all colors, all cultures, abilities, sexual orientations and gender identities.

Welcome to those who identify as biracial, multi-racial, or multi-ethnic.

Welcome to people from Massachusetts, from other states, and from countries all around the world.

Welcome to people of all political persuasions – or who abstain from politics. Welcome to people of all religions and of no religion.

Welcome to military veterans.

Welcome to people who live with mental illness.

Welcome to those of you who are financially broke, or those broken in spirit.

It is my firm belief that you all belong here, and I want you to feel welcome. Whoever you, wherever you are on your journey in computer science, you are welcome here.

I’m Marc Liberatore liberato@cs.umass.edu and I’m your instructor for this course, COMPSCI 186. Please call me Marc.

The most important thing to know today: the course web site is at http://people.cs.umass.edu/~liberato/courses/2019-spring-compsci186/. It includes the syllabus for this class and you are expected to read it in its entirety; it also includes all assignments, readings, and so on.

The course is not just me (of course). We have two fine TAs, Liam Rothschild-Shea and Ben Kushigian. TAs are graduate students in the department who have the side job of helping run courses. They will be holding office hours, running discussions, grading quizzes, answering questions online, and so on.

We also have about ten (!!) undergraduate course assistants, who will also be answering questions, holding office hours, and grading some of your work. But they haven’t all been hired yet, so you’ll just have to live with the mystery for now.

Other important announcements

There is a homework due Thursday before the start of class!

There is a (short, simple) programming assignment due Friday night!

There will be a quiz in discussion Monday!

(It appears that I love exclamation points!!!)

Generally, there is something due All The Time in this course – get in the habit of thinking about 186 all the time!

What is this course?

What is this course? It’s a course about the who, what, when, and why of commonly-used data structures – we’ll learn their names and behavior well enough to know when to use which structures. We’ll only briefly touch on the how – how data structures are implemented – as 187 concerns itself deeply with this topic and is the next course in the COMPSCI sequence.

Speaking of 187, let’s confront the elephant in the room right now. 186 is an optional course between 121 and 187. Why does it exist, and why does it exist now? (And by implication, why are you here?) Two big reasons:

  • The “casualty rate” recently between 121 and 187 has in the past been unacceptably high. Students who get less than an A- in 121 were not likely to succeed in 187 – by which I mean, they are likely to get less than a C on their first attempt at 187. The only option they have is to re-take 187, which nobody (us or you) likes.
  • Much of the content (the “how”) in 187 is, perhaps, overkill for students intending to pursue the Informatics minor.

186 is an attempt to kill these two birds with one stone.

First, we conjecture that many students who do OK in 121 (which you can also read as “get less than a 4 on the CS AP exam”) could pass 187 – if only they had a little more practice programming and exposure to various parts of the computer science ecosystem (especially the practical bits of Java). 186 is intended to guide you on a path toward programming mastery that’s more gentle than the current trajectory of 187. 187 has a bit of what I call the “eat your vegetables first” problem. It also has a bit of the “pie eating contest, where first prize is more pie” problem. Together this is a recipe for a lot of spinach pies, which maybe isn’t great if you’ve not been training in competitive spinach pie eating since you were a kid. 186 is an attempt to provide more reasonable portion sizes and to balance the diet.

Moving on from mixed metaphors, 187 is a prerequisite for all of the 200-level COMPSCI classes. Some upper-level COMPSCI classes are reasonable fits for Informatics majors (such as 326: Web Programming), but have prerequisites of either 187 or 220/230. The reason for these prerequisites is, in some cases, programming maturity: we think you need more than just 121 to be ready for them. 186 will be, we hope, an alternative to 187 in giving you the experience you need to be ready for these courses. (This is still in flux, but it’s the current plan for Informatics majors.)

We’ll start off with a review of some of the material from 121, and work our way up to more complicated programs. We’ll learn about and use many of the data structures available in the Java API to build these programs. Along the way, we’ll learn about the tooling you can expect to use as a working Java programmer (in 187, or in Informatics courses, or in internships). This is the part where I’m also supposed to say, “and we’ll have fun doing it,” but let’s not over-egg the pudding, shan’t we?

121 review: variables and values

Bits and bytes

We’ll spend most of the next two weeks on a review of some 121 material. Let’s get started.

The fundamental unit of information is the bit – a single, binary digit, of value 0 or 1.

Bits are organized into bytes: 8 bits in a byte. bits (the smaller unit) are abbreviated b; bytes (the bigger unit) are abbreviated B.

When we talk about computer memory and RAM, we’re talking about a large number of bytes that we use to store data. Data is thing that enables nontrivial, “stateful” programs — without (changeable) state, behavior is predetermined. But computers are generally useful to us because we can vary their behavior: we need data, and we need to be able to manipulate it.

How many bytes are we talking about? How much memory does your PC or Mac have? 4 GB? What’s a giga-?

(on board)

A kilo / mega / giga are metric prefixes: multipliers of 10^3 (1,000), 10^6, and 10^9, respectively. Confusingly, in the land of computer science, they’re multipliers of 2^10 (1,024), 2^20, and 2^30 – each slightly larger than the corresponding metric prefix. Even more confusingly, that only applies to memory; network bandwidth and disk sizes usually use the metric meaning. (This fact is one of the reasons why when you buy a 500GB drive it usually shows up as significantly less: 500 x 10^9 ~= 465 x 2^30.)

These several billion bytes are like an enormous canvas that you, the programmer, can paint upon. Except the computer isn’t you, and can’t see them all at once: you have to precisely name the place in memory you care about. Modern CPUs number the bytes, starting at zero (‘cuz we start at zero in CS, of course) and working their way up to 4 x 2^30 - 1 (if you have 4GB of “addressable” memory). This number is called the byte’s address.

Modern CPUs usually work on larger units of information, called a word. Words on modern CPUs are typically 32 or 64 bits (4 or 8 bytes), and some special instructions can work on larger units still. A computer can have as much addressable memory as fits in a word. Thus, the address space is usually equal to 2^(word size) of the computer.

Right about now you’re probably like, “did I accidentally sign up for a Computer Systems Engineering course?” And the answer is no. But I do want to make sure you have some intuition for stuff that’s going to come up later in the course and 187. But I promise this is about as deep as we’ll go into computer organization.

In-class exercise

Clickers

I brought my clicker to class today.

Another reminder: register your clicker in Moodle; there will be a box in the lower-right you can use to associate your clicker’s serial number (on the back) with your UMass ID.

Variables, data types, and assignment

OK, so another thing you might be thinking is, “Huh, I’ve never written any Java where I’ve worried about addressing memory directly.” To which I respond, “Eff yes! Ain’t it great?”

(Arguably) one of the greatest success stories of computer science is the development of high-level languages and runtimes to free programmers from worrying (too much) about nitty-gritty details like those above. Of course, sometimes you will need to do so, but for problems that don’t push the boundaries of what a computer can do, and that don’t need to scale to millions of machines, and so on, you can effectively ignore many little details and still be an extremely productive programmer.

For example, suppose we’re writing a web app for the PVTA: http://bustracker.pvta.com/infopoint/

We don’t need to worry about painstakingly laying out four bytes to represent a bus number, then eight bytes to represent a distance, and then remember their memory address each time we want to use them.

(on board)

Instead we might write:

int busNumber;
double distanceTraveled;

(on board)

And the computer (the compiler and the runtime) generate code that lays out memory for us, gives each variable a name, and even knows something about the variable – its type. The compiler can do “typechecking” to prevent us from attempting some impossible things, like, say, adding together a boolean and an integer. It can also use type information to make our lives easier: adding a floating-point number and an integer is actually non-trivial, but it’s transparent in Java. Similarly, we can “add” integers to strings to build a new string with the integer inserted.

There are roughly two kinds of types in Java: primitive types, and objects. (Arrays are kinda in between, but are actually an Object.)

What are the primitive types you know about?

  • byte: 8-bit signed
  • short: 16-bit signed
  • int: 32-bit signed
  • long: 64-bit signed
  • float: 32-bit floating point
  • double: 64-bit floating point
  • boolean: true or false
  • char: 16-bit Unicode character

Whenever you declare a variable of one of these types, Java lays out memory of the correct size and remembers the address, using the value stored in that memory address whenever you reference the variable. We blur the difference sometimes when speaking, but keeping the idea of a variable (a particular location in memory) and a value (in this context, the contents of a memory location) separate is very important.

A fundamental thing you can do with variables is assign to them. You can assign a literal value, like i = 3; to write a value directly to a memory location. You can assign from one variable to another, like i = j;. But let’s be clear about what’s happening: the computer isn’t “copying j into i,” even though that’s how you might say it. It’s looking up the value stored at the address of j, then storing that value in the address of i. This is more clear if you think about the result of a computation, like “i = j + k;”.

The primitive types support various kinds of computation using “operators”, which are built into the language (things like addition and subtraction).

What happens in the following code?

int i = 2;
int j;
j = i;
j = j + 1;

(on board) A memory location to hold an int is allocated, and initialized with the value 0. Another (different!) memory location is allocated. The value from the first location (i) is looked up; it’s then copied into the second location. Then, the value from the second location (‘j’) is looked up, incremented by 1, and written back into that location.

In-class exercise

Primitive value assignment

int i = 5;
int j = i;
j = j + 2;
i = i + j

What are the values of i and j at the end of this code?

More administrivia

My office hours are Monday mornings, 9:30–11:30, in CS 318.

Two TAS: Liam Rothschild-Shea, Office hours: Tuesday and Friday, 12–1, LGRT 220 and Ben Kushigian, Office hours: Thursday, 10:30–12:30, LGRT 220. UCA office hours we’ll announce once they are final. For those of you who are new to college, office hours are typically drop-in (no appointment necessary) and first-come, first-serve chances to come talk to a professor or TA about anything in the course.

Usually, students come with specific questions, but general questions about course material are fine too. Come see us if something from lecture or an assignment is unclear, please! But don’t expect to just open your laptop and say, “my program doesn’t work, can you help me find the problem?” Be ready to tell us what you’ve tried and why you’re stuck.

Please use Piazza for questions! You would be shocked (or maybe not) to know that many students will ask the same question; it saves everyone time if you check Piazza to see if your question has already been asked and answered. And it saves us time if we only have to answer one question. Thus, we’re able to answer more questions, which everyone likes.

There will also almost certainly be ExSEL sessions, which are run in the LRC, located in the tall tower library. Look into this if you need extra help in the course.

Please register your iClicker on Moodle. There is a block on the right (by default) that you can use to do so. It’s OK if you haven’t yet; Moodle will backfill the data for you once you do.

Now back to computer science.

Arrays

Arrays are a built-in form of “container” type – the first (non-primitive) data structure you likely learned about, and possibly the first reference type. In particular, arrays are a linear sequence of values, all of the same type. In our examples today, they’ll be primitive types, but arrays can hold objects (actually refererences to objects) as well, as we’ll see. An array type is denoted with the [] suffix after a type. For example, an array of ints might be declared as int[] busNumbers;

(on board) The array type doesn’t tell you how many of the thing are in the array. This information exists only once the array is instantiated – the memory allocated for it: int[] busNumbers = new int[5]; What’s happening here? First, new int[5] creates a new space in memory for ten integers, one after another. Then, the address of this memory space is stored in busNumbers – arrays are a reference type, so they hold a reference (or address) to the thing, no the thing itself.

If you want to access a particular piece of the array, you must address it correctly, indexing from zero; the arrays indeing operator ([]) looks up the memory address of the array, then finds the offset to value you are interested in.

(on board) So, for example, busNumbers[1] = busNumbers[3]; means, in English, to look up the memory address of the array, then look up the value stored in the third (starting from zero, which we call the “zero-eth”) slot of the array, and store it in the first slot of the array.

Arrays also support some useful methods and properties, such as .length.

You can have arrays of arrays, for example, int sudoku[][] is an array of arrays of ints; this kind of two (or more) dimensional array is sometimes a more intuitive way to represent a problem than a one-dimensional array.

In class exercise

Reference types

What’s the most true about the two variables int busNumber and int[] busNumbers?

Methods and scope

Nothing lasts forever. Variables (the names, not the values) only live for as long as they are “in scope.” Suppose we have two methods:

int add(int i, int j) {
  return i + j;
}

void print() {
  System.out.println(i + " " j);
}

Will this compile? No. Why not? Because i and j are not defined within the print() method – in other words, they are not in scope. Scope means the portion of the program where a variable (again, not necessarily the value) is valid.

Within methods, any parameters are in scope for the entire method. A variable that’s declared inline:

void aMethod() {
  ...some stuff...
  int x = 12;
  ... some more stuff
}

is only valid from where it’s declared until the end of its current block – which as you may recall from 121 is denoted by the next closing curly brace }.

Variables declared by some constructs (such as for loops) are only valid within the body of those constructs.

The stack and the heap

When the JVM is executing some code, for example:

int compute(int x) {
  return doubled(x);
}

int doubled(int x) {
  return 2 * x;
}

How does it keep track of which x is which, and how do values move around? I’m going to simplify somewhat here, but essentially there are two regions of memory used to store values (and whose sections are named by variables): the stack and the heap.

As the JVM executes the code, it goes line-by-line, expression-by-expression, evaluating each expression and performing each statement. If someone somewhere called compute(3), the first thing that would happen is the JVM would lay out some memory in the stack for the result of the computation, and for each of the parameters, and for any variables that are in-scope for the whole method (none here, it turns out). Then the parameters would be copied into their spaces. Then the method would start.

Next, doubled(x) would be called. “On top” of the stack, more memory would be layed out, for the return value and for x, which would be copied in. Then doubled would execute, and copy the result into the right spot on the stack. Then control would return to compute, which would copy the value off the top of the stack and into the right spot. And so on. Notice that variables and values are automatically removed/reclaimed on the stack, and that we need only look at the top of the stack to find the current variable and value it holds.

“But Marc,” you might be thinking, “can a value that’s not the return value exists after a method ends?” Yes, those live on the heap, and the JVM is responsible for managing them dynamically via its garbage collection system. We’ll talk about this (again, at a high level) later.

Stack allocation

double quadratic(double a, double b, double x) {
  double firstTerm = 0.0;
  double result = 0.0;

  firstTerm = a + Math.pow(x, 2); // first computation
  result = firstTerm + b; // second computation
  return result;
}

How many doubles worth of space are allocated on the stack in just this method, including the return value?

Note our use of the Math class and its static method pow to square x.

More administrivia

First, some words about assignments and grading.

There will be:

  • in-class exercises, like what we just did. These serve to give you a self-check of material we’re covering, as well as give me a sense of how much of the class is following what I’m doing. These are graded pass/fail, and you should bring paper and a writing implement to each class to complete them. You will also do exercises in some discussion sections.
  • written assignments (“homework”), which are short (<30 minute) worksheets due at the start of each class, handed in electronically. They’re short, but take them seriously, since they do contribute to your grade nontrivially. We may also do some of these electronically via Moodle, but (of course) we’re having technical difficulties, so maybe not.
  • programming assignments, where you’ll be asked to engage in the practice of programming, and will be able to submit your work to an online grader for immediate feedback.
  • about seven quizzes and a final exam. Some discussion section meetings will have quizzes (about one every other week) which will be written to be taken in 25 minutes (but I’ll give you the full amount of time), and there will be a final exam. You cannot pass the course unless you pass the final.

Several things to note:

Discussion and lecture attendance is not optional! Absences will only be excused with written documentation. But I will drop your lowest two in-class exercise grades.

Assignments (written or programming assignments) have a due date, clearly marked on the course web site. Late assignments will not be accepted. Requests for extensions need to be made at least a day in advance. If you want to request an extension after a due date, I will expect a reasonable and well-documented excuse.

Of the above, you can collaborate on homework and in-class exercises. Even though programming assignments are take-home, you are expected to work on them alone (any exceptions will be clearly noted). 187 routinely starts the academic dishonesty process with something like 15% of its enrolled students, who apparently never believe us when we tell them we check for cheating on the programming assignments. Please help up stop this mayhem!

End-of-class reminders

Read the pages on the course web site titled Syllabus.

Assignments and their due date will go up on the web site as they become available. This includes the first the first programming assignment, and the homework assignment, all of which are now available!

Suggested reading and lecture notes will also be posted to the course web site.

Please ask questions on Piazza (and check to make sure the question hasn’t already been asked). We’ll get office hours scheduled soon.