CMPSCI 187: Programming With Data Structures
============================================

Today's topics
--------------

-   administrivia
-   variables and scope redux
-   data structures: containers, stacks, spec/impl
-   running time analysis

Administrivia
=============

Office hours and help
---------------------

TA and ours are up. Mostly Wed, with some Mon/Tue coverage. Remember
cs187help@, and you can make appointments to see me/John.

When asking for help with code, please bring (or attach to email) all
relevant source code.

Assignments
-----------

We won't violate constraints that we promise, nor need you check them.

Reading is hard, but programming is harder. Read the assignments
completely.

A01 will be posted later today. Last one that doesn't require new 187
knowledge.

A00 difficulty
--------------

Not to be a jerk, but if you found the programming hard (as opposed to
Eclipse setup, which was the real new thing in this assignment),
consider dropping. It's only going to get (much) harder.

OTOH, we want you to succeed, but you have to come see us or email us
(in advance of due dates) if you want help! Don't wait until Tue/Wed to
start the assignments.

Variables (primitive and class) redux
=====================================

Variables hold values
---------------------

Vars of primitive types (int, etc.) hold the value.

Vars of class type are references, and hold the *value of the address of
the object in memory*.

Remember, new objects are created by a `new` class; the associated
variable actually holds the *address* of (aka a reference to) the
object.

Aliasing and equality
---------------------

Vars of primitive type are easy. The only support `==`, and when
assigned (`x = y + 10`), the value is copied from expression on right
into variable on left.

Vars of reference type require some thought. When assigned, the value is
copied, but *the value is the address, not the object.*

-   `==` still checks for value equality, but here it means "contains
    the same address" in other words, are exactly the same object in
    memory. Think "identical."
-   `.equals()` is a method on objects that is class-specific. Usually
    means the objects represent the same value (that is, they are
    "equivalent" but not necessarily "identical").

To deal with aliasing, follow the ABDs: Always Be Drawing.

-   For each `new`, make a new object in memory.
-   For each variable declaration, make a new variable in memory.
-   For each assignment `=`, draw an arrow from the variable to the
    object.

You can read your diagram to see what's going on.

Example with airplanes / flight numbers:

``` {.java}
Flight a;
a = new Flight(1234);
// draw

Flight b = new Flight(77);
// draw

System.out.println(a == b);
System.out.println(a.equals(b));

b.setNumber(1234); // draw

b = a; // draw

System.out.println(a == b);
System.out.println(a.equals(b));
```

Clicker question: equality / aliasing 1
---------------------------------------

Clicker question: equality / aliasing 2
---------------------------------------

Scope (class, instance, method) redux
=====================================

Local variables
---------------

Variables have a scope: a lifetime for which they're valid. Generally,
they live as long as the thing that encloses them.

``` {.java}
int sumAll(int[] values) {
  int sum = 0;
  for (int v : values) {
    sum = sum + v;
  }
  return sum;
}
```

`sum` is a variable that is only in scope inside this method; a new
space is allocated when method is called, and it's thrown away when the
method exits. (What about `v`?)

Instance variables
------------------

``` {.java}
class Person {
  private String name;
  ...
}
```

`name` is an instance variable. Each object of class `Person` that is
created gets its own unique copy, which lives as long as the object
does. `name` is visible to methods on that object (and more, depending
upon modifier `private` vs `public` etc.)

Shadowing
---------

If two variable have the same name, the one in the smaller (more local)
scope wins. The outer variable is being *shadowed*. This is usually a
bad programming practice. Notable exception: arguments to a constructor
often shadow instance variables.

Sometimes you can refer to the outer variable (if it's an instance
variable, using `this.`), sometimes you can't.

Clicker question: variable scope
--------------------------------

Class variables
---------------

``` {.java}
class Math {
  public static double pi = 3.14;
  ...
}
```

`double` is a class (or `static`) variable. Only one copy ever exists,
and it can be accessed by anyone (`public`) by referencing the *class*,
not an object: `Math.pi`.

The real `Math.PI` is also `final` -- a constant / immutable value.

Methods
-------

Methods can also be instance or class methods, and are called the same
way. For example, `Integer.toString()` is a class method used to convert
an `int` to a `String`, but without creating an interim `Integer`
object.

Clicker question: class members
-------------------------------

Data structures
===============

Containers
----------

A container terminal at a seaport: (on board)

Containers arrive, then are put on trucks/trains.

All containers identical, but have labels about contents.

Between unloading and loading, we have a collection of containers. This
is a useful model for pieces of data, like an array. ("All models are
wrong...")

Collections
-----------

In Java, a *Collection* is a group of objects of the same type:

-   usually, we can insert or remove elements
-   sometimes we organize elements in particular ways (e.g., so that the
    next one removed is "most important")
-   how the Collection is implemented (arranged internally) affects how
    quickly/easily we can perform operations
-   different kinds of collections might have different kinds of
    operations (more later)
-   we'll use generics, so that for type T we have a Collection\<T\> ("a
    Collection with elements of type T")

Container arrangement 101
-------------------------

Put them in a line, add on one end, remove on the other. A queue!

Put them in a pile, then take from the top of the pile. A stack!

Many other things to do, most make more sense in context of computer
memory rather than 5t shipping containers (or baby blocks).

Container arrangement 102
-------------------------

Put the objects into an array. Questions:

-   How can we tell where an empty spot is?
-   How can we tell if the array is full?
-   What if we want to keep the array sorted? Buffers? Full reordering?

Clicker question: runtime
-------------------------

Stacks
======

A Stack and its operations
--------------------------

A Stack<T> is a collection of objects of type T.

We can create an empty stack.

We can `push()` an object of type T onto the stack, storing it.

We can check if a stack `isEmpty()`

If the stack is not empty, we can `pop()` the top-most element off,
removing and returning it.

If the stack is not empty, we can `peek()` at the top-most element, but
not remove it.

Specification to implementation
-------------------------------

Once we know what behavior we want (a specification) we can think about
*implementing* the behavior.

Key idea: any user of Stack<T> can count on operations behaving
correctly ("obeying their contract") without knowing how they're
implemented. (But efficiency might depend upon the implementation.)

Implementation?
---------------

Key idea implies Stack<T> should be an *interface* not a class (since we
only care about methods, not details of methods).

Java already defines one, in `Java.util.Stack<T>`; DJW defines one in
`StackInterface<T>`.

Implementing any data structure
-------------------------------

Stacks, queues, lists, graphs, trees, etc., are *conceptual* ways to
implement data structures.

Actual implementations in Java (objects/classes) are built up of smaller
pieces.

Java provides two basic ways to group data.

Grouping data
-------------

Arrays we've already seen. A linear structure with constant-time access.

Pointers are the other way.

``` {.java}
public class Chain {
  private int contents;
  private Chain next;
  ...
```

(LL on board)

A `Chain` contains a value `contents` and a pointer to the next `Chain`,
which contains the same, etc., down to the last which contains nothing
(`null`).

Java pointers are implicit (references) so we often say "contains the
object" rather than "contains a pointer to the object." This can be
confusing, particularly if you're used to explicit pointers (C/C++).

Clicker question: chains
------------------------

Multiple implementations
------------------------

We'll see (later) how to implement many data structures using these two
building blocks (arrays or references) and examine the tradeoffs.

DJW's first example that they spend a looooong time on is the
`StringLog` class. You should read this section carefully.

Algorithm Analysis
==================

WTF is it?
----------

Talkin' about the resources (time, space) used by an algorithm as a
function of input size.

Cost may differ based on particular inputs. For this class we're
considering *worst case.*

*Time complexity* is a function with the number of input bits as its
input, and worst case runtime (seconds, or clock cycles, or
instructions, etc.) as output.

Example: sums
-------------

Suppose we want to add the integers 1 through n. We can sum them
one-by-one, or use the close-form expression.

``` {.java}
public int loopSum(int n) {
  int sum = 0;
  for (int i = 0; i <= n; i++) {
    sum += i;
  }
  return sum;
}

public int algebraicSum(int n) {
  return (n * (n + 1)) / 2;
}
```

Clicker question: loops
-----------------------

Example: Phone book
-------------------

Before the Internet, we had large books of mapping names to phone
numbers.

To find a person's phone number, you looked in the book.

One way to do so would be to look at the first name, then the second,
then the third, etc., until you found the right name, then check the
number.

This *linear search* is correct. (Why is it called a linear search?) But
if names are alphabetized you can use this to your advantage.

Phone books and binary search
-----------------------------

Choose a range where the name could be. Check the name in the middle. If
it's right, stop. If not, bisect the range and begin again. (demo)

Binary search takes time proportional to the logarithm of n. This is
awesome, no joke.

Functions and asymptotic growth
-------------------------------

Consider the two functions:

    f(n) = 100n
    g(n) = 0.01n^2

Which is "bigger"? For small values of n, f is larger. But for a big
enough n (and forever after) g would be larger.

Key point: in any polynomial, the term with the largest degree will
eventually be bigger than any other term. So in an asymptotic sense,
*only the biggest term matters*.

Clicker question: largest value
-------------------------------

Classes of growth functions
---------------------------

*Constant* functions are often written as O(1).

O(f) means "grows proportionally with f" or more exactly "is bounded
above by something that grows proportionately with f".

Others include logarithmic, n log n, n\^2, n\^3, 2\^n. DJW has a table
p45.

Growth Behavior
---------------

Consider doubling input size (n -\> 5n) . How does it affect runtime?

-   constant function O(1)? No change.
-   linear function n? -\> 2n; runtime doubles.
-   quadratic function n\^2? -\> 4n; runtime quadruples.
-   exponential function 2\^n? -\> 2\^(2n); it *squares*.

Looking at it another way. Imagine you speed up your algorithm by a
factor of 10.

-   linear runtime algorithm that used to handle n can now handle 10n in
    the same time. (10x speedup)
-   quadratic runtime can handle about 3x as much (sqrt 10)
-   exponential algorithm can handle three or four more inputs (log\_2
    10); speedup barely matters

Determining runtime from code
-----------------------------

Seeing something is constant is usually easy. Make sure the behavior
does not depend upon input size. (Note: Finding the exact constant is
usually hard.)

``` {.java}
for (int i=0; i<n; i++) { 
  whatever();
}
```

`whatever()` runs at most n times. Other steps happen, but are constant
for each run through loop (so we don't care for O() analysis).

O() arithmetic is non-intuitive:

    O(n) * O(1) + O(n) * O(1) =
    O(n) + O(n) =
    O(n)

(more details later and in 311)

Nested loops
------------

``` {.java}
for (int i=0; i<n; i++) {
  for (int j=0; j<n; j++) {
    whatever();
  }
}
```

How long does the `j` loop take? O(n)

How long does the `i` loop take? O(n\^2)

What about if we change the `j` loop to `j < i`? Still O(n\^2). `j` loop
runs 1, 2, ... n times; which is not constant, but is bounded by n.

Clicker question: runtime
-------------------------

Administrivia
=============

Reminders for next week
-----------------------

Discussion is graded; next week there will be a worksheet to complete
and hand in.

A01 will be posted later today; download it and get started soon.

If, after looking at A01, you have no idea what to do, consider
dropping.

Start reading Chapter 2 (through the end of 2.4, at least).