Run this command from the terminal:
echo 'export SBT_OPTS="-Xmx512m -Xss100m"' >> ~/.profile
This increases the stack size limit for SBT to 100MB and sets the memory limit to 512MB.
For this assignment, you will measure the time it takes to execute typical operations on several represetations of integer-sets: ordered association lists, (unbalanced) binary search trees, and AVL trees. We provide the data structures, but you have to develop the benchmarking framework.
You already know how to implement sets using ordered lists and binary search trees. However, you probably are not familiar with AVL trees. We strongly recommend reading the Sets Appeal chapter from Programming and Programming Languages. It briefly describes how to represent sets using ordered-lists and binary search trees, but spends most of its time discussing AVL trees.
With this assignment, we will cease using the scala220
program and start
using professional build tools: sbt and ScalaTest. These are covered in the
reading for the week.
As discussed in the sbt tutorial, you should create a series of directories that look like this:
./measurement |-- build.sbt `-- src |-- main | `-- scala | `-- implementation goes here `-- test `|-- scala `-- test cases go here
Your build.sbt
file must have exactly these lines:
name := "measurement"
scalaVersion := "2.11.2"
libraryDependencies += "edu.umass.cs" %% "cmpsci220" % "1.1"
libraryDependencies += "org.scalatest" %% "scalatest" % "2.2.1" % "test"
When we grade your work, we will delete your build.sbt
file and use the
one above. So, if you change it, you risk getting a zero.
You should use this template for your solution. You have to fill in several functions. The rest of this page will walk you through them.
Write a higher-order function time(f, a)
that calculates the time
needed to execute f(a)
in milliseconds.
def time[A, B](f : A => B, a : A) : Long
Use the the builtin function System.currentTimeMillis()
, which returns the
current time in milliseconds (since January 1, 1970).
Write a higher-order function averageTime(n, f, a)
that calculates the time
needed to execute f(a)
in milliseconds, averaged over n
executions.
def averageTime[A, B](n: Int, f: A => B, a: A) : Double
An easy way to test these functions is to use Thread.sleep(n)
,
which pauses execution for n
milliseconds. Therefore,
time((n:Int) => Thread.sleep(n), n)
should be approximately
n
.
You’ll be benchmarking these data structures on two types of inputs: ordered keys and randomly generated keys.
Write a recursive function revOrder(n)
that produces the list of numbers
from n
down to 1
:
// Assume that n >= 0
def revOrder(n : Int) : List[Int]
test("revOrder(5)") {
assert(revOrder(5) == List(5, 4, 3, 2, 1))
}
test("revOrder(0)") {
assert(revOrder(0) == Empty())
}
Write a recursive function randomInts(n)
that produces a list of n
randomly
chosen integers.
def randomInts(n : Int) : List[Int]
You will need to use the builtin function util.Random.nextInt()
.
The cmpsci220.hw.measurement
package defines several functions to insert
and lookup elements into ordered lists, binary search trees, and AVL trees.
But, to benchmark these data structures, you need to write functions to
insert/lookup several elements.
You need to write the following six functions (don’t start yet– finish reading this section).
// Map all keys to the provided value
def insertAllOrdList(values: List[Int]): OrdList
def insertAllBST(values: List[Int]): BST
def insertAllAVL(values: List[Int]): AVL
// Returns true only if all keys are mapped to the provided value
def isMemberAllOrdList(values: List[Int], lst: OrdList) : Boolean
def isMemberAllBST(values: List[Int], bst: BST) : Boolean
def isMemberAllAVL(values: List[Int], bst: AVL) : Boolean
Notice that all the insert
functions follow the same pattern and so
do all the isMemberAll
functions.
You actually only need to write the following higher-order functions and use them to define the functions above:
def insertAll[T](empty : T, insert : (Int, T) => T, values: List[Int]): T
def isMemberAll[T](isMember : (Int, T) => Boolean, values: List[Int], container: T): Boolean
If you’re not confident writing these functions immediately, you should first write some of the simpler functions above until the pattern becomes clear.
Once you have insertAll
and isMemberAll
defined, you can benchmark any of
the three data structures by creating a growing sequence of ordered/random keys,
picking a number of trials, and appling insertAll
or isMemberAll
to the list
of integers:
For example, here is some code to benchmark the time needed to insert consecutive values into the three data structures:
/**
* Calculates the average time needed to insert values into the set
*
* @param insertAll a function that inserts a list of values into an empty set
* @param trials the number of trials to run
* @param values the list of values to insert
* @returns A pair (n, t), where n is the number of values and t is the
* average time needed to insert all values, divided by the
* number of values
*/
def timeInsertAll[A](insertAll: List[Int] => A, trials: Int)(values: List[Int]): (Double, Double) = {
val n = length(values).toDouble
val t = averageTime(trials, insertAll, values)
(n, t / n)
}
test("timing insertAllAVL on random input") {
// You may need to tweak this data to suit your computer
val data = map((x: Int) => randomInts(math.pow(2, x + 1).toInt), revOrder(16))
val timing = map(timeInsertAll(insertAllAVL, 5), data)
println(timing)
}
Notice that the test above doesn’t actually check the results. You could enter the printed data into a spreadsheet, plot the graph, and examine the result.
Instead, the cmpsci220.hw.measurement
package has a function to calculate
linear regression, which lets you fit a line to a set of points.
For example, here is are two real tests that try to fit the output a a line and a log-curve:
test("timing insertAllBST on ordered input") {
// You may need to tweak this data to suit your computer
val data = map((x: Int) => revOrder(math.pow(2, x + 1).toInt), revOrder(10))
val timing = map(timeInsertAll(insertAllBST, 5), data)
val line = linearRegression(timing)
assert(line.rSquared >= 0.85)
}
def isNonZeroTime(pt: (Double, Double)): Boolean = pt._2 != 0
test("timing insertAllAVL on ordered input") {
// You may need to tweak this data to suit your computer
val data = map((x: Int) => revOrder(math.pow(2, x + 1).toInt), revOrder(14))
val timing = map(timeInsertAll(insertAllAVL, 5), data)
// Remove points with 0 as the y-coordinate, so that log is defined
val zeroesRemoved = filter(isNonZeroTime, timing)
// Put the X-axis on a log scale, so that we can fit a line
val logScaleX = map((xy: (Double, Double)) => (math.log(xy._1), xy._2), zeroesRemoved)
val line = linearRegression(logScaleX)
assert(line.rSquared >= 0.85)
}
Here is a trivial test suite that simply checks to see if you’ve defined
the Solution
object with the right type:
class TrivialTestSuite extends org.scalatest.FunSuite {
test("The solution object must be defined") {
val obj : cmpsci220.hw.measurement.MeasurementFunctions = Solution
}
}
You should place this test suite in src/test/scala/TrivialTestSuite.scala
.
If this test suite does not run as-is, you risk getting a zero.
First, quit sbt if it running. In the project
sub-directory of your
homework, create a file called plugins.sbt
with exactly
this line:
addSbtPlugin("edu.umass.cs" % "cmpsci220" % "2.2")
This adds a “submit” task to sbt that first ensures that your tests pass and
then creates a file called submission.tar.gz
, which you can upload to
Moodle. E.g., your interaction with sbt might look like this:
> submit
Created submission.tar.gz. Upload this file to Moodle.
[success] Total time: 0 s, completed Sep 29, 2014 1:17:45 PM