# Exam Min. 1st Qu. Median Mean 3rd Qu. Max. 34.00 63.75 73.50 74.48 88.25 100.00 # Continuous variables Mostly we've looked at discrete variables: - binary (true/false) - categorical (red/blue/green) - ordinal (d6) Continuous variables were handled with discertization. Today we'll talk about handling them in their native, continuous domains. # Consider Naive Bayes We've already seen how to use the NBC to predict not just labels but (discrete) distributions. P(class | vars) = alpha P(vars | class) P(class) Each P(var|class) term is its own small distribution (bar graph on board). What if, instead, the domain were continuous? (histogram) # One approach: fit to a distribution If the distribution "looks like" a well-known distribution, you can fit this parametric model. (Caution: looks like is a tricky one! See Anscombe's quartet.) Parametric models have a finite number of parameters. E.g., normal (Gaussian) distributions can be characterized by mean and variance (mu and sigma); exponential distributions by lambda. Extensions: Mixture models (sum of distributions), etc. Parametric distributions can be "summed" or "multiplied" symbolically. ML or stats class for more details. # Another approach: estimate density A nonparametric approach, which is generalizable but computationally intensive, is to estimate the density of the distribution using the observed data. (To board: sum of min-distributions.) Different "kernels" are used: triangles, uniform, normals. Normal has nice mathematical properties, though is not optimal like Epanechnikov. # Another approach: Linear models Treat the output as a linear model of one or more variables. In other words, each variable has a coefficient (or weight) that determines its importance In the simplest case, a single variable: h(x) = w1 * x + w0. # Example Given a set of training data (points on board) finding optimal weight is called linear regression. Task: find [w0, w1] such that error is minimized. Often we use sum of squared errors (L2 loss). (No need to search, can be done analytically.) # To generalize Can add more variables and corresponding weights. Note that variables need not be linear (e.g., x2 could be w^2 or the like) So you could model the position of a ball being thrown upward as: h = w2 * t^2 + w1 * t + w0 h = w2 * x1 + w1 * x + w0 where x1 = t^2, etc. Can also make each variable (including prediction variable) a vector. # Mixing and matching classification and regression Learn a decision tree on discretized version of variables, then prune, then regression on leaves. This is called a regression tree. Learn a line to divide categorical values. This is called a linear classifier. (on board) We will return to this later as it corresponds to perceptron updates.