Contents

DATA100-L15: Cross Validation, Regularization

Cross Validation

the holdout method

/datal15/image.png

1
2
from sklearn.utils import shuffle
training_set, dev_set = np.split(shuffle(data), [int(.8*len(data))])

比较validation error和training error,选择最优的模型。

K-fold cross validation

/datal15/image-1.png K=1 is equivalent to holdout method.

Test sets

provide an unbiased estimate of the model’s performance on new, unseen data. /datal15/image-2.png

Regularization

L2 regularization (Ridge)

/datal15/image-3.png the small the ball, the simpler the model /datal15/image-4.png 拉格朗日思想,$\alpha$ 越大,约束越强,模型越简单。 /datal15/image-5.png 岭回归

scaling data for regularization

标准化数据,be on the same scale

L1 regularization (Lasso)

/datal15/image-6.png

summary

/datal15/image-7.png