Lec9-Normalization, Dropout, + Implementation

HHZZ published on 2024-09-28 included in CMU-10-414-714

Normalization and Regularization Normalization and Initialization 注意看weight variance的曲线，几乎不变 norm的思想来源 layer normalization batch normalization 这么看来batch_norm确实很奇怪, odd! 😢 Regularization L2 Regularization 针对的是过拟合?但是只要是减少function class的操作都是regularization的一种然后发现weight decay和regularization有联系！ dropout

Lec8-NN Library Implementation

HHZZ published on 2024-09-28 included in CMU-10-414-714

Neural Networks lib implementation refreshment 1 import needle as ndl 1 2 def data(self): return self.detach() data 不要grad numerical stability 软回归数值不变性，上下同除 1 2 3 4 def softmax(x): x = x - np.max(x) z = np.exp(x) return z / np.sum(z) nn.Module 参数 1 2 3 4 5 6 7 class Parameter(ndl.Tensor): def __init__(self, data: np.ndarray, requires_grad=True, dtype="float32"): super().__init__(data, requires_grad=requires_grad, dtype=dtype) w = Parameter([2, 1], dtype="float32") isinstance(w, Parameter) # True 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 # recursive function to get all parameters def _get_params(value: ndl.

Lec7-Neural Network Library Abstractions

HHZZ published on 2024-09-28 included in CMU-10-414-714

Neural Networks Abstraction Programming Abstraction 核心思想是host language是一个语言，但是执行计算图的时候可以用其他语言来优化和sql & RDBMS有点相似 🤔 declarative 这应该比较自然的想法，from google “scalable computational systems” 描述图 ==> 指定运行机器 ==> 运行 ==> 结果 imperative define and run 对融合算子友好指定特定值有上面declarative的同样效果 High level modular lib components 经典三明治 loss function is a special case of a “module” 正则化: 要么是损失函数的一部分，要么是优化器的一部分初始化: 包含在nn.Module中总结