logistic regression model continued sklearn demo go to see lec code!
MLE: high-level, detailed (recorded) linear separability and regularization 线性可分性:如果存在一个 超平面(hyperplane) 可以将数据集分割成两部分,那么这就是线性可分的。
超平面的维度和数据集的维度相同 $$ C $$ 注意对“push”的理解!
另一种理解正则化的角度 这里是避免loss出现无限大的情况(梯度爆炸?),避免出现使前面情况发生的参数(infinite theta)出现,所以在loss里面预先加入正则化项。
performance metrics accuracy 1 2 # using sklearn model.score(X_test, y_test) imbalanced data, precision, recall Acc is not a good metric for imbalanced data, use precision and recall instead!!! $$ acc= \frac{TP+TN}{n}\ precision(精确率)=\frac{TP}{TP+FP}\ recall(召回率)=\frac{TP}{TP+FN} $$ adjusting the classification threshold(阈值界限) a case study 变界限可能是因为imbalanced data导致的
SQL II sql and pandas how to connect sql to python
1 2 3 4 5 6 7 8 9 import pandas as pd import sqlalchmey engine = sqlalchemy.create_engine('sqlite:///mydatabase.db') connection = engine.connect() pd.read_sql(""" SELECT * FROM mytable GROUP BY column1, column2 """, connection) LIKE and CAST LIKE: search for a pattern in a column
1 2 SELECT * FROM mytable WHERE column1 LIKE '%value%' CAST: convert data type SQL Joins Cross Join 1 2 3 SELECT * FROM table1 CROSS JOIN table2 Inner Join 1 2 3 4 SELECT * FROM table1 INNER JOIN table2 ON table1.
recap and Goals approximate factorization $W\ L\rightarrow (W+L)/ 2$ rank 下降使得信息缺失了
所以 $M_{100 \times 4} = N_{100 \times P} \times Q_{P \times 4}$ P的值尽量不要小于原来的"秩"
singular value decomposition (SVD) low rank approximation no bad! seem good!
SVD theory 验证orthonormal set
V@V.T = I 当相乘的时候本质上是旋转,不会拉伸
Principal Components 零中心化再来看PCA Principal Components and Variance PCA example Why is useful? 🤔