大纲进程: sheet
SQL I pros and cons relational Terminology and concepts database: set of name relations relation(table): schema: descriptions “metadata” fixed, unique attribute names, atomic types instance: set of data 符合description often changed, can duplicate multiset of tuples or “rows” attribute (column,field) tuple (row,record),怀疑一些python概念也来自于此 DDL (Data Definition Language) 1 2 3 4 5 6 7 8 9 10 CREATE TABLE myTable ( ID INTEGER, myName CHAR(50), Age INTEGER, Salary FLOAT, PRIMARY KEY (ID, myName), FOREIGN KEY (ID) REFERENCES myOtherTable(ID), FOREIGN KEY (myName) REFERENCES myOtherTable(myName) ); 1 2 3 SELECT [DISTINCT] <column expression list> FROM <single_table> [WHERE <predicate>] ORDER BY Lexicographic order by default 字典序 LIMIT Aggregation functions AVG: average COUNT: count the number of rows MAX: maximum value MIN: minimum value SUM: sum of values 1 2 SELECT AVG(Salary) FROM myTable; GROUP BY HAVING
introduction to clustering no label at all 😢
K-means clustering 算法动画演示
K-Means vs KNN minimizing inertia convex?? 损失函数不一定凸,梯度下降难顶 how to see which one is better ❓ 但是找到全局最优解非常困难 agglomerative clustering 演示见上面链接以及lec code!
和CS61B的minimum spanning tree类似,每次合并两个最近的点,直到终止条件
outlier 有时忽略处理或者自成一类
picking K Smax? can s be negative? summary