Contents

DATA100-L19: SQL II and PCA I

SQL II

sql and pandas

how to connect sql to python

1
2
3
4
5
6
7
8
9
import pandas as pd
import sqlalchmey
engine = sqlalchemy.create_engine('sqlite:///mydatabase.db')
connection = engine.connect()

pd.read_sql("""
SELECT * FROM mytable
GROUP BY column1, column2
""", connection)

LIKE and CAST

/datal19/image.png LIKE: search for a pattern in a column

1
2
SELECT * FROM mytable
WHERE column1 LIKE '%value%'

CAST: convert data type /datal19/image-1.png

SQL Joins

/datal19/image-2.png

Cross Join

1
2
3
SELECT *
FROM table1
CROSS JOIN table2

Inner Join

1
2
3
4
SELECT *
FROM table1
INNER JOIN table2
ON table1.column1 = table2.column1
1
2
3
SELECT *
FROM t1, t2
WHERE t1.id = t2.id

/datal19/image-3.png

left/right/full outer join

/datal19/image-4.png

other join conditions

/datal19/image-5.png

PCA

dimensionality and rank of data

/datal19/image-6.png dimensionality <===> rank

two interpretations of matrix multiplication

  • matrices as linear operations
  • ~coordinate transformation /datal19/image-7.png

matrix decomposition and rank

/datal19/image-8.png 尽可能的保留主成分,而舍弃无关的成分 ===> rank

manual matrix decomposition exercise

lin alg有的时候不能分析出真正的“rank",需要 /datal19/image-9.png /datal19/image-10.png

singular value decomposition (high level look)