DATA100-L3: Pandas Ⅰ
Contents
DataFrames: a data structure for tabular data

API
always remember to turn to GPT/google/doc
indexing and loc/iloc
generate subsets:
-
loc: an operator select items by labels1df.loc[row_indexer, column_indexer]row_indexer: can be a single label, a list of labels, slice(闭区间), single valuecolumn_indexer: same asrow_indexer- returns a DataFrame or Series
-
iloc: an operator select items by positions1df.iloc[row_indexer, column_indexer]row_indexer: numeric index or a list of numeric indices,此时回到python经典索引 左闭右开column_indexer: same asrow_indexer- returns a DataFrame or Series
通常情况下,我们使用
loc进行索引
-
.head(6)and.tail()to get the first or last few rows of a DataFrame(syntactic sugar)
[ ]: context sensitive operator

series: a data structure for 1D labeled data
index: a array-like object that labels the rows and columns of a DataFrame
columns: usually do not have same name.
转换:

conditional selection

类型意识!
三种数据类型中的哪一个???
describe()
sample()
|
|
value_counts()
|
|
return a Series with the count of each value in the column.
unique()
|
|
return a numpy array with the unique values in the column.
sort_values()
|
|
return a new DataFrame with the rows sorted by values in a column.
reference
https://www.textbook.ds100.org/ch/a04/ref_pandas.html
https://pandas.pydata.org/pandas-docs/stable/reference/index.html