Contents

DATA100-L4: Pandas Ⅱ

lambda function

1
lambda x: x**2

非显式定义函数,可以直接使用lambda表达式来定义一个函数。 This is a lambda function that takes in one argument x and returns the square of x.

再看sort_values()

注意传递key

1
df.sort_values(by='column_name', keys=lambda x: x.str.lower(), ascending=True)

add move modify and so on

sort by length of string

  • approach1: create a new column and add to original df
1
2
3
newdf = df["Name"].str.len() # create a new column with length of each string
df['length'] = newdf
df.sort_values(by='length', ascending=True)

drop column

/datal4/image.png

groupby.agg

never use loops in this class!

/datal4/image-1.png

1
df.groupby('column_name').agg(f)
  • f is a dictionary of functions to apply to each column.
    • The function can be a lambda function or a named function.

groupby

type: pandas.core.groupby.generic.DataFrameGroupBy

filter

1
df.groupby('column_name').filter(lambda x: x['column_name'].mean() > 10)
  • This will return a new DataFrame with only the groups that have a mean value greater than 10. /datal4/image-2.png

multi index 多维索引

using pivot_table()

1
df.pivot_table(index=['column1', 'column2'], columns='column3', values='column4', aggfunc='mean')
  • This will create a multi-index pivot table with the specified index and columns, and the mean of column4 for each group.

using groupby()

1
df.groupby(['column1', 'column2']).agg({'column3': 'mean', 'column4':'sum'})
  • This will group the DataFrame by column1 and column2, and calculate the mean and sum of column3 and column4 for each group.

joining tables

1
left.merge(right, on='column_name', how='inner', lefton='column_name_left', righton='column_name_right')