DATA100-disc02
Discussion 2: Pandas Practice
We will begin our discussion of Pandas. You will practice:
- Selecting columns
- Filtering with boolean conditions
- Counting with
value_counts
|
|
Pandas Practise
In the first Pandas question, we will be working with the elections
dataset from lecture.
|
|
Year | Candidate | Party | Popular vote | Result | % | |
---|---|---|---|---|---|---|
0 | 1824 | Andrew Jackson | Democratic-Republican | 151271 | loss | 57.210122 |
1 | 1824 | John Quincy Adams | Democratic-Republican | 113142 | win | 42.789878 |
2 | 1828 | Andrew Jackson | Democratic | 642806 | win | 56.203927 |
3 | 1828 | John Quincy Adams | National Republican | 500897 | loss | 43.796073 |
4 | 1832 | Andrew Jackson | Democratic | 702735 | win | 54.574789 |
Question 5
We want to select the “Popular vote” column as a pd.Series
. Which of the following lines of code will error?
elections['Popular vote']
elections.iloc['Popular vote']
elections.loc['Popular vote']
elections.loc[:, 'Popular vote']
elections.iloc[:, 'Popular vote']
Run each line in the cell below and see for yourself!
|
|
0 151271
1 113142
2 642806
3 500897
4 702735
...
173 62984828
174 732273
175 4489235
176 65853514
177 1457226
Name: Popular vote, Length: 178, dtype: int64
Question 6
Write one line of Pandas code that returns a pd.DataFrame
that only contains election results from the 1900s.
|
|
Year | Candidate | Party | Popular vote | Result | % | |
---|---|---|---|---|---|---|
54 | 1900 | John G. Woolley | Prohibition | 210864 | loss | 1.526821 |
55 | 1900 | William Jennings Bryan | Democratic | 6370932 | loss | 46.130540 |
56 | 1900 | William McKinley | Republican | 7228864 | win | 52.342640 |
57 | 1904 | Alton B. Parker | Democratic | 5083880 | loss | 37.685116 |
58 | 1904 | Eugene V. Debs | Socialist | 402810 | loss | 2.985897 |
... | ... | ... | ... | ... | ... | ... |
146 | 1996 | Harry Browne | Libertarian | 485759 | loss | 0.505198 |
147 | 1996 | Howard Phillips | Taxpayers | 184656 | loss | 0.192045 |
148 | 1996 | John Hagelin | Natural Law | 113670 | loss | 0.118219 |
149 | 1996 | Ralph Nader | Green | 685297 | loss | 0.712721 |
150 | 1996 | Ross Perot | Reform | 8085294 | loss | 8.408844 |
97 rows × 6 columns
Question 7
Write one line of Pandas code that returns a pd.Series
, where the index is the Party, and the values are how many times that party won an election. Hint: use value_counts
.
|
|
Party
Democratic 46
Republican 40
Prohibition 11
Libertarian 11
Socialist 10
Independent 6
Whig 6
Green 6
Progressive 4
Populist 3
Constitution 3
American Independent 3
American 2
National Republican 2
Democratic-Republican 2
Reform 2
Free Soil 2
Anti-Masonic 1
National Union 1
Constitutional Union 1
National Democratic 1
Union Labor 1
Greenback 1
Anti-Monopoly 1
Liberal Republican 1
Southern Democratic 1
Northern Democratic 1
Farmer–Labor 1
Dixiecrat 1
States' Rights 1
Communist 1
Union 1
Taxpayers 1
New Alliance 1
Citizens 1
Natural Law 1
Name: count, dtype: int64
Grading Assistance (Bonus)
Fernando is writing a grading script to compute grades for students in Data 101. Recall that many factors go into computing a student’s final grade, including homework, discussion, exams, and labs. In this question, we will help Fernando compute the homework grades for all students using a DataFrame, hw_grades
, provided by Gradescope.
The Pandas DataFrame hw_grades
contains homework grades for all students for all homework assignments, with one row for each combination of student and homework assignment. Any assignments that are incomplete are denoted by NaN (missing) values, and any late assignments are denoted by a True boolean value in the Late column. You may assume that the names of students are unique. Below is a sample of hw_grades
.
|
|
Name | Assignment | Grade | Late | |
---|---|---|---|---|
28 | Sid | Homework 9 | 82.517998 | True |
11 | Ash | Homework 2 | 78.264844 | True |
10 | Ash | Homework 1 | 98.421049 | False |
41 | Emily | Homework 2 | 62.900313 | False |
2 | Meg | Homework 3 | 89.785619 | False |
Question 8a
Assuming there is a late penalty that causes a 10% grade reduction to the student’s
current score (i.e. a 65% score would become a 65% - 6.5% = 58.5%), write a line
of Pandas code to calculate all the homework grades, including the late penalty if
applicable, and store it in a column named ’LPGrade’
.
|
|
Name | Assignment | Grade | Late | LPGrade | |
---|---|---|---|---|---|
0 | Meg | Homework 1 | NaN | False | NaN |
1 | Meg | Homework 2 | 64.191844 | False | 64.191844 |
2 | Meg | Homework 3 | 89.785619 | False | 89.785619 |
3 | Meg | Homework 4 | 74.420033 | False | 74.420033 |
4 | Meg | Homework 5 | 74.372434 | True | 66.935190 |
Question 8b
Which of the following expressions outputs the students’ names and number of late assignments, from least to greatest number of late assignments?
hw_grades.groupby([’Name’]).sum().sort_values()
hw_grades.groupby([’Name’, ’Late’]).sum().sort_values()
hw_grades.groupby([’Name’]).sum()[’Late’].sort_values()
hw_grades.groupby([’Name’]).sum().sort_values()[’Late’]
|
|
Name
Sid 1
Emily 2
Meg 2
Ash 3
Smith 3
Name: Late, dtype: int64
Question 8c
If each assignment is weighted equally, fill in the blanks below to calculate each student’s overall homework grade, including late penalties for any applicable assignments.
Hint: Recall that incomplete assignments have NaN values. How can we use fillna
to replace these null values?
|
|
|
|
Name
Ash 80.830657
Emily 84.297725
Meg 69.218137
Sid 63.020729
Smith 58.332233
Name: LPGrade, dtype: float64
Question 8d
Of all the homework assignments, which are the most difficult in terms of the median grade? Order by the median grade, from lowest to greatest. Do not consider incomplete assignments or late penalties in this calculation.
Fill in the blanks below to answer this question.
Hint: Recall that incomplete assignments have NaN values. How can we use dropna
to remove these null values?
|
|
|
|
Assignment
Homework 2 64.160918
Homework 10 66.366211
Homework 5 74.372434
Homework 8 76.362904
Homework 4 78.207572
Homework 3 78.348163
Homework 9 82.517998
Homework 6 84.369535
Homework 1 85.473281
Homework 7 92.200688
Name: Grade, dtype: float64