Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

3) Use pandas references to figure out how to complete the following tasks with the Iris dataset.  (5 points)  

  1. The iris data set is included in your HW10 repository. Load it Load the Iris datasetit into a pandas DataFrame and print the top 10 rows to the screen. 
  2. How many rows does it contain? How many columns?

  3. Compute the average petal length and print it to the screen. Also, do this for each class.

  4. Compute the average of each numerical column and print it to the screen.

  5. Compute the average of each numerical column for each class of Iris and print it to the screen.

  6. Extract the petal length outliers (defined as those rows whose petal length is more than 2 standard deviations away from the mean average petal length for the full set of data). Print these rows to the screen.

  7. Compute the standard deviation of all columns and for each iris species. 

  8. Extract the petal length outliers (i.e. those rows whose petal length is more than 2 standard deviations away from the mean average petal length for each class of Iris). There are many ways to do this, you may want to explore: groupby(), aggregate(), and merge(). Print these rows to the screen.
  9. Investigate seaborn.pairplot and use it to make the pairplot for the Iris dataset. Save the pairplot as Iris.pairplot.png

  10. Want to be an A student?  Make the pairplot again, but this time draw the outliers from part 8 in a different color on the off-diagonal scatter plots. Hint - you may need to make some new class types in your pandas DataFrame.

Push your results to Github, [MC_pi.py], [ combine_pi.py], pi_100.png, and pi_slurm.sh, and HW07.ipynb.