You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »


Please read ALL the instructions carefully, especially those on the main homework page.   

Read the whole assignment before you begin.


For this assignment, you may use a Python program or a JupyterLab notebook. 

(0) Completion of Labs and Reading

  • If you have not yet completed the in-class work or the weekly reading, then you may want to finish that first. Recent lecture notes on Canvas may also be useful... 

(1)  Checkout the repository:


Use this link to accept the assignment and create your repository on GitHub:   https://classroom.github.com/a/NKjGQ23O

After you accept the assignment and the repository and it exists in your GitHub then clone the repository into your working area on Rivanna. 

Processing a data file (reading and writing)

For this example, we will examine the full Iris data set mentioned in class.

In this data set, there are 5 columns of information (attributes):

1. sepal length in cm

2. sepal width in cm

3. petal length in cm

4. petal width in cm

5. class:

-- Iris Setosa

-- Iris Versicolour

-- Iris Virginica




1) 5 Points: Write a program (or notebook) iris_parse.py (or .ipynb) that performs the following actions:

  • reads the data file, iris.data, one line at a time. Note that it is in your repository. 

  • prints it back to three different files depending on the class: "Setosa.out",  "Versicolour.out", and "Virginica.out". These files only need to include the 4 numbers. Don't include the name in the output file; it would be repetitive and complicate the next step.     


2) 4 Points: Write a second program (or notebook) iris_loadtxt.py (or .ipynb) to read-in each one of these files into NumPy arrays using the function np.loadtxt (you will have 3 NumPy arrays with 4 columns of values in each).  Use these columns along with NumPy functions to print summary statistics to the screen.  Make sure it is clear what you are printing to the screen.


2b) Want to be an A student?  Yes? Then, try this part! (1 point)

Make a table including the summary statistics for each type of iris: average and standard deviation for the 4 attributes of each flower class.   Put the extra effort in to make your table look nice and easy to read and understand.  

Your output might look something like this:


Class                                      sepal length                         sepal width                       petal length                      petal width

Iris Setosa                              Avg +- SD                                ...

Iris Versicolour                        ...

Iris Virginica


Make sure your output is well-organized and easy to read, then write it to a file called summary.txt. Formatting matters!


Upload the two programs and outputs to your repository: Setosa.out, Versicolour.out, Virginica.out, summary.txt, iris_parse.py, and iris_loadtxt.py (or iris_parse.ipynb and iris_loadtxt.ipynb if you use a Notebook).

Start your work early, so you can get assistance if needed.

  • No labels