See the FAQ that will be updated on Friday June 12th!
Final Exam Due Tuesday, Saturday. July 13th by 3:15!
Late exams will be penalized by 20% per day! Please turn it in on time!
Rules:
- Issues: If there are technical issues on Rivanna please let Prof. Group know ASAP via Piazza AND email so he can follow up quickly.
Individual Work: All students must work individually so no collaboration. Last year two students worked together and it was obvious. Their course grade was severely reduced. Don't cheat! It will be obvious If your solution is similar to someone else's.
- Allowed Resources: Class texts, class notes, class web page, Collab page (including solutions), and web links provided directly from the class web pages. Official documentation of any modules used in class: NumPy, pandas, matplotlib, sci-kit-learn, etc. Cite sources for methods, functions, etc using resources not taught in class.
Questions: You can ask questions to Piazza but the response will be limited to conceptual assistance or clarifications. Pose all questions by Thursday June 11 at Midnight.
Help: No more office hours. Post questions to Piazza only. If you are unsure if your question gives anything away - make it private.
Narrative: Your descriptions/justifications are important for these problems. In addition to your code, you need to provide a narrative that includes your response to each question. Add your code and narration using the Jupyter Notebook included in the GitHub repository. You should provide descriptions accompanying your solutions where needed. There are many questions inline as well, answer those in the document. Additional discussions of results and issues you discovered are also allowed and encouraged.
- GitHub: Since the assignment is in Github, I strongly recommend that you make commits early and often! You don't want to lose your work or break something that was working.
Pledge your notebook.
Watch for the final FAQ email on July 12th covering appropriate questions with any necessary clarifications. Watch for this email. After this, there will be no more discussion during the exam period.
General Tips and Advice:
Don't jump into coding before you have a good idea about how you want to write your code. Plan your work carefully.
Your code should produce comprehensible, useful output that is neither too abundant nor too sparse.
Submit documented partial solutions if you don't have full solutions to the problems - we'll give partial credit for reasonable yet incomplete work. Explain how you should solve it.
An amazing image of deep space from the Sloan Digital Sky Survey
Classification of Astrophysical Objects (100 points total - 10 points per part)
Background
For your final exam problem, we are going to study some astronomy data from the Sloan Digital Sky Survey. I told you the project would be out of this world!
The Sloan Digital Sky Survey (SDSS) has created the most detailed three-dimensional maps of the Universe ever made, with deep multi-color images of one-third of the sky, and spectra for more than three million astronomical objects. You may learn and explore more about all phases and surveys — past, present, and future — of the SDSS. https://www.sdss.org/
At the very least, you should watch this short (3 min) video to learn about SDSS:
For our final project, we are going to work with a reduced Sloan dataset that I obtained here:
https://www.kaggle.com/datasets/lucidlenn/sloan-digital-sky-survey
In the dataset, there are data from three classes of astrophysical objects: stars, galaxies, and quasars (QSO).
The reduced Skyserver dataset consists of 10,000 observations of space taken by the SDSS. Every observation is described by 6 feature columns and 1 class column which identifies it to be either a star, galaxy, or quasar.
The feature labels (u,
g, r, i, z, and redshift. u,
g, r, i, z) represent the response of the 5 bands of the telescope. Each band selects different colors (wavelengths) of light to get a different view of the cosmos. That is, these are different filters that focus on different wavelength bands. These are used in the Sloan Digital Sky Server to focus on the light of different wavelengths. This shows how the bands are split by wavelength:
The 6th feature is "redshift", which involves the Doppler Effect. This is what you hear when an emergency vehicle approaches: the siren sounds high-pitched, and then after passing you, the siren sounds lower-pitched. While this effect is familiar with sound, it holds for light as well! When objects move away from us the wave of light gets "stretched out". Thus the wavelength appears shifted towards red on the visible spectrum, so we say objects that are moving away from us are "red-shifted".
So this redshift is a measure of how much the wavelength has been shifted. Because our Universe is expanding (the Big Bang!), almost all objects that we see in the night sky are moving away from us, and thus are red-shifted! One thing we can measure by looking at the light from distant objects is how much it has been red-shifted. It is observed that the farther an object is away from us, the more it is red-shifted. Or, things that are farther away are moving faster away from us! This is the famous Hubble's Law!
Anyway, you don't need to understand any of that, just to understand that it is a feature that is measured for each sample.
Assignment
Ok, now that you understand a little bit about the dataset that we are working with, we can get started!
Unless specified otherwise you may use any tool we learned about to accomplish the tasks below. For example, in part a) you can use a simple readlines() command and then loop over the lines in the file to calculate the requested quantities, or you can use a more advanced tool from Numpy or Pandas if you are aware of one. Whatever you are most comfortable with to get the job done.
You may accept the assignment here and create your repository:
https://classroom.github.com/a/4_BpnEWk
Good luck! Don't panic - just do your best.