Class lectures: Mondays & Wednesdays 10:30-11.50 in Wean Hall 7500. (Campus Map)

Recitations: Thursdays 5:00-6:20 in Gates Hillman Center 6115

It is hard to imagine anything more fascinating than automated systems that improve their own performance. The study of learning from data is commercially and scientifically important. This course is designed to give a graduate-level student a thorough grounding in the methodologies, technologies, mathematics and algorithms currently needed by people who do research in learning and data mining or who may need to apply learning or data mining techniques to a target problem. The topics of the course draw from classical statistics, from machine learning, from data mining, from Bayesian statistics and from statistical algorithmics.

Students entering the class should have a pre-existing working knowledge of probability, statistics and algorithms, though the class has been designed to allow students with a strong numerate background to catch up and fully participate.

Mailing lists and discussion forum




Homework policy

Important Note: As we often reuse problem set questions from previous years, covered by papers and webpages, we expect the students not to copy, refer to, or look at the solutions in preparing their answers. Since this is a graduate class, we expect students to want to learn and not google for answers. The purpose of problem sets in this class is to help you think about the material, not just give us the right answers. Therefore, please restrict attention to the books mentioned on the webpage when solving problems on the problem set. If you do happen to use other material, it must be acknowledged clearly with a citation on the submitted solution.

Collaboration policy

Homeworks will be done individually: each student must hand in their own answers. In addition, each student must write their own code in the programming part of the assignment. It is acceptable, however, for students to collaborate in figuring out answers and helping each other solve the problems. We will be assuming that, as participants in a graduate course, you will be taking the responsibility to make sure you personally understand the solution to any work arising from such collaboration. You also must indicate on each homework with whom you collaborated. The final project may be completed individually or in teams of two students.

Late homework policy

Homework regrades policy

If you feel that we have made an error in grading your homework, please turn in your homework with a written explanation to Michelle, and we will consider your request. Please note that regrading of a homework may cause your grade to go up or down.


You are expected to complete a term project during the class. This will provide you an opportunity to apply machine learning in your own research, investigate aspects of machine learning that interest you, both practical and theoretical. Students are expected to complete succesfully the following requirements for the project: For the project milestone, students are expected to have roughly half of the project work completed. A short write-up will be required, and we will provide feedback. At the end of the semester, students will have the opportunity to present their work in a poster session. Specific due dates will be announced later during the semester.

Note to people outside CMU

Feel free to use the slides and materials available online here. Please email the instructors with any corrections or improvements. Additional slides and software are available at the Machine Learning textbook homepage and at Andrew Moore's tutorials page.