Class lectures: Mondays & Wednesdays 10:3011.50 in Wean Hall 7500. (Campus Map)
Recitations: Thursdays 5:006:20 in Gates Hillman Center 6115
It is hard to imagine anything more fascinating than automated systems that improve their own performance. The study of learning from data is commercially and scientifically important. This course is designed to give a graduatelevel student a thorough grounding in the methodologies, technologies, mathematics and algorithms currently needed by people who do research in learning and data mining or who may need to apply learning or data mining techniques to a target problem. The topics of the course draw from classical statistics, from machine learning, from data mining, from Bayesian statistics and from statistical algorithmics.
Students entering the class should have a preexisting working knowledge of probability, statistics and algorithms, though the class has been designed to allow students with a strong numerate background to catch up and fully participate.
Mailing lists and discussion forum
 Class announcements will be broadcasted using a group email list: 10701announce@cs.cmu.edu
 If you are registered for the course, you have automatically been added to the mail group. If you are for some reason NOT receiving these announcements, you can subscribe via the 10701announce list page.
 For changes (incl. additions or removal) to your membership in the course list, please make changes directly via the list administration page.
 Discussion about the homeworks, projects and lectures will be done in a google group called 10701F09. The group home page is in here, where you can change membership information and read archived messages.
 We have setup a google calendar which contains due dates, exams, office hours and recitations (note: due dates are subject to change): use this link or the calendar ID: ert23qflcke4pc9j6ei0ns5u68@group.calendar.google.com
Textbooks
 Textbook: Pattern Recognition and Machine Learning , Chris Bishop.
 Secondary textbook: The Elements of Statistical Learning: Data Mining, Inference, and Prediction Trevor Hastie, Robert Tibshirani, Jerome Friedman. 2nd edition.
 Optional textbook: Machine Learning , Tom Mitchell.
 Optional textbook: Information Theory, Inference, and Learning Algorithms , David Mackay.
Grading
 Midterms (15%)
 Homeworks (5 assignments 35%)
 Final project (25%)
 Final exam (25%)
Auditing

If you are a student, and you don't want to take the class for
credit, you must register to audit the class. To satisfy the
auditing requirement, you must either:
 Do *two* homeworks, and get at least 75% of the points in each; or
 Take the final, and get at least 50% of the points; or

Do a class project and do *one* homework, and get at least 75% of the
points in the homework
 Like any class project, it must address a topic related to machine learning and you must have started the project while taking this class (can't be something you did last semester). You will need to submit a project proposal with everyone else, and present a poster with everyone. You don't need to submit a milestone or final paper. You must get at least 80% on the poster presentation part of the project.
 Please, send us an email saying that you will be auditing the class and what you plan to do.
 If you are not a student and want to sit in the class, please get authorization from the instructor.
Homework policy
Important Note: As we often reuse problem set questions from previous years, covered by papers and webpages, we expect the students not to copy, refer to, or look at the solutions in preparing their answers. Since this is a graduate class, we expect students to want to learn and not google for answers. The purpose of problem sets in this class is to help you think about the material, not just give us the right answers. Therefore, please restrict attention to the books mentioned on the webpage when solving problems on the problem set. If you do happen to use other material, it must be acknowledged clearly with a citation on the submitted solution.Collaboration policy
Homeworks will be done individually: each student must hand in their own answers. In addition, each student must write their own code in the programming part of the assignment. It is acceptable, however, for students to collaborate in figuring out answers and helping each other solve the problems. We will be assuming that, as participants in a graduate course, you will be taking the responsibility to make sure you personally understand the solution to any work arising from such collaboration. You also must indicate on each homework with whom you collaborated. The final project may be completed individually or in teams of two students.Late homework policy
 Homeworks are due at the begining of class, unless otherwise specified.

You will be allowed 3 total late days without penalty for the entire
semester. For instance, you may be late by 1 day on three different
homeworks or late by 3 days on one homework. Each late day
corresponds to 24 hours or part thereof. Once those days are used,
you will be penalized according to the policy below:
 Homework is worth full credit at the beginning of class on the due date.
 It is worth half credit for the next 48 hours.
 It is worth zero credit after that.
 You must turn in all of the 5 homeworks, even if for zero credit, in order to pass the course.
 Turn in all late homework assignments to Michelle ().
Homework regrades policy
If you feel that we have made an error in grading your homework, please turn in your homework with a written explanation to Michelle, and we will consider your request. Please note that regrading of a homework may cause your grade to go up or down.Project
You are expected to complete a term project during the class. This will provide you an opportunity to apply machine learning in your own research, investigate aspects of machine learning that interest you, both practical and theoretical. Students are expected to complete succesfully the following requirements for the project: Project Proposal
 Project Milestone
 Poster Session
 Final Paper