Machine Learning (W4771)

Final Project

Due Date: Three-sentence proposal is due April 7 by email to devans@cs.columbia.edu. Write-up is due Thursday April 27.

Reading: Chapters 6 and 5 of "Machine Learning" by Tom M. Mitchell.

Your assignment:

Select a machine learning project that interests you. You may work in a group consisting of 1-3 people total. Be careful to select something interesting, but managable in only a few weeks (with a week to do the writing). You will write a 5 (to max 13) page paper describing your work. This paper will include at least one paragraph each on at least two research papers that you will select.

Write a description of what you plan to do and email it as soon as possible, and before the deadline above. This should also include the two or more papers you will read. Feel free to ask us for further direction, insights and ideas.

Selecting research papers. The ability to assimilate papers written in the research level is a necessary part of your machine learning education because the field is new, and most applications of machine learning will require insight and experimentation that is at the borderline or even center of research issues. In other words, to use learning, you'll need to think for yourself a lot, and you'll want to consult other people's investigations.

A list of available papers and on-line search engines will be posted.

Project types In general, experimental projects fall into one of the following categories:

modify or improve a learning paradigm
- different search technique
- different approach to structural credit assignment
- different approach to temporal credit assignment
- etc.
apply a learning paradigm to a new hypothesis representation
apply a learning paradigm to a new problem and/or data set
- real world emperical data
- simulation, e.g., robotics or games (such as lunar lander)
- artificial test problem, e.g., pre-specified target function
some combination of the above

Project ideas Here is a small, non-exhaustive list of possible project ideas:

Metalearning. Have a classifier make decisions based on an instance as well as what other classifiers would say about that instance. We have a system to do this available on-line, and have data that has been input seperately to multiple classifiers, which have not been combined this way. I will provide if you are interested.
Competition averts greedy parameter tuning. Certain learning parameters can be encoded in GA genomes, such as crossover rate. However, since crossover hurts more often than it helps, the population may decide to stop crossover completely (thus unfortunately eliminating the cases where it does help). However, a dynamic, competitively coevolving environment/fitness measure may avert this behavior. I do not think anyone has investigated this.
Further work on Project Othello
Naive Bayes learning system, applied to text categorization, as described in the text. System and data available on-line.
Induce (e.g., with GA or GP) cellular automata rules for boolean functions such as majority, parity, symmetry, etc.
Induce (e.g., with GP) a competitor for a video game, such as Blockade (i.e., those motorcylces in the movie "Tron").
Standard GP plus indexed memory on large input arrays, e.g., for standard boolean functions. I don't think this has been done.
Please go online and check out the example final projects -- follow the link from the web page for this course. Read through the titles and glance at a couple in detail. Also, same thing for the final projects of the course "Advanced Intelligent Systems", which you can get to from Eric's home page -- most (not all) of these are machine learning projects.
etc.

Deliverable:

Your write-up should contain a written description of the experiments you conducted. Break the paper into several section, as follows. The introduction should start with the motivation, and then give an overview of the solution. A seperate "approach" section gives the details of your modification/experiment. The results section gives the numerical results and some analysis. The related work section describes the two or more research papers you read (there will be no penalty if they do not end up being related to the project you do, although you should attempt at this). Finaly, the conclusions section places the capital P on "perspective".

You are required to read the extensive guideline on writing your paper that is located as a link from the web page of this course. Also read the one on preparing your final presentation

email: evs at cs dot columbia dot edu