Machine Learning (W4771)
Final Project
Due Date:
Three-sentence
proposal is due April 7 by email to devans@cs.columbia.edu.
Write-up is due Thursday April 27.
Reading:
Chapters 6 and 5 of "Machine Learning" by Tom M. Mitchell.
Your assignment:
Select a machine learning project that interests you.
You may work in a group consisting of 1-3 people total.
Be careful to select something interesting, but managable in only a few
weeks (with a week to do the writing).
You will write a 5 (to max 13) page paper describing your work.
This paper will include at least one paragraph each on at least two
research papers that you will select.
Write a description of what you plan to do and email
it as soon as possible, and before the deadline above. This should
also include the two or more papers you will read.
Feel free to ask us for further direction, insights and ideas.
Selecting research papers. The ability to assimilate papers written
in the research level is a necessary part of your machine learning
education because the field is new, and most applications of machine
learning will require insight and experimentation that is at the borderline
or even center of research issues. In other words, to use learning, you'll
need to think for
yourself a lot, and you'll want to consult other people's investigations.
A list of available papers and on-line search engines will be posted.
Project types In general, experimental projects fall into
one of the following categories:
- modify or improve a learning paradigm
- different search technique
- different approach to structural credit assignment
- different approach to temporal credit assignment
- etc.
- apply a learning paradigm to a new hypothesis representation
- apply a learning paradigm to a new problem and/or data set
- real world emperical data
- simulation, e.g., robotics or games (such as lunar lander)
- artificial test problem, e.g., pre-specified target function
- some combination of the above
Project ideas Here is a small, non-exhaustive
list of possible project ideas:
- Metalearning. Have a classifier make decisions based on an instance as
well as what other classifiers would say about that instance. We have
a system to do this available on-line, and have data that has been input
seperately to multiple classifiers, which have not been combined this way.
I will provide if you are interested.
- Competition averts greedy parameter tuning. Certain learning
parameters can be
encoded in GA genomes, such as crossover rate. However, since crossover
hurts more often than it helps, the population may decide to stop crossover
completely (thus unfortunately eliminating the cases where it does help).
However, a dynamic, competitively coevolving environment/fitness measure
may avert this behavior. I do not think anyone has investigated this.
- Further work on Project Othello
- Naive Bayes learning system, applied to text categorization, as
described in the text. System and data available on-line.
- Induce (e.g., with GA or GP) cellular automata rules for
boolean functions such as majority, parity, symmetry, etc.
- Induce (e.g., with GP) a competitor for a video game, such
as Blockade (i.e., those motorcylces in the movie "Tron").
- Standard GP plus indexed memory on large input arrays, e.g., for
standard boolean functions. I don't think this has been done.
- Please go online and check out the example final projects --
follow the link from the web page for this course. Read through
the titles and glance at a couple in detail. Also, same thing
for the final projects of the course "Advanced Intelligent Systems",
which you can get to from Eric's home page -- most (not all) of these
are machine learning projects.
- etc.
Deliverable:
Your
write-up should contain
a written description of the experiments you conducted.
Break the paper into several section, as follows.
The introduction
should start with the motivation, and then give an overview of the
solution. A seperate "approach" section gives
the details of your modification/experiment.
The results section gives the numerical results and some analysis.
The related work section describes the two or more research papers
you read (there will be no penalty if they do not end up being related to
the project you do, although you should attempt at this).
Finaly, the conclusions section places the capital P on "perspective".
You are required to read the extensive guideline on writing
your paper that is located as a link from the web page of this course.
Also read the one on preparing your final presentation
email: evs at cs dot columbia dot edu