CSI 5387: Machine Learning
Project Description
Project Proposal (3-5 pages) Due: Week 6
Final Project Report Due: Last Day of Classes
Presentation: Weeks 12, 13 and 14
Demo (optional): By Appointment
Introduction
In this project, you are expected (1) to select a particular area of Machine
Learning that interests you, (2) to conduct a literature search on this area,
(3) to focus on a specific problem in the area you selected, and (4a) to
design and implement a novel learning scheme or (4b) to extend an existing
scheme to deal with the problem you have identified. Alternatively (4c), you
can compare the performance of different existing schemes on the specific
problem you have identified in (1), (2) and (3) or on a particular
real-world data set (but not one of the benchmark data sets such as those
in the UCI repository: such a data set must be of interest to industry
or research).
It is important to start working on this project as soon as the semester
begins. I suggest that you start reading the textbook, some of its suggested
follow-up material, conference proceedings, journals, and papers available
from the Web, early enough to settle quickly on a
subject of interest to you. I will be available for discussions both before the
project proposal is due and after that, during the development of your research.
In order to help you select a topic, here is a list of project suggestions
though you are more than welcome to propose your own idea.
Project Suggestions
- Design a combination scheme for combining learning methods that present
different stengths and weaknesses. This scheme should benefit from the
different learning methods' advantages but not suffer from their
individual weaknesses.
- Ensemble-based combination schemes often perform more accurately
than a single "best classifier". Investigate the relationship between
the accuracy of the individual combined classifiers and that of their
combination.
- Identify an area of Natural Language Processing that could be handled
by a machine learning method (example, the translation of certain
prepositions from one language to another), propose a method for
automatically constructing a training set for that problem from raw
text and a lexicon, and apply one or several learning algorithm to that
data set.
- Implement a program for detecting domain specific keywords in
a collection of texts written for that domain.
- If you have a data set of interest to you (example: from a past or
present job, or another academic project), evaluate the performance
of standard learning techniques on that set, identify particular
properties of your data set that may negatively affect the learning
performance, devise and implement a scheme for addressing this
deficiency.
- Design a method for generating new features and selecting the most
useful ones for a given learning task.
- Use the Mixture-of-Experts Framework with different learning schemes.
Is it a useful scheme for combining different classifiers?
- Design and implement a concept-learner (or extend an existing concept-
learner) for dealing with class imbalance (the situation where a training
set contains more positive than negative data (or the other way around)).
- Design and implement a concept-learner (or extend an existing concept-
learner) for dealing with the case of small disjuncts and rare cases.
- Compare the performance of a number of unsupervised classifiers used
in supervised mode to the performance of supervised classifiers.
- Compare the performance of combination methods such as bagging or
boosting when used with different learning methods.