CSI 5387: Machine
Learning
Project Description
Project Proposal (3-5
pages) Due: Last class before the mid-term break
Final Project Report Due: Last day of classes
Presentation: Last weeks of classes
Demo (optional): By Appointment
Introduction
In
this project, you are expected (1) to select a particular area of Machine
Learning that interests you, (2) to conduct a literature search on this area,
(3) to focus on a specific problem in the area you selected, and (4a) to design
and implement a novel learning scheme or (4b) to extend an existing scheme to
deal with the problem you have identified. Alternatively (4c), you can compare
the performance of different existing schemes on the specific problem you have
identified in (1), (2) and (3) or on a particular real-world data set (but not
one of the benchmark data sets such as those in the UCI repository: such a data
set must be of interest to industry or research).
It is important to start working on this project as soon as the semester
begins. I suggest that you start reading the textbook, some of its suggested
follow-up material, conference proceedings, journals, and papers available from
the Web, early enough to settle quickly on a subject of interest to you. I will
be available for discussions both before the project proposal is due and after
that, during the development of your research.
In order to help you select a topic, here is a list of project suggestions
though you are more than welcome to propose your own idea.
Project Suggestions
- Design a combination scheme for combining learning
methods that present different stengths and
weaknesses. This scheme should benefit from the different learning
methods' advantages but not suffer from their individual weaknesses.
- Ensemble-based combination schemes often perform more
accurately than a single "best classifier". Investigate the
relationship between the accuracy of the individual combined classifiers
and that of their combination.
- Identify an area of Natural Language Processing that
could be handled by a machine learning method (example, the translation of
certain prepositions from one language to another), propose a method for
automatically constructing a training set for
that problem from raw text and a lexicon, and apply one or several
learning algorithm to that data set.
- Implement a program for detecting domain specific
keywords in a collection of texts written for that domain.
- If you have a data set of interest to you (example:
from a past or present job, or another academic project), evaluate the
performance of standard learning techniques on that set, identify
particular properties of your data set that may negatively affect the
learning performance, devise and implement a scheme for addressing this
deficiency.
- Design a method for generating new features and
selecting the most useful ones for a given learning task.
- Use the Mixture-of-Experts Framework with different
learning schemes. Is it a useful scheme for combining different classifiers?
- Design and implement a concept-learner (or extend an
existing concept- learner) for dealing with class imbalance (the situation
where a training set contains more positive than
negative data (or the other way around)).
- Design and implement a concept-learner (or extend an
existing concept- learner) for dealing with the case of small disjuncts and rare cases.
- Compare the performance of a number of unsupervised
classifiers used in supervised mode to the performance of supervised
classifiers.
- Compare the performance of combination methods such as
bagging or boosting when used with different learning methods.