CSI 5387: Machine Learning
Project Description

Project Proposal (3-5 pages) Due: Week 6
Final Project Report Due: Last Day of Classes
Presentation: Weeks 12, 13 and 14
Demo (optional): By Appointment

Introduction

In this project, you are expected (1) to select a particular area of Machine Learning that interests you, (2) to conduct a literature search on this area, (3) to focus on a specific problem in the area you selected, and (4a) to design and implement a novel learning scheme or (4b) to extend an existing scheme to deal with the problem you have identified. Alternatively (4c), you can compare the performance of different existing schemes on the specific problem you have identified in (1), (2) and (3) or on a particular real-world data set (but not one of the benchmark data sets such as those in the UCI repository: such a data set must be of interest to industry or research).

It is important to start working on this project as soon as the semester begins. I suggest that you start reading the textbook, some of its suggested follow-up material, conference proceedings, journals, and papers available from the Web, early enough to settle quickly on a subject of interest to you. I will be available for discussions both before the project proposal is due and after that, during the development of your research.

In order to help you select a topic, here is a list of project suggestions though you are more than welcome to propose your own idea.

Project Suggestions

Design a combination scheme for combining learning methods that present different stengths and weaknesses. This scheme should benefit from the different learning methods' advantages but not suffer from their individual weaknesses.
Ensemble-based combination schemes often perform more accurately than a single "best classifier". Investigate the relationship between the accuracy of the individual combined classifiers and that of their combination.
Identify an area of Natural Language Processing that could be handled by a machine learning method (example, the translation of certain prepositions from one language to another), propose a method for automatically constructing a training set for that problem from raw text and a lexicon, and apply one or several learning algorithm to that data set.
Implement a program for detecting domain specific keywords in a collection of texts written for that domain.
If you have a data set of interest to you (example: from a past or present job, or another academic project), evaluate the performance of standard learning techniques on that set, identify particular properties of your data set that may negatively affect the learning performance, devise and implement a scheme for addressing this deficiency.
Design a method for generating new features and selecting the most useful ones for a given learning task.
Use the Mixture-of-Experts Framework with different learning schemes. Is it a useful scheme for combining different classifiers?
Design and implement a concept-learner (or extend an existing concept- learner) for dealing with class imbalance (the situation where a training set contains more positive than negative data (or the other way around)).
Design and implement a concept-learner (or extend an existing concept- learner) for dealing with the case of small disjuncts and rare cases.
Compare the performance of a number of unsupervised classifiers used in supervised mode to the performance of supervised classifiers.
Compare the performance of combination methods such as bagging or boosting when used with different learning methods.

CSI 5387: Machine Learning Project Description

Project Proposal (3-5 pages) Due: Week 6 Final Project Report Due: Last Day of Classes Presentation: Weeks 12, 13 and 14 Demo (optional): By Appointment

Introduction

Project Suggestions

CSI 5387: Machine Learning
Project Description

Project Proposal (3-5 pages) Due: Week 6
Final Project Report Due: Last Day of Classes
Presentation: Weeks 12, 13 and 14
Demo (optional): By Appointment