CSI 5180: Statistical NLP
Project Description
Project Proposal (2-3 pages) Due: Oct 28
Final Project Report Due: Dec 15
Presentation: Week 13
Demo (optional during presentation)
Introduction
In this project, you are expected (1) to select a particular area of
Statistical
NLP that interests you, (2) to conduct a literature search on this
area,
(3) to focus on a specific problem in the area you selected, and (4a)
to design and implement a novel learning scheme or (4b) to extend an
existing scheme to deal with the problem you have identified.
Alternatively (4c), you can compare the performance of different
existing schemes on the specific problem you have identified in (1),
(2) and (3) and on different corpora.
It is important to start working on this project as soon as the
semester begins. I suggest that you start reading the textbook, some of
its suggested follow-up material, conference proceedings, journals, and
papers available
from the Web, early enough to settle quickly on a subject of interest
to you. I will be available for discussions both before the project
proposal is due and after that, during the development of your
research.
In order to help you select a topic, here is a list of project
suggestions though you are more than welcome to propose your own idea.
Project Suggestions
- Compare the performance of several keyword extraction systems on
several corpora. Describe the strengths and weaknesses of each of them.
- Implement a system for automatic classification and information
extraction from medical articles
- Compare the performance of various machine learning tools on
different representations of the REUTERS text categorization data set
(e.g., bag of word representation, keyword representation, bag of word
representation of summaries of the text, etc...)
- Implement a program for detecting domain specific keywords in a
collection of texts written for that domain.
- Design a method for establishing the degree of similarity
between two
documents.
- Design a system that summarizes several documents into a single
summary.
- Design a system that makes use of a bilingual corpus to perform
word sense disambiguation.
- Design a system that improves (in some way such as word order,
verb tense, choice of preposition, word sense disambiguation, etc...)
on the translation of an existing system (example BabelFish)
- Design a system that detects proper nouns and/or geographical
entity in text (or other kinds of entities and relations between
entities).
- Design a system that classifies customer reviews as positive or
negative (detects opinion).
- Compare the performance of several
part-of-speech taggers.
- Compare the performance of several parsers.
- Implement a program that detects base noun
phrases (NP chunks).
- Develop any of the above systems for
languages other than English. French is of special interest.