Projects
for undergraduate students -- CSI 4900
Guidelines for
writing your final report
Fall 2016
Project code: inkpen18
Title: Detecting early signs of mental
illness from ReachOut forum messages.
Description: In this project you will design a text classifier for ReachOut mental health forum posts. A small corpus of posts was
labelled with a red/amber/green semaphore that indicates how urgently a post
needs moderator attention. A text classifier will be developed to predict the
label for unlabeled posts.
Project code: inkpen17
Title: Detecting early signs of mental
illness from Twitter messages.
Description: In this project you will design a text classifier to predict the level
of risk that a social media user presents signs of mental illness, based on
his/her tweets.
Winter 2016
Project code: inkpen16
Title: Information retrieval using a
formal semantic language
Description: In this project you will design a visualizer of semantic representations
for words and phrases. The semantic representation language will allow more
precise information retrieval.
Winter 2015
Project code: inkpen15
Title: Web opinion mining for product
reviews
Description: This project includes a
web crawler to collect products reviews over the Internet, and a classifier to
detect positive and negative opinions.
Winter 2014
Project code: inkpen14
Title: Annotation tool for error
correction
Description: In this project you will develop a tool that allows teachers to annotate errors made by language learners, to customize the process, to insert their own
error tags, to include feedback, etc. The tool should work for any language, but it will be used by teachers of English and French as second language.
Fall 2013
Project code: inkpen13
Title: Automatic processing of poetry
Description: In this project you will develop a tool that allows to detect similar poems, by detecting similar fragments of texts, similar themes, and similar
structures. A graph will be automatically produced to represent links between similar poems.
Fall 2012
Project code: inkpen12
Title: Blog classification
Description: In this project you will apply automatic text classification algorithms in order to classify blogs by the opinions expressed in the texts
(positive/negative) and by the types of emotions expressed (happy/surprised/sad/angry/scared/disgusted).
Fall 2010
Project code: inkpen11
Title: Voice control for robots
Description: In this project you will program a robot to be able to execute commands
spoken by a user. You will install a voice recognition program and implement a
natural language understanding module that extracts the information about what
move is the robot is asked to perform. Then you will program the robot to
execute the move. There is the possibility of individual work or in a group of
two students. The robots will be available in the Robotics Lab of prof. Emil
Petriu.
Project code: inkpen10
Title: Synonyms and semantic similarity
processing for French texts
Description: In this project you will implement tools for processing a corpus of
French texts and develop a program that can choose the best word in a context.
Fall 2008
Project code: inkpen9
Title: Video and text information
retrieval
Description: In this project you will build an information retrieval system that can
find video clips and dialog text that answer a given query. There is the
possibility of individual work or in a group of two students.
Project code: inkpen8
Title: Grapheme-to-phoneme conversion
tool for French
Description: Transforming words from
written form onto phonetic form is useful in Text-to-Speech systems and in
language learning support tools. In this project a tool will be developed for
French words. The tool will learn pronunciation from data, using machine
learning approaches. Training data and starter Java code will be provided.
Winter 2007
Project code: inkpen7
Title: Information retrieval experiments
Description: In this
project the performance of several information retrieval systems will be
compared, and several query expansion methods will be tried.
Fall 2006
Project code: inkpen6
Title: Tools for French text processing
Description: Many natural language
processing tools exist for English texts. In this project some tools will be
developed to work on a corpus of French texts. The corpus will be provided. The
tools include: an automatic phonetic transcriptor, an
automatic syllabifier, etc.
Project code: inkpen5
Title: Information extraction for
financial information
Description:
Financial information about companies is available on the Web, but the user
needs to know how to find it and interpret it, in order to decide in which
companies to invest. This project will provide a user with various financial
ratios and advice. The user inputs the company name, through a GUI interface
implemented in Java. The program fetched
relevant webpages form Yahoo!Finance
and other sites, and navigates through them to find the desired pages. Then it
automatically extracts the information from the pages, calculates ratios, and
displays results to the user.
Fall 2005
Project code: inkpen4
Title: Intelligent thesaurus using Roget
synonyms
Description: A thesaurus assists a
writer with a list of words that are similar to a given word.
The writer has to choose one of the words. An intelligent thesaurus assists the
user by indicating the best choices. The project will focus on the automatic
choice of the best alternative in the context of writing. Roget thesaurus will
be used as a source of synonyms and similar words, in order to allow for a
wide-coverage of the English language. The implementation will be done in Java.
Winter 2004
Project code: inkpen3
Title: Intelligent thesaurus
Description: A thesaurus
assists a writer with a list of words that are similar to a given word. The
writer has to choose one of the words without
being offered explanations about the differences in nuances of meaning between
the possible choices. This project will develop an intelligent thesaurus that
offers, in addition to the list of similar words, explanations about the
differences between them. Moreover, it will be context-sensitive: it will order
the possible choices by their suitability to the writing context. A
knowledge-base of differences between synonyms will be provided. It also
included knowledge about the collocations of synonyms (what words they combine
well with and what words they do not). The implementation will be done in Java.
Fall 2003
Project code: inkpen2
Title: Language models for the texts of the Web
Description: A language
model reflects the distribution of the words in a large collections of texts.
It computes probabilities of occurrence of individual words (unigrams) and
pairs of consecutive words (bigrams). There are tools that compute language
models for a given collection of texts. This project will modify such a tool to
work with word co-occurrence counts collected from the Web. In this way, the
probabilities of rare words will be computed more accurately. The
implementation will be done in C++, Java, or Perl (to be determined).
Project code: inkpen1
Title: Natural language interface for animation
Description: This
project implements a natural language interface that allows a human to
communicate with an animated character using natural language (English in this
case). The focus on the project is on translating from natural language into a
simplified script-like animation language. An example of input text is: “Walk
five steps to the right, jump three times, and then run back”. This text needs
to be translated into something like: “walk steps:5
direction: Est, speed: slow; jump; jump; jump; walk steps:5, direction:West, speed: fast”. Then the character will
execute this simple animation script, by moving around on the screen in the
required sequence. The implementation will be done in Java.