CSI5180: Topics in Artificial Intelligence
Natural Language Processing, A Statistical Approach
Winter 2012
Instructor: Diana Inkpen
Office: SITE 5015
E-mail: diana@site.uottawa.ca
Telephone: 562-5800 ext. 6711
Announcements
- Assignment 2 is posted.
- The due date for Assignment 1 was extended with one week, because
three students from the class have many coop interviews this week.
If you finish it on time, you can submit it, and focus on choosing a
paper for presentation and a topic for the project.
- Assignment 1 is posted.
Meeting Times and Locations
- Mon 16:00-19:00,
in Simard 429
Office Hours: Fri 12:30-1:30pm or by
email appointment, in SITE 5015.
Overview
Natural Language Processing (NLP) is the subfield of Artificial
Intelligence
concerned with building computer systems such as natural language
interfaces to databases or the World-Wide Web, automatic
machine-translation systems,
text analysis systems, speech understanding systems, or computer-aided
instruction systems.
Until recently, NLP was mainly approached by rule-based or symbolic
methods. In the past few years, however, statistical methods have been
given a lot of attention as they seem to address many of the
bottlenecks encountered by the symbolic methods.
This course will focus mainly on statistical approaches. In particular,
we will concentrate on approaches such as n-gram models and
markov models. If time permits, we will consider applications such as
information retrieval, text
categorization, clustering, and statistical
machine translation.
Pre-Requisites
Students should have reasonable exposure to Artificial Intelligence and
some programming experience in a high-level language. Please check with
the instructor.
Evaluation
Students will be evaluated on:
- Two written and programming assignments (40%: 20% for A1, 20% for
A2)
- One in-class Presentation(15%)
- Class participation (5%)
- A Final Project (40%)
Required Textbook
Foundations of Statistical Natural Language Processing, by
Chris Manning and Hinrich Schütze, MIT Press, 1999.
Timetable (no late assignments
are considered)
- Assignment 1, due Fri, Feb 10, 21:00,
extended till Feb 17, 21:00.
-
Paper Presentation - See Schedule
- Project outline (2-3 pages), due Mon, Feb 27, in class
-
Assignment 2, due Fri, March 23, 21:00.
- Project Presentations, Mon, April 2, in class
- Project Reports, due Fri, April 27, 21:00, by email
Assignments
The programming part should be done in Perl or Java. If you don't know
Perl, it
is very easy to learn enough Perl to do the assignments. Here
is a Perl tutorial that we migth
discuss in class if time allows. Here is a
very simple Perl script. Here are some more sample Perl scripts:
t4.pl
t5.pl
t6.pl
Course Support:
Useful Links:
Syllabus (subject
to minor modifications)
(The lecture
slides will be in pdf format, you can read them with
Acrobat Reader)
Week 1: Jan 9
Preliminaries
Introduction to
Statistical NLP
Readings: Ch1 Links: Webster
LDOCE
WordNet
Slides
Tom Sawyer
Connexor
parser and tagger demo Stanford parser demo
Week 2: Jan 16
Linguistics Essentials
Mathematical
Foundations I: Probability Theory
Readings: Ch2,3
Links: FrameNet
More slides on
Probability Teory and Information Theory
Online
demos
PenTreebank tagset
Week 3: Jan 23
Mathematical
Foundations II: Information Theory
Corpus-Based
Work
Readings: Ch2,4
Week 4: Jan 30
Collocations
Readings: Ch5
Week 5: Feb 6
Statistical Inference: N-gram
Models
Readings: Ch6,
Links: Statistical Language
Modeling Toolkit
Week 6: Feb 13
Word Sense Disambiguation
Readings: Ch7, Links: Senseval
WSD tutorial
Week 7: Feb 20
Reading week (no classes)
Week 8: Feb 27
Lexical Acquisition
Semantic
Similarity
Readings: Ch8 Links:
Corpus-based Similarity
Demo
Dekang Lin's Demos
WordNet::Similarity
Week 9: Mar 5
Hidden Markov Models
Readings: Ch9 Extra
slides on HMM
Week 10: Mar 12
Part-of-Speech Tagging
Readings: Ch 10
Week 11: Mar 19
Text Categorization Text Clustering
Readings: Ch 16 Links Weka
Week 12: Mar 26
Information Retrieval
Latent Semantic Indexing
Probabilistic Retrieval
Readings: Ch15 Links: TREC Textbook
errata p560-563 Extra
slides
Week 13: Apr 2
Statistical Alignment &
Machine
Translation
Readings: Ch13
Slides by George Foster (NRC)
Statistical MT tutorial
Possible
extra topic:
Question Answering Links to IBM's Watson
Deep QA
Answers
Ottawa
Citizen article
Student
presentations for
projects (April 2)