E-mail: diana.inkpen@uottawa.ca
Lectures: See in BrightSpace. Office Hours: Fri,
1:30pm-2:30pm in SITE 5015
Basic principles of Information Retrieval. Indexing methods. Query
processing. Linguistic aspects of Information Retrieval. Agents and artificial
intelligence approaches to Information Retrieval. Relation of Information
Retrieval to the World Wide Web. Search engines. Servers and clients. Browser
and server-side programming for Information Retrieval.
Pre-Requisites (CSI3103 or ELG3300), (CSI3125 or CSI2115 or
SEG2101) or permission from the instructor.
Note: Everything
will be submitted electronically through BrightSpace.
Introduction to
Information Retrieval, by Christopher D. Manning, Prabhakar Raghavan and
Hinrich Schutze, Cambridge University Press, 2008 (online version available)
Pretrained
Transformers for Text Ranking: BERT and Beyond
Other books:
Information Retrieval, by D. Grossman and O. Frieder, Springer, 2004
(second edition).
Another online book Information
Retrieval, by C. J. van Rijsbergen (1979)
Modern Information Retrieval, by Ricardo Baeza-Yates and Berthier
Ribeiro-Neto, 1999. Companion website
to this book.
Course
notes (additional reading, pdf file)
Week 1:
Preliminaries. Introduction: Goals and
history of IR. The impact of the web on IR. The role of artificial intelligence
(AI) in IR.
The Internet and the WWW: History of
Internet. TCP/IP. IP addresses. WWW. HTTP. HTML. Web servers and clients.
Links: Top search
engines in US in 2010 Search engine
watch TREC CLEF FIRE
Week 2:
Basic
IR Models: Boolean and vector-space retrieval models; ranked retrieval;
text-similarity metrics; TF-IDF (term frequency/inverse document
frequency) weighting; cosine similarity.
Slides on Implementation of Vector Space Model
Example
discussed in class Solution to the
example.
Week 3:
Experimental
Evaluation of IR: Performance metrics: recall, precision, and
F-measure; Evaluations on benchmark text collections.
Interpolated
Precision. Example
discussed in class Solution to
example.
Week 4:
Query
Operations and Languages: Relevance feedback; Query expansion; Query
languages.
Example
discussed in class Solution (do
it by yourself first)
Links: WordNet Corpus-based Similarity Demo Dekang
Lin's Demos WordNet::Similarity
Text
Representation: Word statistics; Zipf's law; Porter stemmer;
morphology; index term selection; using thesauri. Metadata and markup languages
(SGML, HTML, XML).
More
slides on Web markup languages: HTML, XML, XHTML, RDF, OWL
Other materials: Semantic
Web and Linked Data
Links: Semantic
Web Linked Data video
Example:
term frequencies in Tom Sawyer
Week 5:
Web
Search: Search engines; spidering; metacrawlers;
directed spidering; link analysis (e.g. hubs and authorities, Google PageRank);
shopping agents.
Link
Analysis: the hubs and authorities algorithm, and PageRank algorithm.
PageRank
Hubs
and authorities example discussed in class Solution
(do it by yourself first) PageRank
examples
Links: Google
- Parallel architecture Slides
about the Google 1998 paper
Week 6: Text Categorization
: Categorization algorithms: decision trees; Rocchio; k-nearest
neighbor, Naive Bayes. Introduction to Deep Learning
Links: Weka
data mining tool Scikit-learn TensorFlow
PyTorch Keras
Other materials: Extra slides on Naive
Bayes SVM Sentiment
Analysis
Week 7: Feb 19-24 Study
Break (Reading Week, no classes)
Week 8:
Feb 28, Midterm revision; Mar 1, during class: Midterm
Week 9:
Advances IR Models: Neural Information Retrieval
Word
embeddings Transformers&BERT
Probabilistic models and LSI. Extra slides on LSI.
Language
Models for Information Retrieval.
Week 10:
Text
Clustering Clustering
algorithms: agglomerative clustering; k-means. Applications to information
filtering and organization.
Examples
of text classification and clustering discussed in class Solution
(do it by yourself first)
Week 11:
Learning
to Rank.
Week 12:
Question Answering : Retrieving
precise short answers to natural language queries.
Other material: Slides about IBM's
Watson. Links to IBM's Watson Deep QA
Week 13:
Cross-Language
IR Image Information
Retrieval
Links: Content-based image
retrieval
Week 14:
Exam revision