Managing Gigabytes: Compressing and Indexing Documents and Images. Edition = 2, file = {Scan:1900-99/WittenMoffatBell99.pdf:PDF;Elsevier Product.
Christopher Manning and Prabhakar Raghavan
Note: this is the Fall 2004 course website; thecurrent Fall 2005 CS276 website is at: http://cs276.stanford.edu
Lecture: 3 units, TuTh 4:15-5:30 Gates B1
TAs:Louis Eisenberg, Daniel Gindikin
Staff e-mail:[email protected]
TAs:Louis Eisenberg, Daniel Gindikin
Staff e-mail:[email protected]
Lectures are also available online and on television through SCPD/SITN.
Course Description:
Text information retrieval systems; efficient text indexing; Boolean, vector space, and probabilistic retrieval models; ranking and rank aggregation; evaluating IR systems. Text clustering and classification methods: Latent semantic indexing, taxonomy induction, cluster labeling;classification algorithms and their evaluation, text filtering and routing.
A note on structure: This year, we're teaching a two quartersequence (CS276A/B) on information retrieval, text, and web page mining,somewhat similarly to in 2002-03, whereas in 2003-04, there was acompressed one quarter course (CS276). Theorganization this year is a little different however: this year, thefirst course will focus on information retrieval, and the text miningproblems of text clustering and classification. This course will havehomeworks, practical exercises and exams, but no large project. Thesecond course will focus on areas like the web and XML, and will be alarge project course.
Textbooks:
For CS276A, we're not having an officialtextbook (there isn't one with good coverage of all and only the topicswe'll discuss), but the books listed remain good references.Managing Gigabytes is particularly good for technical IR in thefirst part of the course, but doesn't cover topics in the second half ofthe course.
Prerequisites:
CS 103B and CS 107, and any one of CS 121, CS 145, or CS 161, or equivalent background.
Programming experience will be necessary for the two practical exercises.
Announcements:
- No office hours Thu, Nov 4.
Instead we'll have extra office hours on Wed, Nov 17 1-2 in Gates B26a. - The class number for Axess is 26099.
Additional Information:
Assignments:
Problem set #1
- docPDF
- solutions: docPDF
- statistics: mean 63 (out of 70), standard deviation 4.5, max 71
Practical exercise #1
- grading criteria: docPDF
- statistics: mean 89, standard deviation 6.5, max 104
Problem set #2
- docPDF
- solutions: docPDF
- statistics: mean 80 (out of 90), standard deviation 7, max 90
Practical exercise #2
Exams:
Midterm
- docPDF
- solutions: docPDF
- statistics: mean 67, standard deviation 13, max 91
- practice midterm: docPDF (solutions: docPDF)
![Pdf Pdf](http://srv2.umlib.com/d5985a0b4fd40dd9f2ce1cc5ff54b6dd/G31M_BIOS_Settings.pdf-0-wat.png)
Final
- docPDF
- solutions: PDF
- statistics: mean in the 120s, max about 180
- practice final: PDF(solutions: PDF)