Research Seminar: Statistical Natural Language Processing

Class Objectives (Learning Outcomes)

After completion of the course, attendants will be able to:


Good knowledge of the R language, statistics, and the tm package.

Teaching and Learning Methods

Reading seminar and R-based mini labs.


Instructor: Kurt Hornik (Kurt.Hornik _AT_

Assistant: Stefan Theussl (Stefan.Theussl _AT_


UnitDate Time Topic Slides
1 08.10.15:00 -- 18:00Preliminary Talk --
2 22.10.15:00 -- 18:00tm and plugins: Ingo, Stefan (Chapter 2/11) [1, 2, R]
3 05.11.09:00 -- 12:00Raw Text/Tagging: Karl, Norbert / Kamran (Chapter 3/5) [1, 2, R]
4 26.11.09:00 -- 12:00Classification/Information extraction: Paul, Thomas, Willy / Angela, Mathias (Chapter 6/7) [1, 2]
5 10.12.09:00 -- 12:00 Data retrieval/Sentiment Analysis: Mario [1]
6 14.01.09:00 -- 12:00 Project discussion --
7 28.01.09:00 -- 12:00 Project discussion --

The seminar will take place at the Besprechungsraum of the Institute for Statistics and Mathematics, UZA2, Ebene 4.


Participants should pick one project out of the following list, preferably the one which is related to the presented topic (in the first half of the course).

A package vignette (eight to twelve pages) is to be delivered at the end of the project.

Criteria for Successful Completion

Presentations and package vignette.


Additional Ressources

See also

