Introduction to Data Technologies
This course will cover several computer technologies that are useful for working with data prior to and subsequent to the actual data analysis. Specifically, we will look at HTML for statistical reports and HTML forms for data collection. We will discuss a variety of storage formats, with extra time spent on XML and databases, and we will look at SQL for accessing information stored in databases. Finally, we will look at tools for processing data that has been stored as text, including an introduction to regular expressions.
The R language and environment for statistical computing and graphics will be used to provide the implementation of several of these technologies and as the glue to make them all work together.
Each week there will be homework specific to the technology being discussed and there will be an overall project that combines everything together to produce a dynamic database-driven web site.
This course requires a web browser (e.g., firefox), SQLite (public domain), and R (GPL). We will also use the following R add-on packages: R2HTML, Rpad, RSQLite, and XML.
Resources
here you find all the relevant links for this course:
- Paul Murrell's email address
- The course text: Introduction to Data Technologies (in PDF and HTML formats).
- Material for week 1.
- Material for week 2.
- Material for week 3.
- Material for week 4.
- Material for week 5.
- Material for week 6:
- Lecture slides:
Text Processing and R (handout) -
Files that were used in demonstrations during the lecture:
R code for processing text
source text for the impromptu python challenge
- Assignment 6.
NOTE: model answers for previous assignments are now available with the material for the relevant week (see links above).
- Lecture slides:
- A description of the overall project
- Installation instructions for the used software
- Official lecture outline
Last change: 2006-03-28 by Stefan Theußl