Preface

The basic premise of this book is that scientists are required to perform many tasks with data other than statistical analyses. A lot of time and effort is usually invested in getting data ready for analysis: collecting the data, storing the data, transforming and subsetting the data, and transferring the data between different operating systems and applications.

Many scientists acquire data management skills in an ad hoc manner, as problems arise in practice. In most cases, skills are self-taught or passed down, guild-like, from master to apprentice. This book aims to provide a more formal and more complete introduction to the skills required for managing data.

The focus of this book is on computational tools that make the management of data faster, more accurate, and more efficient. The intention is to improve the awareness of what sorts of tasks can be achieved and to describe the correct approach to performing these tasks.

This book promotes a philosophy for working with data technologies that emphasizes interaction via written computer languages.

This book will not turn the reader into a web designer, or a database administrator, or a software engineer. However, this book contains information on how to collect and publish information via the world wide web, how to access information stored in different formats, and how to write small programs to automate simple, repetitive tasks.

This book is intended to improve the work habits of individual researchers. It aims to provide a level of understanding that enables a scientist to access and interact with data sets no matter where or how they are stored.

This book is designed to be accessible and practical, with an emphasis on useful, applicable information. Each topic is covered in three different ways: initially, basic ideas are introduced, in an appropriate order, and using trivial examples, to give a quick, easy to read overview of the topic; this is followed by case studies which combine ideas and techniques together and provide demonstrations of more sophisticated and real-life use; finally, there are separate reference chapters, which contain almost no examples, just the bare information for easy look-up.

This book is written primarily for statisticians and this is reflected in the broad range of data sets used in the examples. However, the content is relevant for anyone whose work involves the collection, preparation, or analysis of data.

Subsections

Paul Murrell

This document is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.