Overview
OpenRefine is a powerful open source tool for working with messy data. Our most common use for it is to clean and transform data.
Installation
The OpenRefine website has the latest stable release. They also have extensive installation instructions, including some important tips Mac users on how to circumvent security warnings when launching the file.
While there is an OpenRefine application you launch, the program itself launches and uses your web browser for the interface.
Walkthroughs
These are detailed demos with illustrated step-by-step instructions of commonly-used features of OpenRefine. If you are here to learn how to use OpenRefine, start with these.
When going through these demos, know that numbered lists are steps for you to do:
- Do this thing
- And then this thing
Regular bullet lists are just information:
- This is just information
- It might be important. Or not.
Case studies
These are examples where I’ve used OpenRefine to solve a problem. They are not tutorials, but discussions of how OpenRefine was used in some specific instances with key points explained.
- AHRQ diagnostic codes pulled from a PDF.
- Austin State Hospital cemetery burials where records are on more than one line.
Resources
- OpenRefine.org has great documentation, tutorials, FAQs and more, including:
- OpenRefine’s user manual
- Expressions
- GREL functions
About me
By Christian McDonald
Associate Professor of Practice
School of Journalism and Media, Moody College of Communication
University of Texas at Austin