Introduction to OpenRefine
Introduction to OpenRefine (15 minutes) (Lucia)
What is OpenRefine?
OpenRefine is a free, open source tool used to clean messy data and transform data. parsing data from websites.
Row vs Record
Why use OpenRefine?
Insert possible scenarios that need the use of OpenRefine. It can be used to remove duplicate records, separate multiple values contained in the same field, and
- Open source application that can be used to clean ‘messy’ data
- User interface runs through your browser
Installing OpenRefine (provide a link to a document in advance)
Best practices in managing your data
- Consider using software such as Git, a free and open source version control system that can be used to manage project data and to keep track of file versions.
- Keep a copy of the master (raw) data
- Regularly back-up your working files
- After performing an operation on the data, review the data to make sure the operation behaved as expected and values make sense
- Keep a record of the operations performed on the data as you clean it (see Basic - Tracking Operations)