Guidelines to Handling Messy Data

By Anthony M. Wanjohi:

There exists no well defined guidelines on how to handle messy data. The guidelines may differ depending on the nature of the data, whether qualitative or quantitative data. One of the approaches that could be used to handle messy data for instance is the use of right application from the onset. Messy data in this case could be stacked (joined data) or unstack data entered in list format in spreadsheets such as Ms Excel. Use of SSC Statistics – Excel plugin, Genstat, among other applications could support in addressing the problem of messy data.

There are other issues in messy data including ‘lazy, fake responses’ and non-sampling related errors caused by measurement tool, respondents, interviewee, data entry agents, computer errors, etc. These could be addressed through double entry, data validation through use of syntax in the relevant software, data cleaning (viewed as unethical by some), omission of the messy variables, review of literature or even the use of the data for piloting purpose.

Handling messy data can be as messy as the data are and can therefore lead to erroneous conclusions. Thus, documentation of all data-cleaning decisions is needed from the part of the researchers and their assistants.

Further reading

Leahey, E. (2006). Approaches to Handling Messy Data. Retrieved from