McMaster LibGuides: Data Visualization Guide: Selecting, preparing and exploring your data

Types of Data

The next step is selecting data that you will be using for your visualization. At this stage you should already have the data that you will be using. This may be your own data, for example from a survey that you did or from your experiment, or data from an external source such as historic data found in databases. You will want to figure out what type of data you have:

Categorical: Categorical variables contain a finite number of categories or distinct groups. Categorical data might not have a logical order. Qualitative data is often categorical.
Continuous: Continuous variables are numeric variables that have an infinite number of values between any two values. A continuous variable can be numeric or date/time. Continuous data is always quantitative.
Discrete: Discrete variables are numeric variables that have a countable number of values between any two values. A discrete variable is always numeric.

Preparing your Data

Once you have the data you want to think about preparing the data. And this will take a lot of energy and is a big portion of this process that is often underestimated. The kind of preparations that you will need to do will depend on the visualization tool that you’re going to be using and the state that your dataset is in.

So some basic things are rearranging your data and cleaning your data to prepare it to bring it into a visualization tool. For example you might need to rename the variables or column headers normalize values, delete headers and footers etc. Another thing is you might have to figure out how to deal with missing values or outliers? This is another stage where you need to consider the type of data you have. For example, you might want to think about how you can register responses that do not fit into the categories you have provided, even and especially if they are “edge cases” and “outliers.”

For some types of visualizations you might want to pre-calculate some stats like means or medians to use for the summary in your visualizations. Some tools where you can clean data is for example Tableau this is a very popular data visualization tool and you can download the free version and use it to clean your data as well. Another tool is Open Refine. Excel is also good option.

Here is an excellent Library Carpentry module that walks through using Open Refine to clean your data.