or browse databases: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z #

View of Georgetown campus from the Virginia side of the Potomac

Data Visualization: 1. Planning

Process

There are three main steps to completing a data visualization. Steps 1 and 2 are interchangeable and may sometimes occur simultaneously

1) Formulating the question you want your visualization to answer or the story you want your visualization to tell

2) Gathering, understanding, and sorting the data

3) Applying the visual representation

A good overview of the process can be found on the lynda.com course "Data Visualization Fundamentals." 

 

image credit: "step" Attribution Some rights reserved by tomo908us

Gather Data

If you already have the data that you want to use for your data visualization project, skip this step.

If not, a few quick links have been provided below to get you started. Check out this libguide for a more comprehensive list of resources.

 

United States

www.census.gov 

www.bls.gov

World

www.data.gov

http://data.worldbank.org/

http://www.who.int/research/en/

News

http://developer.nytimes.com

Grab Bag

http://www.visualizing.org/data/browse

http://www.google.com/publicdata/directory

http://datamarket.com/

http://datamob.org/datasets

Tools for Statistical Analysis

Many of these tools are installed on computers throughout Lauinger library. See this chart for more specific information.

You can also download some of this software for free onto your personal laptop through UIS. See the UIS webstore here.

 

Easy to Moderate

     

 Excel:  Without a doubt, the most familiar tool for data analysis. Excel is useful for sorting and preparing your data, but the out-of-the-box charts and visualizations are ugly. However, with a little customization and imagination, Excel can create surprising results. Users can also export charts from Excel and use them as a starting point in Illustrator or Inkskape. 

 

R: An open-source statistical computing and graphics software that  runs on UNIX platforms, Windows and MacOS. Requires some programming. 

 

Wolfram Mathematica: The world's most powerful integrated computation system for non-professional use by hobbyists and enthusiasts alike. Runs on Windows, Mac, and Linux operating systems.

 

Advanced:

SPSS: IBM SPSS Statistics is an integrated family of products that addresses the entire analytical process, from planning to data collection to analysis, reporting and deployment.

 

Stata: Data Analysis and Statistical Software for Professionals. 

 

SAS: SAS Analytics provide an integrated environment for predictive and descriptive modeling, data mining, text analytics, forecasting, optimization, simulation, experimental design and more. 

Understanding and Sorting the Data

The following is an excerpt/summary of the various stages of the data familiarization and preparation process as outlined in Andy Kirk's book Data Visualization: a successful design process. Click here to read the full chapter.

Examining the Data

  • Do you have all the data you need? Does it include all the variables that you are interested in?
  • Are there any obvious errors in your data? Is there any data that is missing?

Understanding the Data Types

  • What type of data have you acquired?

  • What is the range of values for each type of data?


Transforming for Quality

  • Do you need to clean up your data? Do you need to fix any errors or fill in any gaps in your data?

Transforming for Analysis

  • Parsing (splitting up) and variables, such as extracting year from a date value
  • Merging variables to form new ones, such as creating a whole name out of title, forename, and surname
  • Converting qualitative data/free-text into coded values or keywords
  • Deriving new values out of others, such as gender from title or a sentiment out of some qualitative data
  • Creating calculations for use in analysis, such as percentage proportions
  • Removing redundant data for which you have no planned use (be careful though!)

Tools for Data Conversion

Mr. Data Converter - Will convert your Excel data into one of several web-friendly formats, including HTML, JSON and XML.

Dataink.com - Use the supplied Excel spreadsheet to convert your data into a data table that will be compatible with the Google Code Playground for Google Charts. 

DataWrangler - A fantastic tool developed by the Stanford Visualization Group for wrangling and cleaning up your data into a format that can be interepreted by Tableau, R, etc.

Finding a Story

Below is a list of different types of data stories (and examples of each) that you can convey using your data:

"1. Measurement  (The simplest story — counting or totaling something)
‘Local councils across the country spent a total of £x billion on paper clips last year’


But it’s often difficult to know if that’s a lot or a little. For that, you need context — which can be provided by:

2. Proportion
‘Last year local councils spent two-thirds of their stationery budget on paper clips’
Or

3. Internal comparison
‘Local councils spend more on paper clips than on providing meals-on-wheels for the elderly’
Or

4. External comparison
‘Council spending on paper clips last year was twice the nation’s overseas aid budget’
Or there are other ways of exploring the data in a contextual or comparative way:

5. Change over time
‘Council spending on paper clips has trebled in the past four years’
Or

6. ‘League tables’
These are often geographical or by institution, and you must make sure the basis for comparison is fair, e.g. taking into account the size of the local population.

‘Borsetshire Council spends more on paper clips for each member of staff than any other local authority, at a rate four times the national average’

Or you can divide the data subjects into groups:

7. Analysis by categories
‘Councils run by the Purple Party spend 50% more on paper clips than those controlled by the Yellow Party’

Or you can relate factors numerically

8. Association
‘Councils run by politicians who have received donations from stationery companies spend more on paper clips, with spending increasing on average by £100 for each pound donated’


But, of course, always remember that correlation and causation are not the same thing.
So if you’re investigating paper clip spending, are you also getting the following figures:

  •     Total spending to provide context?
  •     Geographical/historical/other breakdowns to provide comparative data?
  •     The additional data you need to ensure comparisons are fair, such as population size?
  •     Other data which might provide interesting analysis to compare or relate the spending to?"

This information was copied from http://datajournalismhandbook.org/1.0/en/understanding_data_5.html and shared under a Creative Commons Attribution-ShareAlike license