94 minute read

I’ve been having some days! Between sick kids, myself getting sick, deadlines and development promises, I haven’t had a lot of time to make any kind of blog post. I have been following along with my #tidytuesday analysis and visualization though, so this week you get a double feature!

First up is last weeks Australian tax data. It was interesting to see other approaches to visualizations and what people presented, and a lot of people represented the data as straight up job salary and the inequity between men and women. In reality it’s a bit more murky than that, as that data was actually taxable income (which includes investments, independent work, etc.) so some the scale of differences in certain jobs are greatly magnified. This doesn’t mean that inequity doesn’t exist. It DEFINITELY does. What this does mean is that there needs to be further investigation, digging and cleaning to get to an analysis that is highly accurate. It also means that visualizations using coarse data are useful at highlighting the existence of a problem or pattern.

The first thing I did was explore the data to get a sense of how big the equity gap was. I tidied up the data, and classified jobs by dividing male taxable income by female taxable income and visualized that distribution as a waffle plot. This shows the jobs that men report more income in blue, and the jobs that women report more income in pink.

I decided to show the income difference in positions through a simple scatter plot, plotting each job as income earned by women vs income earned by men. I highlighted the jobs where women have greater income than men to illustrate the relative size of this equity problem. I also wanted people to able to explore the data, and provide them the underlying information in each job. Providing that level of detail, even simply labeling the job would put far too much clutter on the plot and obscure any patterns. This is usually a problem that can easily be handled by adding interactivity to the plot, in this case creating a custom tooltip that provides the underlying information on demand. There are packages to add this in R quite readily, mainly from the htmlwidgets family, and my favourite of those is ggiraph which provides interactive versions of ggplot geoms that can be used by all means of javascript. If you’re interested in this, go check out the package on github. There has been a pile written on the use of interactivity and interactive visualizations, but my favourite overview has to be in Andy Kirk’s (@vvisualisingdata) Data Visualization: A Handbook for Data Driven Design