Building the scatter plot using population growth versus child mortality

Here's how to get started with building the scatter plot. If you are using the Analyst client, you can use Spotfire's AI-driven recommendations engine in order to get started (web client instructions will follow):

  1. Open the data panel and select the column called Annual Rate of Natural Population Increase (% increase per year).
  2. The recommendations will pop out to the right. Scroll down until you find this one (showing the relationship between Child mortality rate and region):
  1. Add it to the analysis and click on the visualization you just added in order to hide the data panel.

Now for the web clients:

  1. Open the data panel and select the following three columns:
    • Annual Rate of Natural Population Increase (% increase per year) 
    • Child mortality rate (% of children dying under the age of 5)
    • Region
  1. Locate the scatter plot recommendation and click MORE LIKE THIS:
  1. Once again, click MORE LIKE THIS on the scatter plot recommendation. In this case, we are telling Spotfire that we definitely want a scatter plot, but the various columns are not on the correct axes yet.
  2. Add the bottom visualization to the analysis:
  1. Hide the data panel by clicking on the visualization you just added to your analysis.

As we might expect, there's definitely a relationship between mortality rate, population increase, and region, but it all looks a bit confusing because data is shown for all years. There is a marker for every single data point. However, we can still see the progression of each of the countries' data through the years. We can make things a lot clearer by continuing to work with the analysis.

Region is useful to show as there's a really strong relationship between region and infant mortality and rate of population increase. However, the data becomes even more interesting if we switch to show each country as a marker on the scatter plot and add the dimension of population to the visualization. A scatter plot visualization is particularly appropriate for viewing lots of dimensions of data in one go! Let's complete the configuration of the scatter plot:

  1. From the legend of the visualization, select Entity for Marker by:
  1. Take a look at the axis selectors—Spotfire has automatically applied an aggregation method to each of the axes—it needs to do this as it must group the data per entity (country). It's now showing multiple rows of data behind each marker on the scatter plot.
  2. It's good practice to check and adjust the aggregation method that's being usedwe can do this by pulling out the axis selector. Let's do that now and adjust both axes to show the Avg (Average) (if they are showing Sum):

Aggregation methods: Sp otfire's default aggregation method is Sum . My practical experience of real-world data suggests that Avg (Average)  is more appropriate in most cases. Spotfire X added a user preference for this. I usually set mine to Avg . You can set this via Tools | Options | Visualization (Analyst clients only), or your server administrator can set it for all users in your organization.
Aggregation is the method by which Spotfire calculates measures over groups in the data. It groups data and applies the aggregation method to that group. There are lots of different aggregation methods—from Sum to Average , Cumulative Sum, and many more.
  1. To show population as another dimension in the data, click the Size by legend item and choose Total population (Gapminder). Adjust the aggregation method to Sum:
At this stage, we are using Avg for the aggregations on the x- and y-axes o f the scatter plot and Sum for the size by. Since the x- and y-axes ar e showing percentage increases, it doesn't makes sense to sum these as we can never sum percentages! Additionally, each marker on the scatter plot contains the data for multiple years, so it doesn't make sense to sum the populations. However, as you are about to see, we are about to filter the data to an individual year, so summing the population data for each region will then be valid.
  1. Open the filter panel (by clicking on the funnel icon in the toolbar) and find the Year column or open the filter from the data panel on the left (by clicking on the column name and then the funnel icon).
  2. Right-click on the filter and change its type from a Range Filter to an Item Filter:
  1. The data is most complete from the middle of the 20th century, so choose a year from that range, for example, 1974:
  1. Labelling marked data is also a nice thing to do—we can configure this by accessing the visualization properties and choosing to label by Entity (the web client is shown here):
  1. Here is the same dialog in an Analyst client:

  1. It's good practice to give visualizations a meaningful title, so double-click on the title bar and enter a new title—I suggest Population Increase Rate vs Child Mortality Rate.
  2. You should end up with something that looks like this:

The chart is showing the relationships between child mortality, population growth, rate of population increase, and total population per country. The size of the markers indicates the total population and the color indicates the region. Combining the x- and y-axes along with marker size and color allows us to visualize four dimensions in the data simultaneously. It's also possible to add more dimensions to a scatter plot with shape, rotation, drawing order, and others, but we are not covering these in this chapter.

Try selecting China and India by marking them—hold down the Ctrl key as you click on them. They are the largest markers on the scatter plot:

Note how much they stand out—not only because their markers are larger than other countries (as you would expect from their population sizes), but also because their child mortality is still comparatively high, as is their population growth. It's also interesting to inspect the countries at the bottom-left of the plot—they tend to be developed European countries. They have small populations, shrinking populations, and very low child mortality.

When following general good practice guidelines, it is a good idea to hide all visualization configurations that are not being used. To do this for the current example, right-click on the legend and uncheck the Shape by option.

Up until this stage, you have learned how to construct a scatter plot that shows multiple dimensions of data all at once. Along the way, you've learned some best practices around aggregation methods, visualization titling, and more.