Monday, April 13, 2020

Different types of graphs

Line Graphs

type of graphs - digital marketing over time line graph

Line charts, or line graphs, are powerful visual tools that illustrate trends in data over a period of time or a particular correlation. For example, one axis of the graph might represent a variable value, while the other axis often displays a timeline.

Each value is plotted on the chart, then the points are connected to display a trend over the compared time span. Multiple trends can be compared by plotting lines of various colors.

For example, the interest of digital marketing over time can be visually shown with ease through the use of a line graph. Simply plot each number of searches along the timeline to view the trend.

Bar Graphs

types of graphs - social media platform usage bar graph

The simplest and and most straightforward way to compare various categories is the classic bar graph. The universally-recognized graph features a series of bars of varying lengths.

One axis of a bar graph features the categories being compared, while the other axis represents the value of each. The length of each bar is proportionate to the numerical value or percentage that it represents.

For example, $4 could be represented by a rectangular bar four units long, while $5 would equate to a five-unit long bar. With one quick glance, audiences learn exactly how the various items size up against one another.

Bar graphs work great for visually presenting nearly any type of data, but they hold particular power in the marketing industry. The graphs are ideal for comparing any sort of numeric value, including group sizes, inventories, ratings and survey responses.

Pie Charts

types of graphs - most frequently used visuals pie chart

Pie charts are the simplest and most efficient visual tool for comparing parts of a whole. For example, a pie chart can quickly and effectively compare various budget allocations, population segments or market-research question responses.

Marketing content designers frequently rely on pie charts to compare the size of market segments. For example, a simple pie graph can clearly illustrate how the most popular mobile-phone manufacturers compare based on the sizes of their user-bases.

Mosaic or Mekko Charts

types of graphs - smartphone user mosaic or mekko chart

Basic line, bar and pie charts are excellent tools for comparing one or two variables in few categories, but what happens when you need to compare multiple variables or multiple categories at the same time?

What if all those variables aren’t numeric even? A mosaic – or Mekko – chart plot might be the better choice.

Perhaps a market analyst, for example, wants to compare more than the size of various mobile-phone markets. What if, instead, he or she needs to compare the size of the user bases, as well as the age groups within each group?

A mosaic chart would allow said marketer to illustrate all the variables in a clear and straightforward manner.

In the above example, one axis of the chart represents the categories being compared – mobile phone manufacturers – while the other axis lists various age ranges.

The size and color of each cross-section of the chart corresponds with the market segment it represents, as depicted in the chart's legend.

Population Pyramids

Market segments are often divided based on age and gender, and a population pyramid is an ideal visual representation of the two groups.

The graph classically takes on the shape of a pyramid when a population is healthy and growing -- the largest groups are the youngest, and each gender dwindles somewhat equally as the population ages, leaving the smallest groups at the top of the graph.

A population pyramid that veers away from its classic shape might indicate an irregularity in a population during a particular period, such as a famine or an economic boom that led to an increase in deaths or births.

Of course, population pyramids aren’t always used to compare populations by age, and therefore don’t always take on the graph’s namesake shape.

A marketer, for example, might use the design to compare a population by income, weight or IQ, in which the smallest groups will often be at both the top and bottom. Regardless, the graph clearly depicts population trends, while it compares the sizes of two related groups.

Spider Charts

types of graphs - customer satisfaction spider chart

When a statistician needs to visually compare three or more quantitative variables, he or she might choose to use a radar chart, also known as a spider or star chart.

The chart usually consists of a series of radii, each representing a different category, that splay out from a center point like spokes.

The length of each “spoke” is proportionate to the value being compared. For each category, the spokes are then connected with a line of a designated pattern or color, forming a star-like shape with points equal to the number of categories.

The result is a graphic representation that can reveal trends and compare categories all at the same time.

Stock Charts

Image source

One of the most vital of all financial graphs, stock charts help investors track the markets to determine profits and loss, as well as make buying and selling decisions.

While a variety of graphs are used to represent market changes, the most common is likely the basic line graph turned histogram.

The lines simply tracks changes in a particular stock’s or overall market’s value over a period of time. Multiple stocks can be tracked and compared at the same time by transforming the line graph into a stacked area chart or simply using multiple lines of various colors.

Flow Charts

types of graphs - saas referral program flow chart

Oftentimes in business – as well as other industries – a process must be diagrammed. A flow chart allows a process to be sequenced step-by-step, from beginning to end, for the purpose of analyzing, designing, documenting or managing it.

These flow charts can even feature multiple beginnings and ends, with countless pathways and journeys in between.

While a simple flow chart can certainly document a basic process from A to B to C, the diagrams are more frequently used to illustrate more complex sequences with multiple decisions or conditions along the way.

Each time a condition is met, the chart diagrams the various options, then the path continues following each choice.

Gantt Charts

Gantt charts are special types of bar graphs used to diagram projects and schedules. The use of colored bars of varying lengths reflect not only a project’s start and end dates, but also important events, tasks, milestones and their timeframes.

Modern Gantt charts can also illustrate activities’ dependency relationships.

If Team 3’s completion of task C, for example, is dependent upon the prior completion of task B by Team 2, the chart can not only reflect that relationship, but the scheduled dates and deadlines for each.

Control Charts

Also commonly known as a process-behavior chart, a control chart helps determine if a data set falls within a mean or predetermined control range.

Frequently used in quality control processes, a typical control chart consists of points plotted on two axes, representing sample measurements.

The mean of each point is calculated, and a center line across the graph at the mean value. Then, a standard deviation from the mean is calculated using each sample.

Finally, upper and lower control limits are determined and diagrammed to reflect the points at which deviation is beyond the expected standard.

Waterfall Charts

Particularly useful in accounting and qualitative analysis, waterfall charts illustrate how an initial value is affected positively and negatively by various factors.

For example, a waterfall chart could clearly and efficiently communicate how an opening balance changes month by month over the course of a year.

Because they often appear as though bars are floating throughout the graph, waterfall charts are sometimes referred to as floating bricks or Mario charts.

Hierarchy Diagrams

types of graphs - project hierarchy diagram

Similar in appearance to a flow chart, a hierarchical diagram, also known as an organizational chart or an organigram, illustrates the structure of an organization, as well as the relationships within it.

A typical company organigram, for example, lists the CEO at the top, followed by presidents, vice presidents, managers and so on.

An organizational chart can illustrate the chain of command from any employee all the way to the top. Hierarchy diagrams are similarly used to represent pedigrees, scientific classifications, demographics and any data set with a similar breakdown.

Take the above diagram as an example, where a project team is organized in an organizational hierarchy chart so that everyone knows who their supervisor is in a project.

Engineering-and-Technology-Scatter-Plots types of graphs and charts

Also known as a scattergram, the graph consists of two axes, each representing a set of data. For example, one axis might represent the numbers of miles driven by a vehicle, while the second axis displays the total gallons of gas used.

For each vehicle sampled, its miles-per-gallon average is represented by a dot plotted onto the graph. Once multiple dots are plotted, trends can be spotted and samples can be compared, depending on how many colors are featured in the chart.

Trellis Plots

Sometimes a statistician will need to compare more data sets than can be represented by a single graph. What if, for example, a graph needs to compare not only miles driven and gallons used, but also the number of gears and cylinders contained in each vehicle sample?

A trellis plot, also called a lattice graph or plot, can display and compare all of those variables. While the above example uses a series of scatter charts, trellis plots commonly feature series of bar or line graphs, as well.

Function Plots

types of graphs - probability density function graph

Mathematicians, engineers and statisticians often need to determine the value of an equation by graphing its result. The graph of a function is the set of all points whose coordinates satisfy the equation.

Therefore, the function of an equation with variables of x and y would be drawn on a graph with an x and y axis. Likewise, an equation that also included a variable of z would need to be drawn on a three-dimensional graph with a third axis.

Function graphs of common shapes are visually associated with their corresponding algebraic formulas.

Binary Decision Diagrams

types of graphs - binary decision diagram

A binary decision is a choice between two alternatives, so a binary-decision diagram illustrates the path from one decision to another.

In computer science, binary decisions make up the Boolean data type, in which two values are associated with different actions within a process flow.

Outside of computer science, a binary-decision diagram can still be used to illustrate any process by which actions are based on a decision between two values, whether those conditions be yes or no, true or false, 1 or 0 or any other opposing choices.

Ultimately, the path taken will diagram how the process flowed, from beginning to end

Timelines

types of graphs - history of van gogh timeline

Possibly the most self-explanatory of data visualizations, a timeline tracks data over a time period. Significant dates and events are highlighted at the point at which they appear on a chronological scale. Timelines can be used alone or in conjunction with other visualizations.

History-Tree-Diagrams types of graphs and charts

A form of hierarchical diagram, a genealogical tree illustrates the structure of a family. It can either begin with an ancestor, then diagram his or her descendants, their siblings, marriages and children, and so on.

A pedigree chart, on the other hand, begins with an individual and charts their ancestry, from parents to grandparents, and continues up.

Sunburst Charts

A type of multi-level pie chart, a sunburst chart is used to illustrate hierarchical data using concentric circles. Each ring of the “sunburst” represents a level in the hierarchy, with the root node represented by the center circle, and the hierarchy moving outward.

While a sunburst chart can be used to illustrate a familiar or company hierarchy, it can also break data down by time periods, creating a historical hierarchy.

Various branches of an organization can be represented by designated hues, with different levels often taking on varying shades of the same color family. Rings can also be divided further to represent multiple divisions within the same organizational level.

In fact, a traditional, complex color wheel, such as that used by paint stores, is another form of sunburst chart.

Line Graphs

types of graphs - wildfire deaths line graph

If a timeline is a form of graph, then it only makes sense that historians often employ it in displaying other data. By plotting immigration levels against a timeline, the resulting histogram illustrates population trends over a century or longer with a basic line graph.

Stacked Area Charts

Stacked area chart are frequently used to diagram changes of multiple variables across time. Multiple lines can be drawn, for example, to track the population changes of various states across time.

The area below each line can be colored a different hue to represent the state it signifies, resulting in a graph that clearly represents population trends, while at the same time displaying each state’s data in order from least to most populous.

Stacked Bar Graphs

When studying groups of people, it’s common to compare multiple variables at once. After all, it’s enormously more useful to examine racial backgrounds, ages and gender in addition to total population.

A stacked bar graph combines elements of the traditional bar graph and the pie graph to communicate totals, trends and proportions in a single illustration.

Rather than simply illustrating changes in global population over time with a traditional column bar graph, a stacked bar graph can also represent the racial makeup of the population during each year and how those proportions have changed during the same period.

Trellis Bar Graphs

When presenting data with three variables, a designer might try and create a three-dimensional bar graph, but adding an additional axis can sometimes appear cluttered and unclear, especially in printed form.

Instead, additional variables can be presented in a trellis – or lattice – format.

By combining a series of bar graphs in a modular design, additional sets of data can be easily compared. For example, a single bar graph could illustrate the political breakdown of Poland’s national elections over a period of five years.

But a trellis bar graph could depict the same data set for 16 European nations.

Stacked Area Charts

Stacked area charts are ideal for comparing values that would normally require multiple line graphs. Each line represents a different category, and the area below each line is generally shaded a designated color so each data set can be easily compared.

For example, an area chart with one axis that represents a numeric value, and another axis that serves as a timeline, data for various categories over time can be tracked and compared with a single graphic.

Multi-level Pie Charts

All too often a designer finds him or herself with more sets of data than can be presented in a single standard graph. Fortunately, in the case of a pie chart, multiple layers of data can be presented without the need for multiple images or a trellis design.

A multi-level pie chart, for example, consists of tiers, with each layer representing a separate set of data, and can be the perfect solution.

So while it would take three traditional pie graphs to illustrate the various sources of recorded words for three different decades, a multi-level pie graph can not only take the place of all three, but it also offers a clearer visual comparison of each year’s results.

Venn Diagrams

types of graphs - sustainable development venn diagram

The classic Venn diagram, also known as a logic diagram, illustrates all possible logical relationships between a designated collection of sets.

For example, the overlap of two or more circles – in this case there are three – visually represents the similarities and differences between the social, economical and environmental areas of sustainable development.

The more circles used, the more logical conclusions that can be represented by their overlap. The combined set of all data in the diagram is known as the union, while the areas that overlap are called intersections.

A Venn diagram in which the relative size and area of each shape is proportional to the size of the group it represents is known as an area-proportional or scaled Venn diagram.

Science

Scattergrams

Scattergrams, also known as scatter plots, are graphs that show the relationship between two or more variables. The plots use mathematical coordinates to represent two variables of a data set.

Data is displayed in a scattergram as a collection of points, each representing the value variables plotted on a horizontal and vertical axes. If points are color-coded, an additional variable can be represented in a single chart.

By plotting certain data sets, scientists can discover trends of which they might not otherwise be aware. For example, a scattergram might allow a doctor to plot patients’ resting heart rates against their body-mass index figures.

The resulting graph reveals that a higher heart rate correlates with a higher BMI.

Trellis Line Graphs

Trellis graphs allow scientists to examine complex, multi-variable data sets, comparing a greater deal of information at once.

While a single line graph can illustrate monthly UFO sightings in Tennessee over an 18-year period, a trellis line graph will display the same data for all 50 states in a single graphic.

A trellis line graph is based on the same principle as its simpler counterpart, plotting trends in a dataset consisting of two variables – numbers of UFO sightings and dates – through use of connecting points on two axes.

But by combining multiple line graphs in a modular format, an additional variable – location – is represented.

Pareto Charts

types of graphs - food defects pareto chart

Sometimes a basic graph doesn’t display enough information to draw the necessary conclusion. A Pareto chart combines a bar graph with a line graph to illustrate not only categories’ individual values, but also the cumulative total of the entire set.

Pareto charts are designed to highlight the most important of a set of factors.

In a Pareto chart that tracks the type and frequency of food defects, the bars illustrate each type of defects’ total occurrences – as reported on one of the charts’ axes – while the line charts the cumulative frequency of all categories, from most to least prevalent.

The result is a graph that clearly reflects the most common food defects and what percentage of the whole each represents.

Radar Charts

A radar chart, also commonly referred to as a spider chart or a star chart, displays data sets consisting of three or more variables on a two-dimensional graphic. Each variable’s quantitative value is reflected across an axis that usually starts in the chart’s center point.

As each item’s variables are charted, a line connect the points on each axis, forming an irregular polygon that may or may not resemble a star or spider web.

Multiple data sets can be compared on a single radar graph by representing each with a different color, identified by labels or in an accompanying key.

A radar chart can, for example, clearly compare and illustrate the costs and outcomes of various medical procedures as they relate to multiple conditions – all in a single graphic.

Spherical Contour Graphs

Image source

Plotting planetary conditions on a basic two-axis graph can pose a problem. The Earth, after all, is a sphere. Instead, data can be plotted on a three-axis field using variables of x, y and z. The resulting plot, if completed, will take the form of a sphere.

A spherical plot can, for example, reveal global temperature or rainfall trends by assigning each value range with a particular color, then plotting the data with points of the corresponding hue.

Health and Wellness

Multi-Line Graphs

Just as medical symptoms are rarely isolated, neither is the analysis of biometric data. After all, rarely does one statistic paint the entire medical picture.

Line graphs can reflect multiple data sets with lines of varying patterns or color. For example, a multi-line graph can illustrate changes in life expectancies of not just the population in general, but for each gender and multiple racial backgrounds.

Stacked Bar Graphs

Stacked bar graphs aren’t useful only in illustrating parts of of a whole. They can also be used to display additional variables.

While a basic bar graph could represent what portion of a population is classified as overweight over a designated time period, a stacked bar graph can also track how much of the total is obese.

Flow Charts

types of graphs - should you nap flowchart

Following the proper process is probably more important in medicine than in any other field. After all, if the surgeon forgets a step, you might very well bleed to death while you sleep.

Flow charts are frequently used by hospitals, clinics and other medical facilities to ensure proper procedures are uniformly followed.

Pictograms

Health-and-Wellness-Pictographs types of graphs and charts

In a pictogram, or pictograph, images and symbols are used to illustrate data. For example, a basic pictogram might use an image of the sun to signify each fair-weather day in a month and a rain cloud to symbolize each stormy day.

Because images are known to hold more emotional power than raw data, pictograms are often used to present medical data.

An illustration that shades five of 20 person symbols to represent a 20-percent death rate carries a more powerful message, for example, than a bar, line or pie that illustrates the same data.

Anatomical Diagrams

types of graphs - amazing fact anatomical diagram

Medical diagrams are often used to illustrate anatomy, treatments or disease pathology in order to explain treatments for patients and others without an extensive biomedical background.

While medical diagrams are considered a combination of science and art, they can be just as technical as any other quantitative graph. And no matter how detailed the drawing, anatomical diagrams are designed to clearly and efficiently present data.

And just as with a complex contour diagram, the diagrams focus on key information, even if it was selected from voluminous amounts of medical or scientific data.

Multi-Pie Charts

Just as in the cases of multi-level pie graphs, stacked bar graphs and trellis plots, multi-pie graphs paint a more detailed portrait of the data set it illustrates.

While a single pie chart can display what portion of the total population has a particular condition, a multi-pie graph can break those statistics down to illustrate not only the portion of men and the portion of women, but also how the two groups compare.

Health-and-Wellness-Scatter-Plots types of graphs and charts

Image source

It can be difficult to graphically represent medical data sets that consist of hundreds -- or more -- patients, as is the case in most medical studies.

But a scatter plot allows for the representation of each subject, plotted on the graph according to the variables on the chart’s two axes.

The pattern formed by the plotted dots can clearly determine trends in the data. By analyzing a scatter plot, for example, a researcher could easily identify a correlation between longer life expectancy and higher household income.

Meteorology and Environment

Contour Plots

Image source

Contour plots allow for the analysis of three variables in a two-dimensional format. Instead of plotting data along two main axes, the graph also presents a third value that is based on shading or color.

Just as a topographical map plots longitude, latitude and elevation in a two-dimensional design, a contour graph illustrates values of x, y and z.

With a contour graph, for example, a climatologist can not only plot ocean’s salinity on different dates, but its salinity at various depths on those dates.

Heat Maps

Image source

A type of contour graph, a heat map specifically charts varying temperatures at different geographical points. While the graph’s two axes are a map’s latitude and longitude, the third variable – temperature – is represented by a spectrum of color.

While most commonly used to illustrate weather, heat maps also can represent web traffic, financial indicators and almost any other three-dimensional data.

Scatter-Line Combo

By combining a line graph with a scatter plot, meteorologists and other statisticians can illustrate the relationship between two data sets.

For example, the high and low temperatures of each day in a month can be displayed in a scatter plot, then a line graph can be added to plot the historic average high and low temperatures over the same period.

The resulting combination graph clearly displays how the temperature range each day compares to the historic average, and it even indicates how those measurements trend over the examined time period.

Sunday, April 12, 2020

Sankey flow diagram

Many clinical trials collect prospective categorical data from participants to chart changes in the study population over time. Common examples would be quality of life questionnaires or risk scales, which provide a quick, standardized assessment of participant outcomes at a given time point.

A popular method for reporting prospective categorical data is to show results in a stacked bar chart. Consider the stacked bar chart below which reports number of risk factors participants exhibited at each of a series of visits.

sankey bar chart

This stacked bar chart is useful for quickly identifying trends in the overall study population - in this case, we can observe an increase in risk factors reported over time - but it does not provide much information about subgroups in the study. In the era of personalized and precision medicine, subgroup analysis is increasingly important for identifying which groups of people are most likely (or least likely) to respond to a particular treatment.

In our example above, we can see that there is a sizable increase in participants reporting 3 risk factors (dark green bar) from the 30-month visit to the 60-month visit. Where did these high-risk factor participants come from? We might assume they came from the group who had previously reported 2 or more risk factors, but the bar graph alone does not answer this question.

One solution is to overlay a Sankey flow diagram to the chart to shed some light on this mystery. Sankey diagrams were popularized by Matthew Henry Phineas Riall Sankey, a 19th-century Irish engineer, who created flow diagrams where the size of the arrow between two nodes is proportional to the magnitude of the flow.

With a Sankey Bar Chart, we can get the following visualization of our data:

sankey bar chart

Now we can see how our data flow between each time point, which helps us identify patterns in our data.

Let's revisit our question from earlier. Where did the 29% of high-risk factor participants at 60 months come from? According to the diagram, some came from the groups reporting 2 and 3 risk factors at 12-months, but more than half came from the groups previously reporting 0 or 1 risk factor - not what we might have expected from just looking at the bar chart.

For those wanting to really dive into their data, we can provide an interactive version allowing users to explore the chart by selecting individual bar sections or flows and isolating the data for those sections.

sankey bar chart

Like all good data visualizations, the Sankey bar chart is designed to communicate the story behind the data. The bar chart alone tells part of the story, but adding a Sankey overlay provides a richer and more detailed understanding of our data.

Mosaic or Mekko Charts

What if all those variables aren’t numeric even? A mosaic – or Mekko – chart plot might be the better choice.

A mosaic chart would allow said marketer to illustrate all the variables in a clear and straightforward manner.

In the above example, one axis of the chart represents the categories being compared – mobile phone manufacturers – while the other axis lists various age ranges.

The size and color of each cross-section of the chart corresponds with the market segment it represents, as depicted in the chart's legend.

Alternatives to pie chart

Every time I see a 3D pie chart made in Excel, I die a little on the inside.

Working in data visualization, you hear all sorts of opinions on pie charts. Some people really like them. Some people feel they should never be used. Mathematician John Tukey felt that there was no data displayed in a pie-chart that couldn’t be better displayed in another type of chart.

Unlike Tukey and design theorist Edward Tufte—who said, “The only worse design than a pie chart is several of them”—I am not of the opinion that pie charts should never be used. I just think they should be used less often.

I have sensed similar feelings toward Excel spreadsheets. They have even earned the nickname “walls of data.” The connection here is that pie charts and Excel spreadsheets are both overused and stretched to do things they were not meant to do. However, just like you wouldn’t remove colors from the painter’s palette and say, “No more green for you!” I don’t think the solution is to delete Excel and pie charts off everyone’s computer. Perhaps it’s more about making sure the painter has more colors to pick from.

Most of the existing content on this subject will direct you to use a bar chart or line chart instead. But I have challenged myself to show you five unusual alternatives to boring data visualization. Before you cook up another pie chart, consider these alternatives:

The dumbbell chart

One of the most common abuses of pie charts is to use many of them together to display change over time or across categories. If the primary message you want to send to your viewer is variance, it’s helpful to know that humans are really good at detecting and valuing the distance between objects. The dumbbell chart, also known as the DNA chart, is a great way to show change by using visual lengths.

Technically this chart is a tri-bell rather than a dumbbell, but the point is that it gives the information some dimension.

From a visual perspective with the dumbbell/tri-bell presentation, it is easy to see that in 2018, furniture had a lower sales distribution than office supplies and technology. By contrast, the pies all look like peace signs and it is really hard to know both the rank across the categories and how they have changed year over year.

Here’s a great dumbbell chart example that reflects the increase of women in the House of Representatives as it relates to party:

Visualization by Katie Kilroy, with data from Congressional Research Service

The bump chart

Variance may not be important to you. Maybe you want to show a ranking among the categories over time. Then I would point you to a special version of a line chart called the bump chart. Here’s the same information as in the previous example expressed a bit differently:

The greatest pro for the bump chart is that it’s really effective at visualizing ranks. But, for the cons, they can get noisy if ranks change a lot or if you have many categories. And like the dumbbell chart, viewers likely won’t realize you are comparing parts with the whole.

Here’s an effective bump chart example that displays the popularity ranks of new car colors and how they’ve changed over 16 years:

Visualization by Matt Chambers and inspired by Datagraver

The donut

The first two suggestions are certainly different approaches to variance and ranking, but sometimes you need a simple way to convey the parts with the whole. It may be important for a viewer to quickly know that something adds up to 100 percent. And maybe you just like the shape of circles because they symbolize many good things, like the sun or wheels—or donuts.

In the example below, even though it’s the same shape as a pie chart, the donut conveys information a bit differently:

Because people are so overexposed to pie charts early and often throughout their lifetimes, there’s a key advantage in translating the info to a donut—it speeds up the time it takes the viewer to decode the parts and the whole of the visualization.

(On a side note, do you ever wonder if there is a correlation between people who like donut charts and stuffed-crust pizza? I do. Please send me that data set.)

The pros of a donut chart are that it’s effective at showing parts within a whole, but unlike a pie chart, it frees up white space at the core to throw in a total, call out a number, or add another data marker. It can also be used as a gauge to call out a single percentage.

The cons are that it’s hard to interpret things like variance and rank, and humans generally aren’t as good at registering the differences in the ring’s filled-in angle area as with other easy-comparison formats like bar charts.

It can be done, though. Here’s an example of a donut that is effective at using the ring’s shading to display salaries in proportion to each other:

Visualization by Ryan Sleeper with data from SeanLahman

The treemap

A primary argument against the pie chart is that humans are not good at detecting differences between angle sizes. Treemaps alleviate this by using area instead of angles to designate proportion. Using the same data as in the donut format above, this version uses sized rectangles:

In addition to the pro of displaying data with area space rather than angles, treemaps are more useful than pie charts when there are more than five categories (avoiding the sometimes hard-to-label pie slivers) and in visualizing subcategories within categories. The main con is that people are much less familiar with this format.

Here’s another treemap example that aims to show a lot of comparative information in its visualization of the weekly volume of Google searches of four football players across years:

The waffle chart

The waffle chart is a really fun chart and probably my favorite alternative to pie charts—and not just because it’s also named after food. Because it’s typically made with 100 squares representing the whole, it can be shaded or filled based on the relation of several parts to a whole, just like a pie chart—but it’s also good for displaying a single percentage.

The key pro is its diversity. It can show individual parts of a whole and compare single percentages, but another advantage—similar to treemaps—is that proportions are more clearly represented by area instead of angles.

The cons are that it becomes too complicated when too many segments are involved and the individualized spaces don’t leave a good spot to put numbers or much text within the visual itself.

Here’s another waffle chart example that neatly displays comparative survival rates for types of cancers:

Visualization by Gwendoline Tan with data from Our World in Data

Other alternatives

These are only a handful of diverse and creative ways you can visualize data. I also considered other unusual diagram alternatives: Marimekko charts, Sankey flow diagrams, radial pie charts, and sunburst charts.

Let me just leave you with one last 3D pie chart:

Treemap

What is a Treemap?

Treemaps are ideal for displaying large amounts of hierarchically structured (tree-structured) data. The space in the visualization is split up into rectangles that are sized and ordered by a quantitative variable.

The levels in the hierarchy of the treemap are visualized as rectangles containing other rectangles. Each set of rectangles on the same level in the hierarchy represents a column or an expression in a data table. Each individual rectangle on a level in the hierarchy represents a category in a column. For example, a rectangle representing a continent may contain several rectangles representing countries in that continent. Each rectangle representing a country may in turn contain rectangles representing cities in these countries. You can create a treemap hierarchy directly in the visualization, or use an already defined hierarchy.

A number of different algorithms can be used to determine how the rectangles in a treemap should be sized and ordered. The treemap in Spotfire uses a squarified algorithm.

The rectangles in the treemap range in size from the top left corner of the visualization to the bottom right corner, with the largest rectangle positioned in the top left corner and the smallest rectangle in the bottom right corner. For hierarchies, that is, when the rectangles are nested, the same ordering of the rectangles is repeated for each rectangle in the treemap. This means that the size, and thereby also position, of a rectangle that contains other rectangles is decided by the sum of the areas of the contained rectangles.

Example:

Below is a treemap where the rectangles represent cities and are sized and colored by the column Sales. In this case, the aggregation method Sum was selected for the Sales column. This treemap only contains data on one level.

The sizes and positions of the rectangles, as well as the coloring, indicate that Casablanca and Cannes have the highest total sum of sales, while Hong Kong and Bangalore have the lowest.

To compare sum of sales for entire countries or continents, you can add other levels to the treemap hierarchy without losing the information about the individual cities. In the treemap below, the columns Country and Continent were added to the treemap hierarchy.

The rectangles are now nested. Each rectangle that represents a continent consists of rectangles representing countries within that continent. Each rectangle that represents a country consists of rectangles representing cities in that country. It is still possible to see which individual cities has the highest sum of sales, but it is now also easy to see that Africa is the continent with the highest total sum of sales, and that Asia is the continent with the lowest total sum of sales. Since the rectangles are now nested, the rectangles are not in the same positions anymore. However, each level of the hierarchy is still organized according to the squarified algorithm. For example, the size of the rectangle representing India is decided by the sum of the areas of the two rectangles representing Calcutta and Bangalore. The size of the rectangle representing Asia is in turn decided by the sum of the areas of the rectangles representing China and India.

Tuesday, December 24, 2019

Gini coefficient and Lorenz curve

Gini index or Gini coefficient is a statistical measure of distribution which was developed by the Italian statistician Corrado Gini in 1912.

It is used as a gauge of economic inequality, measuring income distribution among a population.

The coefficient ranges from 0 (or 0%) to 1 (or 100%), with 0 representing perfect equality and 1 representing perfect inequality. Values over 1 are not practically possible as we don’t take into account the negative incomes. (Income can be 0 at its lowest but not negative)

Thus, a country in which every resident has the same income would have an income Gini coefficient of 0. A country in which one resident earned all the income, while everyone else earned nothing, would have an income Gini coefficient of 1.

As we know now, the Gini coefficient is an important tool for analyzing income or wealth distribution within a country or region, but,

Gini should not be mistaken for an absolute measurement of income or wealth.

A high-income country and a low-income one can have the same Gini coefficient, as long as incomes are distributed similarly within each country:

Use of Gini index in data modelling

The Gini Coefficient or Gini Index measures the inequality among the values of a variable. Higher the value of an index, more dispersed is the data. Alternatively, the Gini coefficient can also be calculated as the half of the relative mean absolute difference.

Graphical Representation of the Gini Index (Lorenz curve)

The Gini coefficient is usually defined mathematically based on the Lorenz curve, which plots the proportion of the total income of the population (y-axis) that is cumulatively earned by the bottom x% of the population.

The line at 45 degrees thus represents perfect equality of incomes.