Lambaréné, April 15, 2024

Data vis as a discovery and communication tool

Data visualisation is a useful tool at each stage of this process, e.g.,

  • Quality control
  • Hypothesis generation
  • Hypothesis validation
  • Communication

In an ideal scenario the lead researcher combines domain expertise and data analysis (visualisation) expertise, to avoid internal communication loops.

Functions of data visualisation

Data exploration

  • Discover patterns, stories in your data.
  • Should be interactive, or at least flexible
  • Should be comprehensive
    • include every relevant aspect
    • include only potentially relevant aspects
  • Should be focused on its objective and its effective execution
  • Can be unpolished

Data presentation

  • Communicate information
  • Different contexts
    • Publication: what’s the message; what’s the available space?
    • Presentation: how much time?
  • Different audiences (use familiar idoms)
  • Should be focused to the message and its effective communication

Virtue of visual sense

\(\rightarrow\) Visual system provides a high-bandwidth channel to our brain

\(\rightarrow\) Visual system allows for perceived simultaneousness and parallelism in our sensing

\(\rightarrow\) Visual sense, combined with the brain, is a formidable pattern recognition system

Anscombe’s quartet

x1 y1 x2 y2 x3 y3 x4 y4 p
10 8.0 10 9.1 10 7.5 8 6.6 d
8 7.0 8 8.1 8 6.8 8 5.8 d
13 7.6 13 8.7 13 12.7 8 7.7 d
9 8.8 9 8.8 9 7.1 8 8.8 d
11 8.3 11 9.3 11 7.8 8 8.5 d
14 10.0 14 8.1 14 8.8 8 7.0 d
6 7.2 6 6.1 6 6.1 8 5.2 d
4 4.3 4 3.1 4 5.4 19 12.5 d
12 10.8 12 9.1 12 8.2 8 5.6 d
7 4.8 7 7.3 7 6.4 8 7.9 d
5 5.7 5 4.7 5 5.7 8 6.8 d
9 7.5 9 7.5 9 7.5 9 7.5 mean
11 4.1 11 4.1 11 4.1 11 4.1 var

Anscombe’s quartet

x1 y1 x2 y2 x3 y3 x4 y4 p
10 8.0 10 9.1 10 7.5 8 6.6 d
8 7.0 8 8.1 8 6.8 8 5.8 d
13 7.6 13 8.7 13 12.7 8 7.7 d
9 8.8 9 8.8 9 7.1 8 8.8 d
11 8.3 11 9.3 11 7.8 8 8.5 d
14 10.0 14 8.1 14 8.8 8 7.0 d
6 7.2 6 6.1 6 6.1 8 5.2 d
4 4.3 4 3.1 4 5.4 19 12.5 d
12 10.8 12 9.1 12 8.2 8 5.6 d
7 4.8 7 7.3 7 6.4 8 7.9 d
5 5.7 5 4.7 5 5.7 8 6.8 d
9 7.5 9 7.5 9 7.5 9 7.5 mean
11 4.1 11 4.1 11 4.1 11 4.1 var

There are limits

Limitations

Computational capacity

  • How much data can my computing resources store, aggregate, and draw?

Display capacity

  • What is the spatial resolution of your medium?
  • How much space is there for the drawing (e.g., Poster, Journal, Slide in presentation)?
  • Is the medium interactive (e.g., e-Poster, website) or static (e.g., paper, pdf)?

Perceptual and cognitive capacity

  • How much time has the consumer to digest the information?
  • How many components can the reader digest?

Time limitations: Idiom literacy

Deconstruction of graphics

Why?

  • What is the task (i.g., exploration, presentation)?
  • Is there a message to communicate, and if so, what is it?

What?

  • What data is shown?
  • How is the data pre-processed, filtered, abstracted?

How?

  • Which idiom is used?
  • What attributes and which interactions are selected?
  • How are selected attributes encoded?

Plot constituents

  • Data
  • Marks
  • Channels
  • Annotation

Deconstruction of graphics: Data

Dataset types

  • Tables
  • Networks
  • Fields
  • Geometry
  • Sets

Tabular data

species island bill_l bill_d flipper_l mass sex year
Adelie Torgersen 33.5 19.0 190 3600 female 2008
Chinstrap Dream 50.6 19.4 193 3800 male 2007
Adelie Dream 36.8 18.5 193 3500 female 2009
Gentoo Biscoe 51.1 16.3 220 6000 male 2008
Gentoo Biscoe 46.5 13.5 210 4550 female 2007
Gentoo Biscoe 50.5 15.9 222 5550 male 2008
Adelie Dream 35.7 18.0 202 3550 female 2008
Adelie Torgersen 35.1 19.4 193 4200 male 2008
Gentoo Biscoe 47.2 13.7 214 4925 female 2009
Adelie Biscoe 41.6 18.0 192 3950 male 2008
Adelie Dream 39.6 18.8 190 4600 male 2007
Gentoo Biscoe 53.4 15.8 219 5500 male 2009
Gentoo Biscoe 43.2 14.5 208 4450 female 2008
Adelie Biscoe 38.1 17.0 181 3175 female 2009
Gentoo Biscoe 48.5 15.0 219 4850 female 2009

Deconstruction of graphics: Data

Variable types

  • Categorical:
    a factor
  • Ordinal:
    a factor with a natural order
  • Quantitative:
    an arithmetic value

Order types

  • sequential (e.g., body height)
  • diverging (e.g., temperature variation)
  • cyclic (e.g. time of day)

Deconstruction of graphics: Data

Data abstraction

  • Sums
  • Ratios
  • Means
  • Standard deviations, errors, variances
  • Correlations
  • Test statistics
  • Dimensionality reduction (e.g., PCA)
  • Grouping, clustering \(\Rightarrow\) centroids

Deconstruction of graphics: Data

Data abstraction: Principle Component Analysis (PCA)

  • linear dimensionality reduction technique
  • reduced coordinate system captures the largest variation in the data
  • purpose
    • exploratory data analysis
    • visualization
    • data preprocessing

Deconstruction of graphics: Data

Data abstraction: Principle Component Analysis (PCA)

Deconstruction of graphics: Marks and Channels

Marks

Marks are basic geometric elements that depict (data) items or links.

For tabular data there are the geometric primitives used: point, line, area.

Channels

Channels control the appearance of marks, thereby encoding their attributes.

Channels can be distinguished into identity channels (e.g., shape) and magnitude channels (e.g., size).

Deconstruction of graphics: Annotation

Legends

  • A legend is a tool to help explain a graph.
  • It translates between the channel (e.g., color, shape) and the respective encoded value.

Labels

  • Labels can replace legends to some extend.
  • Link between marks and explanatory text.
  • Can also be associated with additonal channels (e.g., font size/color).
  • Very useful to highlight and pop out items.

Effectiveness of channels

\(\leftarrow\) Magnitude channels

Identity channels \(\downarrow\)

Munzner, Tamara. Visualization analysis and design. CRC press, 2014

Effectiveness of channels

  • Expressiveness principle
    • Ordered attributes should be shown with magnitude channels.
    • Categorical attributes should be shown with identity channels.
  • Channels can be combined on one mark, but not arbitrarily.
  • Effectiveness of a channel is defined by
    • Accuracy
    • Discriminability and Separability
    • Popout and Grouping characterisics

Munzner, Tamara. Visualization analysis and design. CRC press, 2014

Perception of channels

Munzner, Tamara. Visualization analysis and design. CRC press, 2014

Properties of Vision

  • Good at
    • Relative judgments
    • Time and space
    • Identification
  • Bad at
    • Veridical (truthful) perception
    • Absolute judgments

Accuracy of channels

Accuracy of the human perceptual judgement of a stimulus.

  • Can be answered by psychophysics (Steven’s Psychophysical Power Law, prev. slide)
  • Can be answered by questionnaires (Cleveland, McGill, 1984, 10.1080/01621459.1984.10478080)
  • The latter psychological approach lead to the following ranking of channel according their accuracy (descending order):
1. Aligned position
2. Unaligned position
3. Length
4. Angle
5. Circular areas
6. Rectangular areas

Discriminability of channels

How many bins, accross the spectrum of a channel, are distinguishable?

Encoding attributes with a certain channel, are the differences between the individual items perceptible to humans as intended?

For some channels a very limited number of “bins” can be distinguished.

Binning the channel might improve the distinguisability of perception, but reduces the resolution.

The range of values for an attribute should match the number of bins which can be distinguished.

Separability between Channels

How uncoupled are two channels?

Channels can be separable or integrated.

Some interactions are obvious
\(\rightarrow\) two attributes are encoded by vertical and horizontal position, the planar proximity is no more available as a channel for a third attribute.
\(\rightarrow\) two attributes via size and shape on the same mark

Some interactions are less obvious
\(\rightarrow\) color hue interferes with size since it changes the perception of the color.

Popout of a channel

How quickly can a highlighted item catch your eye?

Popout is also referred to as preattentive processing or tunable detection

For channels with a favorable popout property, this works independent of the number of distractor objects.

Popout does not depend on the channel but also the particular encoding of the channel.

Grouping and faceting

  • Grouping can be achieved by (from least to most effective)
    • Identity marks (e.g., hue, shapes) are hardest to perceive
    • Link marks (i.e., lines, area of containment) add clutter but are easy to perceive
    • Spatial separation is the strongest cue, but hammpers comparability
  • Appropriate technique strongly depends on particular task

Channel: Position

  • Identity and magnitude channel
  • Can be used in 1D and in 2D (vertical and horizontal position)
  • Closely related to length (distance between to positions)
  • Little interference with other channel (not necessarily true the other way around)
  • Most effective channel as magnitude and identity channel, hence the go-to channel for every plot
  • Discriminability can suffer from alignment

Upper and lower panel contains each
two pairs with identical values

Channel: Color

Color perception is a complex (neuro-)physiological process.

  • Perception of hue depends on adjacent areas

8% of the population suffer from color vision deficiency

  • Most frequent manifestation is a red-green deficiency, affecting mostly men

Color can be described, amongst others, by the HSL system

  • Hue: the wavelength of the light
  • Saturation: how much grey is mixed to the color
  • Luminance: how much white is mixed to the color

Channel: Color (magnitude)

Luminance

A magnitude channel, suitable for ordered data.

Perception is hampered by contrast effects.

Less than five bins distinguishable.

Saturation

A magnitude channel, suitable for ordered data.

Low accuracy between noncontiguous regions.

Only around three bins distinguishable.

Strong interaction with the size channel.

http://dx.doi.org/10.1016/j.apj.2016.02.001


Channel: Color (magnitude)

Saturation and Luminance are rather inefficient magnitude channels, ranging below position, angle, size channels.

Still useful due to versatile application cases.

Remediation: smartly designed color maps, Viridis:

  • preserve perceptually-uniformity
  • use more than one hue
  • account for color vision deficient people
  • account for black-and-white printing

Do no use Rainbow color map for sequential data.

Channel: Color (magnitude)

Channel: Color (identity)

Channel: Color (identity)

Channel: more on Color

Diverging color maps

  • Diverging color maps to depict deviation, e.g. Temperature [\(^{\circ}\)C], log2FC, etc.
  • Ternary color maps

Transparency

  • Not independently to other color channels. Strong interaction with luminance and saturation, less so with hue
  • Limited number of discriminable bins (~2)
  • It can be useful to use in combination with other channel to improve popout

Channel: Shape

  • Identity channel
  • Applicable to point and line marks, not to area marks
  • Strong interference with size channel, and also to hue channel
  • In practice limited to a dozen bins

Channel: Angle

  • Magnitude channel
  • Useful to communicate periodicity (>2\(\pi\)) and whole ensemble (2\(\pi\))
  • Perception of angle is not uniform across the spectrum
    • we can distinguish between 45\(^{\circ}\) and 46\(^{\circ}\) but not between 37\(^{\circ}\) and 39\(^{\circ}\)
    • poor discriminability (~6 bins)
  • Tilt is a closely related channel to angle

Channel: Size

  • Magnitude channel for ordered data.
  • Interactions with most other channels
    • if marks too small, shape and color can not be perceived
    • perception of hue and saturation depends on area, hence size
  • Size can theoretically encode two values, width and height, which is hardly used since difficult to decipher
  • For lines, size is also called linewidth
  • Distinguish between scaling the area or the radius of a point.
    • Area makes seems more sensible, but is less accurately perceived.

Résumé and transfer to practice

Personal recommentation how to approach a plot design task

  1. Know your data: how can it be aggregated, manipulated
  2. Know your task: which message/objective do you whant to convey/achieve
  3. Know your idioms: the more idioms you know, the easier to choose a suitable
  4. Be rational: every design decision should be well motivated
  5. Be critical: every design decision should be well motivated
  6. Be functional: aesthetic appeal does not harm, but should be subordinate
  7. Know your task: after all the bells and whistles were attached, revisit if the plot still serves your task; don’t be shy to dump it

Further reading

  • Munzner, Tamara. Visualization analysis and design. CRC press, 2014.