Title: | Implementation of the Q-Q Boxplot |
---|---|
Description: | A system to implement the Q-Q boxplot. It is implemented as an extension to 'ggplot2'. The Q-Q boxplot is an amalgam of the boxplot and the Q-Q plot and allows the user to rapidly examine summary statistics and tail behavior for multiple distributions in the same pane. As an extension of the 'ggplot2' implementation of the boxplot, possible modifications to the boxplot extend to the Q-Q boxplot. |
Authors: | Jordan Rodu [aut, cre] |
Maintainer: | Jordan Rodu <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.0 |
Built: | 2025-02-28 04:47:59 UTC |
Source: | https://github.com/jrodu/qqboxplot |
A dataset that contains simulated data to reproduce a figure in our manuscript
comparison_dataset
comparison_dataset
A vector
simulations
A dataset that contains log expression data for randomly selected genes for two patients, one with autism and one control.
expression_data
expression_data
A data frame with 1200 rows and 3 variables:
gene identifier (not meaningful)
autism or control
the logged gene expression count
...
https://www.ebi.ac.uk/gxa/experiments/E-GEOD-30573/Results
A modification of the boxplot with information about the tails
geom_qqboxplot( mapping = NULL, data = NULL, stat = "qqboxplot", position = "dodge2", ..., outlier.colour = NULL, outlier.color = NULL, outlier.fill = NULL, outlier.shape = 19, outlier.size = 1.5, outlier.stroke = 0.5, outlier.alpha = NULL, notch = FALSE, notchwidth = 0.5, varwidth = FALSE, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
geom_qqboxplot( mapping = NULL, data = NULL, stat = "qqboxplot", position = "dodge2", ..., outlier.colour = NULL, outlier.color = NULL, outlier.fill = NULL, outlier.shape = 19, outlier.size = 1.5, outlier.stroke = 0.5, outlier.alpha = NULL, notch = FALSE, notchwidth = 0.5, varwidth = FALSE, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
stat |
specifies the stat function to use |
position |
Position adjustment, either as a string, or the result of a call to a position adjustment function. |
... |
Other arguments passed on to |
outlier.colour , outlier.color , outlier.fill , outlier.shape , outlier.size , outlier.stroke , outlier.alpha
|
Default aesthetics for outliers. Set to In the unlikely event you specify both US and UK spellings of colour, the US spelling will take precedence. Sometimes it can be useful to hide the outliers, for example when overlaying
the raw data points on top of the boxplot. Hiding the outliers can be achieved
by setting |
notch |
If |
notchwidth |
For a notched box plot, width of the notch relative to
the body (defaults to |
varwidth |
If |
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
Returns an object of class GeomQqboxplot
, (inherits from Geom
, ggproto
),
that renders the data for the Q-Q boxplot.
The Q-Q boxplot inherits its summary statistics from the boxplot. See
geom_boxplot()
for details. The Q-Q boxplot differs from the boxplot
by using more informative whiskers than the regular boxplot.
The vertical position of the whiskers can be interpreted as it is in the boxplot, and the maximal vertical value is chosen as it is done in the regular boxplot. The horizontal positioning of the whiskers indicates the deviation of the data set of interest from some reference data set (specified as either a theoretical distribution or an actual data set). Taking the central vertical axis of the boxplot as being zero, deviations to the right indicate that those values are larger than the corresponding data points in the reference data set, where two data points correspond if their quantiles match. Deviations to the left indicate that the values are smaller than their corresponding data points. Consider a situation where your data set has fatter tails than the normal distribution. When the reference distribution is the normal distribution, then the whiskers below the box will be left of the central axis (the left tail values are smaller than they ought to be) and the whiskers above the box will be right of the central axis (the right tail values are larger than the ought to be).
In order to compare the data set of interest to the reference data set, they must be on the same scale. The Q-Q boxplot uses Tukey's g-h distribution to determine the appropriate scaling factor.
Much of the code here is a modification of the geom_boxplot() code.
p <- ggplot2::ggplot(simulated_data, ggplot2::aes(factor(group, levels=c("normal, mean=2", "t distribution, df=32", "t distribution, df=16", "t distribution, df=8", "t distribution, df=4")), y=y)) p + geom_qqboxplot() p + geom_qqboxplot(reference_dist = "norm") p + geom_qqboxplot(compdata = comparison_dataset) # geom_qqboxplot inherits all arguments from geom_boxplot, e.g.: p + geom_qqboxplot(notch = TRUE) p + geom_qqboxplot(varwidth=TRUE) p + geom_qqboxplot(ggplot2::aes(color = group)) + ggplot2::guides(color=FALSE)
p <- ggplot2::ggplot(simulated_data, ggplot2::aes(factor(group, levels=c("normal, mean=2", "t distribution, df=32", "t distribution, df=16", "t distribution, df=8", "t distribution, df=4")), y=y)) p + geom_qqboxplot() p + geom_qqboxplot(reference_dist = "norm") p + geom_qqboxplot(compdata = comparison_dataset) # geom_qqboxplot inherits all arguments from geom_boxplot, e.g.: p + geom_qqboxplot(notch = TRUE) p + geom_qqboxplot(varwidth=TRUE) p + geom_qqboxplot(ggplot2::aes(color = group)) + ggplot2::guides(color=FALSE)
A dataset that contains participation rates (%) for ages 15-24, separated by gender, and measured in the years 2008, 2012, and 2017
indicators
indicators
A data frame with 612 rows and 7 variables:
name of country
unique country identifier (string)
Specifies male/female
unique identifier for series
year for data
participation rate in percents
the log of the participation rate
...
https://www.worldbank.org/en/home
A dataset that contains populations of neurons from CA1 and LM and their firing rates for three situations: base firing rate, dot motion, and drifting gradient. Each row represents a neuron
population_brain_data
population_brain_data
A data frame with 13731 rows and 3 variables:
acronym for population location
situation under which firing rate was recorded
the firing rate
...
https://allensdk.readthedocs.io/en/latest/visual_coding_neuropixels.html
A dataset that contains simulated data to reproduce the simulated data figures used in our manuscript
simulated_data
simulated_data
A data frame with 4500 rows and 2 variables:
a value simulated from a distribution
a string specifying the distribution from which the y value is drawn
...
simulations
A dataset that contains the number of spikes for neurons across several possible orientations of a grating
spike_data
spike_data
A data frame with 12800 rows and 5 variables:
1 to 8, specifies the orientation of the grating
number of spikes for a single trial of 1.28 seconds for a particular orientation
region of the brain where the neuron is located
...
Compute values for the Q-Q Boxplot
stat_qqboxplot( mapping = NULL, data = NULL, geom = "qqboxplot", position = "dodge2", ..., coef = 1.5, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, reference_dist = "norm", confidence_level = 0.95, numboots = 500, qtype = 7, compdata = NULL )
stat_qqboxplot( mapping = NULL, data = NULL, geom = "qqboxplot", position = "dodge2", ..., coef = 1.5, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, reference_dist = "norm", confidence_level = 0.95, numboots = 500, qtype = 7, compdata = NULL )
mapping |
Set of aesthetic mappings created by |
data |
The data to be displayed in this layer. There are three options: If A A |
geom |
specifies the geom function to use |
position |
Position adjustment, either as a string, or the result of a call to a position adjustment function. |
... |
Other arguments passed on to |
coef |
Length of the whiskers as multiple of IQR. Defaults to 1.5. |
na.rm |
If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
reference_dist |
Specifies theoretical reference distribution. |
confidence_level |
Sets confidence level for deviation whisker confidence bands |
numboots |
specifies the number of bootstrap draws for bootstrapped CIs needed only if compdata is not NULL |
qtype |
an integer between 1 and 9 indicating which one of the quantile algorithms to use. |
compdata |
specifies a data set to use as the reference distribution. If compdata is not NULL, the argument reference_dist will be ignored. |
Returns an object of class StatQqboxplot
, (inherits from Geom
, ggproto
),
that helps to render the data for geom_qqboxplot()
.
stat_qqboxplot()
provides the following variables, some of which depend on the orientation:
width of boxplot
lower whisker = smallest observation greater than or equal to lower hinge - 1.5 * IQR
lower hinge, 25% quantile
lower edge of notch = median - 1.58 * IQR / sqrt(n)
median, 50% quantile
upper edge of notch = median + 1.58 * IQR / sqrt(n)
upper hinge, 75% quantile
upper whisker = largest observation less than or equal to upper hinge + 1.5 * IQR