--- title: "Basic usage for the qqboxplot package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Basic usage for the qqboxplot package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.8, fig.align = "center" ) ``` This vignette introduces some basic usage of the R package qqboxplot. The figures below are reproductions of the figures found in "The q-q boxplot" (citation coming soon). We first start by reproducing figures that use the q-q boxplot. The other figures used for comparison in the paper follow after that. ## qqboxplot First load the 'qqboxplot' package and packages from the 'tidyverse'. ```{r setup, message=FALSE} library(dplyr) library(ggplot2) library(qqboxplot) ``` The following figure compares simulated t-distributions (and one simulated normal distribution) against a theoretical normal distribution. simulated_data contains to columns, "y" and "group". "group" specifies the distribution the data ("y") comes from. Note in this figure that reference_dist = "norm" is chosen to specify that the normal distribution should be the reference distribution. ```{r} simulated_data %>% ggplot(aes(factor(group, levels=c("normal, mean=2", "t distribution, df=32", "t distribution, df=16", "t distribution, df=8", "t distribution, df=4")), y=y)) + geom_qqboxplot(notch=TRUE, varwidth = TRUE, reference_dist="norm") + xlab("reference: normal distribution") + ylab(NULL) + guides(color=FALSE) + theme(axis.text.x = element_text(angle = 23, size = 15), axis.title.y = element_text(size=15), axis.title.x = element_text(size=15), panel.border = element_blank(), panel.background = element_rect(fill="white"), panel.grid = element_line(colour = "grey70")) ``` simulated data was created by running the following code: ```{r, eval=FALSE} tibble(y=c(rnorm(1000, mean=2), rt(1000, 16), rt(500, 4), rt(1000, 8), rt(1000, 32)), group=c(rep("normal, mean=2", 1000), rep("t distribution, df=16", 1000), rep("t distribution, df=4", 500), rep("t distribution, df=8", 1000), rep("t distribution, df=32", 1000))) ``` The following figure shows the same data as the previous figure, but compared against a simulated normal distribution, with mean=5 and variance=1. Note that the reference dataset `comparison_dataset` is a separate vector and is not contained in the dataset `simulated_data`. ```{r} simulated_data %>% ggplot(aes(factor(group, levels=c("normal, mean=2", "t distribution, df=32", "t distribution, df=16", "t distribution, df=8", "t distribution, df=4")), y=y)) + geom_qqboxplot(notch=TRUE, varwidth = TRUE, compdata=comparison_dataset) + xlab("reference: simulated normal dataset") + ylab(NULL) + theme(axis.text.x = element_text(angle = 23, size = 15), axis.title.y = element_text(size=15), axis.title.x = element_text(size=15), panel.border = element_blank(), panel.background = element_rect(fill="white"), panel.grid = element_line(colour = "grey70")) ``` The vector `comparison_dataset` was simulated as follows ```{r, eval=FALSE} rnorm(1000, 5) ```