---
title: "Basic usage for the qqboxplot package"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Basic usage for the qqboxplot package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", 
  fig.width = 7, 
  fig.height = 4.8, 
  fig.align = "center"
)
```


This vignette introduces some basic usage of the R package qqboxplot.  The figures below are reproductions of the figures found in "The q-q boxplot" (citation coming soon).  We first start by reproducing figures that use the q-q boxplot.  The other figures used for comparison in the paper follow after that.

## qqboxplot

First load the 'qqboxplot' package and packages from the 'tidyverse'.
 
```{r setup, message=FALSE}
library(dplyr)
library(ggplot2)
library(qqboxplot)
```


The following figure compares simulated t-distributions (and one simulated normal distribution) against a theoretical normal distribution. simulated_data contains to columns, "y" and "group".  
"group" specifies the distribution the data ("y") comes from.  Note in this figure that reference_dist = "norm" is chosen to specify that the normal distribution should be the reference distribution.

```{r}
simulated_data %>%
         ggplot(aes(factor(group, levels=c("normal, mean=2", "t distribution, df=32", "t distribution, df=16", "t distribution, df=8", "t distribution, df=4")), y=y)) +
         geom_qqboxplot(notch=TRUE, varwidth = TRUE, reference_dist="norm") +
         xlab("reference: normal distribution") +
         ylab(NULL) +
         guides(color=FALSE) +
         theme(axis.text.x = element_text(angle = 23, size = 15), axis.title.y = element_text(size=15),
               axis.title.x = element_text(size=15),
               panel.border = element_blank(), panel.background = element_rect(fill="white"),
               panel.grid = element_line(colour = "grey70"))

```

simulated data was created by running the following code:

```{r, eval=FALSE}
tibble(y=c(rnorm(1000, mean=2), rt(1000, 16), rt(500, 4), 
                   rt(1000, 8), rt(1000, 32)),
        group=c(rep("normal, mean=2", 1000), 
                rep("t distribution, df=16", 1000), 
                rep("t distribution, df=4", 500), 
                rep("t distribution, df=8", 1000), 
                rep("t distribution, df=32", 1000)))
```

The following figure shows the same data as the previous figure, but compared against a simulated normal distribution, with mean=5 and variance=1.  Note that the reference dataset `comparison_dataset` is a separate vector and is not contained in the dataset `simulated_data`.

```{r}
simulated_data %>%
  ggplot(aes(factor(group, levels=c("normal, mean=2", "t distribution, df=32", "t distribution, df=16", "t distribution, df=8", "t distribution, df=4")), y=y)) +
  geom_qqboxplot(notch=TRUE, varwidth = TRUE, compdata=comparison_dataset) +
  xlab("reference: simulated normal dataset") +
  ylab(NULL) +
  theme(axis.text.x = element_text(angle = 23, size = 15), axis.title.y = element_text(size=15),
        axis.title.x = element_text(size=15),
        panel.border = element_blank(), panel.background = element_rect(fill="white"),
        panel.grid = element_line(colour = "grey70"))
```

The vector `comparison_dataset` was simulated as follows

```{r, eval=FALSE}
rnorm(1000, 5)
```