banner



How To Put P Value On Graph

Manufactures - ggpubr: Publication Set up Plots

Add together P-values and Significance Levels to ggplots

In this article, we'll depict how to easily i) compare means of two or multiple groups; ii) and to automatically add p-values and significance levels to a ggplot (such every bit box plots, dot plots, bar plots and line plots …).

Add p-values to ggplots

Contents

  • Prerequisites
    • Install and load required R packages
    • Demo data sets
  • Methods for comparing means
  • R functions to add p-values
    • compare_means()
    • stat_compare_means()
  • Compare two contained groups
  • Compare two paired samples
  • Compare more than two groups
  • Multiple group variables
  • Other plot types

Prerequisites

Install and load required R packages

Required R package: ggpubr (version >= 0.1.3), for ggplot2-based publication set up plots.

  • Install from CRAN as follow:
                      install.packages("ggpubr")                    
  • Or, install the latest developmental version from GitHub as follow:
                      if(!crave(devtools)) install.packages("devtools") devtools::install_github("kassambara/ggpubr")                    
  • Load ggpubr:
                      library(ggpubr)                    

Demo data sets

Data: ToothGrowth data sets.

                      information("ToothGrowth") head(ToothGrowth)                    
                      ##    len supp dose ## one  4.2   VC  0.five ## 2 11.v   VC  0.5 ## 3  7.iii   VC  0.5 ## 4  5.viii   VC  0.5 ## v  6.4   VC  0.five ## 6 10.0   VC  0.5                    

R functions to add p-values

Here we present 2 new R functions in the ggpubr parcel:

  • compare_means(): piece of cake to use solution to performs one and multiple mean comparisons.
  • stat_compare_means(): easy to utilize solution to automatically add p-values and significance levels to a ggplot.

compare_means()

As nosotros'll prove in the side by side sections, it has multiple useful options compared to the standard R functions.

The simplified format is as follow:

                      compare_means(formula, information, method = "wilcox.test", paired = FALSE,   grouping.by = Nil, ref.group = NULL, ...)                    
  • formula: a formula of the grade x ~ group, where x is a numeric variable and grouping is a gene with 1 or multiple levels. For example, formula = TP53 ~ cancer_group. It's too possible to perform the exam for multiple response variables at the same time. For example, formula = c(TP53, PTEN) ~ cancer_group.
  • data: a data.frame containing the variables in the formula.

  • method: the type of test. Default is "wilcox.test". Allowed values include:
    • "t.test" (parametric) and "wilcox.test"" (non-parametric). Perform comparison between two groups of samples. If the grouping variable contains more than than 2 levels, then a pairwise comparing is performed.
    • "anova" (parametric) and "kruskal.examination" (non-parametric). Perform i-way ANOVA examination comparison multiple groups.
  • paired: a logical indicating whether yous want a paired test. Used only in t.test and in wilcox.test.

  • group.past: variables used to group the information set up before applying the examination. When specified the mean comparisons will be performed in each subset of the data formed by the different levels of the group.by variables.

  • ref.group: a graphic symbol string specifying the reference group. If specified, for a given group variable, each of the grouping levels will be compared to the reference group (i.e. control grouping). ref.group can be as well ".all.". In this case, each of the grouping variable levels is compared to all (i.e. base-mean).

stat_compare_means()

This function extends ggplot2 for adding mean comparison p-values to a ggplot, such as box blots, dot plots, bar plots and line plots.

The simplified format is every bit follow:

                      stat_compare_means(mapping = NULL, comparisons = NULL hide.ns = Simulated,                    label = Zippo,  label.10 = Null, label.y = NULL,  ...)                    
  • mapping: Set of aesthetic mappings created by aes().

  • comparisons: A listing of length-2 vectors. The entries in the vector are either the names of 2 values on the x-axis or the two integers that correspond to the alphabetize of the groups of involvement, to be compared.

  • hide.ns: logical value. If TRUE, hibernate ns symbol when displaying significance levels.

  • label: character string specifying label type. Allowed values include "p.signif" (shows the significance levels), "p.format" (shows the formatted p value).

  • label.10,label.y: numeric values. coordinates (in data units) to be used for absolute positioning of the characterization. If too brusk they will be recycled.

  • : other arguments passed to the role compare_means() such every bit method, paired, ref.group.

Compare two independent groups

Perform the test:

                    compare_means(len ~ supp, data = ToothGrowth)                  
                    ## # A tibble: one ten eight ##     .y. group1 group2      p  p.adj p.format p.signif   method ##                                                                                                                                                                                                                                                                              ## 1   len     OJ     VC 0.0645 0.0645    0.064       ns Wilcoxon                                                                                                                                                                                                                                          

By default method = "wilcox.test" (non-parametric test). You can also specify method = "t.exam" for a parametric t-test.

Returned value is a data frame with the following columns:

  • .y.: the y variable used in the exam.
  • p: the p-value
  • p.adj: the adapted p-value. Default value for p.arrange.method = "holm"
  • p.format: the formatted p-value
  • p.signif: the significance level.
  • method: the statistical test used to compare groups.

Create a box plot with p-values:

                    p <- ggboxplot(ToothGrowth, 10 = "supp", y = "len",           color = "supp", palette = "jco",           add = "jitter") #  Add p-value p + stat_compare_means() # Alter method p + stat_compare_means(method = "t.test")                  

Note that, the p-value label position can exist adjusted using the arguments: characterization.10, label.y, hjust and vjust.

The default p-value label displayed is obtained by concatenating the method and the p columns of the returned data frame by the role compare_means(). Y'all can specify other combinations using the aes() function.

For example,

  • aes(label = ..p.format..) or aes(label = paste0("p =", ..p.format..)): display but the formatted p-value (without the method proper name)
  • aes(label = ..p.signif..): display simply the significance level.
  • aes(label = paste0(..method.., "\n", "p =", ..p.format..)): Utilise line break ("\north") between the method proper noun and the p-value.

As an illustration, type this:

                    p + stat_compare_means( aes(label = ..p.signif..),                          label.x = ane.5, characterization.y = 40)                  

If y'all prefer, it's also possible to specify the argument label equally a character vector:

                    p + stat_compare_means( label = "p.signif", label.x = i.v, label.y = 40)                  

Compare 2 paired samples

Perform the test:

                    compare_means(len ~ supp, data = ToothGrowth, paired = TRUE)                  
                    ## # A tibble: 1 ten 8 ##     .y. group1 group2       p   p.adj p.format p.signif   method ##                                                                                                                                                                                                                                                                              ## one   len     OJ     VC 0.00431 0.00431   0.0043       ** Wilcoxon                                                                                                                                                                                                                                          

Visualize paired information using the ggpaired() function:

                    ggpaired(ToothGrowth, x = "supp", y = "len",          color = "supp", line.color = "greyness", line.size = 0.4,          palette = "jco")+   stat_compare_means(paired = Truthful)                  

Compare more than than two groups

  • Global examination:
                    # Global examination compare_means(len ~ dose,  data = ToothGrowth, method = "anova")                  
                    ## # A tibble: one x 6 ##     .y.        p    p.adj p.format p.signif method ##                                                                                                                                                                                                    ## 1   len 9.53e-16 9.53e-16  nine.5e-16     ****  Anova                                                                                                                                                                        

Plot with global p-value:

                    # Default method = "kruskal.test" for multiple groups ggboxplot(ToothGrowth, x = "dose", y = "len",           color = "dose", palette = "jco")+   stat_compare_means() # Change method to anova ggboxplot(ToothGrowth, x = "dose", y = "len",           color = "dose", palette = "jco")+   stat_compare_means(method = "anova")                  

  • Pairwise comparisons. If the group variable contains more ii levels, so pairwise tests volition be performed automatically. The default method is "wilcox.test". Y'all tin change this to "t.test".
                    # Perorm pairwise comparisons compare_means(len ~ dose,  data = ToothGrowth)                  
                    ## # A tibble: 3 10 8 ##     .y. group1 group2        p    p.adj p.format p.signif   method ##                                                                                                                                                                                                                                                                              ## 1   len    0.5      1 7.02e-06 i.40e-05  7.0e-06     **** Wilcoxon ## 2   len    0.v      2 8.41e-08 2.52e-07  8.4e-08     **** Wilcoxon ## three   len      1      two 1.77e-04 1.77e-04  0.00018      *** Wilcoxon                                                                                                                                                                                                                                          
                    # Visualize: Specify the comparisons you want my_comparisons <- list( c("0.v", "one"), c("1", "ii"), c("0.5", "2") ) ggboxplot(ToothGrowth, x = "dose", y = "len",           color = "dose", palette = "jco")+    stat_compare_means(comparisons = my_comparisons)+ # Add pairwise comparisons p-value   stat_compare_means(label.y = 50)     # Add global p-value                  

If y'all want to specify the precise y location of bars, apply the argument label.y:

                    ggboxplot(ToothGrowth, x = "dose", y = "len",           colour = "dose", palette = "jco")+    stat_compare_means(comparisons = my_comparisons, characterization.y = c(29, 35, 40))+   stat_compare_means(characterization.y = 45)                  

(Adding bars, connecting compared groups, has been facilitated by the ggsignif R bundle )

  • Multiple pairwise tests against a reference group:
                    # Pairwise comparison against reference compare_means(len ~ dose,  information = ToothGrowth, ref.group = "0.5",               method = "t.exam")                  
                    ## # A tibble: 2 x eight ##     .y. group1 group2        p    p.adj p.format p.signif method ##                                                                                                                                                                                                                                                                              ## 1   len    0.5      1 6.70e-09 6.70e-09  vi.7e-09     **** T-examination ## two   len    0.five      2 1.47e-sixteen 2.94e-16  < 2e-16     **** T-test                                                                                                                                                                                                                                          
                    # Visualize ggboxplot(ToothGrowth, 10 = "dose", y = "len",           color = "dose", palette = "jco")+   stat_compare_means(method = "anova", label.y = 40)+      # Add global p-value   stat_compare_means(label = "p.signif", method = "t.test",                      ref.group = "0.5")                    # Pairwise comparing against reference                  

  • Multiple pairwise tests against all (base-hateful):
                    # Comparison of each group against base-hateful compare_means(len ~ dose,  data = ToothGrowth, ref.grouping = ".all.",               method = "t.exam")                  
                    ## # A tibble: 3 x eight ##     .y. group1 group2        p    p.adj p.format p.signif method ##                                                                                                                                                                                                                                                                              ## i   len  .all.    0.5 1.24e-06 3.73e-06  1.2e-06     **** T-test ## ii   len  .all.      1 five.67e-01 5.67e-01     0.57       ns T-test ## 3   len  .all.      2 1.37e-05 2.74e-05  1.4e-05     **** T-test                                                                                                                                                                                                                                          
                    # Visualize ggboxplot(ToothGrowth, ten = "dose", y = "len",           color = "dose", palette = "jco")+   stat_compare_means(method = "anova", label.y = twoscore)+      # Add global p-value   stat_compare_means(label = "p.signif", method = "t.test",                      ref.group = ".all.")                  # Pairwise comparison against all                  

A typical situation, where pairwise comparisons against "all" tin be useful, is illustrated here using the myeloma data set available on Github.

We'll plot the expression profile of the DEPDC1 factor according to the patients' molecular groups. Nosotros want to know if in that location is any difference between groups. If yeah, where the difference is?

To answer to this question, you tin can perform a pairwise comparison between all the 7 groups. This will lead to a lot of comparisons betwixt all possible combinations. If you lot take many groups, as here, it might be difficult to interpret.

Another easy solution is to compare each of the seven groups against "all" (i.e. base-hateful). When the test is significant, then you tin conclude that DEPDC1 is significantly overexpressed or downexpressed in a grouping xxx compared to all.

                    # Load myeloma information from GitHub myeloma <- read.delim("https://raw.githubusercontent.com/kassambara/data/master/myeloma.txt") # Perform the exam compare_means(DEPDC1 ~ molecular_group,  data = myeloma,               ref.grouping = ".all.", method = "t.test")                  
                    ## # A tibble: 7 x 8 ##      .y. group1           group2        p   p.adj p.format p.signif method ##                                                                                                                                                                                                                                                                              ## 1 DEPDC1  .all.       Cyclin D-one 0.149690 0.44907  0.14969       ns T-test ## 2 DEPDC1  .all.       Cyclin D-2 0.523143 one.00000  0.52314       ns T-test ## three DEPDC1  .all.     Hyperdiploid 0.000282 0.00169  0.00028      *** T-exam ## iv DEPDC1  .all. Low bone disease 0.005084 0.02542  0.00508       ** T-test ## 5 DEPDC1  .all.              MAF 0.086107 0.34443  0.08611       ns T-test ## vi DEPDC1  .all.            MMSET 0.576291 i.00000  0.57629       ns T-exam ## # ... with 1 more rows                                                                                                                                                                                                                                          
                    # Visualize the expression profile ggboxplot(myeloma, 10 = "molecular_group", y = "DEPDC1", colour = "molecular_group",            add = "jitter", fable = "none") +   rotate_x_text(angle = 45)+   geom_hline(yintercept = mean(myeloma$DEPDC1), linetype = two)+ # Add horizontal line at base hateful   stat_compare_means(method = "anova", label.y = 1600)+        # Add global annova p-value   stat_compare_means(characterization = "p.signif", method = "t.test",                      ref.group = ".all.")                      # Pairwise comparison confronting all                  

From the plot above, we can conclude that DEPDC1 is significantly overexpressed in proliferation group and, it'due south significantly downexpressed in Hyperdiploid and Low os illness compared to all.

Note that, if y'all want to hide the ns symbol, specify the statement hide.ns = TRUE.

                    # Visualize the expression contour ggboxplot(myeloma, 10 = "molecular_group", y = "DEPDC1", color = "molecular_group",            add = "jitter", legend = "none") +   rotate_x_text(angle = 45)+   geom_hline(yintercept = mean(myeloma$DEPDC1), linetype = 2)+ # Add together horizontal line at base mean   stat_compare_means(method = "anova", characterization.y = 1600)+        # Add global annova p-value   stat_compare_means(label = "p.signif", method = "t.test",                      ref.group = ".all.", hide.ns = TRUE)      # Pairwise comparison against all                  

Multiple grouping variables

  • Two independent sample comparisons after grouping the data by another variable:

Perform the examination:

                    compare_means(len ~ supp, data = ToothGrowth,                grouping.by = "dose")                  
                    ## # A tibble: three x 9 ##    dose   .y. group1 group2       p  p.adj p.format p.signif   method ##                                                                                                                                                                                                                                                                                                                      ## 1   0.v   len     OJ     VC 0.02319 0.0464    0.023        * Wilcoxon ## 2   1.0   len     OJ     VC 0.00403 0.0121    0.004       ** Wilcoxon ## 3   2.0   len     OJ     VC one.00000 1.0000    ane.000       ns Wilcoxon                                                                                                                                                                                                                                                                              

In the case above, for each level of the variable "dose", we compare the means of the variable "len" in the different groups formed by the grouping variable "supp".

Visualize (1/two). Create a multi-panel box plots facetted by group (here, "dose"):

                    # Box plot facetted by "dose" p <- ggboxplot(ToothGrowth, x = "supp", y = "len",           color = "supp", palette = "jco",           add = "jitter",           facet.by = "dose", short.panel.labs = Simulated) # Utilise only p.format equally label. Remove method name. p + stat_compare_means(label = "p.format")                  

                    # Or use significance symbol as characterization p + stat_compare_means(label =  "p.signif", label.x = i.5)                  

To hide the 'ns' symbol, use the argument hide.ns = TRUE.

Visualize (2/ii). Create one single panel with all box plots. Plot y = "len" by 10 = "dose" and color by "supp":

                    p <- ggboxplot(ToothGrowth, x = "dose", y = "len",           colour = "supp", palette = "jco",           add = "jitter") p + stat_compare_means(aes(group = supp))                  

                    # Show but p-value p + stat_compare_means(aes(grouping = supp), label = "p.format")                  

                    # Use significance symbol every bit label p + stat_compare_means(aes(group = supp), label = "p.signif")                  

  • Paired sample comparisons afterwards grouping the information by some other variable:

Perform the test:

                    compare_means(len ~ supp, data = ToothGrowth,                grouping.by = "dose", paired = True)                  
                    ## # A tibble: 3 10 nine ##    dose   .y. group1 group2      p  p.adj p.format p.signif   method ##                                                                                                                                                                                                                                                                                                                      ## 1   0.5   len     OJ     VC 0.0330 0.0659    0.033        * Wilcoxon ## 2   one.0   len     OJ     VC 0.0191 0.0572    0.019        * Wilcoxon ## 3   two.0   len     OJ     VC 1.0000 1.0000    1.000       ns Wilcoxon                                                                                                                                                                                                                                                                              

Visualize. Create a multi-panel box plots facetted past group (here, "dose"):

                    # Box plot facetted by "dose" p <- ggpaired(ToothGrowth, ten = "supp", y = "len",           color = "supp", palette = "jco",            line.color = "greyness", line.size = 0.iv,           facet.past = "dose", short.panel.labs = FALSE) # Use only p.format as label. Remove method name. p + stat_compare_means(characterization = "p.format", paired = Truthful)                  

Other plot types

  • Bar and line plots (one grouping variable):
                    # Bar plot of mean +/-se ggbarplot(ToothGrowth, x = "dose", y = "len", add = "mean_se")+   stat_compare_means() +                                         # Global p-value   stat_compare_means(ref.group = "0.5", label = "p.signif",                      label.y = c(22, 29))                   # compare to ref.group # Line plot of hateful +/-se ggline(ToothGrowth, x = "dose", y = "len", add together = "mean_se")+   stat_compare_means() +                                         # Global p-value   stat_compare_means(ref.group = "0.5", characterization = "p.signif",                      label.y = c(22, 29))                                      

  • Bar and line plots (two grouping variables):
                    ggbarplot(ToothGrowth, x = "dose", y = "len", add together = "mean_se",           colour = "supp", palette = "jco",            position = position_dodge(0.8))+   stat_compare_means(aes(group = supp), label = "p.signif", label.y = 29) ggline(ToothGrowth, x = "dose", y = "len", add = "mean_se",           colour = "supp", palette = "jco")+   stat_compare_means(aes(group = supp), characterization = "p.signif",                       characterization.y = c(16, 25, 29))                  


Source: http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/76-add-p-values-and-significance-levels-to-ggplots/

Posted by: orourkereack1945.blogspot.com

0 Response to "How To Put P Value On Graph"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel