Introduction

fullsibQTL is an R package to perform QTL scans and analysis using genetic maps generated from F1 outcross populations using onemap (which fullsibQTL also depends on). This tutorial demonstrates some of the functionality of fullsibQTL by making use of an F1 apple marker dataset from Gardner et al., 2014. The unfiltered VCF file can be downloaded from DataDryad: http://doi.org/10.5061/dryad.55t54. Here, a filtered and converted version of this VCF file is used at the input file to onemap.

This population was derived from a cross between apple varieties with contrasting skin color:

Review: constructing the genetic map using onemap

First, we will generate a genetic map for all 17 chromosomes in apple using onemap. Using the method introduced previously, the first step is to form linkage groups:

library(onemap) #version 2.2.0
library(ggplot2)

apple <- read_onemap(dir = "./input_files", inputfile = "apple_onemap_DP4.txt")
apple.bins <- find_bins(apple, exact = FALSE)
apple.binned <- create_data_bins(apple, apple.bins)
plot_by_segreg_type(apple)
apple.segreg_test <- test_segregation(apple.binned)
plot(apple.segreg_test)
apple.no_dist <- select_segreg(apple.segreg_test, distorted = FALSE, numbers = TRUE) 
apple.twopts <- rf_2pts(apple.binned, LOD = 4, max.rf = 0.4)
apple.mrks.no_dist <- make_seq(apple.twopts, c(apple.no_dist))
apple.LGs <- group(apple.mrks.no_dist, LOD = 11, max.rf = 0.4 )

We can compare these LG against the apple reference using the tabulation method used previously:

chroms <- sapply(strsplit(apple.LGs$marnames,split="-",fixed=T),"[",1)
chr_vs_LG <- data.frame(Marker = apple.LGs$marnames, LG = apple.LGs$groups, Chr = chroms, SeqNum = apple.LGs$seq.num)
selected_LGs <- which(chr_vs_LG$LG %in% c(1:6,8:10,12:14,16:17,19,20,23))
chr_vs_LG <- chr_vs_LG[selected_LGs,]
tab <- table(chr_vs_LG$Chr,chr_vs_LG$LG)

By using additional markers in the input file (by decreasing the 90% quantile depth filter), we now have markers on all chromosomes included in our linkage groups, including chromosome 9 (linkage group 20), which is where we expect a single QTL:

tab[,order(apply(tab,2,which.max))]
##     
##       17   1  14  19  23  10   2   6  13  16  12   4   3   8   5   9  20
##   1   75   1   1   1   1   1   8   3   3   2   0   1   2   1   9   0   1
##   10   0 101   0   0   0   1   2   1   2   0   1   2  11   1   5   2   0
##   11   2   1 166   0   0   1   1   0   0   5   3   0   3   1   3   3   0
##   12   2   2   1 107   0   3   2   0   0   0   0   0   0   1   1   4   0
##   13   1   1   1   5 132   5   3  14   0   0   4   0   4   0   0   2  13
##   14   4   5   0   3   0  94   1   0   0   0   3   0   4   4   1   1   1
##   15   3   3   0   0   0   1 149   0   1   4   0   4   9   4   4   2   3
##   16   2   3   0   0   0   0   0  86   4   0   0   2   0   0   0   0   0
##   17   0   0   0   0   0   1   4   1 108   0   0   0   2   0   2   0   5
##   2    3   5   2   0   0   1   4   2   1 126   0   1   1   0   3   3   2
##   3    3   2   2   1   1   0   0   0   2   0 136   3   5   0   0   1   0
##   4    1   1   0   4   2   0   0   3   0   0   1  88   2   0   5   0   1
##   5    0   4   0   0   1   0   2   7   2   4   1   6 111   0   0   0   2
##   6    3   0   0   0   0   0   0   1   0   3   0   1   4  48   0   1   0
##   7    2   0   1   2   2   0   0   1   0   1   0   3   2   0  79   1   0
##   8    2   3   0   6   0   1   6   0   1   1   2   0  14   0   1 102   0
##   9    0   2   0   2   1   0   2   2   1   0   5   5   9   0   1   4 111

Now we can construct ordered maps for each of these LGs, and save them as individual .rda files:

set_map_fun(type="kosambi")
for(i in c(1:6,8:10,12:14,16:17,19,20,23)){
  apple.LG <- make_seq(apple.LGs, i)
  apple.LG.ord <- order_seq(apple.LG, n.init=5, touchdown=TRUE)
  apple.LG.final <- make_seq(apple.LG.ord,"force")
  ord.filename <- paste("./orders/LG_", i, ".rda", sep = "")
  save(apple.LG.ord, apple.LG.final, apple.LG_mds, file = ord.filename)
  
  rf.filename <- paste("./heatmaps/rf_heatmap_LG_", i, ".pdf", sep = "")
  pdf(rf.filename, width = 30, height = 30)
  print(rf_graph_table(apple.LG.final, mrk.axis = "numbers"))
  dev.off()
}

Now I will read these back into memory, and write out the final map:

for(i in c(1:6,8:10,12:14,16:17,19,20,23)){
  ord.filename <- paste("./orders/LG_", i, ".rda", sep = "")
  load(ord.filename)
  LG_name <- paste("LG", i, "_final", sep = "")
  assign(LG_name, apple.LG.final)
}

write_map(list( LG1_final, LG2_final, LG3_final, LG4_final,
                LG5_final, LG6_final, LG8_final, LG9_final,
                LG10_final, LG12_final, LG13_final, LG14_final,
                LG16_final, LG17_final, LG19_final, LG20_final,
                LG23_final), "map.txt")

p <- draw_map2(LG1_final, LG2_final, LG3_final, LG4_final, LG5_final, 
               LG6_final, LG8_final, LG9_final, LG10_final, LG12_final,
               LG13_final, LG14_final, LG16_final, LG17_final, LG19_final, LG20_final,
               LG23_final, main = "F1 apple map",  group.names = c("LG1", "LG2", "LG3", "LG4", "LG5", 
                                                                   "LG6", "LG8", "LG9", "LG10", "LG12",
                                                                   "LG13", "LG14", "LG16", "LG17", "LG19", "LG20",
                                                                   "LG23"))

The map.txt file contains columns for linkage group, marker name, and cM position:

head(read.table("map.txt"))
##   V1              V2        V3
## 1  1     10-280310-5  0.000000
## 2  1     10-280258-4  7.132403
## 3  1   10-2063680-12 15.537571
## 4  1   10-1887928-11 20.809478
## 5  1   10-1887205-10 24.296178
## 6  1 10-26811389-157 33.077278

And the draw_map2 function produces an .eps file with the complete map represented graphically:

Clearly there is some inflation currently. This is likely both due to local errors in order, as well as a lack of multipoint regression to shrink map distances. The former could be addressed by performing the marker curation steps shown in the earlier onemap tutorial, but due to the uninformative nature of the marker used here, this will likely remain a problem. For now, we will proceed with QTL analysis using fullsibQTL.

The first step in using fullsibQTL is to read in the original input file to one onemap (using read_onemap), and then create the basic data object used by fullsibQTL using the function create_fullsib. Here I have added in the trait data to the onemap input file, which is a binary score indicating whether the apple had red skin, or green/yellow skin.

Performing single QTL analysis using fullsibQTL

library(fullsibQTL) # version 0.0.9010
ROOT <- "~/Box/Courses/Genetic_Mapping_2021/Lectures/Week7/apple/input_files"
fs_data <- read_onemap(dir = ROOT, inputfile = "apple_onemap_DP4_pheno.txt")
## Information of linkage groups, step and mapping function
fsib <- create_fullsib( fs_data,
                        map.list = list( LG1_final, LG2_final, LG3_final, LG4_final,
                                         LG5_final, LG6_final, LG8_final, LG9_final,
                                         LG10_final, LG12_final, LG13_final, LG14_final,
                                         LG16_final, LG17_final, LG19_final, LG20_final,
                                         LG23_final),
                        step = 1, map.function = "kosambi")

Interval mapping can be used to scan for a single QTL using the command im_scan. This performs a scan similar to scanone in qtl, using the EM algorithm to get ML estimates of the mixture model parameters.

im <- im_scan( fullsib = fsib, lg = "all", pheno.col = 1, LOD = TRUE )

LOD profiles can then be plotted for the entire genome, or specific LGs:

plot_fullsibQTL(fullsib = fsib, im, qtlmapping = "Color") + 
  scale_y_continuous(limits = c(0,20)) +
  geom_hline(yintercept = -log10(0.05/1819)) #set the limit to be between 0 and 20

plot_fullsibQTL(fullsib = fsib, fullsib.scan = im, lgs = 16, qtlmapping = "Color") +
  scale_y_continuous(limits = c(0,20)) +
  scale_x_continuous(limits = c(0,305)) + 
  geom_hline(yintercept = -log10(0.05/1819))

This is very similar to the results published by Gardner et al., using single marker regression in r/qtl:

summary returns the most significant markers on each LG:

summary(im)
12-22887360-610  15 141.47384   1.5212291     0
9-29305493-3533  16 313.79783  29.0389503     0
13-11388275-836  17 381.88718   3.3290360     6

We can then look more closely at specific QTL using im_char.

(qtl1 <- im_char(fsib, pheno.col = 1, lg = 16, pos = "9-29305493-3533"))
9-29305493-3533
LG              1.600000e+01
pos             3.137978e+02
-log10(pval)    2.810093e+01
LOD_Ha          2.903897e+01
mu              3.902439e-01
alpha_p        -4.205854e-03
LOD_H1          7.046611e-03
alpha_q        -4.453872e-01
LOD_H2          2.897378e+01
delta_pq        4.205855e-03
LOD_H3          7.046612e-03

These parameters refer to the model described by Gazaffi et al., 2014. If \(P\) is parent 1 and \(Q\) is parent 2 (determined previously in the encoding of the input genotype file), we can represent the two QTL alleles in each parent as \(P^1\) and \(P^2\), and \(Q^1\) and \(Q^2\). Defining the markers flanking this QTL as \(P_{m}\) and \(P_{m+1}\), and \(Q_{m}\) and \(Q_{m+1}\), the cross generating the F1 population can be represented as:

\[P^1_{m}P^1P^1_{m+1} / P^2_{m}P^2P^2_{m+1} \times Q^1_{m}Q^1Q^1_{m+1} / Q^2_{m}Q^2Q^2_{m+1}\]

and the model can then be fit for each QTL: \[ y_j = \mu + \alpha_{P} + \alpha_{Q} + \delta_{PQ} + \epsilon_j \] where \(\alpha_{P}\) and \(\alpha_{Q}\) are the additive effects of the QTL, and \(\delta_{PQ}\) is the dominance effect.

In this case we see that parent 1 is homozygous for QTL that do not affect the trait (alpha_p and LOD_H1 both = ~0). Parent 2 appears to have at least one allele for the QTL which affect the trait (alpha_q = -0.45 and LOD_H2 = 29). In the constrasts for the additive effects, the \(1\) allele is considered as having a positive effect, and the \(2\) allele as a negative effect. The negative sign on alpha_q thus implies that the \(1\) allele in this case decreases the value of the trait.

We can examine the phase of the \(1\) and \(2\) alleles in the parental haplotypes using the function draw_phase:

p <- draw_phase(fsib, qtl1)
p
Printing QTL and its linkage phase between markers across the LG:
  
  Markers                  Position           Parent 1       Parent 2
9-32642558-3548            305.16           a  |  | a        a  |  | b  
9-32642623-3549            305.16           a  |  | b        a  |  | b  
9-28219296-3527            309.48           b  |  | a        a  |  | a  
QTL                        313.80           P0 |  | P0       Q2 |  | Q1 
9-30419493-3537            316.67           a  |  | a        b  |  | a  
9-29597790-3534            316.67           b  |  | a        a  |  | b  
9-31183391-3540            319.76           a  |  | b        a  |  | a  

P1 and Q1 have positive effect (increase phenotypic value)
P2 and Q2 have negative effect (reduce phenotypic value) 
P0 and Q0 have neutral effect (non signif.)

And we can plot these graphically using plot_fullsib_phases:

plot_fullsib_phases(p)