fullsibQTL
is an R package to perform QTL scans and analysis using genetic maps generated from F1 outcross populations using onemap
(which fullsibQTL
also depends on). This tutorial demonstrates some of the functionality of fullsibQTL by making use of an F1 apple marker dataset from Gardner et al., 2014. The unfiltered VCF file can be downloaded from DataDryad: http://doi.org/10.5061/dryad.55t54. Here, a filtered and converted version of this VCF file is used at the input file to onemap
.
This population was derived from a cross between apple varieties with contrasting skin color:
onemap
First, we will generate a genetic map for all 17 chromosomes in apple using onemap
. Using the method introduced previously, the first step is to form linkage groups:
library(onemap) #version 2.2.0
library(ggplot2)
apple <- read_onemap(dir = "./input_files", inputfile = "apple_onemap_DP4.txt")
apple.bins <- find_bins(apple, exact = FALSE)
apple.binned <- create_data_bins(apple, apple.bins)
plot_by_segreg_type(apple)
apple.segreg_test <- test_segregation(apple.binned)
plot(apple.segreg_test)
apple.no_dist <- select_segreg(apple.segreg_test, distorted = FALSE, numbers = TRUE)
apple.twopts <- rf_2pts(apple.binned, LOD = 4, max.rf = 0.4)
apple.mrks.no_dist <- make_seq(apple.twopts, c(apple.no_dist))
apple.LGs <- group(apple.mrks.no_dist, LOD = 11, max.rf = 0.4 )
We can compare these LG against the apple reference using the tabulation method used previously:
chroms <- sapply(strsplit(apple.LGs$marnames,split="-",fixed=T),"[",1)
chr_vs_LG <- data.frame(Marker = apple.LGs$marnames, LG = apple.LGs$groups, Chr = chroms, SeqNum = apple.LGs$seq.num)
selected_LGs <- which(chr_vs_LG$LG %in% c(1:6,8:10,12:14,16:17,19,20,23))
chr_vs_LG <- chr_vs_LG[selected_LGs,]
tab <- table(chr_vs_LG$Chr,chr_vs_LG$LG)
By using additional markers in the input file (by decreasing the 90% quantile depth filter), we now have markers on all chromosomes included in our linkage groups, including chromosome 9 (linkage group 20), which is where we expect a single QTL:
tab[,order(apply(tab,2,which.max))]
##
## 17 1 14 19 23 10 2 6 13 16 12 4 3 8 5 9 20
## 1 75 1 1 1 1 1 8 3 3 2 0 1 2 1 9 0 1
## 10 0 101 0 0 0 1 2 1 2 0 1 2 11 1 5 2 0
## 11 2 1 166 0 0 1 1 0 0 5 3 0 3 1 3 3 0
## 12 2 2 1 107 0 3 2 0 0 0 0 0 0 1 1 4 0
## 13 1 1 1 5 132 5 3 14 0 0 4 0 4 0 0 2 13
## 14 4 5 0 3 0 94 1 0 0 0 3 0 4 4 1 1 1
## 15 3 3 0 0 0 1 149 0 1 4 0 4 9 4 4 2 3
## 16 2 3 0 0 0 0 0 86 4 0 0 2 0 0 0 0 0
## 17 0 0 0 0 0 1 4 1 108 0 0 0 2 0 2 0 5
## 2 3 5 2 0 0 1 4 2 1 126 0 1 1 0 3 3 2
## 3 3 2 2 1 1 0 0 0 2 0 136 3 5 0 0 1 0
## 4 1 1 0 4 2 0 0 3 0 0 1 88 2 0 5 0 1
## 5 0 4 0 0 1 0 2 7 2 4 1 6 111 0 0 0 2
## 6 3 0 0 0 0 0 0 1 0 3 0 1 4 48 0 1 0
## 7 2 0 1 2 2 0 0 1 0 1 0 3 2 0 79 1 0
## 8 2 3 0 6 0 1 6 0 1 1 2 0 14 0 1 102 0
## 9 0 2 0 2 1 0 2 2 1 0 5 5 9 0 1 4 111
Now we can construct ordered maps for each of these LGs, and save them as individual .rda files:
set_map_fun(type="kosambi")
for(i in c(1:6,8:10,12:14,16:17,19,20,23)){
apple.LG <- make_seq(apple.LGs, i)
apple.LG.ord <- order_seq(apple.LG, n.init=5, touchdown=TRUE)
apple.LG.final <- make_seq(apple.LG.ord,"force")
ord.filename <- paste("./orders/LG_", i, ".rda", sep = "")
save(apple.LG.ord, apple.LG.final, apple.LG_mds, file = ord.filename)
rf.filename <- paste("./heatmaps/rf_heatmap_LG_", i, ".pdf", sep = "")
pdf(rf.filename, width = 30, height = 30)
print(rf_graph_table(apple.LG.final, mrk.axis = "numbers"))
dev.off()
}
Now I will read these back into memory, and write out the final map:
for(i in c(1:6,8:10,12:14,16:17,19,20,23)){
ord.filename <- paste("./orders/LG_", i, ".rda", sep = "")
load(ord.filename)
LG_name <- paste("LG", i, "_final", sep = "")
assign(LG_name, apple.LG.final)
}
write_map(list( LG1_final, LG2_final, LG3_final, LG4_final,
LG5_final, LG6_final, LG8_final, LG9_final,
LG10_final, LG12_final, LG13_final, LG14_final,
LG16_final, LG17_final, LG19_final, LG20_final,
LG23_final), "map.txt")
p <- draw_map2(LG1_final, LG2_final, LG3_final, LG4_final, LG5_final,
LG6_final, LG8_final, LG9_final, LG10_final, LG12_final,
LG13_final, LG14_final, LG16_final, LG17_final, LG19_final, LG20_final,
LG23_final, main = "F1 apple map", group.names = c("LG1", "LG2", "LG3", "LG4", "LG5",
"LG6", "LG8", "LG9", "LG10", "LG12",
"LG13", "LG14", "LG16", "LG17", "LG19", "LG20",
"LG23"))
The map.txt
file contains columns for linkage group, marker name, and cM position:
head(read.table("map.txt"))
## V1 V2 V3
## 1 1 10-280310-5 0.000000
## 2 1 10-280258-4 7.132403
## 3 1 10-2063680-12 15.537571
## 4 1 10-1887928-11 20.809478
## 5 1 10-1887205-10 24.296178
## 6 1 10-26811389-157 33.077278
And the draw_map2 function produces an .eps
file with the complete map represented graphically:
Clearly there is some inflation currently. This is likely both due to local errors in order, as well as a lack of multipoint regression to shrink map distances. The former could be addressed by performing the marker curation steps shown in the earlier onemap
tutorial, but due to the uninformative nature of the marker used here, this will likely remain a problem. For now, we will proceed with QTL analysis using fullsibQTL
.
The first step in using fullsibQTL
is to read in the original input file to one onemap
(using read_onemap
), and then create the basic data object used by fullsibQTL
using the function create_fullsib
. Here I have added in the trait data to the onemap
input file, which is a binary score indicating whether the apple had red skin, or green/yellow skin.
fullsibQTL
library(fullsibQTL) # version 0.0.9010
ROOT <- "~/Box/Courses/Genetic_Mapping_2021/Lectures/Week7/apple/input_files"
fs_data <- read_onemap(dir = ROOT, inputfile = "apple_onemap_DP4_pheno.txt")
## Information of linkage groups, step and mapping function
fsib <- create_fullsib( fs_data,
map.list = list( LG1_final, LG2_final, LG3_final, LG4_final,
LG5_final, LG6_final, LG8_final, LG9_final,
LG10_final, LG12_final, LG13_final, LG14_final,
LG16_final, LG17_final, LG19_final, LG20_final,
LG23_final),
step = 1, map.function = "kosambi")
Interval mapping can be used to scan for a single QTL using the command im_scan
. This performs a scan similar to scanone
in qtl
, using the EM algorithm to get ML estimates of the mixture model parameters.
im <- im_scan( fullsib = fsib, lg = "all", pheno.col = 1, LOD = TRUE )
LOD profiles can then be plotted for the entire genome, or specific LGs:
plot_fullsibQTL(fullsib = fsib, im, qtlmapping = "Color") +
scale_y_continuous(limits = c(0,20)) +
geom_hline(yintercept = -log10(0.05/1819)) #set the limit to be between 0 and 20
plot_fullsibQTL(fullsib = fsib, fullsib.scan = im, lgs = 16, qtlmapping = "Color") +
scale_y_continuous(limits = c(0,20)) +
scale_x_continuous(limits = c(0,305)) +
geom_hline(yintercept = -log10(0.05/1819))
This is very similar to the results published by Gardner et al., using single marker regression in r/qtl
:
summary
returns the most significant markers on each LG:
summary(im)
12-22887360-610 15 141.47384 1.5212291 0
9-29305493-3533 16 313.79783 29.0389503 0
13-11388275-836 17 381.88718 3.3290360 6
We can then look more closely at specific QTL using im_char
.
(qtl1 <- im_char(fsib, pheno.col = 1, lg = 16, pos = "9-29305493-3533"))
9-29305493-3533
LG 1.600000e+01
pos 3.137978e+02
-log10(pval) 2.810093e+01
LOD_Ha 2.903897e+01
mu 3.902439e-01
alpha_p -4.205854e-03
LOD_H1 7.046611e-03
alpha_q -4.453872e-01
LOD_H2 2.897378e+01
delta_pq 4.205855e-03
LOD_H3 7.046612e-03
These parameters refer to the model described by Gazaffi et al., 2014. If \(P\) is parent 1 and \(Q\) is parent 2 (determined previously in the encoding of the input genotype file), we can represent the two QTL alleles in each parent as \(P^1\) and \(P^2\), and \(Q^1\) and \(Q^2\). Defining the markers flanking this QTL as \(P_{m}\) and \(P_{m+1}\), and \(Q_{m}\) and \(Q_{m+1}\), the cross generating the F1 population can be represented as:
\[P^1_{m}P^1P^1_{m+1} / P^2_{m}P^2P^2_{m+1} \times Q^1_{m}Q^1Q^1_{m+1} / Q^2_{m}Q^2Q^2_{m+1}\]
and the model can then be fit for each QTL: \[ y_j = \mu + \alpha_{P} + \alpha_{Q} + \delta_{PQ} + \epsilon_j \] where \(\alpha_{P}\) and \(\alpha_{Q}\) are the additive effects of the QTL, and \(\delta_{PQ}\) is the dominance effect.
In this case we see that parent 1 is homozygous for QTL that do not affect the trait (alpha_p
and LOD_H1
both = ~0). Parent 2 appears to have at least one allele for the QTL which affect the trait (alpha_q
= -0.45 and LOD_H2
= 29). In the constrasts for the additive effects, the \(1\) allele is considered as having a positive effect, and the \(2\) allele as a negative effect. The negative sign on alpha_q
thus implies that the \(1\) allele in this case decreases the value of the trait.
We can examine the phase of the \(1\) and \(2\) alleles in the parental haplotypes using the function draw_phase
:
p <- draw_phase(fsib, qtl1)
p
Printing QTL and its linkage phase between markers across the LG:
Markers Position Parent 1 Parent 2
9-32642558-3548 305.16 a | | a a | | b
9-32642623-3549 305.16 a | | b a | | b
9-28219296-3527 309.48 b | | a a | | a
QTL 313.80 P0 | | P0 Q2 | | Q1
9-30419493-3537 316.67 a | | a b | | a
9-29597790-3534 316.67 b | | a a | | b
9-31183391-3540 319.76 a | | b a | | a
P1 and Q1 have positive effect (increase phenotypic value)
P2 and Q2 have negative effect (reduce phenotypic value)
P0 and Q0 have neutral effect (non signif.)
And we can plot these graphically using plot_fullsib_phases
:
plot_fullsib_phases(p)