library(kableExtra)
library(onemap) #version 2.2.0: https://github.com/Cristianetaniguti/onemap/
load("appleLG1.rda")
load("appleLG1_D10.rda")
# These .rda files + embedded graphics are located on Canvas in Week7/F1 populations/Input files
Recall the nomenclature used by OneMap for segregation types at a single locus:
Segregation.Type | OneMap.ID |
---|---|
BC_P1 | D1.10 |
BC_P2 | D2.15 |
F_2 | B3.7 |
First, consider two unlinked markers: 10-5663069-8 is a B3.7-type on LG1 and 11-43941-88 is a B3.7 type on LG6
print(apple.twopts, c("10-5743289-11", "11-43941-88"))
## Results of the 2-point analysis for markers: 10-5743289-11 and 11-43941-88
## Criteria: LOD = 4 , Maximum recombination fraction = 0.4
##
## rf LOD
## CC 0.5532463 0.186731672
## CR 0.4463522 0.002495994
## RC 0.4463522 0.002495994
## RR 0.4467537 0.186731672
As we would expect, recombination frequencies are close to 0.5, and LOD scores close to 0, for all four linkage phases.
Second, let’s look at linked markers, starting with two B3.7 type markers on LG1.
print(apple.twopts, c("10-5743289-11", "10-5663069-8"))
## Results of the 2-point analysis for markers: 10-5743289-11 and 10-5663069-8
## Criteria: LOD = 4 , Maximum recombination fraction = 0.4
##
## rf LOD
## CC 0.98258905 31.65528
## CR 0.49999997 0.00001
## RC 0.49999997 0.00001
## RR 0.01741095 31.65528
Here we can see the that the C/R and R/C phases are indistinguishable to each other, and that the rf MLE under CC is equal to 1 minus the estimate under RR.
Third, consider two linked D1.10 type markers.
print(apple.twopts, c("10-88517-0", "10-5663071-9"))
## Results of the 2-point analysis for markers: 10-88517-0 and 10-5663071-9
## Criteria: LOD = 4 , Maximum recombination fraction = 0.4
##
## rf LOD
## CC 0.6627901 2.016122
## CR 0.6627901 2.016122
## RC 0.3372099 2.016122
## RR 0.3372099 2.016122
Here we can see that there is MLE estimates of rf are equivalent under CC and CR, as well as RC and RR. This is because we can only distinguish phase in one of the two parents.
Fourth, consider a “mixed” set of markers: a D1.10 and a B3.7 segregation type. Again, phase can only be distinguished in one of the two parents.
print(apple.twopts, c("10-88517-0", "10-5663069-8"))
## Results of the 2-point analysis for markers: 10-88517-0 and 10-5663069-8
## Criteria: LOD = 4 , Maximum recombination fraction = 0.4
##
## rf LOD
## CC 0.92307411 7.146904
## CR 0.92307411 7.146904
## RC 0.07692589 7.146904
## RR 0.07692589 7.146904
Finally, let’s consider a pair of non-informative markers: a D1.10-type and a D2.15-type.
print(apple.twopts, c("10-88517-0", "10-7884747-17"))
## Results of the 2-point analysis for markers: 10-88517-0 and 10-7884747-17
## Criteria: LOD = 4 , Maximum recombination fraction = 0.4
##
## rf LOD
## CC 0.25 0
## CR 0.25 0
## RC 0.25 0
## RR 0.25 0
Note that recombination frequency is set to 0.25 by default in OneMap.
Now that we have assembled LGs, set can set a mapping function (Haldane or Kosambi), extract a specific LG (again making use of the make_seq
function), and ordering markers using order_seq
:
set_map_fun(type="kosambi")
apple.LG1 <- make_seq(apple.LGs, 1)
apple.LG1.ord <- order_seq(apple.LG1, n.init=5, touchdown=TRUE)
The function compare
will test all possible orders. The number of orders scales according to the factorial of the number of markers (\(n!/2\)), and for each order there will be many non-redunant linkage phases to consider. Consequently, order_seq
selects an informative subset of markers (defined by n.init
) based on segregation type, and performs an exhaustive search using these markers in order to build a scaffold.
Markers are then added to the map one by one, and there most likely position and phases is selected with (by default) a LOD threshold of 3 required to place them on the map. Touchdown repeats this process with THRESH = 1-THRESH
(i.e., a LOD of 2, by default). Note that here, LOD refers to the likelihood of the HMM, and the LOD used for placing a single marker is determined by a Markov chain with this marker, and the two immediately flanking markers.
A final sequence
object can be obtained which either includes ("force"
) or excludes ("safe"
) that could not be uniquely positioned in the map:
(apple.LG1.final <- make_seq(apple.LG1.ord,"force"))
##
## Printing map:
##
## Markers Position Parent 1 Parent 2
##
## 1 10-88517-0 0.00 a | | b a | | a
## 351 14-2370241-358 55.57 a | | a a | | b
## 322 13-20671999-328 55.58 b | | a a | | a
## 365 14-10616501-372 72.32 b | | a a | | a
## 70 10-26811389-70 83.70 a | | a b | | a
## 67 10-26811350-67 87.36 a | | a b | | a
## 366 14-10616518-373 87.36 b | | a a | | a
## 11 10-5663079-10 96.16 b | | a a | | a
## 10 10-5663071-9 97.14 b | | a a | | a
## 9 10-5663069-8 98.61 b | | a a | | b
## 12 10-5743289-11 100.45 a | | b b | | a
## 71 10-26811418-71 103.48 a | | b a | | a
## 1103 8-3455494-1144 103.84 a | | a a | | b
## 20 10-10252902-19 104.67 a | | a a | | b
## 203 12-7982331-205 109.35 a | | a b | | a
## 490 15-43151013-500 110.92 a | | a b | | a
## 966 5-22168488-999 117.12 a | | a b | | a
## 21 10-11529576-20 117.90 a | | a b | | a
## 1249 9-21183743-1293 119.46 a | | a b | | a
## 22 10-12349025-21 121.07 a | | b a | | b
## 23 10-12349031-22 121.65 b | | a b | | a
## 31 10-14200217-30 123.20 a | | a b | | a
## 1250 9-21183765-1294 123.50 b | | a a | | a
## 32 10-14200282-31 131.55 b | | a b | | a
## 37 10-15043612-37 138.52 a | | a b | | a
## 30 10-14200176-29 138.52 b | | a a | | a
## 35 10-15043541-36 140.25 b | | a a | | a
## 40 10-16839342-40 143.76 a | | b a | | b
## 44 10-17303607-44 145.81 a | | a a | | b
## 43 10-17044817-43 146.65 b | | a b | | a
## 42 10-17036719-42 149.52 a | | b a | | b
## 45 10-18121674-45 154.70 b | | a b | | a
## 46 10-18121717-46 154.70 a | | b a | | a
## 63 10-22808308-63 162.23 a | | a a | | b
## 64 10-22950759-64 162.23 a | | b a | | b
## 82 10-31233921-82 165.09 b | | a a | | a
## 65 10-25857556-65 168.67 a | | b a | | a
## 919 5-7477564-949 169.85 a | | b a | | a
## 66 10-26298217-66 175.56 a | | b a | | a
## 72 10-26964301-72 177.40 a | | a a | | b
## 73 10-27482152-73 183.94 b | | a b | | a
## 77 10-29087139-77 189.80 a | | b b | | a
## 522 16-4626286-536 194.43 b | | a a | | a
## 86 10-32751205-86 202.87 a | | b a | | a
## 87 10-32751256-87 203.66 b | | a a | | a
## 75 10-28363838-75 203.66 a | | a a | | b
## 74 10-28308191-74 208.29 a | | a a | | b
## 744 2-27876297-763 216.40 a | | a a | | b
## 746 2-27876354-765 217.55 a | | a a | | b
## 76 10-28627003-76 221.00 a | | a a | | b
## 520 16-4626256-534 224.45 a | | a a | | b
## 521 16-4626258-535 236.16 a | | a a | | b
## 523 16-4626305-537 268.88 a | | a b | | a
##
## 53 markers log-likelihood: -1220.427
Finally, alternative orders can be investigated by performing an exhaustive search with a small sliding window of size defined by ws
:
ripple_seq(apple.LG1.ord, ws = 4, LOD = LOD_sug)
We can evaluate the quality of the order visually using a heatmap of either recombination frequencies (the default), or LOD scores (by setting graph.LOD = T
):
rf_graph_table(apple.LG1.final, mrk.axis = "numbers")
Note that while the HMM provides estimates of recombination frequencies for all adjacent pairs of markers, there is signficant white space in this plot, reflecting non-informative pairs of markers.
We can now draw the final map:
draw_map2(apple.LG1.final, main = "LG1", group.names = c("LG1"))
The first and last markers on the map appear to be inflating the map distance. These markers also showed relatively low recombination fractions in the heatmap above, and may therefore be placed erroneously (recall that LG1 contained markers from many chromosomes based on reference genome assignments). These two markers can be dropped, and the map reconstructed without changing the order of any remaining markers:
apple.LG1_drop_seq <- drop_marker(apple.LG1.final, c(1, 523))
apple.LG1_remap <- map(apple.LG1_drop_seq)
draw_map2(apple.LG1_remap, main = "LG1", group.names = c("LG1"))
The resulting map appears to have a more biologically-reasonable total length:
We can also generate a data.frame of progeny haplotypes for use in QTL analysis, and plot them to visualize recombination:
apple.progeny_haplot <- progeny_haplotypes(apple.LG1.final, most_likely = FALSE, ind = c(5:8), group_names = "LG1")
plot(apple.progeny_haplot, position = "split")
Now let’s remake the map of LG1 using just D1.10 makers
apple.D10 <- read_onemap(dir = "./input_files", inputfile = "apple_onemap_D10.txt")
apple.D10.bins <- find_bins(apple.D10, exact = FALSE)
apple.D10.binned <- create_data_bins(apple.D10, apple.D10.bins)
apple.D10.segreg_test <- test_segregation(apple.D10.binned)
apple.D10.no_dist <- select_segreg(apple.D10.segreg_test, distorted = FALSE, numbers = TRUE)
apple.D10.twopts <- rf_2pts(apple.D10.binned, LOD = 4, max.rf = 0.4)
apple.D10.mrks.no_dist <- make_seq(apple.D10.twopts, c(apple.D10.no_dist))
apple.D10.LGs <- group(apple.D10.mrks.no_dist, LOD = 4, max.rf = 0.3)
set_map_fun(type="kosambi")
apple.D10.LG1 <- make_seq(apple.D10.LGs, 1)
apple.D10.LG1.ord <- order_seq(apple.D10.LG1, n.init=5, touchdown=TRUE)
(apple.D10.LG1.final <- make_seq(apple.D10.LG1.ord,"force"))
The heat map now has no white squares, since recombination frequency can be estimated between every pair of markers.
rf_graph_table(apple.D10.LG1.final, mrk.axis = "numbers")
This procedure, however, is similar to the two-way pseudo-testcross strategy, and thus only allows for observation of recombination in one parent, as can again be seen in the in progeny haplotypes:
apple.D10.progeny_haplot <- progeny_haplotypes(apple.D10.LG1.final, most_likely = FALSE, ind = c(13,14,15,17), group_names = "LG1")
plot(apple.D10.progeny_haplot, position = "split")