library(kableExtra)
library(onemap) #version 2.2.0: https://github.com/Cristianetaniguti/onemap/
load("appleLG1.rda")
load("appleLG1_D10.rda")
# These .rda files + embedded graphics are located on Canvas in Week7/F1 populations/Input files

Recall the nomenclature used by OneMap for segregation types at a single locus:

Segregation.Type OneMap.ID
BC_P1 D1.10
BC_P2 D2.15
F_2 B3.7

First, consider two unlinked markers: 10-5663069-8 is a B3.7-type on LG1 and 11-43941-88 is a B3.7 type on LG6

print(apple.twopts, c("10-5743289-11", "11-43941-88"))
##   Results of the 2-point analysis for markers: 10-5743289-11 and 11-43941-88 
##   Criteria: LOD =  4 , Maximum recombination fraction =  0.4 
## 
##           rf         LOD
## CC 0.5532463 0.186731672
## CR 0.4463522 0.002495994
## RC 0.4463522 0.002495994
## RR 0.4467537 0.186731672

As we would expect, recombination frequencies are close to 0.5, and LOD scores close to 0, for all four linkage phases.

Second, let’s look at linked markers, starting with two B3.7 type markers on LG1.

print(apple.twopts, c("10-5743289-11", "10-5663069-8"))
##   Results of the 2-point analysis for markers: 10-5743289-11 and 10-5663069-8 
##   Criteria: LOD =  4 , Maximum recombination fraction =  0.4 
## 
##            rf      LOD
## CC 0.98258905 31.65528
## CR 0.49999997  0.00001
## RC 0.49999997  0.00001
## RR 0.01741095 31.65528

Here we can see the that the C/R and R/C phases are indistinguishable to each other, and that the rf MLE under CC is equal to 1 minus the estimate under RR.

Third, consider two linked D1.10 type markers.

print(apple.twopts, c("10-88517-0", "10-5663071-9"))
##   Results of the 2-point analysis for markers: 10-88517-0 and 10-5663071-9 
##   Criteria: LOD =  4 , Maximum recombination fraction =  0.4 
## 
##           rf      LOD
## CC 0.6627901 2.016122
## CR 0.6627901 2.016122
## RC 0.3372099 2.016122
## RR 0.3372099 2.016122

Here we can see that there is MLE estimates of rf are equivalent under CC and CR, as well as RC and RR. This is because we can only distinguish phase in one of the two parents.

Fourth, consider a “mixed” set of markers: a D1.10 and a B3.7 segregation type. Again, phase can only be distinguished in one of the two parents.

print(apple.twopts, c("10-88517-0", "10-5663069-8"))
##   Results of the 2-point analysis for markers: 10-88517-0 and 10-5663069-8 
##   Criteria: LOD =  4 , Maximum recombination fraction =  0.4 
## 
##            rf      LOD
## CC 0.92307411 7.146904
## CR 0.92307411 7.146904
## RC 0.07692589 7.146904
## RR 0.07692589 7.146904

Finally, let’s consider a pair of non-informative markers: a D1.10-type and a D2.15-type.

print(apple.twopts, c("10-88517-0", "10-7884747-17"))
##   Results of the 2-point analysis for markers: 10-88517-0 and 10-7884747-17 
##   Criteria: LOD =  4 , Maximum recombination fraction =  0.4 
## 
##      rf LOD
## CC 0.25   0
## CR 0.25   0
## RC 0.25   0
## RR 0.25   0

Note that recombination frequency is set to 0.25 by default in OneMap.

Now that we have assembled LGs, set can set a mapping function (Haldane or Kosambi), extract a specific LG (again making use of the make_seq function), and ordering markers using order_seq:

set_map_fun(type="kosambi")
apple.LG1 <- make_seq(apple.LGs, 1)
apple.LG1.ord <- order_seq(apple.LG1, n.init=5, touchdown=TRUE)

The function compare will test all possible orders. The number of orders scales according to the factorial of the number of markers (\(n!/2\)), and for each order there will be many non-redunant linkage phases to consider. Consequently, order_seq selects an informative subset of markers (defined by n.init) based on segregation type, and performs an exhaustive search using these markers in order to build a scaffold.

Markers are then added to the map one by one, and there most likely position and phases is selected with (by default) a LOD threshold of 3 required to place them on the map. Touchdown repeats this process with THRESH = 1-THRESH (i.e., a LOD of 2, by default). Note that here, LOD refers to the likelihood of the HMM, and the LOD used for placing a single marker is determined by a Markov chain with this marker, and the two immediately flanking markers.

A final sequence object can be obtained which either includes ("force") or excludes ("safe") that could not be uniquely positioned in the map:

(apple.LG1.final <- make_seq(apple.LG1.ord,"force"))
## 
## Printing map:
## 
## Markers                       Position           Parent 1       Parent 2
## 
##    1 10-88517-0                   0.00           a |  | b       a |  | a 
##  351 14-2370241-358              55.57           a |  | a       a |  | b 
##  322 13-20671999-328             55.58           b |  | a       a |  | a 
##  365 14-10616501-372             72.32           b |  | a       a |  | a 
##   70 10-26811389-70              83.70           a |  | a       b |  | a 
##   67 10-26811350-67              87.36           a |  | a       b |  | a 
##  366 14-10616518-373             87.36           b |  | a       a |  | a 
##   11 10-5663079-10               96.16           b |  | a       a |  | a 
##   10 10-5663071-9                97.14           b |  | a       a |  | a 
##    9 10-5663069-8                98.61           b |  | a       a |  | b 
##   12 10-5743289-11              100.45           a |  | b       b |  | a 
##   71 10-26811418-71             103.48           a |  | b       a |  | a 
## 1103 8-3455494-1144             103.84           a |  | a       a |  | b 
##   20 10-10252902-19             104.67           a |  | a       a |  | b 
##  203 12-7982331-205             109.35           a |  | a       b |  | a 
##  490 15-43151013-500            110.92           a |  | a       b |  | a 
##  966 5-22168488-999             117.12           a |  | a       b |  | a 
##   21 10-11529576-20             117.90           a |  | a       b |  | a 
## 1249 9-21183743-1293            119.46           a |  | a       b |  | a 
##   22 10-12349025-21             121.07           a |  | b       a |  | b 
##   23 10-12349031-22             121.65           b |  | a       b |  | a 
##   31 10-14200217-30             123.20           a |  | a       b |  | a 
## 1250 9-21183765-1294            123.50           b |  | a       a |  | a 
##   32 10-14200282-31             131.55           b |  | a       b |  | a 
##   37 10-15043612-37             138.52           a |  | a       b |  | a 
##   30 10-14200176-29             138.52           b |  | a       a |  | a 
##   35 10-15043541-36             140.25           b |  | a       a |  | a 
##   40 10-16839342-40             143.76           a |  | b       a |  | b 
##   44 10-17303607-44             145.81           a |  | a       a |  | b 
##   43 10-17044817-43             146.65           b |  | a       b |  | a 
##   42 10-17036719-42             149.52           a |  | b       a |  | b 
##   45 10-18121674-45             154.70           b |  | a       b |  | a 
##   46 10-18121717-46             154.70           a |  | b       a |  | a 
##   63 10-22808308-63             162.23           a |  | a       a |  | b 
##   64 10-22950759-64             162.23           a |  | b       a |  | b 
##   82 10-31233921-82             165.09           b |  | a       a |  | a 
##   65 10-25857556-65             168.67           a |  | b       a |  | a 
##  919 5-7477564-949              169.85           a |  | b       a |  | a 
##   66 10-26298217-66             175.56           a |  | b       a |  | a 
##   72 10-26964301-72             177.40           a |  | a       a |  | b 
##   73 10-27482152-73             183.94           b |  | a       b |  | a 
##   77 10-29087139-77             189.80           a |  | b       b |  | a 
##  522 16-4626286-536             194.43           b |  | a       a |  | a 
##   86 10-32751205-86             202.87           a |  | b       a |  | a 
##   87 10-32751256-87             203.66           b |  | a       a |  | a 
##   75 10-28363838-75             203.66           a |  | a       a |  | b 
##   74 10-28308191-74             208.29           a |  | a       a |  | b 
##  744 2-27876297-763             216.40           a |  | a       a |  | b 
##  746 2-27876354-765             217.55           a |  | a       a |  | b 
##   76 10-28627003-76             221.00           a |  | a       a |  | b 
##  520 16-4626256-534             224.45           a |  | a       a |  | b 
##  521 16-4626258-535             236.16           a |  | a       a |  | b 
##  523 16-4626305-537             268.88           a |  | a       b |  | a 
## 
## 53 markers            log-likelihood: -1220.427

Finally, alternative orders can be investigated by performing an exhaustive search with a small sliding window of size defined by ws:

ripple_seq(apple.LG1.ord, ws = 4, LOD = LOD_sug)

We can evaluate the quality of the order visually using a heatmap of either recombination frequencies (the default), or LOD scores (by setting graph.LOD = T):

rf_graph_table(apple.LG1.final, mrk.axis = "numbers")

Note that while the HMM provides estimates of recombination frequencies for all adjacent pairs of markers, there is signficant white space in this plot, reflecting non-informative pairs of markers.

We can now draw the final map:

draw_map2(apple.LG1.final, main = "LG1", group.names = c("LG1"))

The first and last markers on the map appear to be inflating the map distance. These markers also showed relatively low recombination fractions in the heatmap above, and may therefore be placed erroneously (recall that LG1 contained markers from many chromosomes based on reference genome assignments). These two markers can be dropped, and the map reconstructed without changing the order of any remaining markers:

apple.LG1_drop_seq <- drop_marker(apple.LG1.final, c(1, 523))
apple.LG1_remap <- map(apple.LG1_drop_seq)
draw_map2(apple.LG1_remap, main = "LG1", group.names = c("LG1"))

The resulting map appears to have a more biologically-reasonable total length:

We can also generate a data.frame of progeny haplotypes for use in QTL analysis, and plot them to visualize recombination:

apple.progeny_haplot <- progeny_haplotypes(apple.LG1.final, most_likely = FALSE, ind = c(5:8), group_names = "LG1")
plot(apple.progeny_haplot, position = "split")

Now let’s remake the map of LG1 using just D1.10 makers

apple.D10 <- read_onemap(dir = "./input_files", inputfile = "apple_onemap_D10.txt")
apple.D10.bins <- find_bins(apple.D10, exact = FALSE)
apple.D10.binned <- create_data_bins(apple.D10, apple.D10.bins)
apple.D10.segreg_test <- test_segregation(apple.D10.binned)
apple.D10.no_dist <- select_segreg(apple.D10.segreg_test, distorted = FALSE, numbers = TRUE)
apple.D10.twopts <- rf_2pts(apple.D10.binned, LOD = 4, max.rf = 0.4)
apple.D10.mrks.no_dist <- make_seq(apple.D10.twopts, c(apple.D10.no_dist))
apple.D10.LGs <- group(apple.D10.mrks.no_dist, LOD = 4, max.rf = 0.3)
set_map_fun(type="kosambi")
apple.D10.LG1 <- make_seq(apple.D10.LGs, 1)
apple.D10.LG1.ord <- order_seq(apple.D10.LG1, n.init=5, touchdown=TRUE)
(apple.D10.LG1.final <- make_seq(apple.D10.LG1.ord,"force"))

The heat map now has no white squares, since recombination frequency can be estimated between every pair of markers.

rf_graph_table(apple.D10.LG1.final, mrk.axis = "numbers")

This procedure, however, is similar to the two-way pseudo-testcross strategy, and thus only allows for observation of recombination in one parent, as can again be seen in the in progeny haplotypes:

apple.D10.progeny_haplot <- progeny_haplotypes(apple.D10.LG1.final, most_likely = FALSE, ind = c(13,14,15,17), group_names = "LG1")
plot(apple.D10.progeny_haplot, position = "split")