Introduction

Linkage maps are essential tools to several genetic endeavors such as quantitative trait loci (QTL) analysis, evolutionary studies, assessment of colinearity in chromosome-wise scale between genomes, and study of meiotic processes. The principle behind linkage map construction is detecting recombinant events between genomic positions and summarizing them into pairwise recombination fraction estimates. In diploids, the assessment of such a phenomenon is relatively straightforward. After the homolog duplication, sister chromatids pair up exchanging segments. The presence of informative markers, e. g. single nucleotide polymorphisms (SNPs) enable straightforward computation of the recombination fraction between pairs of genomic positions by comparing the chromosome constitution of parents and offspring. If the order of markers is unknown, it can be obtained using the pairwise recombination fractions in conjunction with optimization algorithms.

In polyploids (species with more than two sets of homolog chromosomes, or homologs), the construction of such maps is quite challenging. While in diploids a biallelic marker can be fully informative, in polyploids they just allow us to account for proportions of biallelic dosages. In order to recover the multiallelic information present in polyploid species we need to account for recombination frequencies, estimate phase configurations and reconstruct the haplotypes of both parents and individuals in the population. As the ploidy level increases, the joint computation of genomic regions becomes intensive, and other approaches such as dimension reduction need to be applied.

MAPpoly is an R package fully capable of building genetic linkage maps for biparental populations in polyploid species with ploidy level up to 12x. This package is part of the Genomic Tools for Sweetpotato Improvement (GT4SP), funded by Bill and Melinda Gates Foundation. All of the procedures used here are detailed in Mollinari and Garcia (2019) and (???) 2020. The results obtained with MAPpoly can be readily used for QTL mapping with the QTLpoly package, which implements the procedures proposed by Pereira et al. (2020).

Some advantages of MAPpoly are:

Can handle multiple dataset types
Can handle thousands of markers
Does not depend on single dose markers (SDM) to build the map
Incorporates genomic information
Explores multipoint information (through Hidden Markov Chain Model)
Can handle high ploidies: up to 8x when using HMM, and up to 12x when using the two-point approach
Optimized to run even without advanced computational resources
Can reconstruct haplotypes for parents and all individuals in the population
Recovers the multiallelic nature of polyploid genomes
Detects occurrence, location and frequency of multivalent pairing during meiosis
Robust enough to build genetic linkage maps with multiallelic markers (when available)

`MAPpoly` installation

MAPpoly is not yet available on CRAN, but you can install it from Github. Please notice that you need the latest R version (3.6 or higher) to install MAPpoly. Within R, you need to install the devtools package:

install.packages('devtools')

Then you can install and load MAPpoly with:

devtools::install_github('mmollina/mappoly')
require(mappoly)

Loading datasets

In its current version, MAPpoly can handle three different types of datasets:

CSV files
MAPpoly files
- Dosage based
- Probability based
VCF files

Please notice that both CSV and MAPpoly datasets are sensible to common errors, such as additional spaces, commas and wrong encoding (non-UTF-8). If you have any trouble, please double check your files before submitting an issue. Detailed steps of all supported files are given on the sections below. Also, a considerable proportion of redundant markers is expected on big datasets. As markers that have the same information does not provide any advantage during the mapping process, the redundant ones may be removed from the dataset in order to reduce computational effort. All reading functions share the elim.redundant argument that automatically identifies and removes redundant markers during the analysis. The redundant markers are always recovered and exported when the final map is ready.

Reading CSV files

The preparation of a CSV file for MAPpoly is quite straightforward. It can be done in Microsoft Excel or any other spreadsheet software of your preference. In this file, each line comprehends a marker and each column comprehends information about the marker. In its current version, MAPpoly can handle .csv files with allelic dosage data.

The first line of the CSV file should contain headers for all columns. The first five columns should contain the following information: marker name, dosage for both parents (one column for each), a sequence number (e.g. a chromosome number, if available) and a sequence position (e.g. the marker position within the chromosome, if available). In addition to these five headers, you should include the name of all individuals in the population. From the second line onwards, all columns should contain its values, including allelic dosages for all individuals. Missing or absent values should be represented by NA.

NOTE: If genomic information is not available, the ‘sequence’ and ‘sequence position’ columns should be filled with NA’s.

Example:

$Figure 1: Example of CSV data set$

Figure 1: Example of CSV data set

Important note: avoid spaces in .csv files. As mentioned above, please double check your datasets for extra spaces, commas, dots and encoding. Your CSV file should be encoded using UTF-8.

You can read CSV files with the read_geno_csv function:

ft="https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/tetra_solcap.csv"
tempfl <- tempfile()
download.file(ft, destfile = tempfl)
dat.dose.csv <- read_geno_csv(file.in  = tempfl, ploidy = 4)
#> Reading the following data:
#>     Ploidy level: 4
#>     No. individuals:  160
#>     No. markers:  4017
#>     No. informative markers:  4017 (100%)
#>     This dataset contains sequence information.
#>     ...
#>     Done with reading.
#>     Filtering non-conforming markers.
#>     ...
#>     Done with filtering.

dat.dose.csv

In addition to the CSV file path, you should indicate the ploidy level using the ploidy argument. This function automatically excludes monomorphic markers, keeping only informative ones. It also performs chi-square tests for all markers, considering the expected segregation patterns under Mendelian inheritance, random chromosome pairing and no double reduction. You can optionally use the filter.non.conforming logical argument (default = TRUE), which excludes non-expected genotypes under these assumptions.

Reading MAPpoly files

Besides CSV and VCF files, MAPpoly can also handle two dataset types that follow the same format: a genotype-based file (with allelic dosages) (1) and a probability-based file (2). Both are pure text files with the same header, but with different genotype table formats.

For both files the header should contain: ploidy level, number of individuals (nind), number of markers (nmrk), marker names (mrknames), individual names (indnames), allele dosages for parent 1 (dosageP), allele dosages for parent 2 (dosageQ), sequence/chromosome information (seq), position of each marker (seqpos), number of phenotypic traits (nphen) and the phenotypic data (pheno) if available. The header should be organized according to this example:

ploidy 4
nind 3
nmrk 5
mrknames M1 M2 M3 M4 M5
indnames Ind1 Ind2 Ind3
dosageP 0 2 0 0 3
dosageQ 1 2 1 1 3
seq 1 1 2 2 3
seqpos 100 200 50 150 80
nphen 0
pheno-----------------------
geno------------------------

For more information about MAPpoly file format, please see ?read_geno and ?read_geno_prob documentation from MAPpoly package.

Using `read_geno`

The header should be followed by a table containing the genotypes (allele dosages) for each marker (rows) and for each individual (columns), as follows:

	Individual 1	Individual 2	Individual 3
Marker 1	1	0	0
Marker 2	3	0	2
Marker 3	1	0	0
Marker 4	1	0	0
Marker 5	3	4	4

The final file should look like the example below:

ploidy 4
nind 3
nmrk 5
mrknames M1 M2 M3 M4 M5
indnames Ind1 Ind2 Ind3
dosageP 0 2 0 0 3
dosageQ 1 2 1 1 3
seq 1 1 2 2 3
seqpos 100 200 50 150 80
nphen 0
pheno-----------------------
geno------------------------
1 0 0
3 0 2
1 0 0
1 0 0
3 4 4

Then, use the read_geno function to read your file:

fl = "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/SolCAP_dosage"
tempfl <- tempfile()
download.file(fl, destfile = tempfl)
dat.dose.mpl <- read_geno(file.in  = tempfl, elim.redundant = TRUE)
#> Reading the following data:
#>     Ploidy level: 4
#>     No. individuals:  160
#>     No. markers:  4017
#>     No. informative markers:  4017 (100%)
#>     This dataset contains sequence information.
#>     ...
#> 
#>     Done with reading.
#>     Filtering non-conforming markers.
#>     ...
#>     Performing chi-square test.
#>     ...
#>     Done.

dat.dose.mpl

This function automatically excludes monomorphic markers, keeping only informative ones. It also performs chi-square tests for all markers, considering the expected segregation patterns under Mendelian inheritance, random chromosome pairing and no double reduction. You can optionally use the filter.non.conforming logical argument (default = TRUE), which excludes non-expected genotypes under these assumptions.

Using `read_geno_prob`

Following the same header described before should be a table containing the probability distribution for each combination of marker $\times$ individual. Each line of this table represents the combination of one marker with one individual, and its respective probabilities of having each possible allele dosage. The first two columns represent the marker and the individual, respectively, and the remaining elements represent the probability associated with each one of the possible dosages, as follows:

Marker	Individual	$p(d=0)$	$p(d=1)$	$p(d=2)$	$p(d=3)$	$p(d=4)$
M1	Ind1	0.5	0.5	0.0	0.0	0.0
M2	Ind1	0.0	1.0	0.0	0.0	0.0
M3	Ind1	0.3	0.7	0.0	0.0	0.0
M4	Ind1	0.5	0.5	0.0	0.0	0.0
M5	Ind1	0.0	0.0	0.0	0.9	0.1
M1	Ind2	1.0	0.0	0.0	0.0	0.0
M2	Ind2	0.2	0.5	0.3	0.0	0.0
M3	Ind2	0.9	0.1	0.0	0.0	0.0
M4	Ind2	0.9	0.1	0.0	0.0	0.0
M5	Ind2	0.0	0.0	0.0	0.2	0.8
M1	Ind3	0.2	0.8	0.0	0.0	0.0
M2	Ind3	0.4	0.6	0.0	0.0	0.0
M3	Ind3	1.0	0.0	0.0	0.0	0.0
M4	Ind3	0.0	0.1	0.9	0.0	0.0
M5	Ind3	0.1	0.9	0.0	0.0	0.0

Please notice that each marker $\times$ individual combination have $m+1$ associated probabilities, being $m$ the ploidy level and $m+1$ the number of possible allele dosages. Also, each line sum must be 1. The final file (header + table) should look like this:

ploidy 4
nind 3
nmrk 5
mrknames M1 M2 M3 M4 M5
indnames Ind1 Ind2 Ind3
dosageP 0 2 0 0 3
dosageQ 1 2 1 1 3
seq 1 1 2 2 3
seqpos 100 200 50 150 80
nphen 0
pheno-----------------------
geno------------------------
M1 Ind1 0.5 0.5 0.0 0.0 0.0
M2 Ind1 0.0 1.0 0.0 0.0 0.0
M3 Ind1 0.3 0.7 0.0 0.0 0.0
M4 Ind1 0.5 0.5 0.0 0.0 0.0
M5 Ind1 0.0 0.0 0.0 0.9 0.1
M1 Ind2 1.0 0.0 0.0 0.0 0.0
M2 Ind2 0.2 0.5 0.3 0.0 0.0
M3 Ind2 0.9 0.1 0.0 0.0 0.0
M4 Ind2 0.9 0.1 0.0 0.0 0.0
M5 Ind2 0.0 0.0 0.0 0.2 0.8
M1 Ind3 0.2 0.8 0.0 0.0 0.0
M2 Ind3 0.4 0.6 0.0 0.0 0.0
M3 Ind3 1.0 0.0 0.0 0.0 0.0
M4 Ind3 0.0 0.1 0.9 0.0 0.0
M5 Ind3 0.1 0.9 0.0 0.0 0.0

Once the file is ready, use the function read_geno_prob to read it:

ft="https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/SolCAP"
tempfl <- tempfile()
download.file(ft, destfile = tempfl)
dat.prob.mpl <- read_geno_prob(file.in  = tempfl, prob.thres = 0.95, elim.redundant = TRUE)
#> Reading the following data:
#>     Ploidy level: 4
#>     No. individuals:  160
#>     No. markers:  4017
#>     No. informative markers:  4017 (100%)
#>     This dataset contains sequence information.
#>     ...
#>     Done with reading.
#>     Filtering non-conforming markers.
#>     ...
#>     Performing chi-square test.
#>     ...
#>     Done.

dat.prob.mpl

Important note: as this type of file contains the probability distribution of all possible dosages, it may take longer to read.

This function automatically excludes monomorphic markers, keeping only informative ones. You can define the minimum probability value necessary to call a dosage using the prob.thres argument. If the higher probability for a marker $\times$ individual passes this threshold, then its associated dosage is used. However, if none of the probabilities reach this threshold, then its dosage is considered missing (NA). This function also performs chi-square tests for all markers, considering the expected segregation patterns under Mendelian inheritance, random chromosome pairing and no double reduction. You can optionally use the filter.non.conforming logical argument (default = TRUE), which excludes non-expected genotypes under these assumptions.

Reading VCF files

VCF files are less sensible to errors, because they are usually produced by automated SNP calling pipelines and less susceptible to user edition. MAPpoly can also handle VCF files of version 4.0 and higher produced by the most common softwares, such as TASSEL, GATK, Stacks and many others. As few of these softwares can handle poliploidy and estimate genotypes correctly, you may use other softwares that are dedicated to estimate the allele dosages. Briefly, these softwares model the ratio between allele read counts (or intensity) for each marker $\times$ individual combination, and determines which is the most probable allele dosage given the observed ratio and other a priori information. Examples of these softwares are SuperMASSA, fitPoly, ClusterCall, updog, PolyRAD and many others. After allele dosage estimation, your VCF file should contain GT values like 1/1/1/0 (for an autotetraploid) rather than 1/0. Since MAPpoly uses allelic dosages (or their probabilities) to build genetic maps, we strongly recommend that you use one of these softwares to estimate allele dosages before building the map. Both PolyRAD and updog have direct integration with MAPpoly, as described in the next section.

To demonstrate the read_vcf function, lets download an autohexaploid sweetpotato VCF file from the MAPpoly’s vignettes repository on Github and read it:

download.file("https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/BT/sweetpotato_chr1.vcf.gz", destfile = 'chr1.vcf.gz')
dat.dose.vcf = read_vcf(file = 'chr1.vcf.gz', parent.1 = "PARENT1", parent.2 = "PARENT2")
#> Reading data...
#> Scanning file to determine attributes.
#> File attributes:
#>   meta lines: 8
#>   header_line: 9
#>   variant count: 7301
#>   column count: 326
#> 
Meta line 8 read in.
#> All meta lines processed.
#> gt matrix initialized.
#> Character matrix gt created.
#>   Character matrix gt rows: 7301
#>   Character matrix gt cols: 326
#>   skip: 0
#>   nrows: 7301
#>   row_num: 0
#> 
Processed variant 1000
Processed variant 2000
Processed variant 3000
Processed variant 4000
Processed variant 5000
Processed variant 6000
Processed variant 7000
Processed variant: 7301
#> All variants processed
#> Processing genotypes...Done!
#> Selected ploidy: 6 
#> Done!
#> Read the following data:
#>     Ploidy level: 6
#>     No. individuals:  315
#>     No. markers:  5514
#>     No. informative markers:  5514 (100%)
#>     This dataset contains sequence information.
#>     ...
#>     Done with reading.
#>     Filtering non-conforming markers.
#>     ...
#>     Performing chi-square test.
#>     ...
#>     Done.

dat.dose.vcf

Besides the path to your VCF file, you should indicate parent.1 and parent.2 names. Please notice that their names must be exactly the same strings that appear in your VCF file. The ploidy level will be automatically detected, but you may indicate it using the optional ploidy argument to let the function check for possible errors. For species with variable ploidy levels (i.e. sugarcane), please indicate the desired ploidy level using the ploidy argument; if absent, MAPpoly will use the smallest ploidy level detected.

This function also has options to filter out undesired markers or data points, such as those with low depth or high proportion of missing data. You can define the following filter arguments: set min.av.depth to any integer level in order to remove markers that show average depth below this value (default = 0); set min.gt.depth to any integer level in order to remove data points that present depth below this value (default = 0); set max.missing to any value between 0 and 1 in order to remove markers that present missing data proportion above this value (default = 1).

read_vcf performs chi-square tests for all markers, considering the expected segregation patterns under Mendelian inheritance, random chromosome pairing and no double reduction. You can optionally use the filter.non.conforming logical argument (default = TRUE), which excludes non-expected genotypes under these assumptions. The p-value threshold used by the segregation test can be defined by the thresh.line argument.

Please notice that the returning object from read_vcf has some additional information: reference and alternative alleles (bases) for each marker; and average depth of each marker. You can inspect all marker depths using the following code as an example:

library(ggplot2)
dosage_combs = cbind(dat.dose.vcf$dosage.p, dat.dose.vcf$dosage.q)
dc_simplex = apply(dosage_combs,1,function(x) if(all(c(0,1) %in% x) | all(c(dat.dose.vcf$m-1, dat.dose.vcf$m) %in% x)) return(TRUE) else return(FALSE))
dc_dsimplex = apply(dosage_combs,1,function(x) if(all(x == c(1,1)) | all(x == c(dat.dose.vcf$m-1, dat.dose.vcf$m-1))) return(TRUE) else return(FALSE))

dc_simplex[which(dc_simplex == TRUE)] = "simplex"
dc_simplex[which(dc_dsimplex == TRUE)] = 'double simplex'
dc_simplex[which(dc_simplex == 'FALSE')] = 'multiplex'

data_depths = data.frame('Marker depths' = dat.dose.vcf$all.mrk.depth,
                         'Depth classes' = findInterval(dat.dose.vcf$all.mrk.depth, c(200,300,400,500,600,50000)),
                         'Dosage combinations' = dc_simplex, check.names = F)

ggplot(data_depths, aes(fill=`Dosage combinations`, x=`Depth classes`, y=`Marker depths`)) +
  geom_bar(position = 'stack', stat = 'identity') +
  scale_x_continuous(breaks=0:5, labels=c(">200","200-300","300-400","400-500","500-600", ">600"))

Importing data from third party packages

As mentioned above, polyploid sequencing data must be used to estimate allelic dosages prior to linkage map building. The R package PolyRAD has its own function to export genotypes to the MAPpoly’s genotype probability distribution format. One may use the commands above to import from PolyRAD:

# load example dataset from polyRAD
library(polyRAD)
data(exampleRAD_mapping)
exampleRAD_mapping = SetDonorParent(exampleRAD_mapping, "parent1")
exampleRAD_mapping = SetRecurrentParent(exampleRAD_mapping, "parent2")
exampleRAD_mapping = PipelineMapping2Parents(exampleRAD_mapping)
#> Making initial parameter estimates...
#> Generating sampling permutations for allele depth.
#> Updating priors using linkage...
#> Done.

# export to MAPpoly
outfile2 = tempfile()
Export_MAPpoly(exampleRAD_mapping, file = outfile2)

# Read in MAPpoly
mydata_polyrad = read_geno_prob(outfile2)
#> Reading the following data:
#>     Ploidy level: 2
#>     No. individuals:  100
#>     No. markers:  4
#>     No. informative markers:  4 (100%)
#>     This dataset contains sequence information.
#>     ...
#>     Done with reading.
#>     Filtering non-conforming markers.
#>     ...
#>     Performing chi-square test.
#>     ...
#>     Done.

mydata_polyrad

You can also use the MAPpoly’s function import_from_updog to import any dataset generated by updog’s function multidog, following the example below:

# Load example dataset from updog
library(updog)
data(uitdewilligen)
mout = multidog(refmat = t(uitdewilligen$refmat), 
                sizemat = t(uitdewilligen$sizemat), 
                ploidy = uitdewilligen$ploidy, 
                model = "f1",
                p1_id = colnames(t(uitdewilligen$sizemat))[1],
                p2_id = colnames(t(uitdewilligen$sizemat))[2],
                nc = 4)
#>     |                                   *.#,%    
#>    |||                                 *******/  
#>  |||||||    (**..#**.                  */   **/  
#> |||||||||    */****************************/*%   
#>    |||    &****..,*.************************/    
#>    |||     (....,,,*,...****%********/(******    
#>    |||                ,,****%////,,,,./.****/    
#>    |||                  /**//         .*///....  
#>    |||                  .*/*/%#         .,/   ., 
#>    |||               , **/   #%         .*    .. 
#>    |||                               ,,,*        
#> 
#> Working on it...done!
mydata_updog = import_from_updog(mout)

mydata_updog

Please notice that updog removes both sequence and sequence position information that may be present in the VCF file. We highly recommend that you use this information during the linkage map building, when available.

Combining multiple datasets

It is not rare to have two or more datasets regarding the same population or individuals from different sources of molecular data, such as SNP chips, GBS and/or microsatellites. MAPpoly can combine datasets when individual names are the same, using the function merge_datasets. To demonstrate its functionality, lets download two VCF files (autohexaploid sweetpotato) from the MAPpoly vignettes repository on Github, and read them using read_vcf function:

# Downloading VCF files regarding chromosome 1 and 2
download.file("https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/BT/sweetpotato_chr1.vcf.gz", destfile = 'chr1.vcf.gz')
download.file("https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/BT/sweetpotato_chr2.vcf.gz", destfile = 'chr2.vcf.gz')
data1 = read_vcf(file = 'chr1.vcf.gz', parent.1 = "PARENT1", parent.2 = "PARENT2")
#> Reading data...
#> Scanning file to determine attributes.
#> File attributes:
#>   meta lines: 8
#>   header_line: 9
#>   variant count: 7301
#>   column count: 326
#> 
Meta line 8 read in.
#> All meta lines processed.
#> gt matrix initialized.
#> Character matrix gt created.
#>   Character matrix gt rows: 7301
#>   Character matrix gt cols: 326
#>   skip: 0
#>   nrows: 7301
#>   row_num: 0
#> 
Processed variant 1000
Processed variant 2000
Processed variant 3000
Processed variant 4000
Processed variant 5000
Processed variant 6000
Processed variant 7000
Processed variant: 7301
#> All variants processed
#> Processing genotypes...Done!
#> Selected ploidy: 6 
#> Done!
#> Read the following data:
#>     Ploidy level: 6
#>     No. individuals:  315
#>     No. markers:  5514
#>     No. informative markers:  5514 (100%)
#>     This dataset contains sequence information.
#>     ...
#>     Done with reading.
#>     Filtering non-conforming markers.
#>     ...
#>     Performing chi-square test.
#>     ...
#>     Done.
data2 = read_vcf(file = 'chr2.vcf.gz', parent.1 = "PARENT1", parent.2 = "PARENT2")
#> Reading data...
#> Scanning file to determine attributes.
#> File attributes:
#>   meta lines: 8
#>   header_line: 9
#>   variant count: 5550
#>   column count: 326
#> 
Meta line 8 read in.
#> All meta lines processed.
#> gt matrix initialized.
#> Character matrix gt created.
#>   Character matrix gt rows: 5550
#>   Character matrix gt cols: 326
#>   skip: 0
#>   nrows: 5550
#>   row_num: 0
#> 
Processed variant 1000
Processed variant 2000
Processed variant 3000
Processed variant 4000
Processed variant 5000
Processed variant: 5550
#> All variants processed
#> Processing genotypes...Done!
#> Selected ploidy: 6 
#> Done!
#> Read the following data:
#>     Ploidy level: 6
#>     No. individuals:  315
#>     No. markers:  4348
#>     No. informative markers:  4348 (100%)
#>     This dataset contains sequence information.
#>     ...
#>     Done with reading.
#>     Filtering non-conforming markers.
#>     ...
#>     Performing chi-square test.
#>     ...
#>     Done.

As we can see, both have different markers for the same population:

# See datasets
print(data1)
#> This is an object of class 'mappoly.data'
#>     Ploidy level:                            6 
#>     No. individuals:                         315 
#>     No. markers:                             4799 
#>     Missing data:                            6.58%
#>     Redundant markers:                       12.97%
#> 
#>     This dataset contains sequence information.
#>     ----------
#>     No. of markers per dosage combination in both parents:
#>     P1 P2 freq
#>      0  1 1409
#>      0  2  214
#>      0  3   40
#>      0  4   10
#>      0  5    2
#>      1  0  508
#>      1  1  382
#>      1  2  255
#>      1  3  138
#>      1  4   45
#>      1  5    7
#>      2  0  189
#>      2  1  277
#>      2  2  302
#>      2  3  199
#>      2  4  102
#>      2  5    4
#>      3  0   56
#>      3  1  139
#>      3  2  227
#>      3  3  125
#>      3  4   20
#>      4  0    7
#>      4  1   60
#>      4  2   67
#>      4  3    8
#>      5  0    1
#>      5  1    5
#>      5  3    1
print(data2)
#> This is an object of class 'mappoly.data'
#>     Ploidy level:                            6 
#>     No. individuals:                         315 
#>     No. markers:                             3786 
#>     Missing data:                            7.28%
#>     Redundant markers:                       12.93%
#> 
#>     This dataset contains sequence information.
#>     ----------
#>     No. of markers per dosage combination in both parents:
#>     P1 P2 freq
#>      0  1 1015
#>      0  2  170
#>      0  3   85
#>      0  4   30
#>      0  5    3
#>      1  0  461
#>      1  1  289
#>      1  2  184
#>      1  3  153
#>      1  4   57
#>      1  5   10
#>      1  6    1
#>      2  0  190
#>      2  1  221
#>      2  2  174
#>      2  3  131
#>      2  4   95
#>      2  5    5
#>      3  0   47
#>      3  1  130
#>      3  2  138
#>      3  3   60
#>      3  4    5
#>      4  0    9
#>      4  1   52
#>      4  2   63
#>      4  3    2
#>      4  4    1
#>      5  1    5

Now lets merge them and see the output:

# Merge datasets
merged_data = merge_datasets(data1, data2)

print(merged_data)
#> This is an object of class 'mappoly.data'
#>     Ploidy level:                            6 
#>     No. individuals:                         315 
#>     No. markers:                             8585 
#>     Missing data:                            6.89%
#>     Redundant markers:                       12.95%
#> 
#>     This dataset contains sequence information.
#>     ----------
#>     No. of markers per dosage combination in both parents:
#>     P1 P2 freq
#>      0  1 2424
#>      0  2  384
#>      0  3  125
#>      0  4   40
#>      0  5    5
#>      1  0  969
#>      1  1  671
#>      1  2  439
#>      1  3  291
#>      1  4  102
#>      1  5   17
#>      1  6    1
#>      2  0  379
#>      2  1  498
#>      2  2  476
#>      2  3  330
#>      2  4  197
#>      2  5    9
#>      3  0  103
#>      3  1  269
#>      3  2  365
#>      3  3  185
#>      3  4   25
#>      4  0   16
#>      4  1  112
#>      4  2  130
#>      4  3   10
#>      4  4    1
#>      5  0    1
#>      5  1   10
#>      5  3    1

Notice that all markers of both datasets were merged successfully, which allows using just one (merged) dataset in the following steps of map construction.

Exploratory Analysis

Whole dataset

For didatic purposes, we will keep using the tetraploid potato array data (loaded using the examples above). We will construct a genetic map of the B2721 population, which is a cross between two tetraploid potato varieties: Atlantic and B1829-5. The population comprises 160 offsprings genotyped with the SolCAP Infinium 8303 potato array. The dataset also contains the genomic order of the SNPs from the Solanum tuberosum genome version 4.03. The genotype calling was performed with fitPoly R package using this pipeline. Another option would be to use ClusterCall and this pipeline.

Once the data is loaded, you can explore the dataset using the print function:

print(dat.dose.mpl, detailed = TRUE)
#> This is an object of class 'mappoly.data'
#>     Ploidy level:                            4 
#>     No. individuals:                         160 
#>     No. markers:                             3679 
#>     Missing data:                            3.18%
#>     Redundant markers:                       8.41%
#> 
#>     ----------
#>     No. markers per sequence:
#>        seq No.mrk
#>          1    391
#>         10    224
#>         11    252
#>         12    215
#>          2    269
#>          3    290
#>          4    387
#>          5    275
#>          6    394
#>          7    345
#>          8    263
#>          9    282
#>     ----------
#>     Markers with no sequence information: 92
#>     ----------
#>     No. of markers per dosage combination in both parents:
#>     P1 P2 freq
#>      0  1  332
#>      0  2  169
#>      0  3   24
#>      1  0  339
#>      1  1  307
#>      1  2  255
#>      1  3  123
#>      1  4   12
#>      2  0  166
#>      2  1  212
#>      2  2  336
#>      2  3  195
#>      2  4   49
#>      3  0   41
#>      3  1  110
#>      3  2  202
#>      3  3  202
#>      3  4  263
#>      4  1    9
#>      4  2   85
#>      4  3  248

This function outputs information about the dataset including the ploidy level, total number of individuals, total number of markers, number of informative markers, proportion of missing data and redundant markers. If detailed = TRUE, the function also outputs the number of markers in each sequence, if available, and the number of markers contained in all possible dosage combinations between both parents.

You can also explore the dataset visually using the plot function:

plot(dat.dose.mpl)

The output figure shows a bar plot on the left-hand side with the number of markers contained in each allele dosage combination between both parents. The right labels indicate allele dosages for Parent 1 and Parent 2, respectively. The upper-right plot contains the $\log_{10}(p-value)$ from $\chi^2$ tests for all markers, considering the expected segregation patterns under Mendelian inheritance. The lower-right plot contains a graphical representation of the allele dosages and missing data distribution for all markers and individuals. Finally, the bottom-right graphic shows the proportion of redundant markers in the dataset, when available.

Marker-specific

If you want to view a specific marker information, use the plot_mrk_info function. You should indicate your dataset object using the input.data argument, and the desired marker using the mrk argument. You can indicate the marker using its number or its name (string):

# For a dosage-based data analysis of marker 104
plot_mrk_info(input.data = dat.dose.mpl, mrk = 240)

# For a probability-based data analysis of the marker solcap_snp_c1_13686
plot_mrk_info(input.data = dat.prob.mpl, mrk = 'solcap_snp_c1_13686')

When applied to a dosage-based dataset, the function outputs a figure showing: marker name and position in the dataset, allele dosage in parents 1 and 2, proportion of missing data, p-value of the associated $\chi^2$ test for Mendelian segregation, sequence and position information (when available). The figure also contains a plot with the allele dosage and missing data distibution in the population.

When applied to a probability-baed dataset, the function outputs the probability threshold and a 3-dimensional plot containing the probability distribution for each allele dosage considering all individuals.

Filtering and Quality Control

In order to build a good genetic map, good quality data is desired to guarantee reliable estimates of recombination fractions, linkage phases and haplotypes. High proportions of messy data, such as unexpected segregation patterns and missing data, will reduce the reliability of these estimates. Furthermore, sequencing technologies are able to produce hundreds of thousands of markers, and a considerable proportion of redundant markers is expected. Markers that carry the same information are not informative and must be removed from the dataset, in order to reduce computational effort. MAPpoly handle some filtering functions that are described in the sections below.

Missing data filtering

MAPpoly is able to filter markers and/or individuals that exceeds a defined threshold for missing data. The function filter_missing does that and creates or updates your dataset according to its arguments’ values. The argument input.data should contain your dataset object, and you can choose to filter either by ‘marker’ or ‘individual’ using the type argument (string). You can also define the maximum proportion of missing data using the filter.thres argument (ranges from 0 to 1, i.e. a threshold of 0.2 will keep just markers or individuals with less than 20% of missing data). When TRUE (default), the inter argument plots markers or individuals vs. frequency of missing data.

# Filtering dataset by marker
dat.filt.mrk <- filter_missing(input.data = dat.dose.mpl, type = "marker", 
                               filter.thres = 0.2, inter = TRUE)

print(dat.filt.mrk)
#> This is an object of class 'mappoly.data'
#>     Ploidy level:                            4 
#>     No. individuals:                         160 
#>     No. markers:                             3640 
#>     Missing data:                            2.94%
#>     Redundant markers:                       8.5%
#> 
#>     This dataset contains sequence information.
#>     ----------
#>     No. of markers per dosage combination in both parents:
#>     P1 P2 freq
#>      0  1  332
#>      0  2  169
#>      0  3   24
#>      1  0  337
#>      1  1  299
#>      1  2  253
#>      1  3  121
#>      1  4   12
#>      2  0  165
#>      2  1  205
#>      2  2  332
#>      2  3  195
#>      2  4   48
#>      3  0   41
#>      3  1  107
#>      3  2  200
#>      3  3  198
#>      3  4  262
#>      4  1    9
#>      4  2   85
#>      4  3  246

# Filtering dataset by individual
dat.filt.ind <- filter_missing(input.data = dat.filt.mrk, type = "individual", 
                               filter.thres = 0.1, inter = TRUE)

print(dat.filt.ind)
#> This is an object of class 'mappoly.data'
#>     Ploidy level:                            4 
#>     No. individuals:                         153 
#>     No. markers:                             3640 
#>     Missing data:                            2.4%
#>     Redundant markers:                       8.5%
#> 
#>     This dataset contains sequence information.
#>     ----------
#>     No. of markers per dosage combination in both parents:
#>     P1 P2 freq
#>      0  1  332
#>      0  2  169
#>      0  3   24
#>      1  0  337
#>      1  1  299
#>      1  2  253
#>      1  3  121
#>      1  4   12
#>      2  0  165
#>      2  1  205
#>      2  2  332
#>      2  3  195
#>      2  4   48
#>      3  0   41
#>      3  1  107
#>      3  2  200
#>      3  3  198
#>      3  4  262
#>      4  1    9
#>      4  2   85
#>      4  3  246

dat.dose.filt <- dat.filt.ind

In this dataset, just 43 markers presented a proportion of missing data above the defined threshold, while 16 individuals exceeded the defined threshold. Then we will use the final filtered dataset during the rest of the analysis. Please notice that the function read_vcf also provides parameters to filter out markers depending on their average depth and missing data, as well as removes data points that do not reach a minimum pre-defined depth value. Check the read_vcf section for more information.

Segregation test

Another very important point to consider is the expected marker segregation pattern under Mendelian inheritance. Markers with messy or distorted segregation can produce unreliable estimates and may be removed (at least for a while) from the dataset. A good test for that is the chi-square ($\chi^2$), which basically matches expected genotype frequencies against observed frequencies and calculates the associated p-value. In order to define the p-value threshold for the tests, we will use the Bonferroni correction:

\[\alpha_{thres} = \frac{\alpha}{\#markers}\]

We will also assume that only random chromosome bivalent pairing occurs and there is no double reduction.

pval.bonf <- 0.05/dat.dose.filt$n.mrk
mrks.chi.filt <- filter_segregation(dat.dose.filt, chisq.pval.thres =  pval.bonf, inter = TRUE)
seq.init <- make_seq_mappoly(mrks.chi.filt)

Please notice that filter_segregation does not produce a filtered dataset; it just tells you which markers follow the expected Mendelian segregation pattern. To select these markers from your dataset, you may use the make_seq_mappoly function.

plot(seq.init)

It is worth to mention that all redundant markers identified during data reading step are stored in the main dataset. The redundant markers are automatically removed and can be added back once the maps are finished using the function update_map (described in details later during this tutorial).

Two-point analysis

Once the markers are filtered and selected, we need to compute the pairwise recombination fraction between all of them (two-point analysis). First, let us load the genotype counts ($\zeta_{\mbox{T}_{k},\mbox{T}_{k^{\prime}}}(l_{P}, l_{Q})$) defined in equation 20 in Mollinari and Garcia, 2019. This object is fundamental to perform the dimension reduction of the transition space.

counts <- cache_counts_twopt(input.seq = seq.init, cached = TRUE)

counts

When cached = TRUE, the function loads all counts from a internal file instead of performing all calculations again.

Next, the function est_pairwise_rf is used to estimate all the pairwise recombination fractions between markers in the sequence provided. Since the output object is too big to be fully displayed on the screen, MAPpoly shows a summary. Please notice that parallel computation is available and, if you want to use of that, you need to define the available number of cores in your machine and also guarantee that you have sufficient RAM memory for that. Remember that it is always good to leave one core available for the system, and be aware that this step will take a while to compute if you have few cores available.

# Defining number of cores
n.cores = parallel::detectCores() - 1

#(~ 9.5 minutes using 14 cores)
all.rf.pairwise <- est_pairwise_rf(input.seq = seq.init, 
                                   count.cache = counts, 
                                   n.clusters = n.cores)
#> INFO: Using  15  CPUs for calculation.
#> INFO: Done with 6619341  pairs of markers 
#> INFO: Calculation took: 604.809 seconds

all.rf.pairwise
#>   This is an object of class 'poly.est.two.pts.pairwise'
#>   -----------------------------------------------------
#>   No. markers:                             3639 
#>   No. estimated recombination fractions:   5871116 (88.7%)
#>   -----------------------------------------------------

To assess the recombination fraction between a particular pair of markers, say markers 93 and 98, we use the following syntax:

all.rf.pairwise$pairwise$`93-98`
#>        LOD_ph         rf       LOD_rf
#> 2-2  0.000000 0.05809366 3.8104528865
#> 1-2 -3.811249 0.49995416 0.0007963287
plot(all.rf.pairwise, first.mrk = 93, second.mrk = 98)

In this case, 93-98 represents the position of the markers in the filtered data set. The name of the rows have the form x-y, where x and y indicate how many homologous chromosomes share the same allelic variant in parents $P1$ and $P2$, respectively (see Mollinari and Garcia, 2019 for notation). The first column indicates the LOD Score in relation to the most likely linkage phase configuration. The second column shows the estimated recombination fraction for each configuration, and the third indicates the LOD Score comparing the likelihood under no linkage ($r = 0.5$) with the estimated recombination fraction (evidence of linkage).

Assembling recombination fraction and LOD Score matrices

Recombination fraction and LOD Score matrices are fundamental in genetic mapping. Later in this tutorial, we will use these matrices as the basic information to order markers and also to perform some diagnostics. To convert the two-point object into recombination fraction and LOD Score matrices, we need to assume thresholds for the three columns observed in the previous output. The arguments thresh.LOD.ph and thresh.LOD.rf set LOD Scores thresholds for the second most likely linkage phase configuration and recombination fraction. Here we assume thresh.LOD.ph = 0 and thresh.LOD.rf = 0, thus no matter how likely is the second best option, all the computed values will be considered. The argument thresh.rf = 0.5 indicates that the maximum accepted recombination fraction is 0.5. To convert these values in a recombination fraction matrix, we use the function rf_list_to_matrix.

mat <- rf_list_to_matrix(input.twopt = all.rf.pairwise, n.clusters = n.cores)
#> INFO: Using  15  CPUs.
#> INFO: Done with 6619341  pairs of markers 
#> INFO: Operation took: 32.024 seconds

Please notice that for big datasets, you may use the multi-core support to perform the conversions, using the parameter n.clusters to define the number of CPU’s. It is also possible to filter again for the parameters mentioned above, such as thresh.LOD.ph, thresh.LOD.rf and thresh.rf.

We can also plot this matrix using the reference genome order. For doing so, we use the function get_genomic_order to get the genomic order of the input sequence and use the resulting order to index the recombination fraction matrix. If the reference order is consistent with the marker order in this specific population, we should observe a block-diagonal matrix and within each sub-matrix, a monotonic pattern.

id<-get_genomic_order(seq.init)
plot(mat, ord = rownames(id), index = FALSE)

Figure 1: Example of CSV data set

As expected, we can observe the block-diagonal and monotonic patterns. In the previous case, the thresholds allowed to plot almost all points in the recombination fraction matrix. The empty cells in the matrix indicate markers where it is impossible to detect recombinant events using two-point estimates (e.g., between $1 \times 0$ and $0 \times 1$ marker). Yet, if the thresholds become more stringent (higher LODs and lower rf), the matrix becomes more sparse.

Assembling linkage groups

The function group_mappoly assign markers to linkage groups using the recombination fraction matrix obtained above. The user can provide an expected number of groups or run the interactive version of the function using inter = TRUE. Since in this data set we expect 12 linkage groups (basic chromosome number in potato), we use expected.groups = 12. If the data set provides the chromosome where the markers are located, the function allows comparing the groups obtained using the pairwise recombination fraction and the chromosome information provided using the comp.mat = TRUE.

grs <- group_mappoly(input.mat = mat,
                     expected.groups = 12,
                     comp.mat = TRUE, 
                     inter = TRUE)

grs

Here, we have the 3639 markers distributed in 12 linkage groups. The rows indicate linkage groups obtained using linkage information and the columns are the chromosomes in the reference genome. Notice the diagonal indicating the concordance between the two sources of information. Now, we can plot the resulting marker cluster analysis.

plot(grs)

Once the linkage groups are properly assembled, we use the function make_seq_mappoly to make marker sequences from the group analysis. We will assemble a list with 12 positions, each one containing the corresponding linkage group sequence. Also, we will use only markers allocated in the diagonal of the previous comparison matrix. Thus only markers that were assigned to a particular linkage group using both sources of information will be considered. We will also assemble a smaller two-point object using the functions make_pairs_mappoly and rf_snp_filter to facilitate further parallelization procedures.

LGS<-vector("list", 12)
for(j in 1:12){
    temp1 <- make_seq_mappoly(grs, j)
    temp2 <- get_genomic_order(temp1) # assembling sequence considering the genomic order
    lg.id <- as.numeric(names(which.max(table(temp2[,1]))))
    nm <- rownames(temp2)[which(temp2[,1] == lg.id)]
    temp3 <- make_seq_mappoly(dat.dose.filt, nm)
    tpt <- make_pairs_mappoly(all.rf.pairwise, input.seq = temp3)
    lgtemp <- rf_snp_filter(input.twopt = tpt)
    LGS[[lg.id]] <- list(lg = lgtemp, 
        tpt = make_pairs_mappoly(all.rf.pairwise, input.seq = lgtemp))
}
#> The following markers were also removed, but presented one or more recombination fractions below 0.15 : solcap_snp_c2_10820 solcap_snp_c1_3557 solcap_snp_c2_10798 solcap_snp_c1_3544 solcap_snp_c2_10771 solcap_snp_c1_3497 solcap_snp_c2_10566 solcap_snp_c1_3476 solcap_snp_c2_10547 
#> The following markers were also removed, but presented one or more recombination fractions below 0.15 : solcap_snp_c2_43408 solcap_snp_c2_15042 solcap_snp_c2_15065 solcap_snp_c1_4860 solcap_snp_c1_4881 solcap_snp_c2_47211 solcap_snp_c1_10593 solcap_snp_c2_35702 solcap_snp_c1_11459 
#> The following markers were also removed, but presented one or more recombination fractions below 0.15 : solcap_snp_c2_9203 solcap_snp_c2_9137 solcap_snp_c2_9172 solcap_snp_c1_14614

Now, let us print the recombination fraction matrices for each linkage group.

Estimating the map for a given order

In this section, we will use the marker order provided by the Solanum tuberosum genome version 4.03. The MDS ordering approach will be addressed later in this tutorial. The estimation of the genetic map for a given order involves the computation of recombination fraction between adjacent markers and also finding the linkage phase configuration of those markers in both parents. The core function to perform these tasks in MAPpoly is est_rf_hmm_sequential. This function uses the pairwise recombination fraction as the first source of information to sequentially position allelic variants in specific homologs. For situations where pairwise analysis has limited power, the algorithm relies on the likelihood obtained through a hidden Markov model (HMM) Mollinari and Garcia, 2019. Once all markers are positioned, the final map is reconstructed using the HMM multipoint algorithm.

Example of linkage phase configuration estimation using sequential search space reduction and HMM evaluation.

Several arguments are available to control the inclusion and phasing of the markers in the chain. The argument start.set defines the number of initial markers to be used in a exhaustive search for the most probable configuration. thres.twopt receives the threshold to whether when the linkage phases compared via two-point analysis should be considered, and the HMM analysis should not be used to infer the linkage phase (A. K. A. $\eta$ in Mollinari and Garcia, 2019). thres.hmm receives the threshold for keeping competing maps computed using HMM (if the two-point analysis was not enough) in the next round of marker insertion. extend.tail indicates the number of markers that should be considered at the end of the chain to insert a new marker. tol and tol.final receive the desired accuracy to estimate the sub-maps during the sequential phasing procedure and the desired accuracy in the final map. phase.number.limit receives the limit number of linkage phase configurations to be tested using HMM. info.tail is a logical argument: if TRUE it uses the complete informative tail (last markers in the chain that allow all homologous to be distinguished in the parents) of the chain to calculate the likelihood of the linkage phases.

First, as an example, let us estimate the map for linkage group 3. The values used for all arguments were obtained using a balance of processing speed and accuracy of the algorithm. As an exercise, it is interesting to try different values and check out the results. For now, let us stick with the following values (this step can take some hours to finish, depending on chosen parameters):

lg3.map <- est_rf_hmm_sequential(input.seq = LGS[[3]]$lg,
                                start.set = 10,
                                thres.twopt = 10, 
                                thres.hmm = 10,
                                extend.tail = 200,
                                info.tail = TRUE, 
                                twopt = LGS[[3]]$tpt,
                                sub.map.size.diff.limit = 10, 
                                phase.number.limit = 20,
                                reestimate.single.ph.configuration = TRUE,
                                tol = 10e-3,
                                tol.final = 10e-4)
#> Number of markers: 257
#> ═══════════════════════════════════════════════════════════════════════════════════ Initial sequence ══
#> 10 markers...
#> ●    Trying sequence: 1 2 3 4 5 6 7 8 9 10 :
#>        4 phase(s): . . . .
#> ═════════════════════════════════════════════════════════════════════════ Done with initial sequence ══
#> 11 /11 :(4.3%)   741 : 2 ph     (1/2)       -- tail: 10  |||| |●●|       |||| ||●● 
#> 12 /12 :(4.7%)   742 : 1 ph     (1/1)       -- tail: 11  ●●●● |●●●                     
#> 13 /13 :(5.1%)   744 : 4 ph     (1/4)       -- tail: 12  |||| ●|||  ...  |||| |●|| 
#> 14 /14 :(5.4%)   745 : 3 ph     (1/3)       -- tail: 13  ●|●● ●●||  ...  ●|●● ●|●| 
#> 15 /15 :(5.8%)   746 : 1 ph     (1/1)       -- tail: 14  ●|●● ●●||                     
#> 16 /16 :(6.2%)   747 : 1 ph     (1/1)       -- tail: 15  ●|●● ●●||                     
#> 17 /17 :(6.6%)   748 : 1 ph     (1/1)       -- tail: 16  |●|| ||●●                     
#> 18 /18 :(7%)     749 : 1 ph     (1/1)       -- tail: 17  |●|| ||●●                     
#> 19 /19 :(7.4%)   750 : 1 ph     (1/1)       -- tail: 18  |●|| ||●●                     
#> 20 /20 :(7.8%)   751 : 1 ph     (1/1)       -- tail: 19  ●|●● ●●||                     
#> 21 /21 :(8.2%)   752 : 1 ph     (1/1)       -- tail: 20  ●|●● ●●||                     
#> 22 /22 :(8.6%)   753 : 1 ph     (1/1)       -- tail: 21  |||| ||●●                     
#> 23 /23 :(8.9%)   754 : 4 ph     (1/4)       -- tail: 22  ●●|| ||●●  ...  ●|●| ||●● 
#> 24 /24 :(9.3%)   755 : 1 ph     (1/1)       -- tail: 23  ●|●● |●●●                     
#> 25 /25 :(9.7%)   758 : 1 ph     (1/1)       -- tail: 24  |||| ●|||                     
#> 26 /26 :(10.1%)  759 : 1 ph     (1/1)       -- tail: 25  ●●●● ||●●                     
#> 27 /27 :(10.5%)  760 : 1 ph     (1/1)       -- tail: 26  |||| |||●                     
#> 28 /28 :(10.9%)  761 : 16 ph    (1/16)      -- tail: 27  ●●|| ●|||  ...  ●●|| |●|| 
#> 29 /29 :(11.3%)  762 : 4 ph     (1/4)       -- tail: 28  ●●●| ●●●●  ...  ●●|● ●●●● 
#> 30 /30 :(11.7%)  763 : 2 ph     (1/2)       -- tail: 29  ●||| ||●|       ||●| ||●| 
#> 31 /31 :(12.1%)  764 : 1 ph     (1/1)       -- tail: 30  |||| |●||                     
#> 32 /32 :(12.5%)  765 : 1 ph     (1/1)       -- tail: 31  ●●●● ●|●●                     
#> 33 /33 :(12.8%)  766 : 1 ph     (1/1)       -- tail: 32  ||●● |●|●                     
#> 34 /34 :(13.2%)  767 : 2 ph     (1/2)       -- tail: 33  ||●| |●||       |||● |●|| 
#> 35 /35 :(13.6%)  768 : 1 ph     (1/1)       -- tail: 34  |||| |||●                     
#> 36 /36 :(14%)    769 : 1 ph     (1/1)       -- tail: 35  ●●|| ●|●|                     
#> 37 /37 :(14.4%)  770 : 1 ph     (1/1)       -- tail: 36  |●|| ●|||                     
#> 38 /38 :(14.8%)  771 : 1 ph     (1/1)       -- tail: 37  |●|| ●|||                     
#> 39 /39 :(15.2%)  772 : 1 ph     (1/1)       -- tail: 38  ●●●| ●●●●                     
#> 40 /40 :(15.6%)  773 : 1 ph     (1/1)       -- tail: 39  |||● ||||                     
#> 41 /41 :(16%)    774 : 1 ph     (1/1)       -- tail: 40  ●|●● |●●●                     
#> 42 /42 :(16.3%)  776 : 1 ph     (1/1)       -- tail: 41  ●|●| |●●|                     
#> 43 /43 :(16.7%)  777 : 1 ph     (1/1)       -- tail: 42  ●|●| |●●●                     
#> 44 /44 :(17.1%)  778 : 1 ph     (1/1)       -- tail: 43  |●|● ●||●                     
#> 45 /45 :(17.5%)  780 : 1 ph     (1/1)       -- tail: 44  ●|●● |●●●                     
#> 46 /46 :(17.9%)  782 : 1 ph     (1/1)       -- tail: 45  |||| |||●                     
#> 47 /47 :(18.3%)  783 : 1 ph     (1/1)       -- tail: 46  ●●●| ●●●●                     
#> 48 /48 :(18.7%)  784 : 1 ph     (1/1)       -- tail: 47  |●|| ●●||                     
#> 49 /49 :(19.1%)  785 : 1 ph     (1/1)       -- tail: 48  |||| |||●                     
#> 50 /50 :(19.5%)  786 : 1 ph     (1/1)       -- tail: 49  |||● ||||                     
#> 51 /51 :(19.8%)  787 : 1 ph     (1/1)       -- tail: 50  ●●●| ●●●●                     
#> 52 /52 :(20.2%)  788 : 1 ph     (1/1)       -- tail: 51  ●●●● |●●●                     
#> 53 /53 :(20.6%)  789 : 1 ph     (1/1)       -- tail: 52  ●●●| ●●●●                     
#> 54 /54 :(21%)    791 : 1 ph     (1/1)       -- tail: 53  |||● ||||                     
#> 55 /55 :(21.4%)  792 : 1 ph     (1/1)       -- tail: 54  |||| ||●|                     
#> 56 /56 :(21.8%)  794 : 1 ph     (1/1)       -- tail: 55  ●●●● ●●|●                     
#> 57 /57 :(22.2%)  796 : 1 ph     (1/1)       -- tail: 56  |||● ●|||                     
#> 58 /58 :(22.6%)  797 : 4 ph     (1/4)       -- tail: 57  ●||| ●|||  ...  |●|| ●||| 
#> 59 /59 :(23%)    798 : 1 ph     (1/1)       -- tail: 58  ||●| ●|||                     
#> 60 /60 :(23.3%)  799 : 1 ph     (1/1)       -- tail: 59  ||●| ●|||                     
#> 61 /61 :(23.7%)  800 : 1 ph     (1/1)       -- tail: 60  ●●|● |●●●                     
#> 62 /62 :(24.1%)  801 : 1 ph     (1/1)       -- tail: 61  ●●|● |●●●                     
#> 63 /63 :(24.5%)  802 : 1 ph     (1/1)       -- tail: 62  ||●| ●|||                     
#> 64 /64 :(24.9%)  803 : 3 ph     (1/3)       -- tail: 63  ||●| ●●||  ...  ||●| ●|●| 
#> 65 /65 :(25.3%)  804 : 1 ph     (1/1)       -- tail: 64  ||●| ●||●                     
#> 66 /66 :(25.7%)  805 : 1 ph     (1/1)       -- tail: 65  ||●| ●||●                     
#> 67 /67 :(26.1%)  806 : 1 ph     (1/1)       -- tail: 66  ●●|● |●●|                     
#> 68 /68 :(26.5%)  807 : 1 ph     (1/1)       -- tail: 67  |||● ||||                     
#> 69 /69 :(26.8%)  808 : 1 ph     (1/1)       -- tail: 68  ●●●| ●●●●                     
#> 70 /70 :(27.2%)  809 : 1 ph     (1/1)       -- tail: 69  ●●|| |●●|                     
#> 71 /71 :(27.6%)  810 : 1 ph     (1/1)       -- tail: 70  ||●| ●||●                     
#> 72 /72 :(28%)    811 : 1 ph     (1/1)       -- tail: 71  ●||| ||●|                     
#> 73 /73 :(28.4%)  812 : 1 ph     (1/1)       -- tail: 72  ●●|| |●●|                     
#> 74 /74 :(28.8%)  813 : 1 ph     (1/1)       -- tail: 73  ●●|● |●●|                     
#> 75 /75 :(29.2%)  814 : 1 ph     (1/1)       -- tail: 74  ●●●| ●●●●                     
#> 76 /76 :(29.6%)  815 : 1 ph     (1/1)       -- tail: 75  ●●|● |●●●                     
#> 77 /77 :(30%)    816 : 1 ph     (1/1)       -- tail: 76  ||●● ●|●●                     
#> 78 /78 :(30.4%)  817 : 1 ph     (1/1)       -- tail: 77  ||●| ●|||                     
#> 79 /79 :(30.7%)  818 : 1 ph     (1/1)       -- tail: 78  ●●|● |●●●                     
#> 80 /80 :(31.1%)  819 : 1 ph     (1/1)       -- tail: 79  ||●| ●|||                     
#> 81 /81 :(31.5%)  820 : 1 ph     (1/1)       -- tail: 80  |||| ||●●                     
#> 82 /82 :(31.9%)  821 : 1 ph     (1/1)       -- tail: 81  ||●● ●|||                     
#> 83 /83 :(32.3%)  822 : 1 ph     (1/1)       -- tail: 82  ●|●● ●●●|                     
#> 84 /84 :(32.7%)  823 : 1 ph     (1/1)       -- tail: 83  |||● ||||                     
#> 85 /85 :(33.1%)  824 : 1 ph     (1/1)       -- tail: 84  ||●| ●|||                     
#> 86 /86 :(33.5%)  825 : 1 ph     (1/1)       -- tail: 85  |●|| |●|●                     
#> 87 /87 :(33.9%)  826 : 1 ph     (1/1)       -- tail: 86  ||●| ●|||                     
#> 88 /88 :(34.2%)  827 : 1 ph     (1/1)       -- tail: 87  |●|| ||●●                     
#> 89 /89 :(34.6%)  828 : 1 ph     (1/1)       -- tail: 88  ||●| ●|||                     
#> 90 /90 :(35%)    829 : 1 ph     (1/1)       -- tail: 89  |●|| |●●●                     
#> 91 /91 :(35.4%)  831 : 1 ph     (1/1)       -- tail: 90  ||●| ●|||                     
#> 92 /92 :(35.8%)  832 : 1 ph     (1/1)       -- tail: 91  ●|●● ●|||                     
#> 93 /93 :(36.2%)  833 : 1 ph     (1/1)       -- tail: 92  ||●| ●|||                     
#> 94 /94 :(36.6%)  834 : 1 ph     (1/1)       -- tail: 93  ||●| ●|||                     
#> 95 /95 :(37%)    835 : 1 ph     (1/1)       -- tail: 94  ||●| ●|||                     
#> 96 /96 :(37.4%)  836 : 1 ph     (1/1)       -- tail: 95  ●●●● |●●|                     
#> 97 /97 :(37.7%)  837 : 1 ph     (1/1)       -- tail: 96  ||●● ||||                     
#> 98 /98 :(38.1%)  838 : 1 ph     (1/1)       -- tail: 97  ●|●● ●●|●                     
#> 99 /99 :(38.5%)  839 : 1 ph     (1/1)       -- tail: 98  ●●●● ●|●●                     
#> 100/100:(38.9%)  840 : 1 ph     (1/1)       -- tail: 99  ●●|| ●|●●                     
#> 101/101:(39.3%)  841 : 1 ph     (1/1)       -- tail: 100 |●|| ●|●●                     
#> 102/102:(39.7%)  842 : 1 ph     (1/1)       -- tail: 101 ●●|● ●●●●                     
#> 103/103:(40.1%)  843 : 1 ph     (1/1)       -- tail: 102 |●|| ●|●●                     
#> 104/104:(40.5%)  844 : 1 ph     (1/1)       -- tail: 103 ●●●● ●●|●                     
#> 105/105:(40.9%)  845 : 1 ph     (1/1)       -- tail: 104 |●●● ●●|●                     
#> 106/106:(41.2%)  846 : 1 ph     (1/1)       -- tail: 105 ●●●● |●●|                     
#> 107/107:(41.6%)  848 : 1 ph     (1/1)       -- tail: 106 ●●●● |●●|                     
#> 108/108:(42%)    849 : 1 ph     (1/1)       -- tail: 107 ||●| ||||                     
#> 109/109:(42.4%)  850 : 1 ph     (1/1)       -- tail: 108 |●|| ●|●●                     
#> 110/110:(42.8%)  851 : 1 ph     (1/1)       -- tail: 109 ●●|● ●●●●                     
#> 111/111:(43.2%)  852 : 1 ph     (1/1)       -- tail: 110 |||● ||||                     
#> 112/112:(43.6%)  853 : 1 ph     (1/1)       -- tail: 111 ||●| ||||                     
#> 113/113:(44%)    854 : 1 ph     (1/1)       -- tail: 112 ||●● ||||                     
#> 114/114:(44.4%)  855 : 1 ph     (1/1)       -- tail: 113 ●●|● ●●●●                     
#> 115/115:(44.7%)  856 : 1 ph     (1/1)       -- tail: 114 ●●|● ●●●●                     
#> 116/116:(45.1%)  857 : 1 ph     (1/1)       -- tail: 115 ●●|● |●●|                     
#> 117/117:(45.5%)  858 : 1 ph     (1/1)       -- tail: 116 ●●●● ●|●●                     
#> 118/118:(45.9%)  859 : 1 ph     (1/1)       -- tail: 117 |●|| |●●|                     
#> 119/119:(46.3%)  860 : 1 ph     (1/1)       -- tail: 118 |||● ||||                     
#> 120/120:(46.7%)  861 : 1 ph     (1/1)       -- tail: 119 ||●| ●||●                     
#> 121/121:(47.1%)  862 : 2 ph     (1/2)       -- tail: 120 ●|●| ●●●●       |●●| ●●●● 
#> 122/122:(47.5%)  863 : 1 ph     (1/1)       -- tail: 121 |●●| ●●●●                     
#> 123/123:(47.9%)  864 : 1 ph     (1/1)       -- tail: 122 ●|●● ●●●●                     
#> 124/124:(48.2%)  865 : 1 ph     (1/1)       -- tail: 123 ||●| ●●●●                     
#> 125/125:(48.6%)  866 : 1 ph     (1/1)       -- tail: 124 |●●| ●●●●                     
#> 126/126:(49%)    867 : 1 ph     (1/1)       -- tail: 125 |||● ||||                     
#> 127/127:(49.4%)  868 : 1 ph     (1/1)       -- tail: 126 |●|| |●●|                     
#> 128/128:(49.8%)  869 : 1 ph     (1/1)       -- tail: 127 ●●|● |●●|                     
#> 129/129:(50.2%)  870 : 1 ph     (1/1)       -- tail: 128 |●|| |●||                     
#> 130/130:(50.6%)  871 : 1 ph     (1/1)       -- tail: 129 |●●| ●●●●                     
#> 131/131:(51%)    872 : 1 ph     (1/1)       -- tail: 130 ●||| ||||                     
#> 132/132:(51.4%)  873 : 1 ph     (1/1)       -- tail: 131 |||| ||●|                     
#> 133/133:(51.8%)  874 : 1 ph     (1/1)       -- tail: 132 ||●| ●●|●                     
#> 134/134:(52.1%)  875 : 1 ph     (1/1)       -- tail: 133 |●●| ●|●●                     
#> 135/135:(52.5%)  876 : 1 ph     (1/1)       -- tail: 134 ●●●● ●|●●                     
#> 136/136:(52.9%)  877 : 1 ph     (1/1)       -- tail: 135 ||●| ●||●                     
#> 137/137:(53.3%)  878 : 1 ph     (1/1)       -- tail: 136 ||●| ●||●                     
#> 138/138:(53.7%)  879 : 1 ph     (1/1)       -- tail: 137 ●|●● ●||●                     
#> 139/139:(54.1%)  880 : 1 ph     (1/1)       -- tail: 138 |●●| ●●●●                     
#> 140/140:(54.5%)  881 : 1 ph     (1/1)       -- tail: 139 ||●| ●|●●                     
#> 141/141:(54.9%)  882 : 1 ph     (1/1)       -- tail: 140 |●|| |●●|                     
#> 142/142:(55.3%)  883 : 1 ph     (1/1)       -- tail: 141 |●|| |●●|                     
#> 143/143:(55.6%)  884 : 1 ph     (1/1)       -- tail: 142 ●|●● ●||●                     
#> 144/144:(56%)    885 : 1 ph     (1/1)       -- tail: 143 |●|| |●●|                     
#> 145/145:(56.4%)  886 : 1 ph     (1/1)       -- tail: 144 |||| |●||                     
#> 146/146:(56.8%)  887 : 1 ph     (1/1)       -- tail: 145 |●|| |●●|                     
#> 147/147:(57.2%)  888 : 1 ph     (1/1)       -- tail: 146 ●●|● |●●|                     
#> 148/148:(57.6%)  889 : 1 ph     (1/1)       -- tail: 147 ||●| ●||●                     
#> 149/149:(58%)    890 : 1 ph     (1/1)       -- tail: 148 |●|| |●●|                     
#> 150/150:(58.4%)  891 : 1 ph     (1/1)       -- tail: 149 ●|●● ●||●                     
#> 151/151:(58.8%)  892 : 1 ph     (1/1)       -- tail: 150 ●|●● ●||●                     
#> 152/152:(59.1%)  893 : 1 ph     (1/1)       -- tail: 151 ●●●● ●|●●                     
#> 153/153:(59.5%)  894 : 1 ph     (1/1)       -- tail: 152 ●|●● ●●|●                     
#> 154/154:(59.9%)  895 : 1 ph     (1/1)       -- tail: 153 |||| ●|●|                     
#> 155/155:(60.3%)  896 : 1 ph     (1/1)       -- tail: 154 ●|●● ●|●●                     
#> 156/156:(60.7%)  897 : 2 ph     (1/2)       -- tail: 155 ●●|| |●●|       |●|● |●●| 
#> 157/157:(61.1%)  898 : 1 ph     (1/1)       -- tail: 156 ||●| ●||●                     
#> 158/158:(61.5%)  899 : 1 ph     (1/1)       -- tail: 157 ||●| ●||●                     
#> 159/159:(61.9%)  900 : 1 ph     (1/1)       -- tail: 158 |||| ||●|                     
#> 160/160:(62.3%)  901 : 1 ph     (1/1)       -- tail: 159 ●|●● ●||●                     
#> 161/161:(62.6%)  902 : 1 ph     (1/1)       -- tail: 160 ●●●| ●●●●                     
#> 162/162:(63%)    903 : 1 ph     (1/1)       -- tail: 161 ●●|| ||●|                     
#> 163/163:(63.4%)  904 : 1 ph     (1/1)       -- tail: 162 |●|| ||●|                     
#> 164/164:(63.8%)  905 : 1 ph     (1/1)       -- tail: 163 |●|| ||||                     
#> 165/165:(64.2%)  906 : 1 ph     (1/1)       -- tail: 164 ●●|| |●||                     
#> 166/166:(64.6%)  907 : 1 ph     (1/1)       -- tail: 165 |||● ||||                     
#> 167/167:(65%)    910 : 1 ph     (1/1)       -- tail: 166 ●●|| |●●|                     
#> 168/168:(65.4%)  911 : 1 ph     (1/1)       -- tail: 167 |||● ||||                     
#> 169/169:(65.8%)  912 : 1 ph     (1/1)       -- tail: 168 |||| |||●                     
#> 170/170:(66.1%)  913 : 1 ph     (1/1)       -- tail: 169 |||● ||||                     
#> 171/171:(66.5%)  914 : 3 ph     (1/3)       -- tail: 170 ●||● ●|●|  ...  ●||● ●||● 
#> 172/172:(66.9%)  915 : 1 ph     (1/1)       -- tail: 171 |||| |||●                     
#> 173/173:(67.3%)  916 : 1 ph     (1/1)       -- tail: 172 |||● |||●                     
#> 174/174:(67.7%)  917 : 1 ph     (1/1)       -- tail: 173 ●||● ||●●                     
#> 175/175:(68.1%)  918 : 1 ph     (1/1)       -- tail: 174 ●●●● ●●|●                     
#> 176/176:(68.5%)  919 : 1 ph     (1/1)       -- tail: 175 ●||● ||●●                     
#> 177/177:(68.9%)  920 : 1 ph     (1/1)       -- tail: 176 ●||| |||●                     
#> 178/178:(69.3%)  921 : 1 ph     (1/1)       -- tail: 177 ||●| ●|||                     
#> 179/179:(69.6%)  922 : 2 ph     (1/2)       -- tail: 178 ●●|| |●●|       |●|● |●●| 
#> 179: not included (map extension)
#> 179/180:(70%)    923 : 1 ph     (1/1)       -- tail: 178 |||● |●●●                     
#> 180/181:(70.4%)  924 : 2 ph     (1/2)       -- tail: 179 |●●| ●|●|       |●●| ●||● 
#> 181: not included (map extension)
#> 180/182:(70.8%)  925 : 1 ph     (1/1)       -- tail: 179 |||| |||●                     
#> 181/183:(71.2%)  926 : 1 ph     (1/1)       -- tail: 180 |||| |●||                     
#> 182/184:(71.6%)  927 : 1 ph     (1/1)       -- tail: 181 ||●| ●●||                     
#> 183/185:(72%)    928 : 1 ph     (1/1)       -- tail: 182 |||| |●||                     
#> 184/186:(72.4%)  929 : 1 ph     (1/1)       -- tail: 183 |||| |●●|                     
#> 185/187:(72.8%)  930 : 1 ph     (1/1)       -- tail: 184 ●||| ||||                     
#> 186/188:(73.2%)  931 : 1 ph     (1/1)       -- tail: 185 |||| |||●                     
#> 187/189:(73.5%)  932 : 1 ph     (1/1)       -- tail: 186 ●●●● ●●●|                     
#> 188/190:(73.9%)  933 : 1 ph     (1/1)       -- tail: 187 ●||● |●●●                     
#> 189/191:(74.3%)  934 : 1 ph     (1/1)       -- tail: 188 ●||● |●|●                     
#> 190/192:(74.7%)  935 : 1 ph     (1/1)       -- tail: 189 ●●●● ●●●|                     
#> 191/193:(75.1%)  937 : 1 ph     (1/1)       -- tail: 190 ●||| ||||                     
#> 192/194:(75.5%)  938 : 1 ph     (1/1)       -- tail: 191 ●●●● ●●|●                     
#> 193/195:(75.9%)  939 : 1 ph     (1/1)       -- tail: 192 ||●● ●|●|                     
#> 194/196:(76.3%)  940 : 1 ph     (1/1)       -- tail: 193 ●●|● |●|●                     
#> 195/197:(76.7%)  941 : 1 ph     (1/1)       -- tail: 194 |||| ||●|                     
#> 196/198:(77%)    942 : 1 ph     (1/1)       -- tail: 195 |||| ||●|                     
#> 197/199:(77.4%)  943 : 1 ph     (1/1)       -- tail: 196 ●●●● ●●|●                     
#> 198/200:(77.8%)  944 : 1 ph     (1/1)       -- tail: 197 |||| ||●|                     
#> 199/201:(78.2%)  946 : 1 ph     (1/1)       -- tail: 198 |||| ||●|                     
#> 200/202:(78.6%)  948 : 1 ph     (1/1)       -- tail: 199 ||●| ●|●|                     
#> 201/203:(79%)    949 : 1 ph     (1/1)       -- tail: 200 |●●● ●●|●                     
#> 202/204:(79.4%)  950 : 1 ph     (1/1)       -- tail: 200 ||●| ●|||                     
#> 203/205:(79.8%)  951 : 1 ph     (1/1)       -- tail: 200 ●●|● |●|●                     
#> 204/206:(80.2%)  952 : 2 ph     (1/2)       -- tail: 200 ●||● |●●|       ●||● |●|● 
#> 206: not included (map extension)
#> 204/207:(80.5%)  953 : 1 ph     (1/1)       -- tail: 200 ●●●● ●●|●                     
#> 205/208:(80.9%)  954 : 1 ph     (1/1)       -- tail: 200 ●●●● ●●|●                     
#> 206/209:(81.3%)  956 : 1 ph     (1/1)       -- tail: 200 ●●●| ●●●●                     
#> 207/210:(81.7%)  957 : 1 ph     (1/1)       -- tail: 200 ●●|● ●●●●                     
#> 208/211:(82.1%)  959 : 1 ph     (1/1)       -- tail: 200 ●||● |●||                     
#> 209/212:(82.5%)  960 : 1 ph     (1/1)       -- tail: 200 ●●●● ●●●|                     
#> 210/213:(82.9%)  961 : 1 ph     (1/1)       -- tail: 200 |●|| ||||                     
#> 211/214:(83.3%)  962 : 1 ph     (1/1)       -- tail: 200 |||● ||||                     
#> 212/215:(83.7%)  963 : 1 ph     (1/1)       -- tail: 200 |||| |||●                     
#> 213/216:(84%)    964 : 1 ph     (1/1)       -- tail: 200 |||| |||●                     
#> 214/217:(84.4%)  965 : 1 ph     (1/1)       -- tail: 200 |||● ||||                     
#> 215/218:(84.8%)  966 : 1 ph     (1/1)       -- tail: 200 ●||● ●●●|                     
#> 216/219:(85.2%)  967 : 1 ph     (1/1)       -- tail: 200 ●||● ●●●|                     
#> 217/220:(85.6%)  968 : 1 ph     (1/1)       -- tail: 200 ●||● ●●●|                     
#> 218/221:(86%)    969 : 1 ph     (1/1)       -- tail: 200 |●●| |||●                     
#> 219/222:(86.4%)  970 : 1 ph     (1/1)       -- tail: 200 |||● ||||                     
#> 220/223:(86.8%)  971 : 2 ph     (1/2)       -- tail: 200 ●●●| |||●       |●●● |||● 
#> 221/224:(87.2%)  972 : 1 ph     (1/1)       -- tail: 200 |●●| |||●                     
#> 222/225:(87.5%)  973 : 1 ph     (1/1)       -- tail: 200 ●●●● ||●●                     
#> 223/226:(87.9%)  974 : 1 ph     (1/1)       -- tail: 200 ●||● ●●●|                     
#> 224/227:(88.3%)  976 : 1 ph     (1/1)       -- tail: 200 |●●● ●●●●                     
#> 225/228:(88.7%)  977 : 1 ph     (1/1)       -- tail: 200 |●●| ||||                     
#> 226/229:(89.1%)  978 : 1 ph     (1/1)       -- tail: 200 |||| ||●|                     
#> 227/230:(89.5%)  979 : 1 ph     (1/1)       -- tail: 200 ●●●● ●●●|                     
#> 228/231:(89.9%)  980 : 1 ph     (1/1)       -- tail: 200 ●●|● ●●●●                     
#> 229/232:(90.3%)  981 : 1 ph     (1/1)       -- tail: 200 ||●● ●●●●                     
#> 230/233:(90.7%)  982 : 1 ph     (1/1)       -- tail: 200 |||| |||●                     
#> 231/234:(91.1%)  983 : 1 ph     (1/1)       -- tail: 200 ●●●● ●●●|                     
#> 232/235:(91.4%)  984 : 1 ph     (1/1)       -- tail: 200 |||| |||●                     
#> 233/236:(91.8%)  985 : 1 ph     (1/1)       -- tail: 200 ●●●● ●●●|                     
#> 234/237:(92.2%)  986 : 1 ph     (1/1)       -- tail: 200 ||●| ||||                     
#> 235/238:(92.6%)  987 : 1 ph     (1/1)       -- tail: 200 ●●|| ||||                     
#> 236/239:(93%)    988 : 4 ph     (1/4)       -- tail: 200 ●●|| ●|||  ...  ●●|| |●|| 
#> 237/240:(93.4%)  989 : 1 ph     (1/1)       -- tail: 200 |||● |||●                     
#> 238/241:(93.8%)  990 : 3 ph     (1/3)       -- tail: 200 ●||● ●●||  ...  ●||● ●|●| 
#> 241: not included (map extension)
#> 238/242:(94.2%)  991 : 1 ph     (1/1)       -- tail: 200 ●●|● ●●●●                     
#> 239/243:(94.6%)  992 : 1 ph     (1/1)       -- tail: 200 |||| |||●                     
#> 240/244:(94.9%)  993 : 1 ph     (1/1)       -- tail: 200 ●●|| ||||                     
#> 241/245:(95.3%)  994 : 1 ph     (1/1)       -- tail: 200 ●●|| ||●|                     
#> 242/246:(95.7%)  995 : 1 ph     (1/1)       -- tail: 200 ●●|| ●●||                     
#> 243/247:(96.1%)  996 : 1 ph     (1/1)       -- tail: 200 ●●●● ●●●|                     
#> 244/248:(96.5%)  997 : 1 ph     (1/1)       -- tail: 200 ||●● ||||                     
#> 245/249:(96.9%)  998 : 1 ph     (1/1)       -- tail: 200 ●●|| ●●|●                     
#> 246/250:(97.3%)  999 : 1 ph     (1/1)       -- tail: 200 ●●|| ●●|●                     
#> 247/251:(97.7%)  1000: 1 ph     (1/1)       -- tail: 200 ●●|| ||||                     
#> 248/252:(98.1%)  1001: 1 ph     (1/1)       -- tail: 200 ||●● ●|●●                     
#> 249/253:(98.4%)  1002: 2 ph     (1/2)       -- tail: 200 |||| ●||●       |||| |●●| 
#> 250/254:(98.8%)  1003: 1 ph     (1/1)       -- tail: 200 |||| ●||●                     
#> 251/255:(99.2%)  1005: 1 ph     (1/1)       -- tail: 200 ●●|| |●||                     
#> 252/256:(99.6%)  1006: 1 ph     (1/1)       -- tail: 200 ||●● ●|●●                     
#> 253/257:(100%)   1009: 1 ph     (1/1)       -- tail: 200 |||| ||●●
#> ═════════════════════════════════════════════════════════ Reestimating final recombination fractions ══
#> Markers in the initial sequence: 257
#> Mapped markers                  : 253 (98.4%)
#> ═══════════════════════════════════════════════════════════════════════════════════════════════════════

Now, use the functions print and plot to view the results:

print(lg3.map)
#> This is an object of class 'mappoly.map'
#>     Ploidy level:     4 
#>     No. individuals:  153 
#>     No. markers:  253 
#>     No. linkage phases:   1 
#> 
#>     ---------------------------------------------
#>     Number of linkage phase configurations:  1
#>     ---------------------------------------------
#>     Linkage phase configuration:  1
#>        map length:    162.38
#>        log-likelihood:    -2179.27
#>        LOD:       0
#>     ~~~~~~~~~~~~~~~~~~
plot(lg3.map)

Colored rectangles (red and blue) indicate the presence of the allelic variant in each one of the four homologous in both parents, $P_1$ and $P_2$.

Reestimating the map considering genotyping errors

Though current technologies enabled the genotyping of thousands of SNPs, they are quite prone to genotyping errors, especially in polyploid species. One way to address this problem is to associate a probability distribution to each one of the markers and allow the HMM to update their probability. This procedure can be applied using either the probability distribution provided by the genotype calling software (loaded in MAPpoly using the function read_geno_prob) or assuming a global genotype error. For a detailed explanation of this procedure, please see Mollinari and Garcia, 2019. Briefly, the use of a prior information will update the genotype of the markers based on a global chromosome structure. In this tutorial, since we are analyzing the dosage data with no probability distribution associated, the second approch will be used.

lg3.map.error <- est_full_hmm_with_global_error(input.map = lg3.map, error = 0.05)
#> 
#>  ----------------------------------------------
#>  INFO: running HMM using full transition space:
#>        this operation may take a while.
#> -----------------------------------------------

plot(lg3.map.error)

Notice that a global genotyping error of 5% was used, and the resulting map was smaller than the previous one. Also, some markers were “attracted” and some markers were “repealed” as a result of the smaller confidence used for each marker genotype.

Reinserting redundant markers

As mentioned before, redundant markers are automatically removed during the analysis to reduce computational calculations. Besides that, one may want to see the genetic linkage maps considering all markers in the full dataset, including the redundant ones. The addition of redundant markers to a map can be done by just calling the function update_map. Here we will show the addition in the previous map:

lg3.map.updated = update_map(lg3.map)
lg3.map
#> This is an object of class 'mappoly.map'
#>     Ploidy level:     4 
#>     No. individuals:  153 
#>     No. markers:  253 
#>     No. linkage phases:   1 
#> 
#>     ---------------------------------------------
#>     Number of linkage phase configurations:  1
#>     ---------------------------------------------
#>     Linkage phase configuration:  1
#>        map length:    162.38
#>        log-likelihood:    -2179.27
#>        LOD:       0
#>     ~~~~~~~~~~~~~~~~~~
lg3.map.updated
#> This is an object of class 'mappoly.map'
#>     Ploidy level:     4 
#>     No. individuals:  153 
#>     No. markers:  280 
#>     No. linkage phases:   1 
#> 
#>     ---------------------------------------------
#>     Number of linkage phase configurations:  1
#>     ---------------------------------------------
#>     Linkage phase configuration:  1
#>        map length:    162.38
#>        log-likelihood:    -2179.27
#>        LOD:       0
#>     ~~~~~~~~~~~~~~~~~~

As can be seen, both maps are identical except for the number of markers.

Ordering markers using MDS and reestimating the map

So far the map was reestimated using the genomic order. In real situations, unless a genomic information is provided, the markers need to be ordered using an optimization technique. Here, we use the MDS (multidimensional scaling) algorithm, proposed in the context of genetic mapping by Preedy and Hackett (2016). The MDS algorithm requires a recombination fraction matrix, which will be transformed in distances using a mapping function (in this case the Haldane’s mapping function). First, let us gather the pairwise recombination fractions for all three linkage groups:

mt <- lapply(LGS, function(x) rf_list_to_matrix(x$tpt))

Now, for each matrix contained in the object mt, we use the MDS algorithm:

mds.ord <- lapply(mt, mds_mappoly)

Usually at this point, the user can make use of diagnostic plots to remove markers that are disturbing the ordering procedure. Here we didn’t use that procedure, but we encourage the user to check the example in ?mds_mappoly. Now, let us compare the estimated and the genomic orders (feel free to run the last commented line and see interactive plots):

LGS.mds<-vector("list", 12)
for(j in 1:12){
  lgtemp <- make_seq_mappoly(mds.ord[[j]])
  LGS.mds[[j]] <- list(lg = lgtemp, 
      tpt = make_pairs_mappoly(all.rf.pairwise, input.seq = lgtemp))
}

geno.vs.mds <- NULL
for(i in 1:length(LGS.mds)){
  geno.vs.mds<-rbind(geno.vs.mds,
                     data.frame(mrk.names = LGS.mds[[i]]$lg$seq.mrk.names,
                                mds.pos = seq_along(LGS.mds[[i]]$lg$seq.mrk.names),
                                genomic.pos = order(LGS.mds[[i]]$lg$sequence.pos),
                                LG = paste0("LG_", i)))
}

require(ggplot2)
p<-ggplot(geno.vs.mds, aes(genomic.pos, mds.pos)) +
  geom_point(alpha = 1/5, aes(colour = LG)) +
  facet_wrap(~LG) +  xlab("Genome Order") + ylab("Map Order")
p

#plotly::ggplotly(p)

Although several local inconsistencies occurred, the global diagonal pattern indicates a consistent order for all linkage groups using both approaches. Now, let us build the genetic map of linkage group 3 using the MDS order (remember that this can take a while to finish):

lg3.map.mds <- est_rf_hmm_sequential(input.seq = LGS.mds[[3]]$lg,
                                start.set = 10,
                                thres.twopt = 10, 
                                thres.hmm = 10,
                                extend.tail = 200,
                                info.tail = TRUE, 
                                twopt = LGS.mds[[3]]$tpt,
                                sub.map.size.diff.limit = 10, 
                                phase.number.limit = 20,
                                reestimate.single.ph.configuration = TRUE,
                                tol = 10e-3,
                                tol.final = 10e-4)
#> Number of markers: 257
#> ═══════════════════════════════════════════════════════════════════════════════════ Initial sequence ══
#> 10 markers...
#> ●    Trying sequence: 1 2 3 4 5 6 7 8 9 10 :
#>        16 phase(s): . . . . . . . . . . . . . . . . 
#> ●    Trying sequence: 2 3 4 5 6 7 8 9 10 11 :
#>        16 phase(s): . . . . . . . . . . . . . . . . 
#> ●    Trying sequence: 3 4 5 6 7 8 9 10 11 12 :
#>        16 phase(s): . . . . . . . . . . . . . . . . 
#> ●    Trying sequence: 4 5 6 7 8 9 10 11 12 13 :
#>        24 phase(s): . . . . . . . . . . . . . . . . . . . . . . . . 
#> ●    Trying sequence: 5 6 7 8 9 10 11 12 13 14 :
#>        6 phase(s): . . . . . . 
#> ●    Trying sequence: 6 7 8 9 10 11 12 13 14 15 :
#>        6 phase(s): . . . . . .
#> ═════════════════════════════════════════════════════════════════════════ Done with initial sequence ══
#> 11 /16 :(6.2%)   989 : 12 ph    (1/12)      -- tail: 10  ●||| ●|||  ...  ●||| |●|| 
#> 12 /17 :(6.6%)   998 : 1 ph     (1/1)       -- tail: 11  ||●● |●●●                     
#> 13 /18 :(7%)     999 : 1 ph     (1/1)       -- tail: 12  ||●● |●●●                     
#> 14 /19 :(7.4%)   976 : 2 ph     (1/2)       -- tail: 13  ●●●| ●●●●       ●●|● ●●●● 
#> 15 /20 :(7.8%)   988 : 3 ph     (1/3)       -- tail: 14  ||●● |●||  ...  ||●● ||●| 
#> 16 /21 :(8.2%)   970 : 1 ph     (1/1)       -- tail: 15  |●|| ||||                     
#> 17 /22 :(8.6%)   980 : 1 ph     (1/1)       -- tail: 16  |●●● ●●●●                     
#> 18 /23 :(8.9%)   977 : 2 ph     (1/2)       -- tail: 17  ●|●| ||||       ●||● |||| 
#> 19 /24 :(9.3%)   995 : 1 ph     (1/1)       -- tail: 18  ||●● ||●●                     
#> 20 /25 :(9.7%)   978 : 1 ph     (1/1)       -- tail: 19  |||| ●|||                     
#> 21 /26 :(10.1%)  965 : 1 ph     (1/1)       -- tail: 20  |●|| ||||                     
#> 22 /27 :(10.5%)  996 : 1 ph     (1/1)       -- tail: 21  ●●●● ●|●●                     
#> 23 /28 :(10.9%)  971 : 1 ph     (1/1)       -- tail: 22  ●●●| |●||                     
#> 24 /29 :(11.3%)  992 : 1 ph     (1/1)       -- tail: 23  |||| |●||                     
#> 25 /30 :(11.7%)  984 : 1 ph     (1/1)       -- tail: 24  |||| |●||                     
#> 26 /31 :(12.1%)  982 : 1 ph     (1/1)       -- tail: 25  |||| |●||                     
#> 27 /32 :(12.5%)  985 : 1 ph     (1/1)       -- tail: 26  ●●●● ●|●●                     
#> 28 /33 :(12.8%)  983 : 1 ph     (1/1)       -- tail: 27  ●●●● ●|●●                     
#> 29 /34 :(13.2%)  966 : 1 ph     (1/1)       -- tail: 28  |●|● ●|●●                     
#> 30 /35 :(13.6%)  974 : 1 ph     (1/1)       -- tail: 29  |●|● ●|●●                     
#> 31 /36 :(14%)    968 : 1 ph     (1/1)       -- tail: 30  |●|● ●|●●                     
#> 32 /37 :(14.4%)  969 : 1 ph     (1/1)       -- tail: 31  ●|●| |●||                     
#> 33 /38 :(14.8%)  967 : 1 ph     (1/1)       -- tail: 32  |●|● ●|●●                     
#> 34 /39 :(15.2%)  979 : 1 ph     (1/1)       -- tail: 33  ●●●● ●|●●                     
#> 35 /40 :(15.6%)  972 : 1 ph     (1/1)       -- tail: 34  ●|●| |●||                     
#> 36 /41 :(16%)    962 : 1 ph     (1/1)       -- tail: 35  |●|| ||||                     
#> 37 /42 :(16.3%)  973 : 1 ph     (1/1)       -- tail: 36  ●●●● ●●||                     
#> 38 /43 :(16.7%)  957 : 1 ph     (1/1)       -- tail: 37  |●●● ●●●●                     
#> 39 /44 :(17.1%)  964 : 1 ph     (1/1)       -- tail: 38  |||| |●||                     
#> 40 /45 :(17.5%)  956 : 1 ph     (1/1)       -- tail: 39  ●|●● ●●●●                     
#> 41 /46 :(17.9%)  990 : 3 ph     (1/3)       -- tail: 40  |●|● ●|●|  ...  |●|● ●||● 
#> 46: not included (map extension)
#> 41 /47 :(18.3%)  960 : 1 ph     (1/1)       -- tail: 40  ●●●● ●|●●                     
#> 42 /48 :(18.7%)  963 : 1 ph     (1/1)       -- tail: 41  |||| |●||                     
#> 43 /49 :(19.1%)  940 : 6 ph     (1/6)       -- tail: 42  |●●● ●●||  ...  |●●● ●|●| 
#> 44 /50 :(19.5%)  951 : 1 ph     (1/1)       -- tail: 43  |●●● |●|●                     
#> 45 /51 :(19.8%)  948 : 1 ph     (1/1)       -- tail: 44  ●||| ●|●|                     
#> 46 /52 :(20.2%)  950 : 2 ph     (1/2)       -- tail: 45  ●||| ●|||       ●||| ||●| 
#> 47 /53 :(20.6%)  939 : 1 ph     (1/1)       -- tail: 46  ●●|| ●|●|                     
#> 48 /54 :(21%)    959 : 1 ph     (1/1)       -- tail: 47  |●|● |||●                     
#> 49 /55 :(21.4%)  937 : 1 ph     (1/1)       -- tail: 48  |||● ||||                     
#> 50 /56 :(21.8%)  934 : 1 ph     (1/1)       -- tail: 49  |●|● |●|●                     
#> 51 /57 :(22.2%)  921 : 1 ph     (1/1)       -- tail: 50  ●||| ||●|                     
#> 52 /58 :(22.6%)  927 : 1 ph     (1/1)       -- tail: 51  ●||| ||●●                     
#> 53 /59 :(23%)    949 : 4 ph     (1/4)       -- tail: 52  ●●●| ●●●|  ...  ●●●| ●●|● 
#> 54 /60 :(23.3%)  954 : 1 ph     (1/1)       -- tail: 53  ●●●● |●●●                     
#> 55 /61 :(23.7%)  933 : 1 ph     (1/1)       -- tail: 54  |●|● ●●|●                     
#> 56 /62 :(24.1%)  942 : 1 ph     (1/1)       -- tail: 55  |||| ●|||                     
#> 57 /63 :(24.5%)  944 : 1 ph     (1/1)       -- tail: 56  |||| ●|||                     
#> 58 /64 :(24.9%)  943 : 1 ph     (1/1)       -- tail: 57  ●●●● |●●●                     
#> 59 /65 :(25.3%)  946 : 1 ph     (1/1)       -- tail: 58  |||| ●|||                     
#> 60 /66 :(25.7%)  953 : 1 ph     (1/1)       -- tail: 59  ●●●● |●●●                     
#> 61 /67 :(26.1%)  930 : 1 ph     (1/1)       -- tail: 60  |||● ||||                     
#> 62 /68 :(26.5%)  941 : 1 ph     (1/1)       -- tail: 61  |||| ●|||                     
#> 63 /69 :(26.8%)  926 : 1 ph     (1/1)       -- tail: 62  |||| |||●                     
#> 64 /70 :(27.2%)  928 : 1 ph     (1/1)       -- tail: 63  |||| |||●                     
#> 65 /71 :(27.6%)  935 : 1 ph     (1/1)       -- tail: 64  ●●●● ●|●●                     
#> 66 /72 :(28%)    923 : 1 ph     (1/1)       -- tail: 65  |●|| ●●|●                     
#> 67 /73 :(28.4%)  938 : 1 ph     (1/1)       -- tail: 66  ●●●● |●●●                     
#> 68 /74 :(28.8%)  952 : 2 ph     (1/2)       -- tail: 67  |●|● ●||●       |●|● |●|● 
#> 74: not included (map extension)
#> 68 /75 :(29.2%)  932 : 1 ph     (1/1)       -- tail: 67  ●●●● ●|●●                     
#> 69 /76 :(29.6%)  931 : 1 ph     (1/1)       -- tail: 68  |||| |●||                     
#> 70 /77 :(30%)    919 : 1 ph     (1/1)       -- tail: 69  |●|● ●●||                     
#> 71 /78 :(30.4%)  920 : 1 ph     (1/1)       -- tail: 70  |||● |●||                     
#> 72 /79 :(30.7%)  917 : 1 ph     (1/1)       -- tail: 71  |●|● ●●||                     
#> 73 /80 :(31.1%)  925 : 1 ph     (1/1)       -- tail: 72  |||| |●||                     
#> 74 /81 :(31.5%)  914 : 1 ph     (1/1)       -- tail: 73  |●|● ●●||                     
#> 75 /82 :(31.9%)  929 : 1 ph     (1/1)       -- tail: 74  |||| ●||●                     
#> 76 /83 :(32.3%)  916 : 1 ph     (1/1)       -- tail: 75  |●|| |●||                     
#> 77 /84 :(32.7%)  913 : 1 ph     (1/1)       -- tail: 76  |●|| ||||                     
#> 78 /85 :(33.1%)  924 : 2 ph     (1/2)       -- tail: 77  ●|●| ●|●|       ●|●| |●●| 
#> 85: not included (map extension)
#> 78 /86 :(33.5%)  903 : 1 ph     (1/1)       -- tail: 77  ||●● ●|||                     
#> 79 /87 :(33.9%)  910 : 1 ph     (1/1)       -- tail: 78  ||●● ●||●                     
#> 80 /88 :(34.2%)  911 : 1 ph     (1/1)       -- tail: 79  |●|| ||||                     
#> 81 /89 :(34.6%)  918 : 1 ph     (1/1)       -- tail: 80  ●●●● |●●●                     
#> 82 /90 :(35%)    873 : 1 ph     (1/1)       -- tail: 81  |||| ●|||                     
#> 83 /91 :(35.4%)  900 : 1 ph     (1/1)       -- tail: 82  |||| ●|||                     
#> 84 /92 :(35.8%)  912 : 1 ph     (1/1)       -- tail: 83  |||| |●||                     
#> 85 /93 :(36.2%)  905 : 1 ph     (1/1)       -- tail: 84  ||●| ||||                     
#> 86 /94 :(36.6%)  922 : 1 ph     (1/1)       -- tail: 85  |●|● ●●||                     
#> 94: not included (map extension)
#> 86 /95 :(37%)    895 : 1 ph     (1/1)       -- tail: 85  |||| ●|●|                     
#> 87 /96 :(37.4%)  904 : 1 ph     (1/1)       -- tail: 86  ||●| ●|||                     
#> 88 /97 :(37.7%)  915 : 1 ph     (1/1)       -- tail: 87  |||| |●||                     
#> 89 /98 :(38.1%)  899 : 1 ph     (1/1)       -- tail: 88  ●||| |●●|                     
#> 90 /99 :(38.5%)  907 : 1 ph     (1/1)       -- tail: 89  |●|| ||||                     
#> 91 /100:(38.9%)  893 : 1 ph     (1/1)       -- tail: 90  ●●●● ●●●|                     
#> 92 /101:(39.3%)  886 : 1 ph     (1/1)       -- tail: 91  |||| |||●                     
#> 93 /102:(39.7%)  876 : 1 ph     (1/1)       -- tail: 92  ●●●● ●●●|                     
#> 94 /103:(40.1%)  902 : 1 ph     (1/1)       -- tail: 93  ●|●● ●●●●                     
#> 95 /104:(40.5%)  906 : 1 ph     (1/1)       -- tail: 94  ||●● |||●                     
#> 96 /105:(40.9%)  898 : 1 ph     (1/1)       -- tail: 95  ●||| |●●|                     
#> 97 /106:(41.2%)  901 : 1 ph     (1/1)       -- tail: 96  ●●|● |●●|                     
#> 98 /107:(41.6%)  897 : 1 ph     (1/1)       -- tail: 97  |●●| ●||●                     
#> 99 /108:(42%)    894 : 1 ph     (1/1)       -- tail: 98  ●●|● |●●●                     
#> 100/109:(42.4%)  896 : 1 ph     (1/1)       -- tail: 99  ●●|● ●●●|                     
#> 101/110:(42.8%)  874 : 1 ph     (1/1)       -- tail: 100 ●||| |●●●                     
#> 102/111:(43.2%)  875 : 1 ph     (1/1)       -- tail: 101 ●|●| ●●●|                     
#> 103/112:(43.6%)  888 : 1 ph     (1/1)       -- tail: 102 |●●● ●||●                     
#> 104/113:(44%)    889 : 1 ph     (1/1)       -- tail: 103 ●||| |●●|                     
#> 105/114:(44.4%)  892 : 1 ph     (1/1)       -- tail: 104 ●●|● |●●|                     
#> 106/115:(44.7%)  884 : 1 ph     (1/1)       -- tail: 105 ●●|● |●●|                     
#> 107/116:(45.1%)  882 : 1 ph     (1/1)       -- tail: 106 ||●| ●||●                     
#> 108/117:(45.5%)  883 : 1 ph     (1/1)       -- tail: 107 ||●| ●||●                     
#> 109/118:(45.9%)  881 : 1 ph     (1/1)       -- tail: 108 ●||| ●●●|                     
#> 110/119:(46.3%)  891 : 1 ph     (1/1)       -- tail: 109 ●●|● |●●|                     
#> 111/120:(46.7%)  887 : 1 ph     (1/1)       -- tail: 110 ||●| ●||●                     
#> 112/121:(47.1%)  885 : 1 ph     (1/1)       -- tail: 111 ||●| ●||●                     
#> 113/122:(47.5%)  890 : 1 ph     (1/1)       -- tail: 112 ||●| ●||●                     
#> 114/123:(47.9%)  877 : 1 ph     (1/1)       -- tail: 113 ●||| |●●|                     
#> 115/124:(48.2%)  878 : 1 ph     (1/1)       -- tail: 114 ●||| |●●|                     
#> 116/125:(48.6%)  879 : 1 ph     (1/1)       -- tail: 115 ●●|● |●●|                     
#> 117/126:(49%)    880 : 1 ph     (1/1)       -- tail: 116 ●|●| ●●●●                     
#> 118/127:(49.4%)  865 : 1 ph     (1/1)       -- tail: 117 ●||| ●●●●                     
#> 119/128:(49.8%)  867 : 1 ph     (1/1)       -- tail: 118 |●|| ||||                     
#> 120/129:(50.2%)  858 : 1 ph     (1/1)       -- tail: 119 ●●●● ●●●|                     
#> 121/130:(50.6%)  869 : 1 ph     (1/1)       -- tail: 120 |●●● ●||●                     
#> 122/131:(51%)    871 : 1 ph     (1/1)       -- tail: 121 ●|●| ●●●●                     
#> 123/132:(51.4%)  864 : 1 ph     (1/1)       -- tail: 122 ●●|● ●●●●                     
#> 124/133:(51.8%)  863 : 1 ph     (1/1)       -- tail: 123 ●|●| ●●●●                     
#> 125/134:(52.1%)  866 : 1 ph     (1/1)       -- tail: 124 ●|●| ●●●●                     
#> 126/135:(52.5%)  868 : 1 ph     (1/1)       -- tail: 125 ||●| ●||●                     
#> 127/136:(52.9%)  870 : 1 ph     (1/1)       -- tail: 126 ||●| |||●                     
#> 128/137:(53.3%)  861 : 1 ph     (1/1)       -- tail: 127 ●||| |●●|                     
#> 129/138:(53.7%)  862 : 1 ph     (1/1)       -- tail: 128 ●|●| ●●●●                     
#> 130/139:(54.1%)  860 : 1 ph     (1/1)       -- tail: 129 |●|| ||||                     
#> 131/140:(54.5%)  857 : 1 ph     (1/1)       -- tail: 130 |●●● ●||●                     
#> 132/141:(54.9%)  855 : 1 ph     (1/1)       -- tail: 131 |●●● ●●●●                     
#> 133/142:(55.3%)  853 : 1 ph     (1/1)       -- tail: 132 ●||| ||||                     
#> 134/143:(55.6%)  856 : 1 ph     (1/1)       -- tail: 133 |●●● ●●●●                     
#> 135/144:(56%)    859 : 1 ph     (1/1)       -- tail: 134 ||●| ●||●                     
#> 136/145:(56.4%)  844 : 1 ph     (1/1)       -- tail: 135 ●●●● |●●●                     
#> 137/146:(56.8%)  872 : 1 ph     (1/1)       -- tail: 136 |||● ||||                     
#> 138/147:(57.2%)  845 : 1 ph     (1/1)       -- tail: 137 ●●●| |●●●                     
#> 139/148:(57.6%)  850 : 1 ph     (1/1)       -- tail: 138 ||●| ●●●|                     
#> 140/149:(58%)    852 : 1 ph     (1/1)       -- tail: 139 |●|| ||||                     
#> 141/150:(58.4%)  839 : 1 ph     (1/1)       -- tail: 140 ●●●● ●●●|                     
#> 142/151:(58.8%)  843 : 1 ph     (1/1)       -- tail: 141 ||●| ●●●|                     
#> 143/152:(59.1%)  838 : 1 ph     (1/1)       -- tail: 142 ●●|● |●●●                     
#> 144/153:(59.5%)  841 : 1 ph     (1/1)       -- tail: 143 ||●| ●●●|                     
#> 145/154:(59.9%)  851 : 1 ph     (1/1)       -- tail: 144 |●●● ●●●●                     
#> 146/155:(60.3%)  854 : 1 ph     (1/1)       -- tail: 145 ●●|| ||||                     
#> 147/156:(60.7%)  846 : 1 ph     (1/1)       -- tail: 146 ●●●● ●||●                     
#> 148/157:(61.1%)  842 : 1 ph     (1/1)       -- tail: 147 |●●● ●●●●                     
#> 149/158:(61.5%)  849 : 1 ph     (1/1)       -- tail: 148 ●||| ||||                     
#> 150/159:(61.9%)  848 : 1 ph     (1/1)       -- tail: 149 ●●●● ●||●                     
#> 151/160:(62.3%)  827 : 2 ph     (1/2)       -- tail: 150 ||●| ●●||       ||●| ●|●| 
#> 152/161:(62.6%)  829 : 1 ph     (1/1)       -- tail: 151 ||●| ●●|●                     
#> 153/162:(63%)    840 : 1 ph     (1/1)       -- tail: 152 ||●● ●●●|                     
#> 154/163:(63.4%)  835 : 1 ph     (1/1)       -- tail: 153 ●||| ||●|                     
#> 155/164:(63.8%)  825 : 1 ph     (1/1)       -- tail: 154 ||●| |●|●                     
#> 156/165:(64.2%)  836 : 1 ph     (1/1)       -- tail: 155 ●●●● ●||●                     
#> 157/166:(64.6%)  832 : 1 ph     (1/1)       -- tail: 156 ●●|● ||●|                     
#> 158/167:(65%)    828 : 1 ph     (1/1)       -- tail: 157 ●||| ||●|                     
#> 159/168:(65.4%)  833 : 1 ph     (1/1)       -- tail: 158 ●||| ||●|                     
#> 160/169:(65.8%)  826 : 1 ph     (1/1)       -- tail: 159 ●||| ||●|                     
#> 161/170:(66.1%)  831 : 1 ph     (1/1)       -- tail: 160 ●||| ||●|                     
#> 162/171:(66.5%)  834 : 1 ph     (1/1)       -- tail: 161 ●||| ||●|                     
#> 163/172:(66.9%)  837 : 1 ph     (1/1)       -- tail: 162 ●●|| ||||                     
#> 164/173:(67.3%)  824 : 1 ph     (1/1)       -- tail: 163 ●||| ||●|                     
#> 165/174:(67.7%)  822 : 1 ph     (1/1)       -- tail: 164 ●●|● ●|●●                     
#> 166/175:(68.1%)  820 : 1 ph     (1/1)       -- tail: 165 |||| ●●||                     
#> 167/176:(68.5%)  819 : 1 ph     (1/1)       -- tail: 166 ●||| ||●|                     
#> 168/177:(68.9%)  817 : 1 ph     (1/1)       -- tail: 167 ●||| ||●|                     
#> 169/178:(69.3%)  818 : 1 ph     (1/1)       -- tail: 168 |●●● ●●|●                     
#> 170/179:(69.6%)  815 : 1 ph     (1/1)       -- tail: 169 |●●● ●●|●                     
#> 171/180:(70%)    823 : 1 ph     (1/1)       -- tail: 170 |●|| ||||                     
#> 172/181:(70.4%)  813 : 1 ph     (1/1)       -- tail: 171 |●●● ●||●                     
#> 173/182:(70.8%)  821 : 1 ph     (1/1)       -- tail: 172 ●●|| ||●|                     
#> 174/183:(71.2%)  811 : 1 ph     (1/1)       -- tail: 173 |||● ●|||                     
#> 175/184:(71.6%)  814 : 1 ph     (1/1)       -- tail: 174 ●|●● ●●●●                     
#> 176/185:(72%)    788 : 1 ph     (1/1)       -- tail: 175 ●●●● ●●|●                     
#> 177/186:(72.4%)  810 : 1 ph     (1/1)       -- tail: 176 ●||| |●●|                     
#> 178/187:(72.8%)  816 : 1 ph     (1/1)       -- tail: 177 ●●|| ●●●|                     
#> 179/188:(73.2%)  804 : 1 ph     (1/1)       -- tail: 178 ●||| |●●|                     
#> 180/189:(73.5%)  805 : 1 ph     (1/1)       -- tail: 179 ●||| |●●|                     
#> 181/190:(73.9%)  806 : 1 ph     (1/1)       -- tail: 180 |●●● ●||●                     
#> 182/191:(74.3%)  802 : 1 ph     (1/1)       -- tail: 181 ●||| ||●|                     
#> 183/192:(74.7%)  803 : 1 ph     (1/1)       -- tail: 182 ●||| |●●|                     
#> 184/193:(75.1%)  812 : 1 ph     (1/1)       -- tail: 183 ||●● ●||●                     
#> 185/194:(75.5%)  807 : 1 ph     (1/1)       -- tail: 184 |●|| ||||                     
#> 186/195:(75.9%)  808 : 1 ph     (1/1)       -- tail: 185 ●|●● ●●●●                     
#> 187/196:(76.3%)  801 : 1 ph     (1/1)       -- tail: 186 |●●● ●●|●                     
#> 188/197:(76.7%)  809 : 1 ph     (1/1)       -- tail: 187 ||●● ●||●                     
#> 189/198:(77%)    800 : 1 ph     (1/1)       -- tail: 188 |●●● ●●|●                     
#> 190/199:(77.4%)  799 : 1 ph     (1/1)       -- tail: 189 ●||| ||●|                     
#> 191/200:(77.8%)  798 : 1 ph     (1/1)       -- tail: 190 ●||| ||●|                     
#> 192/201:(78.2%)  796 : 1 ph     (1/1)       -- tail: 191 |●|| ||●|                     
#> 193/202:(78.6%)  797 : 1 ph     (1/1)       -- tail: 192 ●||| ||●|                     
#> 194/203:(79%)    780 : 1 ph     (1/1)       -- tail: 193 ●●|● ●●|●                     
#> 195/204:(79.4%)  784 : 1 ph     (1/1)       -- tail: 194 ||●| ||●●                     
#> 196/205:(79.8%)  792 : 1 ph     (1/1)       -- tail: 195 |||| ●|||                     
#> 197/206:(80.2%)  794 : 1 ph     (1/1)       -- tail: 196 ●●●● |●●●                     
#> 198/207:(80.5%)  766 : 1 ph     (1/1)       -- tail: 197 ●●|| |●|●                     
#> 199/208:(80.9%)  769 : 1 ph     (1/1)       -- tail: 198 ||●● ●|●|                     
#> 200/209:(81.3%)  771 : 1 ph     (1/1)       -- tail: 199 ||●| ||●|                     
#> 201/210:(81.7%)  774 : 1 ph     (1/1)       -- tail: 200 ●●|● ●●|●                     
#> 202/211:(82.1%)  770 : 1 ph     (1/1)       -- tail: 200 ||●| ||●|                     
#> 203/212:(82.5%)  787 : 1 ph     (1/1)       -- tail: 200 ●|●● ●●●●                     
#> 204/213:(82.9%)  786 : 1 ph     (1/1)       -- tail: 200 |●|| ||||                     
#> 205/214:(83.3%)  789 : 1 ph     (1/1)       -- tail: 200 ●|●● ●●●●                     
#> 206/215:(83.7%)  791 : 1 ph     (1/1)       -- tail: 200 |●|| ||||                     
#> 207/216:(84%)    783 : 1 ph     (1/1)       -- tail: 200 ●|●● ●●●●                     
#> 208/217:(84.4%)  785 : 1 ph     (1/1)       -- tail: 200 |||| |●||                     
#> 209/218:(84.8%)  772 : 1 ph     (1/1)       -- tail: 200 ●|●● ●●●●                     
#> 210/219:(85.2%)  773 : 1 ph     (1/1)       -- tail: 200 |●|| ||||                     
#> 211/220:(85.6%)  782 : 1 ph     (1/1)       -- tail: 200 |||| |●||                     
#> 212/221:(86%)    762 : 1 ph     (1/1)       -- tail: 200 ●|●● ●●●●                     
#> 213/222:(86.4%)  763 : 1 ph     (1/1)       -- tail: 200 |||● ●|||                     
#> 214/223:(86.8%)  767 : 1 ph     (1/1)       -- tail: 200 ●||| |||●                     
#> 215/224:(87.2%)  778 : 5 ph     (1/5)       -- tail: 200 ●●|| |●●|  ...  ●|●| |●●| 
#> 216/225:(87.5%)  777 : 1 ph     (1/1)       -- tail: 200 ●||● ●●|●                     
#> 217/226:(87.9%)  776 : 1 ph     (1/1)       -- tail: 200 ●||● ●||●                     
#> 218/227:(88.3%)  765 : 1 ph     (1/1)       -- tail: 200 ●●●● ●●●|                     
#> 219/228:(88.7%)  764 : 1 ph     (1/1)       -- tail: 200 |||| |||●                     
#> 220/229:(89.1%)  761 : 1 ph     (1/1)       -- tail: 200 ●||● ●|||                     
#> 221/230:(89.5%)  768 : 1 ph     (1/1)       -- tail: 200 |||| |●||                     
#> 222/231:(89.9%)  755 : 1 ph     (1/1)       -- tail: 200 ●●|● ●●|●                     
#> 223/232:(90.3%)  758 : 1 ph     (1/1)       -- tail: 200 |||| ||●|                     
#> 224/233:(90.7%)  759 : 1 ph     (1/1)       -- tail: 200 ●●●● ●●||                     
#> 225/234:(91.1%)  754 : 3 ph     (1/3)       -- tail: 200 ●●|| ●●||  ...  ●||● ●●|| 
#> 226/235:(91.4%)  749 : 4 ph     (1/4)       -- tail: 200 ●||| ●●||  ...  |●|| ●●|| 
#> 227/236:(91.8%)  747 : 1 ph     (1/1)       -- tail: 200 ●●|● ||●●                     
#> 228/237:(92.2%)  750 : 1 ph     (1/1)       -- tail: 200 ||●| ●●||                     
#> 229/238:(92.6%)  751 : 1 ph     (1/1)       -- tail: 200 ●●|● ||●●                     
#> 230/239:(93%)    746 : 1 ph     (1/1)       -- tail: 200 ●●|● ||●●                     
#> 231/240:(93.4%)  745 : 1 ph     (1/1)       -- tail: 200 ●●|● ||●●                     
#> 232/241:(93.8%)  752 : 1 ph     (1/1)       -- tail: 200 ●●|● ||●●                     
#> 233/242:(94.2%)  748 : 1 ph     (1/1)       -- tail: 200 ||●| ●●||                     
#> 234/243:(94.6%)  760 : 1 ph     (1/1)       -- tail: 200 |||| |●||                     
#> 235/244:(94.9%)  753 : 1 ph     (1/1)       -- tail: 200 |||| ●●||                     
#> 236/245:(95.3%)  742 : 1 ph     (1/1)       -- tail: 200 ●●●● ●●|●                     
#> 237/246:(95.7%)  744 : 1 ph     (1/1)       -- tail: 200 |||| |●||                     
#> 238/247:(96.1%)  733 : 1 ph     (1/1)       -- tail: 200 |||| ||●|                     
#> 239/248:(96.5%)  739 : 1 ph     (1/1)       -- tail: 200 ●●●● ●●|●                     
#> 240/249:(96.9%)  732 : 4 ph     (1/4)       -- tail: 200 ●||| ||●|  ...  |●|| ||●| 
#> 241/250:(97.3%)  730 : 1 ph     (1/1)       -- tail: 200 ●●●● ●●|●                     
#> 242/251:(97.7%)  735 : 1 ph     (1/1)       -- tail: 200 |||● ||●|                     
#> 243/252:(98.1%)  738 : 1 ph     (1/1)       -- tail: 200 |||● ||●|                     
#> 244/253:(98.4%)  737 : 1 ph     (1/1)       -- tail: 200 ||●| ●●|●                     
#> 245/254:(98.8%)  725 : 1 ph     (1/1)       -- tail: 200 |||| ||●|                     
#> 246/255:(99.2%)  736 : 1 ph     (1/1)       -- tail: 200 ||●| ||||                     
#> 247/256:(99.6%)  741 : 1 ph     (1/1)       -- tail: 200 |||| ●||●                     
#> 248/257:(100%)   729 : 1 ph     (1/1)       -- tail: 200 |||| |||●
#> ═════════════════════════════════════════════════════════ Reestimating final recombination fractions ══
#> Markers in the initial sequence: 257
#> Mapped markers                  : 248 (96.5%)
#> ═══════════════════════════════════════════════════════════════════════════════════════════════════════

And plot the map:

plot(lg3.map.mds)

It is also possible to compare the maps using both genomic-based and MDS-based orders with the function plot_map_list:

plot_map_list(list(lg3.map, lg3.map.mds), col = c("red", "blue"), title = "")

The genomic-based map included the same number of markers but is smaller than the MDS-based map, which indicates a better result. To formally compare the maps, one needs to select the markers that are present in both maps. Interestingly enough, both procedures included the same markers in the final map; however, we provide the code to perform the comparison even if the maps share only a subset of markers:

mrks.in.gen<-intersect(lg3.map$maps[[1]]$seq.num, lg3.map.mds$maps[[1]]$seq.num)
mrks.in.mds<-intersect(lg3.map.mds$maps[[1]]$seq.num, lg3.map$maps[[1]]$seq.num)
if(cor(mrks.in.gen, mrks.in.mds) < 0){
  mrks.in.mds <- rev(mrks.in.mds)
  lg3.map.mds <- rev_map(lg3.map.mds)
}
map.comp.3.gen<-get_submap(input.map = lg3.map, match(mrks.in.gen, lg3.map$maps[[1]]$seq.num), verbose = FALSE)
map.comp.3.mds<-get_submap(input.map = lg3.map.mds, match(mrks.in.mds, lg3.map.mds$maps[[1]]$seq.num), verbose = FALSE)
prob.3.gen<-extract_map(lg3.map)
prob.3.mds<-extract_map(lg3.map.mds)
names(prob.3.gen)<-map.comp.3.gen$maps[[1]]$seq.num
names(prob.3.mds)<-map.comp.3.mds$maps[[1]]$seq.num

matplot(t(data.frame(prob.3.gen,prob.3.mds[names(prob.3.gen)])), 
        type="b", pch="_", col=1, lty=1, lwd = .5, xlab= "", 
        ylab="Marker position (cM)", axes = F)
axis(2)
mtext(text = round(map.comp.3.gen$maps[[1]]$loglike,1), side = 1, adj = 0)
mtext(text = round(map.comp.3.mds$maps[[1]]$loglike,1), side = 1, adj = 1)
mtext(text = "Genomic", side = 3, adj = 0)
mtext(text = "MDS", side = 3, adj = 1)

Please notice that these maps have the same local inversions shown in the dot plots presented earlier. In this case, the log-likelihood of the genomic order is higher than the one obtained using the MDS order. For this linkage group, the genomic-based map was chosen as the best one.

Parallel map construction

Using one core by LG

Now, the mapping procedure will be applied to all linkage groups using parallelization. Although users must be encouraged to compare both MDS and genomic orders following the previous example, the genomic order will be considered as an example (remember that this step will take a long time to run):

## Performing parallel computation
my.phase.func<-function(X){
  x<-est_rf_hmm_sequential(input.seq = X$lg,
                                start.set = 10,
                                thres.twopt = 10, 
                                thres.hmm = 10,
                                extend.tail = 200,
                                info.tail = TRUE, 
                                twopt = X$tpt,
                                sub.map.size.diff.limit = 8, 
                                phase.number.limit = 10,
                                reestimate.single.ph.configuration = TRUE,
                                tol = 10e-3,
                                tol.final = 10e-4)
  return(x)
}
system.time({
  cl <- parallel::makeCluster(n.cores)
  parallel::clusterEvalQ(cl, require(mappoly))
  parallel::clusterExport(cl, "dat.dose.filt")
  MAPs <- parallel::parLapply(cl,LGS,my.phase.func)
  parallel::stopCluster(cl)
})
#>     user   system  elapsed 
#>   25.376    2.768 2960.494

A traditional linkage map plot can be generated including all linkage groups, using the function plot_map_list:

plot_map_list(MAPs, col = "ggstyle")

Following the reconstruction of LG 3 shown before, let us consider a global genotyping error of 5% to reestimate the final maps:

my.error.func<-function(X){
  x<-est_full_hmm_with_global_error(input.map = X, 
                                    error = 0.05, 
                                    tol = 10e-4, 
                                    verbose = FALSE)
  return(x)
}
system.time({
  cl <- parallel::makeCluster(n.cores)
  parallel::clusterEvalQ(cl, require(mappoly))
  parallel::clusterExport(cl, "dat.dose.filt")
  MAP.err <- parallel::parLapply(cl,MAPs,my.error.func)
  parallel::stopCluster(cl)
})
#>    user  system elapsed 
#>   0.154   0.927 251.524

Comparing both results:

all.MAPs <- NULL
for(i in 1:12) 
  all.MAPs<-c(all.MAPs, MAPs[i], MAP.err[i])
plot_map_list(map.list = all.MAPs, col = rep(c("#E69F00", "#56B4E9"), 12))

Then, the map that included modeling of genotype errors was chosen as the best one.

plot_map_list(MAP.err)

Map summary

After building two or more maps, one may want to compare summary statistics of those maps regarding the same chromosome, or even across chromosomes. A brief comparison can be done using the function summary_maps, which generates a table containing these statistics based on a list of mappoly.map objects:

knitr::kable(summary_maps(MAPs))
#> 
#> Your dataset contains removed (redundant) markers. Once finished the map, remember to add the redundant ones with the function 'update_map'.

LG	Genomic sequence	Map size (cM)	Markers/cM	Simplex	Double-simplex	Multiplex	Total	Max gap
1	1	217.16	1.49	94	18	211	323	4.56
2	2	165.23	1.35	76	46	101	223	7.63
3	3	161.61	1.55	94	41	116	251	6.58
4	4	166.37	2.01	124	30	181	335	2.66
5	5	152.55	1.49	70	19	139	228	4.06
6	6	163.44	2.07	92	46	200	338	4.37
7	7	172.33	1.74	97	51	151	299	6.94
8	8	126.43	1.61	58	27	119	204	6.79
9	9	151.72	1.6	80	44	119	243	11.08
10	10	155.39	1.03	69	17	74	160	9.3
11	11	135.92	1.55	75	29	107	211	4.65
12	12	150.08	1.07	57	25	78	160	18.51
Total	NA	1918.23	1.55	986	393	1596	2975	7.26

Genotype conditional probabilities

In order to use the genetic map in QTLpoly, one needs to obtain the conditional probability of all possible 36 genotypes along the 12 linkage groups for all individuals in the full-sib population. Let us use the function calc_genoprob_error, which similarly to est_full_hmm_with_global_error allows the inclusion of a global genotyping error:

genoprob.err <- vector("list", 12)
for(i in 1:12)
genoprob.err[[i]] <- calc_genoprob_error(input.map = MAP.err[[i]], error = 0.05)
#> Ploidy level:4
#> Number of individuals:153
#>  ..................................................
#>  ..................................................
#>  ..................................................
#>  ...
#> Ploidy level:4
#> Number of individuals:153
#>  ..................................................
#>  ..................................................
#>  ..................................................
#>  ...
#> Ploidy level:4
#> Number of individuals:153
#>  ..................................................
#>  ..................................................
#>  ..................................................
#>  ...
#> Ploidy level:4
#> Number of individuals:153
#>  ..................................................
#>  ..................................................
#>  ..................................................
#>  ...
#> Ploidy level:4
#> Number of individuals:153
#>  ..................................................
#>  ..................................................
#>  ..................................................
#>  ...
#> Ploidy level:4
#> Number of individuals:153
#>  ..................................................
#>  ..................................................
#>  ..................................................
#>  ...
#> Ploidy level:4
#> Number of individuals:153
#>  ..................................................
#>  ..................................................
#>  ..................................................
#>  ...
#> Ploidy level:4
#> Number of individuals:153
#>  ..................................................
#>  ..................................................
#>  ..................................................
#>  ...
#> Ploidy level:4
#> Number of individuals:153
#>  ..................................................
#>  ..................................................
#>  ..................................................
#>  ...
#> Ploidy level:4
#> Number of individuals:153
#>  ..................................................
#>  ..................................................
#>  ..................................................
#>  ...
#> Ploidy level:4
#> Number of individuals:153
#>  ..................................................
#>  ..................................................
#>  ..................................................
#>  ...
#> Ploidy level:4
#> Number of individuals:153
#>  ..................................................
#>  ..................................................
#>  ..................................................
#>  ...

Here, a global genotyping error of 5% was used. Each position of any object in the list genoprob.err contains two elements: an array of dimensions $36 \times number \; of \; markers \times number \; of \; individuals$ and the position of the markers in the maps in centimorgans. Let us display the results for all linkage groups in individual 1:

ind <- 1
op <- par(mfrow = c(3, 4), pty = "s", mar=c(1,1,1,1)) 
for(i in 1:12)
{
  d <- genoprob.err[[i]]$map
  image(t(genoprob.err[[i]]$probs[,,ind]),
        col=RColorBrewer::brewer.pal(n=9 , name = "YlOrRd"),
        axes=FALSE,
        xlab = "Markers",
        ylab = "",
        main = paste("LG", i))
  axis(side = 1, at = d/max(d),
       labels =rep("", length(d)), las=2)
}

par(op)

In this figure, the x-axis represents the genetic map and the y-axis represents the 36 possible genotypes in the full-sib population. The color scale varies from dark purple (high probabilities) to light yellow (low probabilities). With the conditional probabilities computed, it is possible to use the object genoprob.err alongside with phenotypic data as the input of the software QTLpoly, which is an under development software to map multiple QTLs in full-sib families of outcrossing autopolyploid species.

Obtaining individual haplotypes

Once ready, the genotypic conditional probabilities can be used to recover any individual haplotype given the map (details described in Mollinari et al. (2020)). To generate this information, one may use the function calc_homoprob to account for the probabilities of each homologous, in all map positions for each individual. For example, let us view the homologous probabilities for chromosome 1 and individual 10:

homoprobs = calc_homoprob(genoprob.err)
#> 
#> Linkage group  1 ...
#> Linkage group  2 ...
#> Linkage group  3 ...
#> Linkage group  4 ...
#> Linkage group  5 ...
#> Linkage group  6 ...
#> Linkage group  7 ...
#> Linkage group  8 ...
#> Linkage group  9 ...
#> Linkage group  10 ...
#> Linkage group  11 ...
#> Linkage group  12 ...

plot(homoprobs, lg = 1, ind = 10)

Using this graphic, it is possible to identify regions of crossing-over occurrence, represented by the inversion of probability magnitudes between homologous from the same parent. It is also possible to view all chromosomes at the same time for any individual by setting the parameter lg = "all". One may use this information to evaluate the quality of the map and repeat some processes with modifications, if necessary.

Evaluating the meiotic process

MAPpoly also handles a function to evaluate the meiotic process that guided gamete formation on the studied population. Given genotype conditional probabilities, one may want to account for homologous pairing probabilities and detect the occurrence of preferential pairing, which is possible through the function calc_prefpair_profiles:

prefpairs = calc_prefpair_profiles(genoprob.err)
#> 
#> Linkage group  1 ...
#> Linkage group  2 ...
#> Linkage group  3 ...
#> Linkage group  4 ...
#> Linkage group  5 ...
#> Linkage group  6 ...
#> Linkage group  7 ...
#> Linkage group  8 ...
#> Linkage group  9 ...
#> Linkage group  10 ...
#> Linkage group  11 ...
#> Linkage group  12 ...

The function returns an object of class mappoly.prefpair.profiles, which was saved as prefpairs. This object handles all information necessary to study pairing, such as the probability for each pairing configuration ($\psi$; see Mollinari and Garcia, 2019) inside each parent. For a more user-friendly visualization of the results, one may want to look at the plot output:

plot(prefpairs)
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'

#save.image("all.analysis.rda")

This graphic shows information about all pairing configurations and their probabilities, the proportion of bivalent/multivalent pairing, and also the p-value for preferential pairing test for all markers inside each parent.

References

Mollinari, Marcelo, and Antonio Augusto Franco Garcia. 2019. “Linkage Analysis and Haplotype Phasing in Experimental Autopolyploid Populations with High Ploidy Level Using Hidden Markov Models.” G3: Genes, Genomes, Genetics 9 (10): 3297–3314.

Mollinari, Marcelo, Bode A Olukolu, Guilherme da S Pereira, Awais Khan, Dorcus Gemenet, G Craig Yencho, and Zhao-Bang Zeng. 2020. “Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping.” G3: Genes, Genomes, Genetics 10 (1): 281–92.

Pereira, Guilherme da Silva, Dorcus C. Gemenet, Marcelo Mollinari, Bode Olukolu, Federico Diaz, Veronica Mosquera, Wolfgang Gruneberg, Awais Khan, Craig Yencho, and and Zhao-Bang Zeng. 2020. “Multiple Qtl Mapping in Autopolyploids: A Random-Effect Model Approach with Application in a Hexaploid Sweetpotato Full-Sib Population.” In Press, no. https://www.biorxiv.org/content/10.1101/622951v1.

Preedy, K. F., and C. A. Hackett. 2016. “A Rapid Marker Ordering Approach for High-Density Genetic Linkage Maps in Experimental Autotetraploid Populations Using Multidimensional Scaling.” Theor. Appl. Genet. 129 (11): 2117–32. https://doi.org/10.1007/s00122-016-2761-8.

Building a genetic linkage map of an autotetraploid population using MAPpoly

Marcelo Mollinari, Gabriel Gesteira, Guilherme Pereira, A Augusto Garcia, Zhao-Bang Zeng

2020-09-08

Introduction

`MAPpoly` installation

Loading datasets

Reading CSV files

Reading MAPpoly files

Using `read_geno`

Using `read_geno_prob`

Reading VCF files

Importing data from third party packages

Combining multiple datasets

Exploratory Analysis

Whole dataset

Marker-specific

Filtering and Quality Control

Missing data filtering

Segregation test

Two-point analysis

Assembling recombination fraction and LOD Score matrices

Assembling linkage groups

Estimating the map for a given order

Reestimating the map considering genotyping errors

Reinserting redundant markers

Ordering markers using MDS and reestimating the map

Parallel map construction

Using one core by LG

Map summary

Genotype conditional probabilities

Obtaining individual haplotypes

Evaluating the meiotic process

References

Building a genetic linkage map of an autotetraploid population using MAPpoly

Marcelo Mollinari, Gabriel Gesteira, Guilherme Pereira, A Augusto Garcia, Zhao-Bang Zeng

2020-09-08

Introduction

MAPpoly installation

Loading datasets

Reading CSV files

Reading MAPpoly files

Using read_geno

Using read_geno_prob

Reading VCF files

Importing data from third party packages

Combining multiple datasets

Exploratory Analysis

Whole dataset

Marker-specific

Filtering and Quality Control

Missing data filtering

Segregation test

Two-point analysis

Assembling recombination fraction and LOD Score matrices

Assembling linkage groups

Estimating the map for a given order

Reestimating the map considering genotyping errors

Reinserting redundant markers

Ordering markers using MDS and reestimating the map

Parallel map construction

Using one core by LG

Map summary

Genotype conditional probabilities

Obtaining individual haplotypes

Evaluating the meiotic process

References

`MAPpoly` installation

Using `read_geno`

Using `read_geno_prob`