Multiple Alignment Track Settings
 
Multiple Alignment on 90 human genome assemblies   (All Human Pangenome - HPRC tracks)

Maximum display mode:       Reset to defaults
Select views (Help):
Multiz Alignment ▾      
 
Multiz Alignment Configuration

Species selection:  + -

  T2T  + -

T2T-CHM13v2.0

  HAPMAP  + -

NA21309.mat
NA21309.pat

  Yoruba Nigeria  + -

NA18906.mat
NA18906.pat

  Esan Nigeria  + -

HG03516.pat
HG03516.mat

  Gambian  + -

HG02622.mat
HG02622.pat
HG02717.mat
HG02630.pat
HG02630.mat
HG02717.pat
HG02572.pat
HG02572.mat
HG02886.mat
HG02886.pat
HG03540.mat
HG03540.pat
HG02818.pat
HG02818.mat
HG02723.mat
HG02723.pat

  Mende Sierra Leone  + -

HG03579.mat
HG03579.pat
HG03453.mat
HG03453.pat
HG03486.pat
HG03486.mat
HG03098.pat
HG03098.mat

  Afr Carib Barabdos  + -

HG02257.pat
HG02257.mat
HG02559.pat
HG02559.mat
HG02486.pat
HG02486.mat
HG01891.mat
HG01891.pat
HG02109.mat
HG02055.pat
HG02109.pat
HG02055.mat
HG02145.mat
HG02145.pat

  African SW USA  + -

NA20129.pat
NA20129.mat

  Puerto Rico  + -

HG01175.pat
HG01106.pat
HG01175.mat
HG00741.mat
HG00741.pat
HG01106.mat
HG01071.mat
HG00735.pat
HG01071.pat
HG00735.mat
HG01243.pat
HG01109.mat
HG01243.mat
HG01109.pat
HG00733.pat
HG00733.mat

  Peru Lima  + -

HG02148.pat
HG02148.mat
HG01952.mat
HG01952.pat
HG01928.mat
HG01928.pat
HG01978.pat
HG01978.mat

  Columbia Medellin  + -

HG01258.mat
HG01123.mat
HG01258.pat
HG01361.mat
HG01123.pat
HG01361.pat
HG01358.mat
HG01358.pat

  Han SoChina  + -

HG00438.mat
HG00673.mat
HG00621.pat
HG00673.pat
HG00438.pat
HG00621.mat

  Vietnam Kinh  + -

HG02080.pat
HG02080.mat

  Punjabo Pakis  + -

HG03492.pat
HG03492.mat

Multiple alignment base-level:
Display bases identical to reference as dots
Display chains between alignments

Codon highlighting:
  Alternate colors every bases
  Offset alternate colors by bases

List subtracks: only selected/visible    all  
 
hide
 Multiple Alignment  Multiple Alignment on 90 human genome assemblies   Data format 
Assembly: Human Dec. 2013 (GRCh38/hg38)

Description

This track shows multiple alignments of 90 human genomes generated by the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments. This method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools.

Display Conventions and Configuration

In full and pack display modes, conservation scores are displayed as a wiggle track (histogram) in which the height reflects the size of the score. The conservation wiggles can be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options.

Pairwise alignments of each species to the human genome are displayed below the conservation histogram as a grayscale density plot (in pack mode) or as a wiggle (in full mode) that indicates alignment quality. In dense display mode, conservation is shown in grayscale using darker values to indicate higher levels of overall conservation as scored by phastCons.

Checkboxes on the track configuration page allow selection of the species to include in the pairwise display. Note that excluding species from the pairwise display does not alter the the conservation score display.

To view detailed information about the alignments at a specific position, zoom the display in to 30,000 or fewer bases, then click on the alignment.

Gap Annotation

The Display chains between alignments configuration option enables display of gaps between alignment blocks in the pairwise alignments in a manner similar to the Chain track display. The following conventions are used:

  • Single line: No bases in the aligned species. Possibly due to a lineage-specific insertion between the aligned blocks in the human genome or a lineage-specific deletion between the aligned blocks in the aligning species.
  • Double line: Aligning species has one or more unalignable bases in the gap region. Possibly due to excessive evolutionary distance between species or independent indels in the region between the aligned blocks in both species.
  • Pale yellow coloring: Aligning species has Ns in the gap region. Reflects uncertainty in the relationship between the DNA of both species, due to lack of sequence in relevant portions of the aligning species.

Genomic Breaks

Discontinuities in the genomic context (chromosome, scaffold or region) of the aligned DNA in the aligning species are shown as follows:

  • Vertical blue bar: Represents a discontinuity that persists indefinitely on either side, e.g. a large region of DNA on either side of the bar comes from a different chromosome in the aligned species due to a large scale rearrangement.
  • Green square brackets: Enclose shorter alignments consisting of DNA from one genomic context in the aligned species nested inside a larger chain of alignments from a different genomic context. The alignment within the brackets may represent a short misalignment, a lineage-specific insertion of a transposon in the human genome that aligns to a paralogous copy somewhere else in the aligned species, or other similar occurrence.

Base Level

When zoomed-in to the base-level display, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the human sequence at those alignment positions relative to the longest non-human sequence. If there is sufficient space in the display, the size of the gap is shown. If the space is insufficient and the gap size is a multiple of 3, a "*" is displayed; other gap sizes are indicated by "+".

Methods

The MAF was obtained from the HPRC v1.0 minigraph-cactus HAL file (renamed to replace all "." characters in sample names with "#" using halRenameGenomes) using cactus v2.6.4 as follows.

cactus-hal2maf ./js ./hprc-v1.0-mc-grch38.h
al hprc-v1.0-mc-grch38.maf.gz --noAncestors --refGenome GRCh38
--filterGapCausingDupes --chunkSize 100000 --batchCores 96 --batchCount 1
0 --noAncestors --batchParallelTaf 32 --batchSystem slurm --logFile
hprc-v1.0-mc-grch38.maf.gz.log

zcat hprc-v1.0-mc-grch38.maf.gz | mafDuplicateFilter -m - -k | bgzip >
hprc-v1.0-mc-grch38-single-copy.maf.gz

Credits

Thank you to Glenn Hickey for providing the HAL file from the HPRC project.

References

Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas JK, Monlong J, Abel HJ et al. A draft human pangenome reference. Nature. 2023 May;617(7960):312-324. DOI: 10.1038/s41586-023-05896-x; PMID: 37165242; PMC: PMC10172123

Hickey G, Monlong J, Ebler J, Novak AM, Eizenga JM, Gao Y, Human Pangenome Reference Consortium, Marschall T, Li H, Paten B. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol. 2023 May 10;. DOI: 10.1038/s41587-023-01793-w; PMID: 37165083; PMC: PMC10638906

Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020 Nov;587(7833):246-251. DOI: 10.1038/s41586-020-2871-y; PMID: 33177663; PMC: PMC7673649

Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. Cactus: Algorithms for genome multiple sequence alignment. Genome Res. 2011 Sep;21(9):1512-28. DOI: 10.1101/gr.123356.111; PMID: 21665927; PMC: PMC3166836