阅读量:0
AssemblyRegion
是GATK
(Genome Analysis Toolkit)中的一个类,用于处理基因组组装区域。GATK 是一个广泛使用的工具集,用于变异检测和基因组分析。AssemblyRegion
类在 GATK 的变异调用流程中扮演着重要的角色,主要用于定义和管理变异调用的区域。
AssemblyRegion
类概述
主要功能
定义组装区域:
AssemblyRegion
用于表示一个特定的基因组区域,通常是一个变异检测的区域。这些区域可以是预先定义的(如具有高变异率的区域)或者是由算法动态决定的(如在变异调用过程中确定的区域)。
支持变异调用:
- 在变异调用过程中,
AssemblyRegion
提供了对区域内的基因组数据的访问,使得算法可以在这些区域内进行变异检测和调用。
- 在变异调用过程中,
整合数据:
AssemblyRegion
类通常会与其他数据结构(如VariantContext
、ReferenceContext
和ReadsContext
)配合使用,以整合和处理基因组数据。
主要属性和方法
以下是AssemblyRegion
类的一些常见属性和方法(注意,具体实现和方法可能因 GATK 版本而异):
位置和范围:
getContig()
:返回组装区域所在的染色体或 contig。getStart()
:返回组装区域的起始位置(1-based)。getEnd()
:返回组装区域的结束位置(1-based)。
数据访问:
getReads()
:返回与组装区域相关的读取(reads)。getReference()
:返回组装区域的参考序列。
辅助方法:
isActive()
:检查组装区域是否在变异调用过程中被激活或考虑。addRead()
:向组装区域添加读取数据。
源代码
package org.broadinstitute.hellbender.engine; import htsjdk.samtools.SAMFileHeader; import htsjdk.samtools.SAMSequenceDictionary; import htsjdk.samtools.SAMSequenceRecord; import htsjdk.samtools.reference.ReferenceSequenceFile; import htsjdk.samtools.util.Locatable; import org.broadinstitute.hellbender.exceptions.UserException; import org.broadinstitute.hellbender.utils.IntervalUtils; import org.broadinstitute.hellbender.utils.SimpleInterval; import org.broadinstitute.hellbender.utils.Utils; import org.broadinstitute.hellbender.utils.clipping.ReadClipper; import org.broadinstitute.hellbender.utils.read.GATKRead; import org.broadinstitute.hellbender.utils.read.ReadCoordinateComparator; import org.broadinstitute.hellbender.utils.read.ReadUtils; import java.util.*; import java.util.stream.Collectors; /** * Region of the genome that gets assembled by the local assembly engine. * * As AssemblyRegion is defined by two intervals -- a primary interval containing a territory for variant calling and a second, * padded, interval for assembly -- as well as the reads overlapping the padded interval. Although we do not call variants in the padded interval, * assembling over a larger territory improves calls in the primary territory. * * This concept is complicated somewhat by the fact that these intervals are mutable and the fact that the AssemblyRegion onject lives on after * assembly during local realignment during PairHMM. Here is an example of the life cycle of an AssemblyRegion: * * Suppose that the HaplotypeCaller engine finds an evidence for a het in a pileup at locus 400 -- that is, it produces * an {@code ActivityProfileState} with non-zero probability at site 400 and passes it to its {@code ActivityProfile}. * The {@code ActivityProfile} eventually produces an AssemblyRegion based on the {@code AssemblyRegionArgumentCollection} parameters. * Let's suppose that this initial region has primary span 350-450 and padded span 100 - 700. * * Next, the assembly engine assembles all reads that overlap the padded interval to find variant haplotypes and the variants * they contain. The AssemblyRegion is then trimmed down to a new primary interval bound by all assembled variants within the original primary interval * and a new padded interval. The amount of padding of the new padded interval around the var