GATK AssemblyRegion类介绍

avatar
作者
猴君
阅读量:0

AssemblyRegionGATK(Genome Analysis Toolkit)中的一个类,用于处理基因组组装区域。GATK 是一个广泛使用的工具集,用于变异检测和基因组分析。AssemblyRegion类在 GATK 的变异调用流程中扮演着重要的角色,主要用于定义和管理变异调用的区域。

AssemblyRegion类概述

主要功能
  • 定义组装区域

    • AssemblyRegion用于表示一个特定的基因组区域,通常是一个变异检测的区域。这些区域可以是预先定义的(如具有高变异率的区域)或者是由算法动态决定的(如在变异调用过程中确定的区域)。
  • 支持变异调用

    • 在变异调用过程中,AssemblyRegion提供了对区域内的基因组数据的访问,使得算法可以在这些区域内进行变异检测和调用。
  • 整合数据

    • AssemblyRegion类通常会与其他数据结构(如VariantContextReferenceContextReadsContext)配合使用,以整合和处理基因组数据。
主要属性和方法

以下是AssemblyRegion类的一些常见属性和方法(注意,具体实现和方法可能因 GATK 版本而异):

  1. 位置和范围

    • getContig():返回组装区域所在的染色体或 contig。
    • getStart():返回组装区域的起始位置(1-based)。
    • getEnd():返回组装区域的结束位置(1-based)。
  2. 数据访问

    • getReads():返回与组装区域相关的读取(reads)。
    • getReference():返回组装区域的参考序列。
  3. 辅助方法

    • isActive():检查组装区域是否在变异调用过程中被激活或考虑。
    • addRead():向组装区域添加读取数据。

源代码

package org.broadinstitute.hellbender.engine;  import htsjdk.samtools.SAMFileHeader; import htsjdk.samtools.SAMSequenceDictionary; import htsjdk.samtools.SAMSequenceRecord; import htsjdk.samtools.reference.ReferenceSequenceFile; import htsjdk.samtools.util.Locatable; import org.broadinstitute.hellbender.exceptions.UserException; import org.broadinstitute.hellbender.utils.IntervalUtils; import org.broadinstitute.hellbender.utils.SimpleInterval; import org.broadinstitute.hellbender.utils.Utils; import org.broadinstitute.hellbender.utils.clipping.ReadClipper; import org.broadinstitute.hellbender.utils.read.GATKRead; import org.broadinstitute.hellbender.utils.read.ReadCoordinateComparator; import org.broadinstitute.hellbender.utils.read.ReadUtils;  import java.util.*; import java.util.stream.Collectors;  /**  * Region of the genome that gets assembled by the local assembly engine.  *  * As AssemblyRegion is defined by two intervals -- a primary interval containing a territory for variant calling and a second,  * padded, interval for assembly -- as well as the reads overlapping the padded interval.  Although we do not call variants in the padded interval,  * assembling over a larger territory improves calls in the primary territory.  *  * This concept is complicated somewhat by the fact that these intervals are mutable and the fact that the AssemblyRegion onject lives on after  * assembly during local realignment during PairHMM.  Here is an example of the life cycle of an AssemblyRegion:  *  * Suppose that the HaplotypeCaller engine finds an evidence for a het in a pileup at locus 400 -- that is, it produces  * an {@code ActivityProfileState} with non-zero probability at site 400 and passes it to its {@code ActivityProfile}.  * The {@code ActivityProfile} eventually produces an AssemblyRegion based on the {@code AssemblyRegionArgumentCollection} parameters.  * Let's suppose that this initial region has primary span 350-450 and padded span 100 - 700.  *  * Next, the assembly engine assembles all reads that overlap the padded interval to find variant haplotypes and the variants  * they contain.  The AssemblyRegion is then trimmed down to a new primary interval bound by all assembled variants within the original primary interval  * and a new padded interval.  The amount of padding of the new padded interval around the var

广告一刻

为您即时展示最新活动产品广告消息,让您随时掌握产品活动新动态!