Package vcf

Class VcfRec

java.lang.Object
vcf.VcfRec
All Implemented Interfaces:
IntArray, DuplicatesGTRec, GTRec, MarkerContainer

public final class VcfRec extends Object implements GTRec

Class VcfRec represents a VCF record. If one allele in a diploid genotype is missing, then both alleles are set to missing.

Instances of class VcfRec are immutable.

  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final String
    The VCF FORMAT code for log-scaled genotype likelihood data: "GL".
    static final String
    The VCF FORMAT code for phred-scaled genotype likelihood data: "PL".
  • Method Summary

    Modifier and Type
    Method
    Description
    int
    allele1(int sample)
    Returns the first allele for the specified sample or -1 if the allele is missing.
    int
    allele2(int sample)
    Returns the second allele for the specified sample or -1 if the allele is missing.
    int[]
    Returns an array of length this.size() whose j-th element is equal to this.allele(j}
    Returns the FILTER field.
    Returns the FORMAT field.
    formatData(String formatCode)
    Returns an array of length this.size() containing the specified FORMAT subfield data for each sample.
    int
    formatIndex(String formatCode)
    Returns the index of the specified FORMAT subfield if the specified subfield is defined for this VCF record, and returns -1 otherwise.
    formatSubfield(int subfieldIndex)
    Returns the specified FORMAT subfield.
    static VcfRec
    fromGL(VcfHeader vcfHeader, String vcfRecord, float maxLR)
    Constructs and returns a new VcfRec instance from a VCF record and its GL or PL format subfield data.
    static VcfRec
    fromGT(VcfHeader vcfHeader, String vcfRecord)
    Constructs and returns a new VcfRec instance from a VCF record and its GT format subfield data
    static VcfRec
    fromGTGL(VcfHeader vcfHeader, String vcfRecord, float maxLR)
    Constructs and returns a new VcfRec instance from a VCF record and its GT, GL, and PL format subfield data.
    int
    get(int hap)
    Returns the specified allele for the specified haplotype or -1 if the allele is missing.
    float
    gl(int sample, int allele1, int allele2)
    Returns the probability of the observed data for the specified sample if the specified pair of ordered alleles is the true ordered genotype.
    static int
    gtIndex(int a1, int a2)
    Returns the VCF genotype index for the specified pair of alleles.
    boolean
    hasFormat(String formatCode)
    Returns true if the specified FORMAT subfield is present, and returns false otherwise.
    Returns the INFO field.
    boolean
    Returns true if every genotype for each sample is a phased, non-missing genotype, and returns false otherwise.
    boolean
    isPhased(int sample)
    Returns true if the genotype for the specified sample has non-missing alleles and is either haploid or diploid with a phased allele separator, and returns false otherwise.
    Returns the marker.
    int
    Returns the number of FORMAT subfields.
    Returns the QUAL field.
    sampleData(int sample)
    Returns the data for the specified sample.
    sampleData(int sample, int subfieldIndex)
    Returns the specified data for the specified sample.
    sampleData(int sample, String formatCode)
    Returns the specified data for the specified sample.
    Returns the list of samples.
    int
    Returns the number of haplotypes.
    Returns the VCF record.
    Returns the VCF meta-information lines and the VCF header line.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Field Details

    • GL_FORMAT

      public static final String GL_FORMAT
      The VCF FORMAT code for log-scaled genotype likelihood data: "GL".
      See Also:
    • PL_FORMAT

      public static final String PL_FORMAT
      The VCF FORMAT code for phred-scaled genotype likelihood data: "PL".
      See Also:
  • Method Details

    • gtIndex

      public static int gtIndex(int a1, int a2)
      Returns the VCF genotype index for the specified pair of alleles.
      Parameters:
      a1 - the first allele
      a2 - the second allele
      Returns:
      the VCF genotype index for the specified pair of alleles
      Throws:
      IllegalArgumentException - if a1 < 0 || a2 < 0
    • fromGT

      public static VcfRec fromGT(VcfHeader vcfHeader, String vcfRecord)
      Constructs and returns a new VcfRec instance from a VCF record and its GT format subfield data
      Parameters:
      vcfHeader - meta-information lines and header line for the specified VCF record.
      vcfRecord - a VCF record with a GL format field corresponding to the specified vcfHeader object
      Returns:
      a new VcfRec instance
      Throws:
      IllegalArgumentException - if the VCF record does not have a GT format field
      IllegalArgumentException - if a VCF record format error is detected
      IllegalArgumentException - if there are not vcfHeader.nHeaderFields() tab-delimited fields in the specified VCF record
      NullPointerException - if vcfHeader == null || vcfRecord == null
    • fromGL

      public static VcfRec fromGL(VcfHeader vcfHeader, String vcfRecord, float maxLR)
      Constructs and returns a new VcfRec instance from a VCF record and its GL or PL format subfield data. If both GL and PL format subfields are present, the GL format field will be used. If the maximum normalized genotype likelihood is 1.0 for a sample, then any other genotype likelihood for the sample that is less than lrThreshold is set to 0.
      Parameters:
      vcfHeader - meta-information lines and header line for the specified VCF record
      vcfRecord - a VCF record with a GL format field corresponding to the specified vcfHeader object
      maxLR - the maximum likelihood ratio
      Returns:
      a new VcfRec instance
      Throws:
      IllegalArgumentException - if the VCF record does not have a GL format field
      IllegalArgumentException - if a VCF record format error is detected
      IllegalArgumentException - if there are not vcfHeader.nHeaderFields() tab-delimited fields in the specified VCF record
      NullPointerException - if vcfHeader == null || vcfRecord == null
    • fromGTGL

      public static VcfRec fromGTGL(VcfHeader vcfHeader, String vcfRecord, float maxLR)
      Constructs and returns a new VcfRec instance from a VCF record and its GT, GL, and PL format subfield data. If the GT format subfield is present and non-missing, the GT format subfield is used to determine genotype likelihoods. Otherwise the GL or PL format subfield is used to determine genotype likelihoods. If both the GL and PL format subfields are present, only the GL format subfield will be used. If the maximum normalized genotype likelihood is 1.0 for a sample, then any other genotype likelihood for the sample that is less than lrThreshold is set to 0.
      Parameters:
      vcfHeader - meta-information lines and header line for the specified VCF record
      vcfRecord - a VCF record with a GT, a GL or a PL format field corresponding to the specified vcfHeader object
      maxLR - the maximum likelihood ratio
      Returns:
      a new VcfRec
      Throws:
      IllegalArgumentException - if the VCF record does not have a GT, GL, or PL format field
      IllegalArgumentException - if a VCF record format error is detected
      IllegalArgumentException - if there are not vcfHeader.nHeaderFields() tab-delimited fields in the specified VCF record
      NullPointerException - if vcfHeader == null || vcfRecord == null
    • qual

      public String qual()
      Returns the QUAL field.
      Returns:
      the QUAL field
    • filter

      public String filter()
      Returns the FILTER field.
      Returns:
      the FILTER field
    • info

      public String info()
      Returns the INFO field.
      Returns:
      the INFO field
    • format

      public String format()
      Returns the FORMAT field. Returns the empty string ("") if the FORMAT field is missing.
      Returns:
      the FORMAT field
    • nFormatSubfields

      public int nFormatSubfields()
      Returns the number of FORMAT subfields.
      Returns:
      the number of FORMAT subfields
    • formatSubfield

      public String formatSubfield(int subfieldIndex)
      Returns the specified FORMAT subfield.
      Parameters:
      subfieldIndex - a FORMAT subfield index
      Returns:
      the specified FORMAT subfield
      Throws:
      IndexOutOfBoundsException - if subfieldIndex < 0 || subfieldIndex >= this.nFormatSubfields()
    • hasFormat

      public boolean hasFormat(String formatCode)
      Returns true if the specified FORMAT subfield is present, and returns false otherwise.
      Parameters:
      formatCode - a FORMAT subfield code
      Returns:
      true if the specified FORMAT subfield is present
    • formatIndex

      public int formatIndex(String formatCode)
      Returns the index of the specified FORMAT subfield if the specified subfield is defined for this VCF record, and returns -1 otherwise.
      Parameters:
      formatCode - the format subfield code
      Returns:
      the index of the specified FORMAT subfield if the specified subfield is defined for this VCF record, and -1 otherwise
    • sampleData

      public String sampleData(int sample)
      Returns the data for the specified sample.
      Parameters:
      sample - a sample index
      Returns:
      the data for the specified sample
      Throws:
      IndexOutOfBoundsException - if sample < 0 || sample >= this.size()
    • sampleData

      public String sampleData(int sample, String formatCode)
      Returns the specified data for the specified sample.
      Parameters:
      sample - a sample index
      formatCode - a FORMAT subfield code
      Returns:
      the specified data for the specified sample
      Throws:
      IllegalArgumentException - if this.hasFormat(formatCode)==false
      IndexOutOfBoundsException - if sample < 0 || sample >= this.size()
    • sampleData

      public String sampleData(int sample, int subfieldIndex)
      Returns the specified data for the specified sample.
      Parameters:
      sample - a sample index
      subfieldIndex - a FORMAT subfield index
      Returns:
      the specified data for the specified sample
      Throws:
      IndexOutOfBoundsException - if field < 0 || field >= this.nFormatSubfields()
      IndexOutOfBoundsException - if sample < 0 || sample >= this.size()
    • formatData

      public String[] formatData(String formatCode)
      Returns an array of length this.size() containing the specified FORMAT subfield data for each sample. The k-th element of the array is the specified FORMAT subfield data for the k-th sample.
      Parameters:
      formatCode - a format subfield code
      Returns:
      an array of length this.size() containing the specified FORMAT subfield data for each sample
      Throws:
      IllegalArgumentException - if this.hasFormat(formatCode) == false
    • samples

      public Samples samples()
      Description copied from interface: GTRec
      Returns the list of samples.
      Specified by:
      samples in interface GTRec
      Returns:
      the list of samples
    • vcfHeader

      public VcfHeader vcfHeader()
      Returns the VCF meta-information lines and the VCF header line.
      Returns:
      the VCF meta-information lines and the VCF header line
    • marker

      public Marker marker()
      Description copied from interface: MarkerContainer
      Returns the marker.
      Specified by:
      marker in interface MarkerContainer
      Returns:
      the marker
    • allele1

      public int allele1(int sample)
      Description copied from interface: DuplicatesGTRec
      Returns the first allele for the specified sample or -1 if the allele is missing. The two alleles for a sample are arbitrarily ordered if this.unphased(marker, sample) == false.
      Specified by:
      allele1 in interface DuplicatesGTRec
      Parameters:
      sample - a sample index
      Returns:
      the first allele for the specified sample
    • allele2

      public int allele2(int sample)
      Description copied from interface: DuplicatesGTRec
      Returns the second allele for the specified sample or -1 if the allele is missing. The two alleles for a sample are arbitrarily ordered if this.unphased(marker, sample) == false.
      Specified by:
      allele2 in interface DuplicatesGTRec
      Parameters:
      sample - a sample index
      Returns:
      the second allele for the specified sample
    • get

      public int get(int hap)
      Description copied from interface: DuplicatesGTRec
      Returns the specified allele for the specified haplotype or -1 if the allele is missing. The two alleles for a sample at a marker are arbitrarily ordered if this.unphased(marker, hap/2) == false.
      Specified by:
      get in interface DuplicatesGTRec
      Specified by:
      get in interface IntArray
      Parameters:
      hap - a haplotype index
      Returns:
      the specified allele for the specified sample
    • alleles

      public int[] alleles()
      Description copied from interface: DuplicatesGTRec
      Returns an array of length this.size() whose j-th element is equal to this.allele(j}
      Specified by:
      alleles in interface DuplicatesGTRec
      Returns:
      an array of length this.size() whose j-th element is equal to this.allele(j}
    • isPhased

      public boolean isPhased(int sample)
      Description copied from interface: DuplicatesGTRec
      Returns true if the genotype for the specified sample has non-missing alleles and is either haploid or diploid with a phased allele separator, and returns false otherwise.
      Specified by:
      isPhased in interface DuplicatesGTRec
      Parameters:
      sample - a sample index
      Returns:
      true if the genotype for the specified sample is a phased, nonmissing genotype
    • isPhased

      public boolean isPhased()
      Description copied from interface: DuplicatesGTRec
      Returns true if every genotype for each sample is a phased, non-missing genotype, and returns false otherwise.
      Specified by:
      isPhased in interface DuplicatesGTRec
      Returns:
      true if the genotype for each sample is a phased, non-missing genotype
    • gl

      public float gl(int sample, int allele1, int allele2)
      Returns the probability of the observed data for the specified sample if the specified pair of ordered alleles is the true ordered genotype. Returns 1.0f if the corresponding genotype determined by the isPhased(), allele1(), and allele2() methods is consistent with the specified ordered genotype, and returns 0.0f otherwise.
      Parameters:
      sample - the sample index
      allele1 - the first allele index
      allele2 - the second allele index
      Returns:
      the probability of the observed data for the specified sample if the specified pair of ordered alleles is the true ordered genotype.
      Throws:
      IndexOutOfBoundsException - if samples < 0 || samples >= this.size()
      IndexOutOfBoundsException - if allele1 < 0 || allele1 >= this.marker().nAlleles()
      IndexOutOfBoundsException - if allele2 < 0 || allele2 >= this.marker().nAlleles()
    • size

      public int size()
      Description copied from interface: DuplicatesGTRec
      Returns the number of haplotypes.
      Specified by:
      size in interface DuplicatesGTRec
      Specified by:
      size in interface IntArray
      Returns:
      the number of haplotypes
    • toString

      public String toString()
      Returns the VCF record.
      Overrides:
      toString in class Object
      Returns:
      the VCF record