Interface¶
svviz2 requires three types of files as input:
- one or more sorted, indexed BAM (or CRAM) files containing read data to be assessed
- a reference genome, including a fasta file that’s been indexed using bwa
- a VCF file containing variants of interest. See below for more notes about this file.
Note about CRAM files svviz2 uses pysam under the hood to access read data from BAM files. In theory pysam can read from CRAM files, but I have found its implementations to be somewhat less reliable. You’re probably best off ensuring you have the latest version of pysam installed if you’re working with CRAM files!
Representing Structural Variants in VCF files¶
Please note that the input VCF file is parsed using htslib and so the formatting must conform quite closely to the spec.
The following types of events are currently recognized by svviz2. Please submit an issue if you would like to request support for additional event types or formats.
Sequence-defined
An arbitrary event can be specified by including the exact reference and alternate sequence in the ref and alt fields of the event. These two fields must thus include only the letters A, C, G and T. For example, a deletion of 6 nucleotides, replacing them with an insertion of 8 nucleotides would be specified as “CAGGTCA” (ref) and “CTACGAAGT” (alt) where the pos coordinate of the event points to the position of the C (upstream of the deletion and insertion). Note that any arbitrary length sequence can be placed in these fields, so an insertion of 8kb (for example, a LINE element) could include the full 8,000 base sequence of the inserted LINE in the alt field of the variant.
Deletion
Deletions can be specified either by including the exact reference and alternate sequence in the ref and alt columns of the VCF file, or by specifying SVTYPE=DEL;END=<end coordinate>
in the INFO field of the variant record; please note that the end coordinate is the last genomic position that is deleted (ie, inclusive)
Insertion
Insertions may only be specified by including the exact reference and alternate sequence in the ref and alt fields. Only sequence-resolved insertions are supported!
Breakend
Any complex type of event such as a translocation can be represented as a pair of “breakends” which together specify the position and orientation of the two halves of the event. Please see the VCF spec for a detailed description of BND-type complex structural variants.
Inversion
Similar to deletions, inversions can be specified by including SVTYPE=INV;END=<end coordinate>
in the INFO field of a variant. Remember that end coordinates are inclusive.
Full command interface¶
A brief summary of all of svviz2’s arguments and options can be obtained by running svviz2
without any arguments at the command line:
ssw library not found
usage: svviz2 [options] --ref REF --variants VARIANTS BAM [BAM2 ...]
svviz2 version 2.0a3
optional arguments:
-h, --help show this help message and exit
Required arguments:
bam sorted, indexed bam file containing reads of interest to plot; can be
specified multiple times to load multiple samples
--ref REF, -r REF reference fasta file (a .faidx index file will be created if it doesn't
exist so you need write permissions for this directory)
--variants VARIANTS, -V VARIANTS
the variants to analyze, in vcf or bcf format (vcf files may be
compressed with gzip)
Optional arguments:
--outdir OUTDIR, -o OUTDIR
output directory for visualizations, summaries, etc (default: current
working directory)
--format FORMAT format for output visualizations; must be one of pdf, png or svg
(default: pdf, or svg if no suitable converter is found)
--savereads output the read realignments against the appropriate alt or ref allele
(default: false)
--min-mapq MIN_MAPQ only reads with mapq>=MIN_MAPQ will be analyzed; when analyzing
paired-end data, at least one read end must be near the breakpoints
with this mapq (default:0)
--align-distance ALIGN_DISTANCE
sequence upstream and downstream of breakpoints to include when
performing re-alignment (default: infer from data)
--batch-size BATCH_SIZE
Number of reads to analyze at once; larger batch-size values may run
more quickly but will require more memory (default=10000)
--downsample DOWNSAMPLE
Ensure the total number of reads per event per sample does not exceed
this number by downsampling (default: infinity)
--aligner ALIGNER The aligner to use for realigning reads; either ssw (smith-waterman) or
bwa (default=bwa)
--only-realign-locally
Only when using bwa as the aligner backend, when this option is enabled,
reads will only be aligned locally around the breakpoints and not also
against the full reference genome (default: False)
--fast More aggressively skip reads that are unlikely to overlap
the breakpoints (default: false)
--first-variant FIRST_VARIANT
Skip all variants before this variant; counting starts with first
variant in input VCF as 0 (default: 0)
--last-variant LAST_VARIANT
Skip all variants after this variant; counting starts with first
variant in input VCF as 0 (default: end of vcf)
--render-only
--no-render
--dotplots-only
--no-dotplots
--report-only
--no-report
--only-plot-context ONLY_PLOT_CONTEXT
Only show this many nucleotides before the first breakpoint, and the
last breakpoint in each region (default: show as much context as needed
to show all reads fully)
--also-plot-context ALSO_PLOT_CONTEXT
Generates two plots per event, one using the default settings, and one
generatedby zooming in on the breakpoints as per the
--only-plot-context option