cfdna bam-to-bam

Apply filtering and corrections to the fragments in a BAM file and write to a new coordinate-sorted BAM file.

To use our corrections and filters in custom, downstream analyses, you can apply them directly to a given BAM file. Filter which reads/fragments to write and add correction weights as AUX tags on the reads. The new BAM file is coordinate-sorted.

The output BAM keeps the input BAM header and chromosome order.

NOTE: This is not needed for running other cfDNAlab tools. Those tools will not automatically use the correction tags.

GC bias correction

The GC bias correction weight that would normally be multiplied with the fragment's count value (1.0) is written as the AUX tag "GC" in the read(s).

Coverage-based genomic smoothing (--coverage-scaling-factors)

The coverage-based weight that would normally be multiplied with the fragment's count value (1.0 or the corrected value) is written as the AUX tag "cw" in the read(s).

Fragment count-based genomic smoothing (--count-scaling-factors)

The fragment-count-based weight that would normally be multiplied with the fragment's count value (1.0 or the corrected value) is written as the AUX tag "nw" in the read(s).

Fragment length

The fragment length is written to the AUX tag "fl".

Definition:

Paired-end: end(reverse) - start(forward).

Unpaired where each read is a fragment: end(read) - start(read).

Always-on exclusion criteria

The following criteria always exclude a read:

The read is secondary, supplementary or duplicate. The read failed quality check.

Paired-end input only: The read or mate read is unmapped. The read is mapped to a different tid than the mate. The paired reads are not inwardly directed (we require: start(forward) <= start(reverse)).

Usage

cfdna bam-to-bam [OPTIONS] --in-bam <IN_BAM> --out-bam <OUT_BAM>

Options

-h, --help

Print help (see a summary with '-h')

Core

-i, --in-bam <IN_BAM>

Indexed, coordinate-sorted BAM input file [path]
-o, --out-bam <OUT_BAM>

Path to write coordinate-sorted BAM at [path]
--reads-are-fragments

The input has one read per fragment and the read spans the full aligned fragment (e.g. Nanopore) [flag]

Each aligned read is treated as a fragment spanning its aligned reference interval [pos, reference_end). Some commands allow expanding this to include soft clipped bases.

Cannot be combined with --require-proper-pair (when available).

Windows

--by-bed <BY_BED>

Intervals to keep overlapping fragments from [path]

Reads that are part of a fragment that overlaps a window are considered for the new BAM file.

Chromosome Selection (select max. one arg.)

--chromosomes <CHROMOSOMES>...

Names of chromosomes to process (comma-separated or repeated). E.g. 'chr1,chr2,chr3'.

When no chromosomes are specified, it defaults to chr1..chr22.

Specify "all" as the only string to use all chromosomes from the command's configured contig source.
--chromosomes-file <CHROMOSOMES_FILE>

File with chromosome names to process (one per line)

Normalization

--coverage-scaling-factors <COVERAGE_SCALING_FACTORS>

Optional path to coverage-based scaling factors [path]

.tsv file as produced by cfdna coverage-weights.
--count-scaling-factors <COUNT_SCALING_FACTORS>

Optional path to count-based scaling factors [path]

.tsv file as produced by cfdna fragment-count-weights.

Filtering

--min-fragment-length <MIN_FRAGMENT_LENGTH>

Minimum fragment length to include [integer]

[default: 30]
--max-fragment-length <MAX_FRAGMENT_LENGTH>

Maximum fragment length to include [integer]

[default: 1000]
--min-mapq <MIN_MAPQ>

Minimum mapping quality to include [integer]

Defaults to 0 to allow making filtering decisions downstream.

[default: 0]
--require-proper-pair

Only count properly paired reads [flag]

This is NOT recommended by default, as it trims the tails of the length distribution.

Note, that we only keep inward-directed fragments within the specified length range, so there's no real need for proper-pair filtering.
-b, --blacklist <BLACKLIST>...

Optional BED file(s) with blacklisted regions [path]
--blacklist-min-size <BLACKLIST_MIN_SIZE>

Minimum size of blacklist intervals to load (bp) [integer]

[default: 1]
--blacklist-strategy <BLACKLIST_STRATEGY>

The fragment positions that should overlap blacklisted regions for it to be excluded [string]

Possible values: "any", "all", "midpoint", or "proportion=<threshold>"

Example of proportion: --blacklist-strategy proportion=0.2 (no space around =)

[default: any]

GC Correction

--gc-file <GC_FILE>

Optional path to GC correction file made from the same BAM file with cfdna gc-bias [path]

The file is usually called gc_bias_correction.npz.

NOTE: Requires specifying the reference genome 2bit file as well.
--neutralize-invalid-gc

Keep fragments with unusable GC weights and weight them as 1.0 [flag]

By default, fragments are skipped when the GC correction cannot be computed or resolves to an unusable value. Set this flag to keep them instead and count them with neutral weight 1.0.
-r, --ref-2bit <REF_2BIT>

Optional 2bit reference genome file [path]

NOTE: Required for GC correction, otherwise ignored.

E.g., "hg38.2bit" from UCSC ( https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.2bit ).

Logging

--log <LOG>

Logging destination [stdout|quiet|file|file=<path>]

stdout keeps the normal run narrative on standard output.

quiet suppresses the normal run narrative and progress bars, while warnings and errors still go to stderr.

file writes the normal run narrative to an auto-generated log file under the command output directory.

file=<path> writes the normal run narrative to the exact path you provide.

[default: stdout]

GC bias correction​

Coverage-based genomic smoothing (--coverage-scaling-factors)​

Fragment count-based genomic smoothing (--count-scaling-factors)​

Fragment length​

Always-on exclusion criteria​

Usage​

Options​

Core​

Windows​

Chromosome Selection (select max. one arg.)​

Normalization​

Filtering​

GC Correction​

Logging​