cfdna fcoverage

Count positional fragment coverage across the genome.

In paired-end mode, only fragments with both reads present are considered. By default, the entire fragment span is counted, except for deletions and skipped regions that are not covered by the other read.

Fragment span definition

Paired-end: [forward.pos, reverse.reference_end), the reference span from the first aligned position on the forward read to the last aligned position on the reverse read.

Unpaired where each read is a fragment: [read.pos, read.reference_end), the reference span from the first to the last aligned position on the read.

Windowing

When specifying windows (--by-bed, --by-grouped-bed, or --by-size), one of the following outputs is possible:

Get the average (default) or total coverage per window. Average is NaN when no positions are eligible, for example when the whole window is blacklisted.
Get summary statistics per window or grouped row. Derived statistics with no eligible positions are NaN.
Get the positional coverage for the included windows only (--by-bed only). Excludes all positions that do not overlap a window from the output. Choose between: 1. Indexed: Adds the original window index as an output column and keeps duplicate positions. 2. Unique: Overlapping windows are merged to avoid duplicate positions.

Without windowing, positional coverage are output for the selected chromosomes.

Positional output and tiles

Positional outputs are written tile by tile to keep memory use low. This means coverage segments can be split at genomic tile boundaries even when the coverage value stays the same. The covered positions and coverage values stay the same, but the bedGraph rows may be shorter than they would be in a single-pass whole-chromosome run.

Reduced outputs like per-window average and total are merged across tiles, so tile boundaries should not affect their final values.

Blacklisting

Blacklisted positions are excluded from positional and aggregate coverage outputs.

GC correction

Reduce the global GC bias (common technically-induced bias) in the coverage by weighting the contribution of fragments. Two options:

--gc-file: Weight the contribution of each fragment by its length and GC content using a precomputed correction matrix from cfdna gc-bias. The GC correction matrix should be calculated from the same BAM file, as the bias is sample-specific.

--gc-tag: Weight the contribution of each fragment by a weight saved as an aux tag in the BAM reads. Allows using external GC packages like GCParagon and GCfix (both use the tag "GC").

Temporary files

We write temporary files to a <output-dir>/tmp.<output-prefix>.<random> directory to reduce memory. When no output prefix is given, the directory becomes <output-dir>/tmp.<random>. This directory is deleted at the end of the run. If the software is disrupted, the directory may be left behind.

Always-on exclusion criteria

The following criteria always exclude a read:

The read is secondary, supplementary or duplicate. The read failed quality check.

Paired-end input only: The read or mate read is unmapped. The read is mapped to a different tid than the mate. The paired reads are not inwardly directed (we require: start(forward) <= start(reverse)).

Usage

cfdna fcoverage [OPTIONS] --bam <BAM> --output-dir <OUTPUT_DIR>

Options

-h, --help

Print help (see a summary with '-h')

Core

-i, --bam <BAM>

Indexed, coordinate-sorted BAM input file [path]

Can be either paired-end or unpaired (set --reads-are-fragments). Unpaired assumes the reads span their fragments exactly (so read size is fragment size).
-o, --output-dir <OUTPUT_DIR>

Output directory for results [path]
-t, --n-threads <N_THREADS>

Number of threads to use (increases RAM usage) [integer]

Defaults to the number of available CPU cores (-1).

[default: auto]
--reads-are-fragments

The input has one read per fragment and the read spans the full aligned fragment (e.g. Nanopore) [flag]

Each aligned read is treated as a fragment spanning its aligned reference interval [pos, reference_end). Some commands allow expanding this to include soft clipped bases.

Cannot be combined with --require-proper-pair (when available).
--normalize-by-length-mode [<NORMALIZE_BY_LENGTH_MODE>]

Divide the contribution of each fragment by the number of countable bases [string/flag]

By default, we count each fragment as 1.0 in each covered position (before correction/scaling). That weights longer fragments higher than shorter fragments in the overall mass as they are counted in more positions. If we want each fragment to contribute the same mass, we can divide the per-position 1.0 weight by the number of countable positions.

Interpretation: Per-base fragment support after normalizing each fragment to a total weight of 1.0 before correction/scaling. For --per-window total this approximates fragment counts (in sufficiently large windows). For --per-window average this approximates fragment-count density, i.e. fragment counts divided by window length.

Modes:
- unit-mass: Specifying --normalize-by-length or --normalize-by-length=unit-mass uses the described "all fragments contribute a mass of 1" mode. TIP: In this mode, we suggest setting --decimals 3.
- restore-mean: Specifying --normalize-by-length=restore-mean restores the global mean by multiplying the final output by the observed mean normalization length (the countable bases) after counting in unit-mass space.
This setting is reflected in the output filenames: length_normalized for unit-mass, length_normalized.restored_mean for restore-mean.

Blacklisted positions still count toward the normalization denominator to avoid large values around blacklisted regions (edge effects).

[default: off]
-x, --output-prefix <OUTPUT_PREFIX>

Optional prefix for output files (e.g., a sample name) [string]

Leave empty to write filenames without a leading prefix.

E.g., specify to enable writing to the same output directory from multiple calls to this software.

Examples produce files like: <prefix>.fcoverage.per_position.bedgraph.zst, <prefix>.fcoverage.per_position_per_window.tsv.zst, <prefix>.fcoverage.average.tsv.zst, <prefix>.fcoverage.total.tsv.zst, or <prefix>.fcoverage.summary_stats.tsv.zst.
--decimals <DECIMALS>

Decimals to round coverage to when writing [integer]

NOTE: When floating point precision is not needed, all coverages are integers, we remove all decimal points!

[default: 2]
--keep-zero-runs

Output zero-coverage runs in positional coverage outputs [flag]

By default, only covered positions are written to the output.
--tile-size <TILE_SIZE>

Size of tiles to parallelize over [integer]

Chromosomes are processed in tiles of this size to reduce memory usage.

[default: 10000000]
--per-window <PER_WINDOW>

What to return per window [string]

Possible values:
- "average": Get the average coverage per window (default). Returns NaN when the window has no eligible positions after masking.
- "total": Get the total coverage per window.
- "summary-stats": Get raw and derived coverage summary statistics per window. Average, variance, standard deviation, CV, and covered fraction are NaN when the window has no eligible positions after masking.
For --by-bed only:
- "unique-positions": Get the positional coverage for the included windows only. Overlapping windows are merged to avoid duplicate positions. Excludes all positions that do not overlap a window from the output.
- "indexed-positions": Get the positional coverage for the included windows only. Adds the original window index as an output column and keeps duplicate positions. Excludes all positions that do not overlap a window from the output. NOTE: The output is first sorted by chromosome, tile index, and window start. Then the coverage segments are sorted by start- and end coordinates. Window indices may thus not be contiguous. Depending on your needs, sort downstream.
For --by-grouped-bed only, three further "-on-unique-bases" options, where we merge overlapping or touching windows within each group windows before calculating the statistics:
- "average-on-unique-bases": Get the average coverage across merged within-group windows.
- "total-on-unique-bases": Get the total coverage across merged within-group windows.
- "summary-stats-on-unique-bases": Get grouped summary statistics across merged within-group windows.
Without windowing, only positional coverage output is supported.

[default: average]
--ignore-gap

Ignore inter-mate gap [flag]

Disable counting of the gap between reads (i.e., [forward.end, reverse.start)) when the two reads do not overlap.

Cannot be used with --reads-are-fragments.

Windows (select max. one arg.)

--by-size <BY_SIZE>

Window definition: a fixed window size [integer]

When no windowing is specified, the default is one global window.
--by-bed <BY_BED>

Window definition: a BED file of windows [path]
--by-grouped-bed <BY_GROUPED_BED>

Window definition: a BED file of grouped windows [path]

Requires a fourth BED column with the group name.

Windows with the same group name are aggregated together in the final output. The exact per-group output shape depends on the command.

Chromosome Selection (select max. one arg.)

--chromosomes <CHROMOSOMES>...

Names of chromosomes to process (comma-separated or repeated). E.g. 'chr1,chr2,chr3'.

When no chromosomes are specified, it defaults to chr1..chr22.

Specify "all" as the only string to use all chromosomes from the command's configured contig source.
--chromosomes-file <CHROMOSOMES_FILE>

File with chromosome names to process (one per line)

Normalization

--scaling-factors <SCALING_FACTORS>

Optional path to non-negative scaling factors for normalizing/smoothing the genome [path]

.tsv file as produced by cfdna fragment-count-weights or cfdna coverage-weights containing a scaling factor to multiply by per scaling-bin.

Files may start with comment metadata lines from cfdna coverage-weights/fragment-count-weights, such as # gc_mode=corrected_tag.

The scaling-bin-overlapping parts of the fragments are counted as the scaling factor of the bin.

File Requirements

The TSV file must have a header. Column names are matched case-insensitively.

Required columns: chromosome, start, end, scaling_factor.

Coordinates are 0-based, half-open [start, end).

Scaling factors must be finite and non-negative.

Bins are filtered to the provided chromosomes.

For every chromosome in chromosomes, bins must:
- start at the 0-coordinate
- be perfectly contiguous (no gaps, no overlaps)
- end exactly at that chromosome's length

Filtering

--min-fragment-length <MIN_FRAGMENT_LENGTH>

Minimum fragment length to include [integer]

[default: 30]
--max-fragment-length <MAX_FRAGMENT_LENGTH>

Maximum fragment length to include [integer]

[default: 1000]
--min-mapq <MIN_MAPQ>

Minimum mapping quality to include [integer]

[default: 30]
--require-proper-pair

Only count properly paired reads [flag]

Not recommended, as we already select only inward-directed read pairs within fragment length bounds.
-b, --blacklist <BLACKLIST>...

Optional BED file(s) with blacklisted regions [path]

GC Correction (select max. one source)

--gc-file <GC_FILE>

Optional path to GC correction file made from the same BAM file with cfdna gc-bias [path]

The file is usually called gc_bias_correction.npz.

NOTE: Requires specifying the reference genome 2bit file as well.
--gc-tag <GC_TAG>

Optional aux tag to get GC weight from when using external GC correction packages [string]

The tag name must be exactly two ASCII characters matching the SAM/BAM AUX tag format: first character is a letter, second character is a letter or digit.

Packages like GCParagon and GCfix allow saving GC weights directly to the reads in a BAM file. They often assign a "GC" aux tag.

The average per-read weight is used to count the fragment. When any of the reads have a zero-weight, the fragment gets a zero-weight. If only one mate has a usable tag, that single usable weight is reused for the fragment.
--neutralize-invalid-gc

Keep fragments with unusable GC weights and weight them as 1.0 [flag]

By default, fragments are skipped when the GC correction is missing, cannot be computed, or resolves to an unusable value. Set this flag to keep them instead and count them with neutral weight 1.0.

GC Correction

-r, --ref-2bit <REF_2BIT>

Optional 2bit reference genome file [path]

NOTE: Required for GC correction, otherwise ignored.

E.g., "hg38.2bit" from UCSC ( https://hgdownload.cse.ucsc.edu/goldenpath/hg38/bigZips/hg38.2bit ).

Logging

--log <LOG>

Logging destination [stdout|quiet|file|file=<path>]

stdout keeps the normal run narrative on standard output.

quiet suppresses the normal run narrative and progress bars, while warnings and errors still go to stderr.

file writes the normal run narrative to an auto-generated log file under the command output directory.

file=<path> writes the normal run narrative to the exact path you provide.

[default: stdout]

Fragment span definition​

Windowing​

Positional output and tiles​

Blacklisting​

GC correction​

Temporary files​

Always-on exclusion criteria​

Usage​

Options​

Core​

Windows (select max. one arg.)​

Chromosome Selection (select max. one arg.)​

Normalization​

File Requirements​

Filtering​

GC Correction (select max. one source)​

GC Correction​

Logging​