Correct GC-Bias
Fragmentomics features are vulnerable to biases from various sample-handling and sequencing processes, such as PCR amplification. cfDNAlab commands allow correcting the commonly observed GC-bias.
This requires only a few steps.
Step 1. Build reference GC bias once per assembly
Calculate the expected GC bias in the reference genome assembly (for example hg38). This output can be reused for all samples aligned to that assembly.
cfdna ref-gc-bias --help
# Run once per assembly
cfdna ref-gc-bias \
--ref-2bit <path>/hg38.2bit \
--output-dir <ref_gc_directory> \
--output-prefix hg38 \
--n-threads 12 \
--blacklist <path>/hg38-blacklist.v2.bed \
--blacklist <path>/<another_blacklist>.bed
Step 2. Build sample-specific GC correction
cfdna gc-bias --help
cfdna gc-bias \
--bam <sample>.bam \
--output-dir <sample_directory>/gc_bias \
--n-threads 12 \
--ref-2bit <path>/hg38.2bit \
--ref-gc-file <ref_gc_directory>/hg38.ref_gc_package.npz \
--blacklist <path>/hg38-blacklist.v2.bed \
--blacklist <path>/<another_blacklist>.bed
Use the same blacklist inputs as in step 1.
Step 3. Apply correction in feature extraction commands
cfdna fcoverage \
--bam <sample>.bam \
... \
--gc-file <sample_directory>/gc_bias/gc_bias_correction.npz \
--ref-2bit <path>/hg38.2bit
The same pattern works for lengths and midpoints.
Alternative: read GC weights from BAM aux tags
If you prefer a different or custom GC-bias tool, the feature extraction commands also accept reading a GC weight from a BAM aux tag.
cfdna fcoverage \
--bam <sample>.bam \
... \
--gc-tag 'GC'