Skip to main content

Extract Fragment End Motifs

Multiple studies have used fragment end- and breakpoint-motifs to study cfDNA fragmentation biology [REFS]. These motif frequencies can capture sequence preferences around where fragments start and end.

Base command

cfdna ends --help

cfdna ends \
--bam <sample>.bam \
--output-dir <sample_directory>/ends \
--output-prefix <sample_id> \
--n-threads 12 \
--blacklist <path>/hg38-blacklist.v2.bed \
--blacklist <path>/<another_blacklist>.bed \
--by-size 1000000 \
--k-inside 2 \
--k-outside 2

GC-bias correction example

cfdna ends \
--bam <sample>.bam \
--output-dir <sample_directory>/ends \
--output-prefix <sample_id> \
--n-threads 12 \
--blacklist <path>/hg38-blacklist.v2.bed \
--blacklist <path>/<another_blacklist>.bed \
--by-size 1000000 \
--k-inside 2 \
--k-outside 2 \
--gc-file <sample_directory>/gc_bias/gc_bias_correction.npz \
--ref-2bit <path>/hg38.2bit

Genomic smoothing example

cfdna ends \
--bam <sample>.bam \
--output-dir <sample_directory>/ends \
--output-prefix <sample_id> \
--n-threads 12 \
--blacklist <path>/hg38-blacklist.v2.bed \
--blacklist <path>/<another_blacklist>.bed \
--by-size 1000000 \
--k-inside 2 \
--k-outside 2 \
--scaling-factors <sample_directory>/coverage_weights/<sample_id>.scaling_factors.tsv

GC-bias correction + genomic smoothing

cfdna ends \
--bam <sample>.bam \
--output-dir <sample_directory>/ends \
--output-prefix <sample_id> \
--n-threads 12 \
--blacklist <path>/hg38-blacklist.v2.bed \
--blacklist <path>/<another_blacklist>.bed \
--by-size 1000000 \
--k-inside 2 \
--k-outside 2 \
--gc-file <sample_directory>/gc_bias/gc_bias_correction.npz \
--ref-2bit <path>/hg38.2bit \
--scaling-factors <sample_directory>/coverage_weights/<sample_id>.scaling_factors.tsv

Handling clipped ends

The default ends behavior is conservative around soft clipping. With --clip-strategy skip, motifs are discarded when the relevant fragment end is soft-clipped.

If you want to keep using the aligned fragment boundaries, you can switch to --clip-strategy aligned.

cfdna ends \
... \
--clip-strategy aligned

The raw-aligned-boundary and raw-shifted-boundary modes are stronger analysis choices. Use them only when you specifically want raw read bases, including soft-clipped sequence, to contribute to the motif.