Python API

Generated from public symbols and docstrings in py-cfdnalab/src/cfdnalab.

Midpoint Profiles

Load midpoint profile Zarr stores and extract count arrays or data frames by group, fragment length bin, and midpoint position.

Symbol	Type	Summary
`read_midpoints`	function	Open a cfDNAlab midpoint profile Zarr store.
`MidpointProfiles`	class	Helper for loading and slicing midpoint profile Zarr output.

`read_midpoints`

read_midpoints(path: pathlib.Path | str) -> MidpointProfiles

Open a cfDNAlab midpoint profile Zarr store.

Parameters

path: Path to a .midpoint_profiles.zarr directory.

Returns

MidpointProfiles: Loaded midpoint profile helper.

`MidpointProfiles`

Helper for loading and slicing midpoint profile Zarr output.

Midpoint profiles store counts as (group, length_bin, position). The class exposes metadata as pandas data frames and count slices as NumPy arrays.

Public Methods

Method	Summary
`group_idx`	Find the midpoint group index for a group name.
`length_bin_idx`	Find the length-bin index whose interval contains a fragment length.
`group_metadata`	Return midpoint group labels and eligible interval counts.
`counts_array`	Return midpoint counts as a dense NumPy array.
`length_bins`	Get the fragment length bins available in this midpoint-profile output.
`positions`	Get the midpoint position bins available in this output.
`data_frame`	Create a pandas DataFrame of midpoint profile counts.

`MidpointProfiles.group_idx`

MidpointProfiles.group_idx

MidpointProfiles.group_idx(group_name: str) -> int

Find the midpoint group index for a group name.

Parameters

group_name: Group name to resolve.

Returns

int: Group index.

`MidpointProfiles.length_bin_idx`

MidpointProfiles.length_bin_idx

MidpointProfiles.length_bin_idx(length: int) -> int

Find the length-bin index whose interval contains a fragment length.

Parameters

length: Fragment length in bp.

Returns

int: Length-bin index.

`MidpointProfiles.group_metadata`

MidpointProfiles.group_metadata

MidpointProfiles.group_metadata() -> pd.DataFrame

Return midpoint group labels and eligible interval counts.

Returns

pandas.DataFrame: Columns are group_idx, group_name, and eligible_intervals.

`MidpointProfiles.counts_array`

MidpointProfiles.counts_array

MidpointProfiles.counts_array(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> np.ndarray

Return midpoint counts as a dense NumPy array.

The result keeps the midpoint count dimensions in the same order as the file: group, length bin, then position. Scalar selectors keep their axis as length one, so the shape is always (selected groups, selected length bins, positions).

Parameters

groups: None for all groups, one group name, or a sequence of group names. Use either groups or group_idxs, not both.
group_idxs: None for all groups, one group index, or a sequence of group indices. Use either groups or group_idxs, not both.
with_lengths: Fragment length or lengths in bp. Counts are returned for the length bins containing these lengths. Multiple lengths must select distinct length bins.
with_length_range: Two bp bounds defining a half-open range [start, end). Counts are returned for whole length bins that overlap this range.
length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.

Returns

numpy.ndarray: Count array with shape (group, length_bin, position).

`MidpointProfiles.length_bins`

MidpointProfiles.length_bins

MidpointProfiles.length_bins() -> pd.DataFrame

Get the fragment length bins available in this midpoint-profile output.

Length bins are half-open intervals. A bin with length_start_bp=30 and length_end_bp=50 contains fragment lengths 30 <= length < 50.

Returns

pandas.DataFrame: Columns are length_bin, length_start_bp, and length_end_bp.

`MidpointProfiles.positions`

MidpointProfiles.positions

MidpointProfiles.positions() -> pd.DataFrame

Get the midpoint position bins available in this output.

Returns

pandas.DataFrame: Columns are position, position_bin_start_bp, and position_bin_end_bp.

`MidpointProfiles.data_frame`

MidpointProfiles.data_frame

MidpointProfiles.data_frame(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> pd.DataFrame

Create a pandas DataFrame of midpoint profile counts.

Use this for tabular analysis of the midpoint count array. The result expands the selected group and length-bin axes across all midpoint position bins, with group, length-bin, and position metadata on each row.

Parameters

groups: None for all groups, one group name, or a sequence of group names. Use either groups or group_idxs, not both.
group_idxs: None for all groups, one group index, or a sequence of group indices. Use either groups or group_idxs, not both.
with_lengths: Fragment length or lengths in bp. The returned rows use the length bins containing these lengths. Multiple lengths must select distinct length bins.
with_length_range: Two bp bounds defining a half-open range [start, end). Returned rows use whole length bins that overlap this range.
length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.

Returns

pandas.DataFrame: One row per selected group, length bin, and midpoint position bin.

End-Motif Counts

Load dense or sparse end-motif count Zarr stores and extract motif count tables, dense arrays, or sparse matrices.

Symbol	Type	Summary
`read_end_motifs`	function	Open a cfDNAlab end-motif count Zarr store.
`EndMotifCounts`	class	Common API for global, windowed, and grouped end-motif outputs.
`GlobalEndMotifCounts`	class	End-motif counts for global output.
`WindowedEndMotifCounts`	class	End-motif counts for fixed-size or BED-window output.
`GroupedEndMotifCounts`	class	End-motif counts for grouped BED output.

`read_end_motifs`

read_end_motifs(path: pathlib.Path | str) -> GlobalEndMotifCounts | WindowedEndMotifCounts | GroupedEndMotifCounts

Open a cfDNAlab end-motif count Zarr store.

Parameters

path: Path to an .end_motifs.zarr directory.

Returns

EndMotifCounts: Mode-specific end-motif count helper.

`EndMotifCounts`

Common API for global, windowed, and grouped end-motif outputs.

Public Methods

Method	Summary
`storage_mode`	Return how end-motif counts are stored on disk.
`row_mode`	Return what each end-motif count row represents.
`motifs_metadata`	Return motif-axis labels and motif indices available in this output.
`motif_idx`	Find the motif-axis index for a motif label.
`has_motif`	Return whether a motif label exists in this output.
`dense_counts_zarr_array`	Return the lazy Zarr counts array for dense output.

`EndMotifCounts.storage_mode`

EndMotifCounts.storage_mode

EndMotifCounts.storage_mode() -> str

Return how end-motif counts are stored on disk.

Returns

str: Either "dense" or "sparse_coo".

`EndMotifCounts.row_mode`

EndMotifCounts.row_mode

EndMotifCounts.row_mode() -> str

Return what each end-motif count row represents.

Returns

str: One of "global", "size", "bed", or "grouped_bed".

`EndMotifCounts.motifs_metadata`

EndMotifCounts.motifs_metadata

EndMotifCounts.motifs_metadata() -> pd.DataFrame

Return motif-axis labels and motif indices available in this output.

For grouped motifs-file output, the motif labels are the group names used during counting.

Returns

pandas.DataFrame: Columns are motif_index and motif.

`EndMotifCounts.motif_idx`

EndMotifCounts.motif_idx

EndMotifCounts.motif_idx(motif: str) -> int

Find the motif-axis index for a motif label.

Parameters

motif: Motif label to resolve.

Returns

int: Motif index.

`EndMotifCounts.has_motif`

EndMotifCounts.has_motif

EndMotifCounts.has_motif(motif: str) -> bool

Return whether a motif label exists in this output.

Sparse output only stores observed motifs, so an unobserved motif will return False even if it is part of the theoretical motif universe.

Parameters

motif: Motif label to check.

Returns

bool: Whether the motif can be resolved in this output.

`EndMotifCounts.dense_counts_zarr_array`

EndMotifCounts.dense_counts_zarr_array

EndMotifCounts.dense_counts_zarr_array() -> zarr.Array

Return the lazy Zarr counts array for dense output.

This returns the on-disk Zarr array handle without loading the full dense matrix into memory. Sparse output has no dense counts array.

Returns

zarr.Array: Dense count array with shape (output row, motif).

`GlobalEndMotifCounts`

End-motif counts for global output.

Public Methods

Method	Summary
`data_frame`	Create a pandas DataFrame for global end-motif counts.
`dense_counts_array`	Return global end-motif counts as a dense NumPy array.
`sparse_counts_matrix`	Return global end-motif counts as a SciPy sparse matrix.

`GlobalEndMotifCounts.data_frame`

GlobalEndMotifCounts.data_frame

GlobalEndMotifCounts.data_frame(*, densify: bool = False, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None) -> pd.DataFrame

Create a pandas DataFrame for global end-motif counts.

Sparse outputs return stored non-zero motif counts unless densify=True. Densifying adds explicit zero-count rows for selected observed motifs. Dense outputs always include zero counts.

Parameters

densify: If True, sparse outputs add explicit zero-count rows for selected observed motifs. Dense outputs ignore this option.
motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.

Returns

pandas.DataFrame: Global row metadata, motif metadata, and count.

`GlobalEndMotifCounts.dense_counts_array`

GlobalEndMotifCounts.dense_counts_array

GlobalEndMotifCounts.dense_counts_array(*, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, allow_densify: bool = False) -> np.ndarray

Return global end-motif counts as a dense NumPy array.

Sparse stores are only densified when allow_densify=True. Scalar motif selectors keep their axis as length one, so the shape is always (1, selected motifs).

Parameters

motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.
allow_densify: If True, allow sparse stores to be converted to dense counts.

Returns

numpy.ndarray: Dense count array with shape (global row, motif).

`GlobalEndMotifCounts.sparse_counts_matrix`

GlobalEndMotifCounts.sparse_counts_matrix

GlobalEndMotifCounts.sparse_counts_matrix(*, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None) -> sparse.coo_matrix

Return global end-motif counts as a SciPy sparse matrix.

Scalar motif selectors keep their axis as length one, so the shape is always (1, selected motifs).

Parameters

motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.

Returns

scipy.sparse.coo_matrix: Sparse count matrix with shape (global row, motif).

`WindowedEndMotifCounts`

End-motif counts for fixed-size or BED-window output.

Public Methods

Method	Summary
`data_frame`	Create a pandas DataFrame of end-motif counts for genomic windows.
`window_metadata`	Return genomic window metadata for this end-motif output.
`dense_counts_array`	Return windowed end-motif counts as a dense NumPy array.
`sparse_counts_matrix`	Return windowed end-motif counts as a SciPy sparse matrix.

`WindowedEndMotifCounts.data_frame`

WindowedEndMotifCounts.data_frame

WindowedEndMotifCounts.data_frame(*, window_idxs: int | Sequence[int] | None = None, densify: bool = False, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, max_blacklisted_fraction: float = 1.0) -> pd.DataFrame

Create a pandas DataFrame of end-motif counts for genomic windows.

Use window_idxs to keep only selected windows and motifs or motif_idxs to keep only selected motifs. Sparse outputs return stored non-zero rows unless densify=True. Densifying adds explicit zero-count rows for selected observed motifs. Dense outputs always include zero counts.

Parameters

window_idxs: None for all windows, one window index, or a sequence of window indices.
densify: If True, sparse outputs add explicit zero-count rows for selected observed motifs. Dense outputs ignore this option.
motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.
max_blacklisted_fraction: Maximum row blacklisted_fraction in 0..1 to retain before counts are returned. The default 1.0 keeps all selected windows.

Returns

pandas.DataFrame: Window metadata, motif metadata, and count.

`WindowedEndMotifCounts.window_metadata`

WindowedEndMotifCounts.window_metadata

WindowedEndMotifCounts.window_metadata() -> pd.DataFrame

Return genomic window metadata for this end-motif output.

Public genomic window metadata uses window_idx, chrom, start, and end columns.

Returns

pandas.DataFrame: Columns are window_idx, chrom, start, end, and blacklisted_fraction.

`WindowedEndMotifCounts.dense_counts_array`

WindowedEndMotifCounts.dense_counts_array

WindowedEndMotifCounts.dense_counts_array(*, window_idxs: int | Sequence[int] | None = None, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, allow_densify: bool = False) -> np.ndarray

Return windowed end-motif counts as a dense NumPy array.

Sparse stores are only densified when allow_densify=True. Scalar selectors keep their axes as length one, so the shape is always (selected windows, selected motifs).

Parameters

window_idxs: None for all windows, one window index, or a sequence of window indices.
motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.
allow_densify: If True, allow sparse stores to be converted to dense counts.

Returns

numpy.ndarray: Dense count array with shape (window, motif).

`WindowedEndMotifCounts.sparse_counts_matrix`

WindowedEndMotifCounts.sparse_counts_matrix

WindowedEndMotifCounts.sparse_counts_matrix(*, window_idxs: int | Sequence[int] | None = None, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None) -> sparse.coo_matrix

Return windowed end-motif counts as a SciPy sparse matrix.

Scalar selectors keep their axes as length one, so the shape is always (selected windows, selected motifs).

Parameters

window_idxs: None for all windows, one window index, or a sequence of window indices.
motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.

Returns

scipy.sparse.coo_matrix: Sparse count matrix with shape (window, motif).

`GroupedEndMotifCounts`

End-motif counts for grouped BED output.

Public Methods

Method	Summary
`data_frame`	Create a pandas DataFrame of end-motif counts for grouped BED rows.
`group_metadata`	Return grouped BED metadata for this end-motif output.
`group_idx`	Find the end-motif row index for a group name.
`dense_counts_array`	Return grouped end-motif counts as a dense NumPy array.
`sparse_counts_matrix`	Return grouped end-motif counts as a SciPy sparse matrix.

`GroupedEndMotifCounts.data_frame`

GroupedEndMotifCounts.data_frame

GroupedEndMotifCounts.data_frame(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, densify: bool = False, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, max_blacklisted_fraction: float = 1.0) -> pd.DataFrame

Create a pandas DataFrame of end-motif counts for grouped BED rows.

Use groups or group_idxs to keep only selected groups and motifs or motif_idxs to keep only selected motifs. Sparse outputs return stored non-zero rows unless densify=True. Densifying adds explicit zero-count rows for selected observed motifs. Dense outputs always include zero counts.

Parameters

groups: None for all groups, one group name, or a sequence of group names. Use either groups or group_idxs, not both.
group_idxs: None for all groups, one group index, or a sequence of group indices. Use either groups or group_idxs, not both.
densify: If True, sparse outputs add explicit zero-count rows for selected observed motifs. Dense outputs ignore this option.
motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.
max_blacklisted_fraction: Maximum row blacklisted_fraction in 0..1 to retain before counts are returned. The default 1.0 keeps all selected groups.

Returns

pandas.DataFrame: Group metadata, motif metadata, and count.

`GroupedEndMotifCounts.group_metadata`

GroupedEndMotifCounts.group_metadata

GroupedEndMotifCounts.group_metadata() -> pd.DataFrame

Return grouped BED metadata for this end-motif output.

Returns

pandas.DataFrame: Columns are group_idx, group_name, eligible_windows, and blacklisted_fraction.

`GroupedEndMotifCounts.group_idx`

GroupedEndMotifCounts.group_idx

GroupedEndMotifCounts.group_idx(group_name: str) -> int

Find the end-motif row index for a group name.

Parameters

group_name: Group name to resolve.

Returns

int: Group index.

`GroupedEndMotifCounts.dense_counts_array`

GroupedEndMotifCounts.dense_counts_array

GroupedEndMotifCounts.dense_counts_array(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, allow_densify: bool = False) -> np.ndarray

Return grouped end-motif counts as a dense NumPy array.

Sparse stores are only densified when allow_densify=True. Scalar selectors keep their axes as length one, so the shape is always (selected groups, selected motifs).

Parameters

groups: None for all groups, one group name, or a sequence of group names. Use either groups or group_idxs, not both.
group_idxs: None for all groups, one group index, or a sequence of group indices. Use either groups or group_idxs, not both.
motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.
allow_densify: If True, allow sparse stores to be converted to dense counts.

Returns

numpy.ndarray: Dense count array with shape (group, motif).

`GroupedEndMotifCounts.sparse_counts_matrix`

GroupedEndMotifCounts.sparse_counts_matrix

GroupedEndMotifCounts.sparse_counts_matrix(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None) -> sparse.coo_matrix

Return grouped end-motif counts as a SciPy sparse matrix.

Scalar selectors keep their axes as length one, so the shape is always (selected groups, selected motifs).

Parameters

groups: None for all groups, one group name, or a sequence of group names. Use either groups or group_idxs, not both.
group_idxs: None for all groups, one group index, or a sequence of group indices. Use either groups or group_idxs, not both.
motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.

Returns

scipy.sparse.coo_matrix: Sparse count matrix with shape (group, motif).

Length Counts

Load fragment length-count TSV outputs and return counts, fractions, or densities as arrays, matrices, vectors, or data frames.

Symbol	Type	Summary
`read_lengths`	function	Read a cfDNAlab length-count TSV and return the matching loader class.
`LengthCounts`	class	Common API for global, windowed, and grouped length-count outputs.
`GlobalLengthCounts`	class	Length counts for global output.
`WindowedLengthCounts`	class	Length counts for fixed-size or BED-window output.
`GroupedLengthCounts`	class	Length counts for grouped BED output.

`read_lengths`

read_lengths(path: pathlib.Path | str) -> GlobalLengthCounts | WindowedLengthCounts | GroupedLengthCounts

Read a cfDNAlab length-count TSV and return the matching loader class.

Parameters

path: Path to a .length_counts.tsv or .length_counts.tsv.zst file.

Returns

LengthCounts: GlobalLengthCounts, WindowedLengthCounts, or GroupedLengthCounts, depending on the TSV metadata columns.

`LengthCounts`

Common API for global, windowed, and grouped length-count outputs.

Public Methods

Method	Summary
`length_bins`	Return fragment length bin definitions used by the count columns.
`length_bin_idx`	Find the length-bin index whose interval contains a fragment length.
`counts_array`	Return raw length counts as a dense NumPy array.

`LengthCounts.length_bins`

LengthCounts.length_bins

LengthCounts.length_bins() -> pd.DataFrame

Return fragment length bin definitions used by the count columns.

Length bins are half-open intervals. A bin with length_start_bp=30 and length_end_bp=50 contains fragment lengths 30 <= length < 50.

Returns

pandas.DataFrame: Columns are length_bin, length_start_bp, length_end_bp, length_midpoint_bp, and length_width_bp.

`LengthCounts.length_bin_idx`

LengthCounts.length_bin_idx

LengthCounts.length_bin_idx(length: int) -> int

Find the length-bin index whose interval contains a fragment length.

Parameters

length: Fragment length in bp.

Returns

int: Length-bin index.

Raises

KeyError: If no length bin contains length.

`LengthCounts.counts_array`

LengthCounts.counts_array

LengthCounts.counts_array(*, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> np.ndarray

Return raw length counts as a dense NumPy array.

Use with_lengths, with_length_range, or length_bin_idxs to select length bins. Range selection uses whole bins overlapping the half-open [start, end) bp range.

Parameters

with_lengths: Fragment length or lengths in bp. Counts are returned for the length bins containing these lengths. Multiple lengths must select distinct length bins.
with_length_range: Two bp bounds defining a half-open range [start, end).
length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.

Returns

numpy.ndarray: Count array with shape (output row, length_bin). Output rows are windows for windowed output, groups for grouped output, and the single global summary row for global output.

`GlobalLengthCounts`

Length counts for global output.

Public Methods

Method	Summary
`data_frame`	Create a pandas DataFrame for the global fragment length distribution.

`GlobalLengthCounts.data_frame`

GlobalLengthCounts.data_frame

GlobalLengthCounts.data_frame(*, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None, value: str = 'count', denominator: str = 'all_bins', keep_wide: bool = False) -> pd.DataFrame

Create a pandas DataFrame for the global fragment length distribution.

Long output has one row per length bin with bin metadata. Wide output has one row with one value column per length bin.

Parameters

with_lengths: Fragment length or lengths in bp. Returned values use the length bins containing these lengths. Multiple lengths must select distinct length bins.
with_length_range: Two bp bounds defining a half-open range [start, end). Returned values use whole length bins that overlap this range.
length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.
value: One of "count", "fraction", or "density". Fractions are within the global row. Densities are fractions divided by the length-bin width.
denominator: For "fraction" and "density", "all_bins" divides by the row total over all length bins, while "selected_bins" divides by the total over the returned length bins. Ignored for "count".
keep_wide: If False, return one row per length bin. If True, return one row with one value column per length bin.

Returns

pandas.DataFrame: Global length-count values with length-bin metadata for long output or value-prefixed columns for wide output.

`WindowedLengthCounts`

Length counts for fixed-size or BED-window output.

Public Methods

Method	Summary
`window_metadata`	Return genomic window metadata for this length-count output.
`counts_array`	Return raw length counts as a dense NumPy array.
`data_frame`	Create a pandas DataFrame of fragment length distributions for windows.

`WindowedLengthCounts.window_metadata`

WindowedLengthCounts.window_metadata

WindowedLengthCounts.window_metadata() -> pd.DataFrame

Return genomic window metadata for this length-count output.

Returns

pandas.DataFrame: Columns are window_idx, chrom, start, end, and optionally blacklisted_fraction.

`WindowedLengthCounts.counts_array`

WindowedLengthCounts.counts_array

WindowedLengthCounts.counts_array(*, window_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> np.ndarray

Return raw length counts as a dense NumPy array.

Scalar selectors keep their axis as length one, so the shape is always (selected windows, length_bin).

Parameters

window_idxs: None for all windows, one window index, or a sequence of window indices.
with_lengths: Fragment length or lengths in bp. Counts are returned for the length bins containing these lengths. Multiple lengths must select distinct length bins.
with_length_range: Two bp bounds defining a half-open range [start, end).
length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.

Returns

numpy.ndarray: Count array with shape (window, length_bin).

`WindowedLengthCounts.data_frame`

WindowedLengthCounts.data_frame

WindowedLengthCounts.data_frame(*, window_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None, value: str = 'count', denominator: str = 'all_bins', keep_wide: bool = False, max_blacklisted_fraction: float = 1.0) -> pd.DataFrame

Create a pandas DataFrame of fragment length distributions for windows.

Use window_idxs to keep only selected genomic windows. Long output has one row per selected window and length bin. Wide output has one row per selected window with one value column per length bin.

Parameters

window_idxs: None for all windows, a window index, or a sequence of window indices.
with_lengths: Fragment length or lengths in bp. Returned values use the length bins containing these lengths. Multiple lengths must select distinct length bins.
with_length_range: Two bp bounds defining a half-open range [start, end). Returned values use whole length bins that overlap this range.
length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.
value: One of "count", "fraction", or "density". Fractions are within each selected window. Densities are fractions divided by the length-bin width.
denominator: For "fraction" and "density", "all_bins" divides by each row's total over all length bins, while "selected_bins" divides by the total over the returned length bins. Ignored for "count".
keep_wide: If False, return one row per selected window and length bin. If True, return one row per selected window with one value column per length bin.
max_blacklisted_fraction: Maximum blacklisted_fraction in 0..1 to keep. The default 1.0 keeps all selected windows.

Returns

pandas.DataFrame: Window metadata and length-count values.

`GroupedLengthCounts`

Length counts for grouped BED output.

Public Methods

Method	Summary
`group_metadata`	Return grouped BED metadata for this length-count output.
`group_idx`	Find the count-row index for a group name.
`counts_array`	Return raw length counts as a dense NumPy array.
`data_frame`	Create a pandas DataFrame of fragment length distributions for groups.

`GroupedLengthCounts.group_metadata`

GroupedLengthCounts.group_metadata

GroupedLengthCounts.group_metadata() -> pd.DataFrame

Return grouped BED metadata for this length-count output.

Returns

pandas.DataFrame: Columns are group_idx, group_name, eligible_windows, and optionally blacklisted_fraction.

`GroupedLengthCounts.group_idx`

GroupedLengthCounts.group_idx

GroupedLengthCounts.group_idx(group_name: str) -> int

Find the count-row index for a group name.

Parameters

group_name: Group name to resolve.

Returns

int: Group index.

`GroupedLengthCounts.counts_array`

GroupedLengthCounts.counts_array

GroupedLengthCounts.counts_array(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> np.ndarray

Return raw length counts as a dense NumPy array.

Scalar selectors keep their axis as length one, so the shape is always (selected groups, length_bin).

Parameters

groups: None for all groups, one group name, or a sequence of group names. Use either groups or group_idxs, not both.
group_idxs: None for all groups, one group index, or a sequence of group indices. Use either groups or group_idxs, not both.
with_lengths: Fragment length or lengths in bp. Counts are returned for the length bins containing these lengths. Multiple lengths must select distinct length bins.
with_length_range: Two bp bounds defining a half-open range [start, end).
length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.

Returns

numpy.ndarray: Count array with shape (group, length_bin).

`GroupedLengthCounts.data_frame`

GroupedLengthCounts.data_frame

GroupedLengthCounts.data_frame(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None, value: str = 'count', denominator: str = 'all_bins', keep_wide: bool = False, max_blacklisted_fraction: float = 1.0) -> pd.DataFrame

Create a pandas DataFrame of fragment length distributions for groups.

Use groups or group_idxs to keep only selected grouped BED rows. Long output has one row per selected group and length bin. Wide output has one row per selected group with one value column per length bin.

Parameters

groups: None for all groups, one group name, or a sequence of group names. Use either groups or group_idxs, not both.
group_idxs: None for all groups, one group index, or a sequence of group indices. Use either groups or group_idxs, not both.
with_lengths: Fragment length or lengths in bp. Returned values use the length bins containing these lengths. Multiple lengths must select distinct length bins.
with_length_range: Two bp bounds defining a half-open range [start, end). Returned values use whole length bins that overlap this range.
length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.
value: One of "count", "fraction", or "density". Fractions are within each selected group. Densities are fractions divided by the length-bin width.
denominator: For "fraction" and "density", "all_bins" divides by each row's total over all length bins, while "selected_bins" divides by the total over the returned length bins. Ignored for "count".
keep_wide: If False, return one row per selected group and length bin. If True, return one row per selected group with one value column per length bin.
max_blacklisted_fraction: Maximum blacklisted_fraction in 0..1 to keep. The default 1.0 keeps all selected groups.

Returns

pandas.DataFrame: Group metadata and length-count values.

Jump To​

Midpoint Profiles​

read_midpoints​

MidpointProfiles​

MidpointProfiles.group_idx​

MidpointProfiles.length_bin_idx​

MidpointProfiles.group_metadata​

MidpointProfiles.counts_array​

MidpointProfiles.length_bins​

MidpointProfiles.positions​

MidpointProfiles.data_frame​

End-Motif Counts​

read_end_motifs​

EndMotifCounts​

EndMotifCounts.storage_mode​

EndMotifCounts.row_mode​

EndMotifCounts.motifs_metadata​

EndMotifCounts.motif_idx​

EndMotifCounts.has_motif​

EndMotifCounts.dense_counts_zarr_array​

GlobalEndMotifCounts​

GlobalEndMotifCounts.data_frame​

GlobalEndMotifCounts.dense_counts_array​

GlobalEndMotifCounts.sparse_counts_matrix​

WindowedEndMotifCounts​

WindowedEndMotifCounts.data_frame​

WindowedEndMotifCounts.window_metadata​

WindowedEndMotifCounts.dense_counts_array​

WindowedEndMotifCounts.sparse_counts_matrix​

GroupedEndMotifCounts​

GroupedEndMotifCounts.data_frame​

GroupedEndMotifCounts.group_metadata​

GroupedEndMotifCounts.group_idx​

GroupedEndMotifCounts.dense_counts_array​

GroupedEndMotifCounts.sparse_counts_matrix​

Length Counts​

read_lengths​

LengthCounts​

LengthCounts.length_bins​

LengthCounts.length_bin_idx​

LengthCounts.counts_array​

GlobalLengthCounts​

GlobalLengthCounts.data_frame​

WindowedLengthCounts​

WindowedLengthCounts.window_metadata​

WindowedLengthCounts.counts_array​

WindowedLengthCounts.data_frame​

GroupedLengthCounts​

GroupedLengthCounts.group_metadata​

GroupedLengthCounts.group_idx​

GroupedLengthCounts.counts_array​

GroupedLengthCounts.data_frame​

Jump To

Midpoint Profiles

`read_midpoints`

`MidpointProfiles`

`MidpointProfiles.group_idx`

`MidpointProfiles.length_bin_idx`

`MidpointProfiles.group_metadata`

`MidpointProfiles.counts_array`

`MidpointProfiles.length_bins`

`MidpointProfiles.positions`

`MidpointProfiles.data_frame`

End-Motif Counts

`read_end_motifs`

`EndMotifCounts`

`EndMotifCounts.storage_mode`

`EndMotifCounts.row_mode`

`EndMotifCounts.motifs_metadata`

`EndMotifCounts.motif_idx`

`EndMotifCounts.has_motif`

`EndMotifCounts.dense_counts_zarr_array`

`GlobalEndMotifCounts`

`GlobalEndMotifCounts.data_frame`

`GlobalEndMotifCounts.dense_counts_array`

`GlobalEndMotifCounts.sparse_counts_matrix`

`WindowedEndMotifCounts`

`WindowedEndMotifCounts.data_frame`

`WindowedEndMotifCounts.window_metadata`

`WindowedEndMotifCounts.dense_counts_array`

`WindowedEndMotifCounts.sparse_counts_matrix`

`GroupedEndMotifCounts`

`GroupedEndMotifCounts.data_frame`

`GroupedEndMotifCounts.group_metadata`

`GroupedEndMotifCounts.group_idx`

`GroupedEndMotifCounts.dense_counts_array`

`GroupedEndMotifCounts.sparse_counts_matrix`

Length Counts

`read_lengths`

`LengthCounts`

`LengthCounts.length_bins`

`LengthCounts.length_bin_idx`

`LengthCounts.counts_array`

`GlobalLengthCounts`

`GlobalLengthCounts.data_frame`

`WindowedLengthCounts`

`WindowedLengthCounts.window_metadata`

`WindowedLengthCounts.counts_array`

`WindowedLengthCounts.data_frame`

`GroupedLengthCounts`

`GroupedLengthCounts.group_metadata`

`GroupedLengthCounts.group_idx`

`GroupedLengthCounts.counts_array`

`GroupedLengthCounts.data_frame`