Skip to main content

Python API

Generated from public symbols and docstrings in py-cfdnalab/src/cfdnalab.

Jump To

Midpoint Profiles

Load midpoint profile Zarr stores and extract count arrays or data frames by group, fragment length bin, and midpoint position.

SymbolTypeSummary
read_midpointsfunctionOpen a cfDNAlab midpoint profile Zarr store.
MidpointProfilesclassHelper for loading and slicing midpoint profile Zarr output.

read_midpoints

read_midpoints(path: pathlib.Path | str) -> MidpointProfiles

Open a cfDNAlab midpoint profile Zarr store.

Parameters

  • path: Path to a .midpoint_profiles.zarr directory.

Returns

  • MidpointProfiles: Loaded midpoint profile helper.

MidpointProfiles

Helper for loading and slicing midpoint profile Zarr output.

Midpoint profiles store counts as (group, length_bin, position). The class exposes metadata as pandas data frames and count slices as NumPy arrays.

Public Methods

MethodSummary
group_idxFind the midpoint group index for a group name.
length_bin_idxFind the length-bin index whose interval contains a fragment length.
group_metadataReturn midpoint group labels and eligible interval counts.
counts_arrayReturn midpoint counts as a dense NumPy array.
length_binsGet the fragment length bins available in this midpoint-profile output.
positionsGet the midpoint position bins available in this output.
data_frameCreate a pandas DataFrame of midpoint profile counts.

MidpointProfiles.group_idx

MidpointProfiles.group_idx
MidpointProfiles.group_idx(group_name: str) -> int

Find the midpoint group index for a group name.

Parameters

  • group_name: Group name to resolve.

Returns

  • int: Group index.

MidpointProfiles.length_bin_idx

MidpointProfiles.length_bin_idx
MidpointProfiles.length_bin_idx(length: int) -> int

Find the length-bin index whose interval contains a fragment length.

Parameters

  • length: Fragment length in bp.

Returns

  • int: Length-bin index.

MidpointProfiles.group_metadata

MidpointProfiles.group_metadata
MidpointProfiles.group_metadata() -> pd.DataFrame

Return midpoint group labels and eligible interval counts.

Returns

  • pandas.DataFrame: Columns are group_idx, group_name, and eligible_intervals.

MidpointProfiles.counts_array

MidpointProfiles.counts_array
MidpointProfiles.counts_array(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> np.ndarray

Return midpoint counts as a dense NumPy array.

The result keeps the midpoint count dimensions in the same order as the file: group, length bin, then position. Scalar selectors keep their axis as length one, so the shape is always (selected groups, selected length bins, positions).

Parameters

  • groups: None for all groups, one group name, or a sequence of group names. Use either groups or group_idxs, not both.
  • group_idxs: None for all groups, one group index, or a sequence of group indices. Use either groups or group_idxs, not both.
  • with_lengths: Fragment length or lengths in bp. Counts are returned for the length bins containing these lengths. Multiple lengths must select distinct length bins.
  • with_length_range: Two bp bounds defining a half-open range [start, end). Counts are returned for whole length bins that overlap this range.
  • length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.

Returns

  • numpy.ndarray: Count array with shape (group, length_bin, position).

MidpointProfiles.length_bins

MidpointProfiles.length_bins
MidpointProfiles.length_bins() -> pd.DataFrame

Get the fragment length bins available in this midpoint-profile output.

Length bins are half-open intervals. A bin with length_start_bp=30 and length_end_bp=50 contains fragment lengths 30 <= length < 50.

Returns

  • pandas.DataFrame: Columns are length_bin, length_start_bp, and length_end_bp.

MidpointProfiles.positions

MidpointProfiles.positions
MidpointProfiles.positions() -> pd.DataFrame

Get the midpoint position bins available in this output.

Returns

  • pandas.DataFrame: Columns are position, position_bin_start_bp, and position_bin_end_bp.

MidpointProfiles.data_frame

MidpointProfiles.data_frame
MidpointProfiles.data_frame(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> pd.DataFrame

Create a pandas DataFrame of midpoint profile counts.

Use this for tabular analysis of the midpoint count array. The result expands the selected group and length-bin axes across all midpoint position bins, with group, length-bin, and position metadata on each row.

Parameters

  • groups: None for all groups, one group name, or a sequence of group names. Use either groups or group_idxs, not both.
  • group_idxs: None for all groups, one group index, or a sequence of group indices. Use either groups or group_idxs, not both.
  • with_lengths: Fragment length or lengths in bp. The returned rows use the length bins containing these lengths. Multiple lengths must select distinct length bins.
  • with_length_range: Two bp bounds defining a half-open range [start, end). Returned rows use whole length bins that overlap this range.
  • length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.

Returns

  • pandas.DataFrame: One row per selected group, length bin, and midpoint position bin.

End-Motif Counts

Load dense or sparse end-motif count Zarr stores and extract motif count tables, dense arrays, or sparse matrices.

SymbolTypeSummary
read_end_motifsfunctionOpen a cfDNAlab end-motif count Zarr store.
EndMotifCountsclassCommon API for global, windowed, and grouped end-motif outputs.
GlobalEndMotifCountsclassEnd-motif counts for global output.
WindowedEndMotifCountsclassEnd-motif counts for fixed-size or BED-window output.
GroupedEndMotifCountsclassEnd-motif counts for grouped BED output.

read_end_motifs

read_end_motifs(path: pathlib.Path | str) -> GlobalEndMotifCounts | WindowedEndMotifCounts | GroupedEndMotifCounts

Open a cfDNAlab end-motif count Zarr store.

Parameters

  • path: Path to an .end_motifs.zarr directory.

Returns

  • EndMotifCounts: Mode-specific end-motif count helper.

EndMotifCounts

Common API for global, windowed, and grouped end-motif outputs.

Public Methods

MethodSummary
storage_modeReturn how end-motif counts are stored on disk.
row_modeReturn what each end-motif count row represents.
motifs_metadataReturn motif-axis labels and motif indices available in this output.
motif_idxFind the motif-axis index for a motif label.
has_motifReturn whether a motif label exists in this output.
dense_counts_zarr_arrayReturn the lazy Zarr counts array for dense output.

EndMotifCounts.storage_mode

EndMotifCounts.storage_mode
EndMotifCounts.storage_mode() -> str

Return how end-motif counts are stored on disk.

Returns

  • str: Either "dense" or "sparse_coo".

EndMotifCounts.row_mode

EndMotifCounts.row_mode
EndMotifCounts.row_mode() -> str

Return what each end-motif count row represents.

Returns

  • str: One of "global", "size", "bed", or "grouped_bed".

EndMotifCounts.motifs_metadata

EndMotifCounts.motifs_metadata
EndMotifCounts.motifs_metadata() -> pd.DataFrame

Return motif-axis labels and motif indices available in this output.

For grouped motifs-file output, the motif labels are the group names used during counting.

Returns

  • pandas.DataFrame: Columns are motif_index and motif.

EndMotifCounts.motif_idx

EndMotifCounts.motif_idx
EndMotifCounts.motif_idx(motif: str) -> int

Find the motif-axis index for a motif label.

Parameters

  • motif: Motif label to resolve.

Returns

  • int: Motif index.

EndMotifCounts.has_motif

EndMotifCounts.has_motif
EndMotifCounts.has_motif(motif: str) -> bool

Return whether a motif label exists in this output.

Sparse output only stores observed motifs, so an unobserved motif will return False even if it is part of the theoretical motif universe.

Parameters

  • motif: Motif label to check.

Returns

  • bool: Whether the motif can be resolved in this output.

EndMotifCounts.dense_counts_zarr_array

EndMotifCounts.dense_counts_zarr_array
EndMotifCounts.dense_counts_zarr_array() -> zarr.Array

Return the lazy Zarr counts array for dense output.

This returns the on-disk Zarr array handle without loading the full dense matrix into memory. Sparse output has no dense counts array.

Returns

  • zarr.Array: Dense count array with shape (output row, motif).

GlobalEndMotifCounts

End-motif counts for global output.

Public Methods

MethodSummary
data_frameCreate a pandas DataFrame for global end-motif counts.
dense_counts_arrayReturn global end-motif counts as a dense NumPy array.
sparse_counts_matrixReturn global end-motif counts as a SciPy sparse matrix.

GlobalEndMotifCounts.data_frame

GlobalEndMotifCounts.data_frame
GlobalEndMotifCounts.data_frame(*, densify: bool = False, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None) -> pd.DataFrame

Create a pandas DataFrame for global end-motif counts.

Sparse outputs return stored non-zero motif counts unless densify=True. Densifying adds explicit zero-count rows for selected observed motifs. Dense outputs always include zero counts.

Parameters

  • densify: If True, sparse outputs add explicit zero-count rows for selected observed motifs. Dense outputs ignore this option.
  • motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
  • motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.

Returns

  • pandas.DataFrame: Global row metadata, motif metadata, and count.

GlobalEndMotifCounts.dense_counts_array

GlobalEndMotifCounts.dense_counts_array
GlobalEndMotifCounts.dense_counts_array(*, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, allow_densify: bool = False) -> np.ndarray

Return global end-motif counts as a dense NumPy array.

Sparse stores are only densified when allow_densify=True. Scalar motif selectors keep their axis as length one, so the shape is always (1, selected motifs).

Parameters

  • motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
  • motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.
  • allow_densify: If True, allow sparse stores to be converted to dense counts.

Returns

  • numpy.ndarray: Dense count array with shape (global row, motif).

GlobalEndMotifCounts.sparse_counts_matrix

GlobalEndMotifCounts.sparse_counts_matrix
GlobalEndMotifCounts.sparse_counts_matrix(*, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None) -> sparse.coo_matrix

Return global end-motif counts as a SciPy sparse matrix.

Scalar motif selectors keep their axis as length one, so the shape is always (1, selected motifs).

Parameters

  • motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
  • motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.

Returns

  • scipy.sparse.coo_matrix: Sparse count matrix with shape (global row, motif).

WindowedEndMotifCounts

End-motif counts for fixed-size or BED-window output.

Public Methods

MethodSummary
data_frameCreate a pandas DataFrame of end-motif counts for genomic windows.
window_metadataReturn genomic window metadata for this end-motif output.
dense_counts_arrayReturn windowed end-motif counts as a dense NumPy array.
sparse_counts_matrixReturn windowed end-motif counts as a SciPy sparse matrix.

WindowedEndMotifCounts.data_frame

WindowedEndMotifCounts.data_frame
WindowedEndMotifCounts.data_frame(*, window_idxs: int | Sequence[int] | None = None, densify: bool = False, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, max_blacklisted_fraction: float = 1.0) -> pd.DataFrame

Create a pandas DataFrame of end-motif counts for genomic windows.

Use window_idxs to keep only selected windows and motifs or motif_idxs to keep only selected motifs. Sparse outputs return stored non-zero rows unless densify=True. Densifying adds explicit zero-count rows for selected observed motifs. Dense outputs always include zero counts.

Parameters

  • window_idxs: None for all windows, one window index, or a sequence of window indices.
  • densify: If True, sparse outputs add explicit zero-count rows for selected observed motifs. Dense outputs ignore this option.
  • motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
  • motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.
  • max_blacklisted_fraction: Maximum row blacklisted_fraction in 0..1 to retain before counts are returned. The default 1.0 keeps all selected windows.

Returns

  • pandas.DataFrame: Window metadata, motif metadata, and count.

WindowedEndMotifCounts.window_metadata

WindowedEndMotifCounts.window_metadata
WindowedEndMotifCounts.window_metadata() -> pd.DataFrame

Return genomic window metadata for this end-motif output.

Public genomic window metadata uses window_idx, chrom, start, and end columns.

Returns

  • pandas.DataFrame: Columns are window_idx, chrom, start, end, and blacklisted_fraction.

WindowedEndMotifCounts.dense_counts_array

WindowedEndMotifCounts.dense_counts_array
WindowedEndMotifCounts.dense_counts_array(*, window_idxs: int | Sequence[int] | None = None, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, allow_densify: bool = False) -> np.ndarray

Return windowed end-motif counts as a dense NumPy array.

Sparse stores are only densified when allow_densify=True. Scalar selectors keep their axes as length one, so the shape is always (selected windows, selected motifs).

Parameters

  • window_idxs: None for all windows, one window index, or a sequence of window indices.
  • motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
  • motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.
  • allow_densify: If True, allow sparse stores to be converted to dense counts.

Returns

  • numpy.ndarray: Dense count array with shape (window, motif).

WindowedEndMotifCounts.sparse_counts_matrix

WindowedEndMotifCounts.sparse_counts_matrix
WindowedEndMotifCounts.sparse_counts_matrix(*, window_idxs: int | Sequence[int] | None = None, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None) -> sparse.coo_matrix

Return windowed end-motif counts as a SciPy sparse matrix.

Scalar selectors keep their axes as length one, so the shape is always (selected windows, selected motifs).

Parameters

  • window_idxs: None for all windows, one window index, or a sequence of window indices.
  • motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
  • motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.

Returns

  • scipy.sparse.coo_matrix: Sparse count matrix with shape (window, motif).

GroupedEndMotifCounts

End-motif counts for grouped BED output.

Public Methods

MethodSummary
data_frameCreate a pandas DataFrame of end-motif counts for grouped BED rows.
group_metadataReturn grouped BED metadata for this end-motif output.
group_idxFind the end-motif row index for a group name.
dense_counts_arrayReturn grouped end-motif counts as a dense NumPy array.
sparse_counts_matrixReturn grouped end-motif counts as a SciPy sparse matrix.

GroupedEndMotifCounts.data_frame

GroupedEndMotifCounts.data_frame
GroupedEndMotifCounts.data_frame(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, densify: bool = False, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, max_blacklisted_fraction: float = 1.0) -> pd.DataFrame

Create a pandas DataFrame of end-motif counts for grouped BED rows.

Use groups or group_idxs to keep only selected groups and motifs or motif_idxs to keep only selected motifs. Sparse outputs return stored non-zero rows unless densify=True. Densifying adds explicit zero-count rows for selected observed motifs. Dense outputs always include zero counts.

Parameters

  • groups: None for all groups, one group name, or a sequence of group names. Use either groups or group_idxs, not both.
  • group_idxs: None for all groups, one group index, or a sequence of group indices. Use either groups or group_idxs, not both.
  • densify: If True, sparse outputs add explicit zero-count rows for selected observed motifs. Dense outputs ignore this option.
  • motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
  • motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.
  • max_blacklisted_fraction: Maximum row blacklisted_fraction in 0..1 to retain before counts are returned. The default 1.0 keeps all selected groups.

Returns

  • pandas.DataFrame: Group metadata, motif metadata, and count.

GroupedEndMotifCounts.group_metadata

GroupedEndMotifCounts.group_metadata
GroupedEndMotifCounts.group_metadata() -> pd.DataFrame

Return grouped BED metadata for this end-motif output.

Returns

  • pandas.DataFrame: Columns are group_idx, group_name, eligible_windows, and blacklisted_fraction.

GroupedEndMotifCounts.group_idx

GroupedEndMotifCounts.group_idx
GroupedEndMotifCounts.group_idx(group_name: str) -> int

Find the end-motif row index for a group name.

Parameters

  • group_name: Group name to resolve.

Returns

  • int: Group index.

GroupedEndMotifCounts.dense_counts_array

GroupedEndMotifCounts.dense_counts_array
GroupedEndMotifCounts.dense_counts_array(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, allow_densify: bool = False) -> np.ndarray

Return grouped end-motif counts as a dense NumPy array.

Sparse stores are only densified when allow_densify=True. Scalar selectors keep their axes as length one, so the shape is always (selected groups, selected motifs).

Parameters

  • groups: None for all groups, one group name, or a sequence of group names. Use either groups or group_idxs, not both.
  • group_idxs: None for all groups, one group index, or a sequence of group indices. Use either groups or group_idxs, not both.
  • motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
  • motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.
  • allow_densify: If True, allow sparse stores to be converted to dense counts.

Returns

  • numpy.ndarray: Dense count array with shape (group, motif).

GroupedEndMotifCounts.sparse_counts_matrix

GroupedEndMotifCounts.sparse_counts_matrix
GroupedEndMotifCounts.sparse_counts_matrix(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None) -> sparse.coo_matrix

Return grouped end-motif counts as a SciPy sparse matrix.

Scalar selectors keep their axes as length one, so the shape is always (selected groups, selected motifs).

Parameters

  • groups: None for all groups, one group name, or a sequence of group names. Use either groups or group_idxs, not both.
  • group_idxs: None for all groups, one group index, or a sequence of group indices. Use either groups or group_idxs, not both.
  • motifs: Motif label or labels. Use either motifs or motif_idxs, not both.
  • motif_idxs: Motif index or indices. Use either motifs or motif_idxs, not both.

Returns

  • scipy.sparse.coo_matrix: Sparse count matrix with shape (group, motif).

Length Counts

Load fragment length-count TSV outputs and return counts, fractions, or densities as arrays, matrices, vectors, or data frames.

SymbolTypeSummary
read_lengthsfunctionRead a cfDNAlab length-count TSV and return the matching loader class.
LengthCountsclassCommon API for global, windowed, and grouped length-count outputs.
GlobalLengthCountsclassLength counts for global output.
WindowedLengthCountsclassLength counts for fixed-size or BED-window output.
GroupedLengthCountsclassLength counts for grouped BED output.

read_lengths

read_lengths(path: pathlib.Path | str) -> GlobalLengthCounts | WindowedLengthCounts | GroupedLengthCounts

Read a cfDNAlab length-count TSV and return the matching loader class.

Parameters

  • path: Path to a .length_counts.tsv or .length_counts.tsv.zst file.

Returns

  • LengthCounts: GlobalLengthCounts, WindowedLengthCounts, or GroupedLengthCounts, depending on the TSV metadata columns.

LengthCounts

Common API for global, windowed, and grouped length-count outputs.

Public Methods

MethodSummary
length_binsReturn fragment length bin definitions used by the count columns.
length_bin_idxFind the length-bin index whose interval contains a fragment length.
counts_arrayReturn raw length counts as a dense NumPy array.

LengthCounts.length_bins

LengthCounts.length_bins
LengthCounts.length_bins() -> pd.DataFrame

Return fragment length bin definitions used by the count columns.

Length bins are half-open intervals. A bin with length_start_bp=30 and length_end_bp=50 contains fragment lengths 30 <= length < 50.

Returns

  • pandas.DataFrame: Columns are length_bin, length_start_bp, length_end_bp, length_midpoint_bp, and length_width_bp.

LengthCounts.length_bin_idx

LengthCounts.length_bin_idx
LengthCounts.length_bin_idx(length: int) -> int

Find the length-bin index whose interval contains a fragment length.

Parameters

  • length: Fragment length in bp.

Returns

  • int: Length-bin index.

Raises

  • KeyError: If no length bin contains length.

LengthCounts.counts_array

LengthCounts.counts_array
LengthCounts.counts_array(*, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> np.ndarray

Return raw length counts as a dense NumPy array.

Use with_lengths, with_length_range, or length_bin_idxs to select length bins. Range selection uses whole bins overlapping the half-open [start, end) bp range.

Parameters

  • with_lengths: Fragment length or lengths in bp. Counts are returned for the length bins containing these lengths. Multiple lengths must select distinct length bins.
  • with_length_range: Two bp bounds defining a half-open range [start, end).
  • length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.

Returns

  • numpy.ndarray: Count array with shape (output row, length_bin). Output rows are windows for windowed output, groups for grouped output, and the single global summary row for global output.

GlobalLengthCounts

Length counts for global output.

Public Methods

MethodSummary
data_frameCreate a pandas DataFrame for the global fragment length distribution.

GlobalLengthCounts.data_frame

GlobalLengthCounts.data_frame
GlobalLengthCounts.data_frame(*, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None, value: str = 'count', denominator: str = 'all_bins', keep_wide: bool = False) -> pd.DataFrame

Create a pandas DataFrame for the global fragment length distribution.

Long output has one row per length bin with bin metadata. Wide output has one row with one value column per length bin.

Parameters

  • with_lengths: Fragment length or lengths in bp. Returned values use the length bins containing these lengths. Multiple lengths must select distinct length bins.
  • with_length_range: Two bp bounds defining a half-open range [start, end). Returned values use whole length bins that overlap this range.
  • length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.
  • value: One of "count", "fraction", or "density". Fractions are within the global row. Densities are fractions divided by the length-bin width.
  • denominator: For "fraction" and "density", "all_bins" divides by the row total over all length bins, while "selected_bins" divides by the total over the returned length bins. Ignored for "count".
  • keep_wide: If False, return one row per length bin. If True, return one row with one value column per length bin.

Returns

  • pandas.DataFrame: Global length-count values with length-bin metadata for long output or value-prefixed columns for wide output.

WindowedLengthCounts

Length counts for fixed-size or BED-window output.

Public Methods

MethodSummary
window_metadataReturn genomic window metadata for this length-count output.
counts_arrayReturn raw length counts as a dense NumPy array.
data_frameCreate a pandas DataFrame of fragment length distributions for windows.

WindowedLengthCounts.window_metadata

WindowedLengthCounts.window_metadata
WindowedLengthCounts.window_metadata() -> pd.DataFrame

Return genomic window metadata for this length-count output.

Returns

  • pandas.DataFrame: Columns are window_idx, chrom, start, end, and optionally blacklisted_fraction.

WindowedLengthCounts.counts_array

WindowedLengthCounts.counts_array
WindowedLengthCounts.counts_array(*, window_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> np.ndarray

Return raw length counts as a dense NumPy array.

Scalar selectors keep their axis as length one, so the shape is always (selected windows, length_bin).

Parameters

  • window_idxs: None for all windows, one window index, or a sequence of window indices.
  • with_lengths: Fragment length or lengths in bp. Counts are returned for the length bins containing these lengths. Multiple lengths must select distinct length bins.
  • with_length_range: Two bp bounds defining a half-open range [start, end).
  • length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.

Returns

  • numpy.ndarray: Count array with shape (window, length_bin).

WindowedLengthCounts.data_frame

WindowedLengthCounts.data_frame
WindowedLengthCounts.data_frame(*, window_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None, value: str = 'count', denominator: str = 'all_bins', keep_wide: bool = False, max_blacklisted_fraction: float = 1.0) -> pd.DataFrame

Create a pandas DataFrame of fragment length distributions for windows.

Use window_idxs to keep only selected genomic windows. Long output has one row per selected window and length bin. Wide output has one row per selected window with one value column per length bin.

Parameters

  • window_idxs: None for all windows, a window index, or a sequence of window indices.
  • with_lengths: Fragment length or lengths in bp. Returned values use the length bins containing these lengths. Multiple lengths must select distinct length bins.
  • with_length_range: Two bp bounds defining a half-open range [start, end). Returned values use whole length bins that overlap this range.
  • length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.
  • value: One of "count", "fraction", or "density". Fractions are within each selected window. Densities are fractions divided by the length-bin width.
  • denominator: For "fraction" and "density", "all_bins" divides by each row's total over all length bins, while "selected_bins" divides by the total over the returned length bins. Ignored for "count".
  • keep_wide: If False, return one row per selected window and length bin. If True, return one row per selected window with one value column per length bin.
  • max_blacklisted_fraction: Maximum blacklisted_fraction in 0..1 to keep. The default 1.0 keeps all selected windows.

Returns

  • pandas.DataFrame: Window metadata and length-count values.

GroupedLengthCounts

Length counts for grouped BED output.

Public Methods

MethodSummary
group_metadataReturn grouped BED metadata for this length-count output.
group_idxFind the count-row index for a group name.
counts_arrayReturn raw length counts as a dense NumPy array.
data_frameCreate a pandas DataFrame of fragment length distributions for groups.

GroupedLengthCounts.group_metadata

GroupedLengthCounts.group_metadata
GroupedLengthCounts.group_metadata() -> pd.DataFrame

Return grouped BED metadata for this length-count output.

Returns

  • pandas.DataFrame: Columns are group_idx, group_name, eligible_windows, and optionally blacklisted_fraction.

GroupedLengthCounts.group_idx

GroupedLengthCounts.group_idx
GroupedLengthCounts.group_idx(group_name: str) -> int

Find the count-row index for a group name.

Parameters

  • group_name: Group name to resolve.

Returns

  • int: Group index.

GroupedLengthCounts.counts_array

GroupedLengthCounts.counts_array
GroupedLengthCounts.counts_array(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> np.ndarray

Return raw length counts as a dense NumPy array.

Scalar selectors keep their axis as length one, so the shape is always (selected groups, length_bin).

Parameters

  • groups: None for all groups, one group name, or a sequence of group names. Use either groups or group_idxs, not both.
  • group_idxs: None for all groups, one group index, or a sequence of group indices. Use either groups or group_idxs, not both.
  • with_lengths: Fragment length or lengths in bp. Counts are returned for the length bins containing these lengths. Multiple lengths must select distinct length bins.
  • with_length_range: Two bp bounds defining a half-open range [start, end).
  • length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.

Returns

  • numpy.ndarray: Count array with shape (group, length_bin).

GroupedLengthCounts.data_frame

GroupedLengthCounts.data_frame
GroupedLengthCounts.data_frame(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None, value: str = 'count', denominator: str = 'all_bins', keep_wide: bool = False, max_blacklisted_fraction: float = 1.0) -> pd.DataFrame

Create a pandas DataFrame of fragment length distributions for groups.

Use groups or group_idxs to keep only selected grouped BED rows. Long output has one row per selected group and length bin. Wide output has one row per selected group with one value column per length bin.

Parameters

  • groups: None for all groups, one group name, or a sequence of group names. Use either groups or group_idxs, not both.
  • group_idxs: None for all groups, one group index, or a sequence of group indices. Use either groups or group_idxs, not both.
  • with_lengths: Fragment length or lengths in bp. Returned values use the length bins containing these lengths. Multiple lengths must select distinct length bins.
  • with_length_range: Two bp bounds defining a half-open range [start, end). Returned values use whole length bins that overlap this range.
  • length_bin_idxs: None for all length bins, one length-bin index, or a sequence of length-bin indices. Use only one of with_lengths, with_length_range, or length_bin_idxs.
  • value: One of "count", "fraction", or "density". Fractions are within each selected group. Densities are fractions divided by the length-bin width.
  • denominator: For "fraction" and "density", "all_bins" divides by each row's total over all length bins, while "selected_bins" divides by the total over the returned length bins. Ignored for "count".
  • keep_wide: If False, return one row per selected group and length bin. If True, return one row per selected group with one value column per length bin.
  • max_blacklisted_fraction: Maximum blacklisted_fraction in 0..1 to keep. The default 1.0 keeps all selected groups.

Returns

  • pandas.DataFrame: Group metadata and length-count values.