Python API
Generated from public symbols and docstrings in py-cfdnalab/src/cfdnalab.
Jump To
Midpoint Profiles
Load midpoint profile Zarr stores and extract count arrays or data frames by group, fragment length bin, and midpoint position.
| Symbol | Type | Summary |
|---|---|---|
read_midpoints | function | Open a cfDNAlab midpoint profile Zarr store. |
MidpointProfiles | class | Helper for loading and slicing midpoint profile Zarr output. |
read_midpoints
read_midpoints(path: pathlib.Path | str) -> MidpointProfiles
Open a cfDNAlab midpoint profile Zarr store.
Parameters
path: Path to a.midpoint_profiles.zarrdirectory.
Returns
MidpointProfiles: Loaded midpoint profile helper.
MidpointProfiles
Helper for loading and slicing midpoint profile Zarr output.
Midpoint profiles store counts as (group, length_bin, position). The class
exposes metadata as pandas data frames and count slices as NumPy arrays.
Public Methods
| Method | Summary |
|---|---|
group_idx | Find the midpoint group index for a group name. |
length_bin_idx | Find the length-bin index whose interval contains a fragment length. |
group_metadata | Return midpoint group labels and eligible interval counts. |
counts_array | Return midpoint counts as a dense NumPy array. |
length_bins | Get the fragment length bins available in this midpoint-profile output. |
positions | Get the midpoint position bins available in this output. |
data_frame | Create a pandas DataFrame of midpoint profile counts. |
MidpointProfiles.group_idx
MidpointProfiles.group_idx
MidpointProfiles.group_idx(group_name: str) -> int
Find the midpoint group index for a group name.
Parameters
group_name: Group name to resolve.
Returns
int: Group index.
MidpointProfiles.length_bin_idx
MidpointProfiles.length_bin_idx
MidpointProfiles.length_bin_idx(length: int) -> int
Find the length-bin index whose interval contains a fragment length.
Parameters
length: Fragment length in bp.
Returns
int: Length-bin index.
MidpointProfiles.group_metadata
MidpointProfiles.group_metadata
MidpointProfiles.group_metadata() -> pd.DataFrame
Return midpoint group labels and eligible interval counts.
Returns
pandas.DataFrame: Columns aregroup_idx,group_name, andeligible_intervals.
MidpointProfiles.counts_array
MidpointProfiles.counts_array
MidpointProfiles.counts_array(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> np.ndarray
Return midpoint counts as a dense NumPy array.
The result keeps the midpoint count dimensions in the same order as
the file: group, length bin, then position. Scalar selectors keep their
axis as length one, so the shape is always
(selected groups, selected length bins, positions).
Parameters
groups:Nonefor all groups, one group name, or a sequence of group names. Use eithergroupsorgroup_idxs, not both.group_idxs:Nonefor all groups, one group index, or a sequence of group indices. Use eithergroupsorgroup_idxs, not both.with_lengths: Fragment length or lengths in bp. Counts are returned for the length bins containing these lengths. Multiple lengths must select distinct length bins.with_length_range: Two bp bounds defining a half-open range[start, end). Counts are returned for whole length bins that overlap this range.length_bin_idxs:Nonefor all length bins, one length-bin index, or a sequence of length-bin indices. Use only one ofwith_lengths,with_length_range, orlength_bin_idxs.
Returns
numpy.ndarray: Count array with shape(group, length_bin, position).
MidpointProfiles.length_bins
MidpointProfiles.length_bins
MidpointProfiles.length_bins() -> pd.DataFrame
Get the fragment length bins available in this midpoint-profile output.
Length bins are half-open intervals. A bin with length_start_bp=30
and length_end_bp=50 contains fragment lengths 30 <= length < 50.
Returns
pandas.DataFrame: Columns arelength_bin,length_start_bp, andlength_end_bp.
MidpointProfiles.positions
MidpointProfiles.positions
MidpointProfiles.positions() -> pd.DataFrame
Get the midpoint position bins available in this output.
Returns
pandas.DataFrame: Columns areposition,position_bin_start_bp, andposition_bin_end_bp.
MidpointProfiles.data_frame
MidpointProfiles.data_frame
MidpointProfiles.data_frame(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> pd.DataFrame
Create a pandas DataFrame of midpoint profile counts.
Use this for tabular analysis of the midpoint count array. The result expands the selected group and length-bin axes across all midpoint position bins, with group, length-bin, and position metadata on each row.
Parameters
groups:Nonefor all groups, one group name, or a sequence of group names. Use eithergroupsorgroup_idxs, not both.group_idxs:Nonefor all groups, one group index, or a sequence of group indices. Use eithergroupsorgroup_idxs, not both.with_lengths: Fragment length or lengths in bp. The returned rows use the length bins containing these lengths. Multiple lengths must select distinct length bins.with_length_range: Two bp bounds defining a half-open range[start, end). Returned rows use whole length bins that overlap this range.length_bin_idxs:Nonefor all length bins, one length-bin index, or a sequence of length-bin indices. Use only one ofwith_lengths,with_length_range, orlength_bin_idxs.
Returns
pandas.DataFrame: One row per selected group, length bin, and midpoint position bin.
End-Motif Counts
Load dense or sparse end-motif count Zarr stores and extract motif count tables, dense arrays, or sparse matrices.
| Symbol | Type | Summary |
|---|---|---|
read_end_motifs | function | Open a cfDNAlab end-motif count Zarr store. |
EndMotifCounts | class | Common API for global, windowed, and grouped end-motif outputs. |
GlobalEndMotifCounts | class | End-motif counts for global output. |
WindowedEndMotifCounts | class | End-motif counts for fixed-size or BED-window output. |
GroupedEndMotifCounts | class | End-motif counts for grouped BED output. |
read_end_motifs
read_end_motifs(path: pathlib.Path | str) -> GlobalEndMotifCounts | WindowedEndMotifCounts | GroupedEndMotifCounts
Open a cfDNAlab end-motif count Zarr store.
Parameters
path: Path to an.end_motifs.zarrdirectory.
Returns
EndMotifCounts: Mode-specific end-motif count helper.
EndMotifCounts
Common API for global, windowed, and grouped end-motif outputs.
Public Methods
| Method | Summary |
|---|---|
storage_mode | Return how end-motif counts are stored on disk. |
row_mode | Return what each end-motif count row represents. |
motifs_metadata | Return motif-axis labels and motif indices available in this output. |
motif_idx | Find the motif-axis index for a motif label. |
has_motif | Return whether a motif label exists in this output. |
dense_counts_zarr_array | Return the lazy Zarr counts array for dense output. |
EndMotifCounts.storage_mode
EndMotifCounts.storage_mode
EndMotifCounts.storage_mode() -> str
Return how end-motif counts are stored on disk.
Returns
str: Either"dense"or"sparse_coo".
EndMotifCounts.row_mode
EndMotifCounts.row_mode
EndMotifCounts.row_mode() -> str
Return what each end-motif count row represents.
Returns
str: One of"global","size","bed", or"grouped_bed".
EndMotifCounts.motifs_metadata
EndMotifCounts.motifs_metadata
EndMotifCounts.motifs_metadata() -> pd.DataFrame
Return motif-axis labels and motif indices available in this output.
For grouped motifs-file output, the motif labels are the group names
used during counting.
Returns
pandas.DataFrame: Columns aremotif_indexandmotif.
EndMotifCounts.motif_idx
EndMotifCounts.motif_idx
EndMotifCounts.motif_idx(motif: str) -> int
Find the motif-axis index for a motif label.
Parameters
motif: Motif label to resolve.
Returns
int: Motif index.
EndMotifCounts.has_motif
EndMotifCounts.has_motif
EndMotifCounts.has_motif(motif: str) -> bool
Return whether a motif label exists in this output.
Sparse output only stores observed motifs, so an unobserved motif will
return False even if it is part of the theoretical motif universe.
Parameters
motif: Motif label to check.
Returns
bool: Whether the motif can be resolved in this output.
EndMotifCounts.dense_counts_zarr_array
EndMotifCounts.dense_counts_zarr_array
EndMotifCounts.dense_counts_zarr_array() -> zarr.Array
Return the lazy Zarr counts array for dense output.
This returns the on-disk Zarr array handle without loading the full
dense matrix into memory. Sparse output has no dense counts array.
Returns
zarr.Array: Dense count array with shape(output row, motif).
GlobalEndMotifCounts
End-motif counts for global output.
Public Methods
| Method | Summary |
|---|---|
data_frame | Create a pandas DataFrame for global end-motif counts. |
dense_counts_array | Return global end-motif counts as a dense NumPy array. |
sparse_counts_matrix | Return global end-motif counts as a SciPy sparse matrix. |
GlobalEndMotifCounts.data_frame
GlobalEndMotifCounts.data_frame
GlobalEndMotifCounts.data_frame(*, densify: bool = False, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None) -> pd.DataFrame
Create a pandas DataFrame for global end-motif counts.
Sparse outputs return stored non-zero motif counts unless
densify=True. Densifying adds explicit zero-count rows for selected
observed motifs. Dense outputs always include zero counts.
Parameters
densify: IfTrue, sparse outputs add explicit zero-count rows for selected observed motifs. Dense outputs ignore this option.motifs: Motif label or labels. Use eithermotifsormotif_idxs, not both.motif_idxs: Motif index or indices. Use eithermotifsormotif_idxs, not both.
Returns
pandas.DataFrame: Global row metadata, motif metadata, andcount.
GlobalEndMotifCounts.dense_counts_array
GlobalEndMotifCounts.dense_counts_array
GlobalEndMotifCounts.dense_counts_array(*, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, allow_densify: bool = False) -> np.ndarray
Return global end-motif counts as a dense NumPy array.
Sparse stores are only densified when allow_densify=True. Scalar
motif selectors keep their axis as length one, so the shape is always
(1, selected motifs).
Parameters
motifs: Motif label or labels. Use eithermotifsormotif_idxs, not both.motif_idxs: Motif index or indices. Use eithermotifsormotif_idxs, not both.allow_densify: IfTrue, allow sparse stores to be converted to dense counts.
Returns
numpy.ndarray: Dense count array with shape(global row, motif).
GlobalEndMotifCounts.sparse_counts_matrix
GlobalEndMotifCounts.sparse_counts_matrix
GlobalEndMotifCounts.sparse_counts_matrix(*, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None) -> sparse.coo_matrix
Return global end-motif counts as a SciPy sparse matrix.
Scalar motif selectors keep their axis as length one, so the shape is
always (1, selected motifs).
Parameters
motifs: Motif label or labels. Use eithermotifsormotif_idxs, not both.motif_idxs: Motif index or indices. Use eithermotifsormotif_idxs, not both.
Returns
scipy.sparse.coo_matrix: Sparse count matrix with shape(global row, motif).
WindowedEndMotifCounts
End-motif counts for fixed-size or BED-window output.
Public Methods
| Method | Summary |
|---|---|
data_frame | Create a pandas DataFrame of end-motif counts for genomic windows. |
window_metadata | Return genomic window metadata for this end-motif output. |
dense_counts_array | Return windowed end-motif counts as a dense NumPy array. |
sparse_counts_matrix | Return windowed end-motif counts as a SciPy sparse matrix. |
WindowedEndMotifCounts.data_frame
WindowedEndMotifCounts.data_frame
WindowedEndMotifCounts.data_frame(*, window_idxs: int | Sequence[int] | None = None, densify: bool = False, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, max_blacklisted_fraction: float = 1.0) -> pd.DataFrame
Create a pandas DataFrame of end-motif counts for genomic windows.
Use window_idxs to keep only selected windows and motifs or
motif_idxs to keep only selected motifs. Sparse outputs return stored
non-zero rows unless densify=True. Densifying adds explicit
zero-count rows for selected observed motifs. Dense outputs always
include zero counts.
Parameters
window_idxs:Nonefor all windows, one window index, or a sequence of window indices.densify: IfTrue, sparse outputs add explicit zero-count rows for selected observed motifs. Dense outputs ignore this option.motifs: Motif label or labels. Use eithermotifsormotif_idxs, not both.motif_idxs: Motif index or indices. Use eithermotifsormotif_idxs, not both.max_blacklisted_fraction: Maximum rowblacklisted_fractionin 0..1 to retain before counts are returned. The default1.0keeps all selected windows.
Returns
pandas.DataFrame: Window metadata, motif metadata, andcount.
WindowedEndMotifCounts.window_metadata
WindowedEndMotifCounts.window_metadata
WindowedEndMotifCounts.window_metadata() -> pd.DataFrame
Return genomic window metadata for this end-motif output.
Public genomic window metadata uses window_idx, chrom, start,
and end columns.
Returns
pandas.DataFrame: Columns arewindow_idx,chrom,start,end, andblacklisted_fraction.
WindowedEndMotifCounts.dense_counts_array
WindowedEndMotifCounts.dense_counts_array
WindowedEndMotifCounts.dense_counts_array(*, window_idxs: int | Sequence[int] | None = None, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, allow_densify: bool = False) -> np.ndarray
Return windowed end-motif counts as a dense NumPy array.
Sparse stores are only densified when allow_densify=True. Scalar
selectors keep their axes as length one, so the shape is always
(selected windows, selected motifs).
Parameters
window_idxs:Nonefor all windows, one window index, or a sequence of window indices.motifs: Motif label or labels. Use eithermotifsormotif_idxs, not both.motif_idxs: Motif index or indices. Use eithermotifsormotif_idxs, not both.allow_densify: IfTrue, allow sparse stores to be converted to dense counts.
Returns
numpy.ndarray: Dense count array with shape(window, motif).
WindowedEndMotifCounts.sparse_counts_matrix
WindowedEndMotifCounts.sparse_counts_matrix
WindowedEndMotifCounts.sparse_counts_matrix(*, window_idxs: int | Sequence[int] | None = None, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None) -> sparse.coo_matrix
Return windowed end-motif counts as a SciPy sparse matrix.
Scalar selectors keep their axes as length one, so the shape is always
(selected windows, selected motifs).
Parameters
window_idxs:Nonefor all windows, one window index, or a sequence of window indices.motifs: Motif label or labels. Use eithermotifsormotif_idxs, not both.motif_idxs: Motif index or indices. Use eithermotifsormotif_idxs, not both.
Returns
scipy.sparse.coo_matrix: Sparse count matrix with shape(window, motif).
GroupedEndMotifCounts
End-motif counts for grouped BED output.
Public Methods
| Method | Summary |
|---|---|
data_frame | Create a pandas DataFrame of end-motif counts for grouped BED rows. |
group_metadata | Return grouped BED metadata for this end-motif output. |
group_idx | Find the end-motif row index for a group name. |
dense_counts_array | Return grouped end-motif counts as a dense NumPy array. |
sparse_counts_matrix | Return grouped end-motif counts as a SciPy sparse matrix. |
GroupedEndMotifCounts.data_frame
GroupedEndMotifCounts.data_frame
GroupedEndMotifCounts.data_frame(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, densify: bool = False, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, max_blacklisted_fraction: float = 1.0) -> pd.DataFrame
Create a pandas DataFrame of end-motif counts for grouped BED rows.
Use groups or group_idxs to keep only selected groups and motifs
or motif_idxs to keep only selected motifs. Sparse outputs return
stored non-zero rows unless densify=True. Densifying adds explicit
zero-count rows for selected observed motifs. Dense outputs always
include zero counts.
Parameters
groups:Nonefor all groups, one group name, or a sequence of group names. Use eithergroupsorgroup_idxs, not both.group_idxs:Nonefor all groups, one group index, or a sequence of group indices. Use eithergroupsorgroup_idxs, not both.densify: IfTrue, sparse outputs add explicit zero-count rows for selected observed motifs. Dense outputs ignore this option.motifs: Motif label or labels. Use eithermotifsormotif_idxs, not both.motif_idxs: Motif index or indices. Use eithermotifsormotif_idxs, not both.max_blacklisted_fraction: Maximum rowblacklisted_fractionin 0..1 to retain before counts are returned. The default1.0keeps all selected groups.
Returns
pandas.DataFrame: Group metadata, motif metadata, andcount.
GroupedEndMotifCounts.group_metadata
GroupedEndMotifCounts.group_metadata
GroupedEndMotifCounts.group_metadata() -> pd.DataFrame
Return grouped BED metadata for this end-motif output.
Returns
pandas.DataFrame: Columns aregroup_idx,group_name,eligible_windows, andblacklisted_fraction.
GroupedEndMotifCounts.group_idx
GroupedEndMotifCounts.group_idx
GroupedEndMotifCounts.group_idx(group_name: str) -> int
Find the end-motif row index for a group name.
Parameters
group_name: Group name to resolve.
Returns
int: Group index.
GroupedEndMotifCounts.dense_counts_array
GroupedEndMotifCounts.dense_counts_array
GroupedEndMotifCounts.dense_counts_array(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None, allow_densify: bool = False) -> np.ndarray
Return grouped end-motif counts as a dense NumPy array.
Sparse stores are only densified when allow_densify=True. Scalar
selectors keep their axes as length one, so the shape is always
(selected groups, selected motifs).
Parameters
groups:Nonefor all groups, one group name, or a sequence of group names. Use eithergroupsorgroup_idxs, not both.group_idxs:Nonefor all groups, one group index, or a sequence of group indices. Use eithergroupsorgroup_idxs, not both.motifs: Motif label or labels. Use eithermotifsormotif_idxs, not both.motif_idxs: Motif index or indices. Use eithermotifsormotif_idxs, not both.allow_densify: IfTrue, allow sparse stores to be converted to dense counts.
Returns
numpy.ndarray: Dense count array with shape(group, motif).
GroupedEndMotifCounts.sparse_counts_matrix
GroupedEndMotifCounts.sparse_counts_matrix
GroupedEndMotifCounts.sparse_counts_matrix(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, motifs: str | Sequence[str] | None = None, motif_idxs: int | Sequence[int] | None = None) -> sparse.coo_matrix
Return grouped end-motif counts as a SciPy sparse matrix.
Scalar selectors keep their axes as length one, so the shape is always
(selected groups, selected motifs).
Parameters
groups:Nonefor all groups, one group name, or a sequence of group names. Use eithergroupsorgroup_idxs, not both.group_idxs:Nonefor all groups, one group index, or a sequence of group indices. Use eithergroupsorgroup_idxs, not both.motifs: Motif label or labels. Use eithermotifsormotif_idxs, not both.motif_idxs: Motif index or indices. Use eithermotifsormotif_idxs, not both.
Returns
scipy.sparse.coo_matrix: Sparse count matrix with shape(group, motif).
Length Counts
Load fragment length-count TSV outputs and return counts, fractions, or densities as arrays, matrices, vectors, or data frames.
| Symbol | Type | Summary |
|---|---|---|
read_lengths | function | Read a cfDNAlab length-count TSV and return the matching loader class. |
LengthCounts | class | Common API for global, windowed, and grouped length-count outputs. |
GlobalLengthCounts | class | Length counts for global output. |
WindowedLengthCounts | class | Length counts for fixed-size or BED-window output. |
GroupedLengthCounts | class | Length counts for grouped BED output. |
read_lengths
read_lengths(path: pathlib.Path | str) -> GlobalLengthCounts | WindowedLengthCounts | GroupedLengthCounts
Read a cfDNAlab length-count TSV and return the matching loader class.
Parameters
path: Path to a.length_counts.tsvor.length_counts.tsv.zstfile.
Returns
LengthCounts:GlobalLengthCounts,WindowedLengthCounts, orGroupedLengthCounts, depending on the TSV metadata columns.
LengthCounts
Common API for global, windowed, and grouped length-count outputs.
Public Methods
| Method | Summary |
|---|---|
length_bins | Return fragment length bin definitions used by the count columns. |
length_bin_idx | Find the length-bin index whose interval contains a fragment length. |
counts_array | Return raw length counts as a dense NumPy array. |
LengthCounts.length_bins
LengthCounts.length_bins
LengthCounts.length_bins() -> pd.DataFrame
Return fragment length bin definitions used by the count columns.
Length bins are half-open intervals. A bin with length_start_bp=30
and length_end_bp=50 contains fragment lengths 30 <= length < 50.
Returns
pandas.DataFrame: Columns arelength_bin,length_start_bp,length_end_bp,length_midpoint_bp, andlength_width_bp.
LengthCounts.length_bin_idx
LengthCounts.length_bin_idx
LengthCounts.length_bin_idx(length: int) -> int
Find the length-bin index whose interval contains a fragment length.
Parameters
length: Fragment length in bp.
Returns
int: Length-bin index.
Raises
KeyError: If no length bin containslength.
LengthCounts.counts_array
LengthCounts.counts_array
LengthCounts.counts_array(*, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> np.ndarray
Return raw length counts as a dense NumPy array.
Use with_lengths, with_length_range, or length_bin_idxs to select
length bins. Range selection uses whole bins overlapping the half-open
[start, end) bp range.
Parameters
with_lengths: Fragment length or lengths in bp. Counts are returned for the length bins containing these lengths. Multiple lengths must select distinct length bins.with_length_range: Two bp bounds defining a half-open range[start, end).length_bin_idxs:Nonefor all length bins, one length-bin index, or a sequence of length-bin indices. Use only one ofwith_lengths,with_length_range, orlength_bin_idxs.
Returns
numpy.ndarray: Count array with shape(output row, length_bin). Output rows are windows for windowed output, groups for grouped output, and the single global summary row for global output.
GlobalLengthCounts
Length counts for global output.
Public Methods
| Method | Summary |
|---|---|
data_frame | Create a pandas DataFrame for the global fragment length distribution. |
GlobalLengthCounts.data_frame
GlobalLengthCounts.data_frame
GlobalLengthCounts.data_frame(*, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None, value: str = 'count', denominator: str = 'all_bins', keep_wide: bool = False) -> pd.DataFrame
Create a pandas DataFrame for the global fragment length distribution.
Long output has one row per length bin with bin metadata. Wide output has one row with one value column per length bin.
Parameters
with_lengths: Fragment length or lengths in bp. Returned values use the length bins containing these lengths. Multiple lengths must select distinct length bins.with_length_range: Two bp bounds defining a half-open range[start, end). Returned values use whole length bins that overlap this range.length_bin_idxs:Nonefor all length bins, one length-bin index, or a sequence of length-bin indices. Use only one ofwith_lengths,with_length_range, orlength_bin_idxs.value: One of"count","fraction", or"density". Fractions are within the global row. Densities are fractions divided by the length-bin width.denominator: For"fraction"and"density","all_bins"divides by the row total over all length bins, while"selected_bins"divides by the total over the returned length bins. Ignored for"count".keep_wide: IfFalse, return one row per length bin. IfTrue, return one row with one value column per length bin.
Returns
pandas.DataFrame: Global length-count values with length-bin metadata for long output or value-prefixed columns for wide output.
WindowedLengthCounts
Length counts for fixed-size or BED-window output.
Public Methods
| Method | Summary |
|---|---|
window_metadata | Return genomic window metadata for this length-count output. |
counts_array | Return raw length counts as a dense NumPy array. |
data_frame | Create a pandas DataFrame of fragment length distributions for windows. |
WindowedLengthCounts.window_metadata
WindowedLengthCounts.window_metadata
WindowedLengthCounts.window_metadata() -> pd.DataFrame
Return genomic window metadata for this length-count output.
Returns
pandas.DataFrame: Columns arewindow_idx,chrom,start,end, and optionallyblacklisted_fraction.
WindowedLengthCounts.counts_array
WindowedLengthCounts.counts_array
WindowedLengthCounts.counts_array(*, window_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> np.ndarray
Return raw length counts as a dense NumPy array.
Scalar selectors keep their axis as length one, so the shape is always
(selected windows, length_bin).
Parameters
window_idxs:Nonefor all windows, one window index, or a sequence of window indices.with_lengths: Fragment length or lengths in bp. Counts are returned for the length bins containing these lengths. Multiple lengths must select distinct length bins.with_length_range: Two bp bounds defining a half-open range[start, end).length_bin_idxs:Nonefor all length bins, one length-bin index, or a sequence of length-bin indices. Use only one ofwith_lengths,with_length_range, orlength_bin_idxs.
Returns
numpy.ndarray: Count array with shape(window, length_bin).
WindowedLengthCounts.data_frame
WindowedLengthCounts.data_frame
WindowedLengthCounts.data_frame(*, window_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None, value: str = 'count', denominator: str = 'all_bins', keep_wide: bool = False, max_blacklisted_fraction: float = 1.0) -> pd.DataFrame
Create a pandas DataFrame of fragment length distributions for windows.
Use window_idxs to keep only selected genomic windows. Long output has
one row per selected window and length bin. Wide output has one row per
selected window with one value column per length bin.
Parameters
window_idxs:Nonefor all windows, a window index, or a sequence of window indices.with_lengths: Fragment length or lengths in bp. Returned values use the length bins containing these lengths. Multiple lengths must select distinct length bins.with_length_range: Two bp bounds defining a half-open range[start, end). Returned values use whole length bins that overlap this range.length_bin_idxs:Nonefor all length bins, one length-bin index, or a sequence of length-bin indices. Use only one ofwith_lengths,with_length_range, orlength_bin_idxs.value: One of"count","fraction", or"density". Fractions are within each selected window. Densities are fractions divided by the length-bin width.denominator: For"fraction"and"density","all_bins"divides by each row's total over all length bins, while"selected_bins"divides by the total over the returned length bins. Ignored for"count".keep_wide: IfFalse, return one row per selected window and length bin. IfTrue, return one row per selected window with one value column per length bin.max_blacklisted_fraction: Maximumblacklisted_fractionin 0..1 to keep. The default1.0keeps all selected windows.
Returns
pandas.DataFrame: Window metadata and length-count values.
GroupedLengthCounts
Length counts for grouped BED output.
Public Methods
| Method | Summary |
|---|---|
group_metadata | Return grouped BED metadata for this length-count output. |
group_idx | Find the count-row index for a group name. |
counts_array | Return raw length counts as a dense NumPy array. |
data_frame | Create a pandas DataFrame of fragment length distributions for groups. |
GroupedLengthCounts.group_metadata
GroupedLengthCounts.group_metadata
GroupedLengthCounts.group_metadata() -> pd.DataFrame
Return grouped BED metadata for this length-count output.
Returns
pandas.DataFrame: Columns aregroup_idx,group_name,eligible_windows, and optionallyblacklisted_fraction.
GroupedLengthCounts.group_idx
GroupedLengthCounts.group_idx
GroupedLengthCounts.group_idx(group_name: str) -> int
Find the count-row index for a group name.
Parameters
group_name: Group name to resolve.
Returns
int: Group index.
GroupedLengthCounts.counts_array
GroupedLengthCounts.counts_array
GroupedLengthCounts.counts_array(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None) -> np.ndarray
Return raw length counts as a dense NumPy array.
Scalar selectors keep their axis as length one, so the shape is always
(selected groups, length_bin).
Parameters
groups:Nonefor all groups, one group name, or a sequence of group names. Use eithergroupsorgroup_idxs, not both.group_idxs:Nonefor all groups, one group index, or a sequence of group indices. Use eithergroupsorgroup_idxs, not both.with_lengths: Fragment length or lengths in bp. Counts are returned for the length bins containing these lengths. Multiple lengths must select distinct length bins.with_length_range: Two bp bounds defining a half-open range[start, end).length_bin_idxs:Nonefor all length bins, one length-bin index, or a sequence of length-bin indices. Use only one ofwith_lengths,with_length_range, orlength_bin_idxs.
Returns
numpy.ndarray: Count array with shape(group, length_bin).
GroupedLengthCounts.data_frame
GroupedLengthCounts.data_frame
GroupedLengthCounts.data_frame(*, groups: str | Sequence[str] | None = None, group_idxs: int | Sequence[int] | None = None, with_lengths: int | Sequence[int] | None = None, with_length_range: Sequence[int] | None = None, length_bin_idxs: int | Sequence[int] | None = None, value: str = 'count', denominator: str = 'all_bins', keep_wide: bool = False, max_blacklisted_fraction: float = 1.0) -> pd.DataFrame
Create a pandas DataFrame of fragment length distributions for groups.
Use groups or group_idxs to keep only selected grouped BED rows.
Long output has one row per selected group and length bin. Wide output
has one row per selected group with one value column per length bin.
Parameters
groups:Nonefor all groups, one group name, or a sequence of group names. Use eithergroupsorgroup_idxs, not both.group_idxs:Nonefor all groups, one group index, or a sequence of group indices. Use eithergroupsorgroup_idxs, not both.with_lengths: Fragment length or lengths in bp. Returned values use the length bins containing these lengths. Multiple lengths must select distinct length bins.with_length_range: Two bp bounds defining a half-open range[start, end). Returned values use whole length bins that overlap this range.length_bin_idxs:Nonefor all length bins, one length-bin index, or a sequence of length-bin indices. Use only one ofwith_lengths,with_length_range, orlength_bin_idxs.value: One of"count","fraction", or"density". Fractions are within each selected group. Densities are fractions divided by the length-bin width.denominator: For"fraction"and"density","all_bins"divides by each row's total over all length bins, while"selected_bins"divides by the total over the returned length bins. Ignored for"count".keep_wide: IfFalse, return one row per selected group and length bin. IfTrue, return one row per selected group with one value column per length bin.max_blacklisted_fraction: Maximumblacklisted_fractionin 0..1 to keep. The default1.0keeps all selected groups.
Returns
pandas.DataFrame: Group metadata and length-count values.