Single-Cell Whole Genome Analysis in Python.

API#

Import scgenome as:

import scgenome

Preprocessing: pp#

Data loading and pre-processing functionality.

Data loading#

pp.read_dlp_hmmcopy(alignment_results_dir, ...)

Read hmmcopy results from the DLP pipeline.

pp.convert_dlp_hmmcopy(metrics_data, cn_data)

Convert hmmcopy pandas dataframes to anndata

pp.convert_dlp_signals(hscn, metrics_data)

Convert signals pandas dataframes to anndata

pp.read_bam_bin_counts(bins, bams[, excluded])

Count reads in bins from bams

pp.read_snv_genotyping(filename)

Read SNV genotyping into an AnnData

Filtering#

pp.filter_cells(adata[, filters, inplace])

Filter poor quality cells based on the filters provided.

pp.calculate_filter_metrics(adata[, ...])

Calculate additional filtering metrics to be used by other filtering methods.

Tools: tl#

Any transformation of the data matrix that is not preprocessing. In contrast to a preprocessing function, a tool usually adds an easily interpretable annotation to the data matrix, which can then be visualized with a corresponding plotting function.

Clustering#

tl.cluster_cells(adata[, layer_name, ...])

Cluster cells by copy number.

tl.aggregate_clusters_hmmcopy(adata)

Aggregate hmmcopy copy number by cluster to create cluster CN matrix

tl.aggregate_clusters(adata[, agg_X, ...])

Aggregate copy number by cluster to create cluster CN matrix

tl.sort_cells(adata[, layer_name, cell_ids, ...])

Sort cells by hierarchical clustering on copy number values.

Embeddings#

tl.compute_umap(adata[, layer_name, ...])

Cluster cells by copy number.

tl.pca_loadings(adata[, layer, ...])

Compute PCA loadings matrix

Generating binned data#

tl.create_bins(binsize)

Create a regular binning of the genome

tl.count_gc(bins, genome_fasta[, ...])

Count gc in each bin

tl.mean_from_bigwig(bins, bigwig_file, ...)

Count gc in each bin

Gene regions#

tl.read_ensemble_genes_gtf(gtf_filename)

Read an ensembl gtf and extract gene start end

tl.aggregate_genes(adata, genes[, ...])

Aggregate copy number by gene to create gene CN matrix

Phylogenetics#

tl.prune_leaves(tree, f)

tl.align_cn_tree(tree, adata)

Anndata Manipulation#

tl.ad_concat_cells(adatas)

Concatenate a list of anndata by obs (cells)

Plotting: pl#

The plotting module scgenome.pl largely parallels the tl.* and a few of the pp.* functions. For most tools and for some preprocessing functions, you’ll find a plotting function with the same name.

Note

TODO: more plotting functions matching tools

Copy number profiles and heatmaps#

pl.plot_cn_profile(adata, obs_id[, ...])

Plot scatter points of copy number across the genome or a chromosome.

pl.plot_cell_cn_matrix(adata[, layer_name, ...])

Plot a copy number matrix

pl.plot_cell_cn_matrix_fig(adata[, ...])

Plot a copy number matrix

pl.plot_gc_reads(adata, obs_id, **kwargs)

Plot scatter points of gc by read count.

Phylogenetics#

pl.plot_tree_cn(tree, adata[, ...])

Plot a tree aligned to a CN values matrix heatmap