Input/Output¶

We provide two simple functions to download_data from google buckets and read in anndatada from csv files. We stress that the recomanded format for reading/writing anndata object is the .h5ad format.

download_from_bucket(bucket_name: str, source_path: str, destination_path: str)[source]¶

Helper function to download files from google buckets.

Parameters:

bucket_name – the name of the google bucket. For example: “ld-data-bucket”
source_path – path to the file in the bucket. For example “tissue-mosaic/slideseq_testis_anndata_h5ad.tar.gz”
destination_path – path in the local filesystem to save file. For example “my_dir/my_file_h5ad.tar.gz”

anndata_from_expression_csv(filename: str, key: str, transpose: bool, top_n_rows: int = None)[source]¶

Read a csv file with the expression data (i.e. count matrix) and returns an anndata object. To be used when your collaborators give you a .csv file instead of a .h5ad file.

If transpose == False: The csv is expected to have a header: ‘barcode’, ‘gene_name_1’, …, ‘gene_name_N’. Each entry is expected to be something-like: ACCDAT, 2, 0, …., 1

If transpose == True: The csv is expected to have a header: ‘gene’, ‘barcode_name_1’, …, ‘barcode_name_N’. Each entry is expected to be something-like: Arhgap18, 2, 0, …., 1

Parameters:

filename – the path to the csv file to read
key – the column name associated with the observations. It defaults to ‘barcode’ is transpose == False and ‘gene’ if transpose == True.
transpose – bool, whether the matrix is gene_by_cell or cell_by_gene
top_n_rows – int, the number of the top rows to read. Set to a small value (like 20) for debugging.

Note

The output will always be cell_by_gene (i.e. cells=obs, genes=var) regardless the value of transpose

Returns:: adata – An anndata object with (i) anndata.X the counts in a scipy Compressed Sparse Row format (ii) anndata.obs the observation name (often the cellular barcodes) (iii) anndata.var the variable names (often the gene names)