Input/Output

We provide two simple functions to download_data from google buckets and read in anndatada from csv files. We stress that the recomanded format for reading/writing anndata object is the .h5ad format.

download_from_bucket(bucket_name: str, source_path: str, destination_path: str)[source]

Helper function to download files from google buckets.

Parameters:
  • bucket_name – the name of the google bucket. For example: “ld-data-bucket”

  • source_path – path to the file in the bucket. For example “tissue-mosaic/slideseq_testis_anndata_h5ad.tar.gz”

  • destination_path – path in the local filesystem to save file. For example “my_dir/my_file_h5ad.tar.gz”

anndata_from_expression_csv(filename: str, key: str, transpose: bool, top_n_rows: int = None)[source]

Read a csv file with the expression data (i.e. count matrix) and returns an anndata object. To be used when your collaborators give you a .csv file instead of a .h5ad file.

If transpose == False: The csv is expected to have a header: ‘barcode’, ‘gene_name_1’, …, ‘gene_name_N’. Each entry is expected to be something-like: ACCDAT, 2, 0, …., 1

If transpose == True: The csv is expected to have a header: ‘gene’, ‘barcode_name_1’, …, ‘barcode_name_N’. Each entry is expected to be something-like: Arhgap18, 2, 0, …., 1

Parameters:
  • filename – the path to the csv file to read

  • key – the column name associated with the observations. It defaults to ‘barcode’ is transpose == False and ‘gene’ if transpose == True.

  • transpose – bool, whether the matrix is gene_by_cell or cell_by_gene

  • top_n_rows – int, the number of the top rows to read. Set to a small value (like 20) for debugging.

Note

The output will always be cell_by_gene (i.e. cells=obs, genes=var) regardless the value of transpose

Returns:

adata – An anndata object with (i) anndata.X the counts in a scipy Compressed Sparse Row format (ii) anndata.obs the observation name (often the cellular barcodes) (iii) anndata.var the variable names (often the gene names)