cas package

Subpackages

Submodules

cas.abc_cas_converter module

cas.abc_cas_converter.validate_dataframe_columns(df, required_columns)[source]

Validates a DataFrame for the required columns. :type df: DataFrame :param df: DataFrame to validate. :type df: pandas.DataFrame :type required_columns: list :param required_columns: List of required column names. :type required_columns: list

Parameters:
  • df (DataFrame)

  • required_columns (list)

cas.abc_cas_converter.generate_catset_dataframe(cas)[source]

Generate a DataFrame representing the Cluster Annotation Term Set (cat_set) from the given Cell Annotation Schema (CAS) dictionary.

Parameters:

cas (Dict[str, Any]) – The Cell Annotation Schema (CAS) dictionary.

Returns:

DataFrame representing the Cluster Annotation Term Set (cat_set).

Return type:

pd.DataFrame

cas.abc_cas_converter.generate_cat_dataframe(cas)[source]

Generate a DataFrame representing the Cluster Annotation Term (cat) from the given Cell Annotation Schema (CAS) dictionary.

Parameters:

cas (Dict[str, Any]) – The Cell Annotation Schema (CAS) dictionary.

Returns:

DataFrame representing the Cluster Annotation Term (cat).

Return type:

pd.DataFrame

cas.abc_cas_converter.calculate_order_mapping(order_values)[source]

Calculate a mapping dictionary based on the order values.

Parameters:

order_values (pandas.Series) – Series containing the order values.

Returns:

Mapping dictionary where keys are order values and values are rank values.

Return type:

Dict[str, str]

cas.abc_cas_converter.abc2cas(cat_set_file_path, cat_file_path, output_file_path)[source]

Converts given ABC files to a Cell Annotation Schema (CAS) JSON and writes it to a file with output_file_path name. :type cat_set_file_path: str :param cat_set_file_path: Path to the Cluster Annotation Term Set file. :type cat_file_path: str :param cat_file_path: Path to the Cluster Annotation Term file. :type output_file_path: str :param output_file_path: Output CAS file name (default: output.json).

Parameters:
  • cat_set_file_path (str)

  • cat_file_path (str)

  • output_file_path (str)

cas.abc_cas_converter.add_annotations(cas, cat)[source]

Adds annotations to the Cell Annotation Schema (CAS) based on the data from the Cluster Annotation Term DataFrame.

Parameters:
  • cas (Dict[str, Any]) – Dictionary representing the Cell Annotation Schema.

  • cat (pd.DataFrame) – DataFrame containing Cluster Annotation Term data.

cas.abc_cas_converter.add_labelsets(cas, cat_set)[source]

Adds labelsets to the Cell Annotation Schema (CAS) based on the data from the Cluster Annotation Term Set DataFrame.

Parameters:
  • cas (Dict[str, Any]) – Cell Annotation Schema dictionary.

  • cat_set (pandas.DataFrame) – DataFrame containing Cluster Annotation Term Set data.

cas.abc_cas_converter.init_metadata()[source]

Initializes metadata for Cell Annotation Schema (CAS).

Returns:

Metadata dictionary containing default values for various fields.

Return type:

Dict[str, Any]

cas.abc_cas_converter.cas2abc(cas_file_path, cat_set_file_path, cat_file_path)[source]

Converts given Cell Annotation Schema (CAS) to ABC files: cluster_annotation_term and cluster_annotation_term_set, and writes them to files with cat_file_path and cat_set_file_path.

Parameters:
  • cas_file_path (str) – Path to the Cell Annotation Schema (CAS) file

  • cat_set_file_path (str) – Path to the Cluster Annotation Term Set file.

  • cat_file_path (str) – Path to the Cluster Annotation Term file.

cas.add_author_annotations module

cas.add_author_annotations.add_author_annotations(cas, df, join_column, columns=None)[source]

Updates the provided CAS dictionary with author annotation fields from a DataFrame. Annotations are based on the values matched between the DataFrame’s specified join columns and the CAS records. All columns or specified columns are added as annotations.

Parameters:
  • cas (Dict[str, Any]) – The CAS dictionary loaded from a JSON file.

  • df (DataFrame) – DataFrame containing the data to join with the CAS.

  • join_column (Union[str, List[str]]) – The column(s) in the DataFrame used for matching CAS records; can be a single column or a list of columns.

  • columns (Optional[List[str]]) – Optional list of columns whose data will be added as annotations. If None, all DataFrame columns are used.

Returns:

The CAS dictionary updated with new annotation fields.

Return type:

Dict[str, Any]

cas.add_author_annotations.add_author_annotations_from_file(cas_json, csv_path, join_column, columns=None, output_file_path='output.json')[source]

Reads data from a CSV file and a CAS JSON file, then updates the CAS JSON with author annotation fields, and outputs the updated CAS JSON to a specified file. It uses specified columns or all columns if none are specified.

Parameters:
  • cas_json (str) – Path to the CAS JSON file.

  • csv_path (str) – Path to the CSV file.

  • join_column (Union[str, List[str]]) – Column name or names used for matching CAS records.

  • columns (Optional[List[str]]) – Optional columns to be added as annotations; if None, all columns are used.

  • output_file_path (str) – Output CAS file name.

Returns:

The updated CAS JSON dictionary.

Return type:

Dict[str, Any]

cas.add_author_annotations.validate_columns(df, join_column, columns)[source]

Validates the existence of key columns and optionally other specified columns in a DataFrame.

Parameters:
  • df (DataFrame) – DataFrame to check.

  • join_column (Union[str, List[str]]) – Column or columns that must exist in the DataFrame.

  • columns (Optional[List[str]]) – Additional column names that must exist if specified; checks only join_column if None.

Raises:

ValueError – If any specified columns, including the join column(s), are not found in the DataFrame.

cas.add_author_annotations.validate_values(df, join_column, cas)[source]

Validates that all keys generated from specified DataFrame columns exist within the CAS annotations. If there are any keys in the DataFrame that are not found in the CAS annotations, it raises an exception indicating these are extra, undesired keys.

Parameters:
  • df (pd.DataFrame) – DataFrame containing the data to validate.

  • join_column (Union[str, List[str]]) – Column or list of columns in the DataFrame used to form the keys. If a list of two strings is provided, these columns are concatenated to form the keys.

  • cas (Dict[str, Any]) – Dictionary containing CAS annotations, each annotation expected to contain the keys.

Raises:

ValueError – If there are extra keys in the DataFrame that do not exist in the CAS annotations.

cas.add_author_annotations.dataframe_to_dict(df, join_column, labelset, filter_value, columns)[source]

Converts specified columns of a DataFrame into a dictionary. If columns are not specified, the entire DataFrame for rows matching the condition is converted.

Parameters:
  • df (DataFrame) – DataFrame to extract data from.

  • join_column (Union[str, List[str]]) – The column or columns used to filter the DataFrame based on cell_label.

  • labelset (str) – Labelset of a CAS annotation.

  • filter_value (str) – The value in join_column used to filter rows.

  • columns (Optional[List[str]]) – List of column names to convert to dictionary. If None, all columns are used.

Returns:

Dictionary with column names as keys and single values or lists of values as values.

Return type:

dict

cas.anndata_conversion module

cas.anndata_conversion.merge(cas_file_path, anndata_path, validate, output_file_name)[source]

Tests if CAS json and AnnData are compatible and merges CAS into AnnData if possible.

This function performs the following checks:
  1. Verifies that all cell barcodes (cell IDs) in CAS exist in AnnData and vice versa.

  2. Identifies matching labelset names between CAS and AnnData.

  3. Validates that cell sets associated with each annotation match between CAS and AnnData.

  4. Checks if the cell labels are identical; if not, provides options to update or terminate.

Parameters:
  • cas_file_path (str) – The path to the CAS json file.

  • anndata_path (Optional[str]) – The path to the AnnData file.

  • validate (bool) – Boolean to determine if validation checks will be performed before writing to the output AnnData file.

  • output_file_name (str) – Output AnnData file name.

cas.anndata_conversion.merge_cas_object(input_json, anndata_file_path, validate, output_file_path, download_dir=None)[source]

Tests if CAS json and AnnData are compatible and merges CAS into AnnData if possible.

This function performs the following checks:
  1. Verifies that all cell barcodes (cell IDs) in CAS exist in AnnData and vice versa.

  2. Identifies matching labelset names between CAS and AnnData.

  3. Validates that cell sets associated with each annotation match between CAS and AnnData.

  4. Checks if the cell labels are identical; if not, provides options to update or terminate.

Parameters:
  • input_json (dict) – The CAS json object.

  • anndata_file_path (Optional[str]) – The path to the AnnData file.

  • validate (bool) – Boolean to determine if validation checks will be performed before writing to the output AnnData file.

  • output_file_path (str) – Output AnnData file name.

  • download_dir (Optional[str]) – The directory to download AnnData files.

cas.anndata_conversion.test_compatibility(anndata_obs, input_json, validate)[source]

Tests if CAS and AnnData can be merged.

Args:

anndata_obs: The AnnData obs object. input_json: The CAS data json object. validate: Boolean to determine if validation checks will be performed before writing to the output AnnData file.

cas.anndata_conversion.check_labelsets(cas_json, input_obs, matching_obs_keys, validate)[source]
cas.anndata_conversion.get_matching_obs_keys(obs_keys, cas_labelsets)[source]
cas.anndata_conversion.handle_matching_labelset(ann, cell_label, input_obs, validate)[source]
cas.anndata_conversion.handle_non_matching_labelset(ann, input_obs, validate, derived_cell_ids)[source]
cas.anndata_conversion.validate_cell_ids(anndata_cell_ids, annotations, validate)[source]

cas.anndata_splitter module

cas.anndata_splitter.split_anndata_to_file(anndata_file_path, cas_json_paths, multiple_outputs, compression_method='gzip')[source]

Splits an AnnData file into multiple files based on provided CAS JSON files and writes them to disk.

Parameters:
  • anndata_file_path (Optional[str]) – Path to the AnnData file.

  • cas_json_paths (List[str]) – List of CAS JSON file paths.

  • multiple_outputs (bool) – If True, outputs multiple files, one for each CAS JSON file; otherwise, outputs a single file.

  • compression_method (Optional[Literal['gzip', 'lzf']]) – Compression method utilized in anndata write function. Default is “gzip”.

cas.anndata_splitter.split_anndata(adata, cas, multiple_outputs)[source]

Splits an AnnData object into multiple or single AnnData objects based on the provided CAS data.

Parameters:
  • adata (AnnData) – AnnData object.

  • cas (Dict[str, Dict[str, Any]]) – Dictionary representing the CAS data with its file name as keys.

  • multiple_outputs (bool) – Determines if the output should be multiple AnnData objects or a single one.

Return type:

List[AnnData]

Returns:

A list of AnnData objects if multiple_outputs is True, otherwise a single AnnData object.

Raises:

ValueError – If any required terms do not exist in the CAS data under ‘parent_cell_set_name’.

cas.anndata_to_cas module

cas.anndata_to_cas.anndata2cas(anndata_file_path, labelsets, output_file_path, include_hierarchy, accession_columns=None)[source]

Convert an AnnData file to Cell Annotation Schema (CAS) JSON.

Parameters:
  • anndata_file_path (str) – Path to the AnnData file.

  • labelsets (List[str]) – List of labelsets, which are names of observation (obs) fields used to record author

  • order (cell type names. The labelsets should be provided in)

  • 0 (starting from rank)

  • ranks. (to higher)

  • output_file_path (str) – Output CAS file name.

  • include_hierarchy (bool) – Flag indicating whether to include hierarchy in the output.

  • accession_columns (List[str], optional) – List of columns in the AnnData obs that contain accession information. If provided, these columns will be used to populate the ‘cell_set_accession’ field in the CAS annotations. Otherwise, accession IDs will be automatically generated using a hash of the cells in each cell set. Defaults to None.

cas.anndata_to_cas.generate_cas_metadata(uns)[source]

Generates CAS metadata based on the provided ‘uns’ dictionary.

Parameters:

uns (Dict[str, Any]) – The ‘uns’ dictionary containing metadata.

Returns:

The generated CAS metadata dictionary.

Return type:

Dict[str, Any]

cas.anndata_to_cas.add_annotations_to_cas(cas, labelset_dict, parent_cell_look_up)[source]

Generates CAS annotations based on the provided AnnData object and updates the CAS dictionary with new annotations. This function can optionally use a precomputed parent cell lookup dictionary to enrich the annotations with hierarchical information.

Parameters:
  • cas (Dict[str, Any]) – The CAS dictionary to be updated with annotations. Expected to have a key ‘annotations’ where new annotations will be appended.

  • labelset_dict (Dict[str, Any]) – A dictionary defining labelsets and their members. This is used to match cell labels with their respective metadata and annotations.

  • parent_cell_look_up (Dict[str, Any]) – A precomputed dictionary containing hierarchical metadata about cell labels.

Returns:

The function directly updates the cas dictionary with new annotations. The parent_cell_look_up is used for enrichment and must be generated beforehand if hierarchical information is to be included.

Return type:

None

cas.cas_splitter module

cas.cas_splitter.split_cas_to_file(cas_json_path, split_terms, multiple_outputs)[source]

Splits a CAS JSON file into files based on provided terms, and writes them to disk.

Parameters:
  • cas_json_path (str) – Path to the CAS JSON file.

  • split_terms (Union[List[str], str]) – Terms used to determine how to split the CAS file; can be a string or a list of strings.

  • multiple_outputs (bool) – If True, outputs multiple files, one for each split term; otherwise, outputs a single file.

cas.cas_splitter.split_cas(cas, split_terms, multiple_outputs)[source]

Splits a CAS dictionary into multiple or single dictionary based on split terms.

Parameters:
  • cas (Dict[str, Any]) – Dictionary representing the CAS data.

  • split_terms (Union[List[str], str]) – Terms used to filter and split the CAS data; can be a string or a list of strings.

  • multiple_outputs (bool) – Determines if the output should be multiple dictionaries or a single dictionary.

Return type:

Union[List[Dict[str, Any]], Dict[str, Any]]

Returns:

A list of dictionaries if multiple_outputs is True, otherwise a single dictionary.

Raises:

ValueError – If any split_terms do not exist in the CAS data under ‘parent_cell_set_name’.

cas.cas_splitter.filter_and_copy_cas_entries(cas, label_to_copy_list)[source]

Copies entries from the CAS based on a list of labels to copy.

Parameters:
  • cas (Dict[str, Any]) – Dictionary representing the original CAS data.

  • label_to_copy_list (List[str]) – List of labels indicating which entries to copy.

Return type:

Dict[str, Any]

Returns:

A dictionary with filtered CAS entries.

cas.cas_splitter.get_split_terms(parent_dict, split_terms)[source]

Resolves split terms into a comprehensive list of terms based on a parent-child relationship dictionary.

Parameters:
  • parent_dict (Dict[str, List[str]]) – Dictionary mapping parent terms to lists of child terms.

  • split_terms (Union[List[str], str]) – Initial terms to resolve, can be a string or a list of strings.

Return type:

List[str]

Returns:

A list of all terms, resolved from the parent_dict.

cas.cas_to_rdf module

cas.cas_to_rdf.export_to_rdf(cas_schema, data, ontology_namespace, ontology_iri, output_path=None, validate=True, include_cells=True)[source]

Generates and returns an RDF graph from the provided data and CAS schema, with an option to write the RDF graph to a file. :type cas_schema: Union[str, dict, None] :param cas_schema: Name of the CAS release (such as base, cap, bican), path to the CAS schema file, or CAS schema JSON object.

If not provided, reads the base CAS schema from the CAS module.

Parameters:
  • data (Union[str, dict]) – The data JSON file path or JSON object dictionary.

  • ontology_namespace (str) – Ontology namespace (e.g., MTG).

  • ontology_iri (str) – Ontology IRI (e.g., https://purl.brain-bican.org/ontology/AIT_MTG/).

  • labelsets (Optional[List[str]]) – Labelsets used in the taxonomy, such as [“Cluster”, “Subclass”, “Class”].

  • output_path (Optional[str]) – Path to the output RDF file, if specified.

  • validate (bool) – Determines if data-schema validation checks will be performed. True by default.

  • include_cells (bool) – Determines if cell data will be included in the RDF output. True by default.

  • cas_schema (Optional[Union[str, dict]])

Return type:

Graph

Returns:

An RDFlib graph object.

cas.cxg_utils module

cxg_utils.py

This module provides utility functions for working with AnnData datasets in the context of the CellxGene Census library.

cas.cxg_utils.download_dataset_with_id(dataset_id, file_path=None)[source]

Download an AnnData dataset with the specified ID.

Parameters:
  • dataset_id (str) – The ID of the dataset to download.

  • file_path (Optional[str], optional) – The file path to save the downloaded AnnData. If not provided, the dataset will be saved in the current working directory with the dataset_id as the file name. Supports both absolute and relative paths.

Returns:

The path to the downloaded AnnData dataset

Return type:

str

cas.file_utils module

cas.file_utils.read_json_file(file_path)[source]

Reads and parses a JSON file into a Python dictionary.

Parameters:

file_path (str) – The path to the JSON file.

Returns:

The JSON data as a Python dictionary.

Return type:

dict

Returns None if the file does not exist or if there is an issue parsing the JSON content.

Example

json_data = read_json_file(‘path/to/your/file.json’) if json_data is not None:

# Use the parsed JSON data as a dictionary print(json_data)

cas.file_utils.read_cas_json_file(file_path)[source]

Reads and parses a JSON file into a CAS object.

Parameters:

file_path (str) – The path to the JSON file.

Returns:

The JSON data as a CAS object.

Return type:

dict

cas.file_utils.read_cas_from_anndata(anndata_path)[source]

Reads the CAS json from the anndata uns and parses into a CAS object. :type anndata_path: str :param anndata_path: The path to the Anndata file.

Return type:

CellTypeAnnotation

Returns:

CellTypeAnnotation object.

Parameters:

anndata_path (str)

cas.file_utils.write_json_file(cas, out_file, print_undefined=False)[source]

Writes cell type annotation object to a json file. :type cas: CellTypeAnnotation :param cas: cell type annotation object to serialize. :type out_file: str :param out_file: output file path. :type print_undefined: bool :param print_undefined: prints null values to the output json if true. Omits undefined values from the json output if

Parameters:
cas.file_utils.write_dict_to_json_file(output_file_path, dictionary)[source]
Parameters:
  • output_file_path (str)

  • dictionary (dict)

cas.file_utils.read_anndata_file(file_path)[source]

Load anndata object from a file.

Parameters:

file_path (str) – The path to the file containing the anndata object.

Return type:

Optional[AnnData]

Returns:

The loaded anndata object if successful, else None.

cas.file_utils.read_table_to_dict(table_path, id_column=0, generated_ids=False)[source]

Reads table file content into a dict. Key is the first column value and the value is dict representation of the :type table_path: :param table_path: Path of the table file :type id_column: :param id_column: Id column becomes the key of the dict. This column should be unique. Default value is first column. :type generated_ids: :param generated_ids: If ‘True’, uses row number as the key of the dict. Initial key is 0.

Returns:

first; headers of the table and second; the TSV content dict. Key of the content is the first column value and the values are dict of row values.

Return type:

Function provides two return values

cas.file_utils.read_tsv_to_dict(tsv_path, id_column=0, generated_ids=False)[source]

Reads tsv file content into a dict. Key is the first column value and the value is dict representation of the row values (each header is a key and column value is the value). :type tsv_path: :param tsv_path: Path of the TSV file :type id_column: :param id_column: Id column becomes the key of the dict. This column should be unique. Default value is first column. :type generated_ids: :param generated_ids: If ‘True’, uses row number as the key of the dict. Initial key is 0.

Returns:

first; headers of the table and second; the TSV content dict. Key of the content is the first column value and the values are dict of row values.

Return type:

Function provides two return values

cas.file_utils.read_csv_to_dict(csv_path, id_column=0, id_column_name='', delimiter=',', id_to_lower=False, generated_ids=False)[source]

Reads tsv file content into a dict. Key is the first column value and the value is dict representation of the row values (each header is a key and column value is the value). :type csv_path: :param csv_path: Path of the CSV file :type id_column: :param id_column: Id column becomes the keys of the dict. This column should be unique. Default is the first column. :type id_column_name: :param id_column_name: Alternative to the numeric id_column, id_column_name specifies id_column by its header string. :type delimiter: :param delimiter: Value delimiter. Default is comma. :type id_to_lower: :param id_to_lower: applies string lowercase operation to the key :type generated_ids: :param generated_ids: If ‘True’, uses row number as the key of the dict. Initial key is 1.

Returns:

first; headers of the table and second; the CSV content dict. Key of the content is the first column value and the values are dict of row values.

Return type:

Function provides two return values

cas.file_utils.read_json_config(file_path)[source]

Reads the configuration object from the given path. :type file_path: str :param file_path: path to the json file :rtype: dict :return: configuration object (List of data column config items)

Parameters:

file_path (str)

Return type:

dict

cas.file_utils.read_yaml_config(file_path)[source]

Reads the configuration object from the given path. :type file_path: str :param file_path: path to the yaml file :rtype: dict :return: configuration object (List of data column config items)

Parameters:

file_path (str)

Return type:

dict

cas.file_utils.read_config(file_path)[source]

Reads the configuration object from the given path. :type file_path: str :param file_path: path to the configuration file :rtype: dict :return: configuration object (List of data column config items)

Parameters:

file_path (str)

Return type:

dict

cas.file_utils.update_obs(obs, data)[source]

Updates the obs with data dict.

Parameters:
  • obs (CapAnnDataDF) – Dataset representing the obs field in the AnnData file.

  • data (dict) – Dictionary containing flattened data.

cas.file_utils.update_uns(uns, data)[source]

Updates the uns with data dict.

Parameters:
  • uns (CapAnnDataDF) – The HDF5 group to write data to.

  • data (dict) – Dictionary containing the data to be written.

cas.file_utils.get_cas_schema_names()[source]

Returns the list of available CAS schema names.

Returns:

The available CAS schema names.

Return type:

dict

cas.file_utils.get_cas_schema(schema_name='base')[source]

Reads the schema file from the CAS module and returns as a dictionary. :type schema_name: Optional[str] :param schema_name: The name of the schema to be returned. Default is ‘base’.

Returns:

The schema as a dictionary.

Return type:

dict

Parameters:

schema_name (str | None)

cas.flatten_data_to_anndata module

cas.flatten_data_to_anndata.is_list_of_strings(var)[source]

Check if a value is a list of strings.

Parameters:

var (list or any) – The value to be checked.

Returns:

True if the value is a list containing only string elements,

False otherwise.

Return type:

bool

cas.flatten_data_to_anndata.export2cap(cas_file_path, anndata_file_path, output_file_path, fill_na)[source]

Processes and integrates information from a CAS JSON file and an AnnData file, creating a new AnnData object that incorporates metadata. The resulting AnnData object is then saved to a new file.

Note

At least one of cas_file_path or anndata_file_path must be provided. If cas_file_path is not supplied, the CAS JSON will be loaded from the AnnData file’s ‘uns’ section. Conversely, if anndata_file_path is not provided, the AnnData file will be downloaded using the matrix file id from the CAS JSON.

Parameters:
  • cas_file_path (Optional[str]) – Optional path to the CAS JSON file. If not provided, the CAS JSON will be extracted from the AnnData file’s ‘uns’ section.

  • anndata_file_path (Optional[str]) – Optional path to the AnnData file. If not provided, the AnnData file will be downloaded using the matrix file id from the CAS JSON.

  • output_file_path (str) – Output AnnData file name.

  • fill_na (bool) – Boolean flag indicating whether to fill missing values in the ‘obs’ field with pd.NA. If True, missing values will be replaced with pd.NA; if False, they will remain as empty strings.

cas.flatten_data_to_anndata.export_cas_object2cap(input_json, anndata_file_path, output_file_path, fill_na)[source]

Processes and integrates information from a CAS JSON and an AnnData (Annotated Data) file, creating a new AnnData object that incorporates metadata. If a CAS JSON object is not provided via the input parameter, it is extracted from the AnnData file’s ‘uns’ section. Conversely, if the AnnData file is not provided, it will be downloaded using the matrix file id from the CAS JSON.

Note

At least one of input_json or anndata_file_path must be provided. If neither is provided, the operation cannot proceed.

Parameters:
  • input_json (Optional[dict]) – Optional CAS JSON object. If not provided, the CAS JSON will be extracted from the AnnData file’s ‘uns’ section.

  • anndata_file_path (Optional[str]) – Optional path to the AnnData file. If not provided, the AnnData file will be downloaded using the matrix file id from the CAS JSON.

  • output_file_path (str) – Output AnnData file name.

  • fill_na (bool) – Boolean flag indicating whether to fill missing values in the ‘obs’ field with pd.NA. If True, missing values will be replaced with pd.NA; if False, they will remain as empty strings.

cas.flatten_data_to_anndata.process_annotations(annotations, obs_index, parent_cell_ids, fill_na)[source]

Processes annotations and generates flattened data for obs dataset.

Parameters:
  • annotations (list) – List of annotations.

  • obs_index (np.ndarray) – Array representing the index of the obs dataset.

  • parent_cell_ids (dict) – Dictionary containing parent cell ids.

  • fill_na (bool)

Returns:

Dictionary containing flattened data.

Return type:

dict

cas.flatten_data_to_anndata.generate_uns_json(input_json)[source]

Generates a dictionary representing the uns (unstructured) field in an AnnData object from a given JSON input.

This function processes information from a JSON input and generates a dictionary that represents the uns (unstructured) field in an AnnData object. The resulting dictionary can be used to populate the uns field in the AnnData object.

Parameters:

input_json (dict) – A dictionary representing the input CAS JSON data containing annotations.

Returns:

A dictionary representing the uns (unstructured) field in an AnnData object, ready to be used as input

for writing to an AnnData file.

Return type:

dict

cas.flatten_data_to_anndata.unflatten(json_file_path, anndata_file_path, output_file_path, output_json_path)[source]

Unflatten an Anndata file and save it. Also creates a CAS json file as output.

Parameters:
  • json_file_path (Optional[str]) – The path to the CAS json file.

  • anndata_file_path (str) – The path to the AnnData file.

  • output_file_path (str) – Output AnnData file name.

  • output_json_path (str) – Output CAS JSON file name.

cas.flatten_data_to_anndata.unflatten_obs(obs_df, uns_df, cas_json, cellhash_lookup)[source]

Reverse the flattening process to update the “annotations” section in a CAS object.

Parameters:
  • obs_df (DataFrame) – DataFrame containing the flattened obs columns from an AnnData object.

  • uns_df (Dict[str, Any]) – Dictionary containing the flattened uns section from an AnnData object.

  • cas_json (Optional[Dict[str, Any]]) – Optional CAS JSON object.

  • cellhash_lookup (Dict[str, Any]) – Cell hash lookup dictionary.

Return type:

Dict[str, Any]

Returns:

Updated CAS JSON with revised annotations.

cas.flatten_data_to_anndata.create_cell_label_lookup(df_dict)[source]

Create a lookup dictionary for cell labels with corresponding observations.

Parameters:

df_dict (Dict[str, DataFrame]) – A dictionary of DataFrames keyed by label sets.

Return type:

dict

Returns:

A nested dictionary where keys are cell labels and values are the observations from the obs field in the AnnData object.

cas.flatten_data_to_anndata.update_cas_annotation(cas_dict, cas_json, cellhash_lookup)[source]

Update the annotations in the CAS JSON using the provided lookup dictionary.

This function checks the CAS JSON annotations against a lookup dictionary. It updates annotations where cell labels or cell hashes match and discards mismatches. It also adds new annotations from the lookup dictionary that are not in the CAS JSON.

Parameters:
  • cas_dict (Dict[str, Dict[str, Any]]) – A lookup dictionary where keys are cell labels or hashes, and values are dictionaries with annotation data.

  • cas_json (Dict[str, Any]) – The CAS JSON object containing existing annotations.

  • cellhash_lookup (Dict[str, Any]) – The lookup for cell hashes.

Return type:

List[Dict[str, Any]]

Returns:

The updated annotations based on the lookup dictionary.

cas.flatten_data_to_anndata.generate_cas_json(uns_data, cas_dict, schema_name=None)[source]

Generates a CAS JSON object from provided annotation metadata and schema.

This function constructs a CAS JSON object by retrieving a specified schema (or the default “cap” schema if none is provided) and using properties and metadata from the uns_data and cas_dict arguments. It populates the top-level properties, annotations, and label sets for the CAS JSON structure.

Parameters:
  • uns_data (Dict[str, Any]) – A dictionary containing unstructured annotation metadata, which provides values for the CAS JSON’s top-level properties and cell annotation metadata.

  • cas_dict (Dict[str, Dict[str, Any]]) – A dictionary of CAS annotation sets, each representing an annotation structure used to populate the CAS JSON annotations.

  • schema_name (Optional[str]) – An optional schema name to retrieve the schema for building the CAS JSON. If not provided, defaults to “cap”.

Return type:

Dict[str, Any]

Returns:

A dictionary representing the CAS JSON structure populated with data from uns_data and cas_dict, following the specified schema format.

cas.flatten_data_to_tables module

cas.flatten_data_to_tables.serialize_to_tables(cta, file_name_prefix, out_folder, project_config)[source]

Writes cell type annotation object to a series of tsv files. Tables to generate:

  • Annotation table (main)

  • Labelset table

  • Metadata

  • Annotation transfer

Parameters:
  • cta – cell type annotation object to serialize.

  • file_name_prefix – Name prefix for table names

  • out_folder – output folder path.

  • project_config – project configuration with extra metadata

cas.flatten_data_to_tables.generate_annotation_transfer_table(cta, out_folder)[source]

Generates annotation transfer table.

Parameters:
  • cta – cell type annotation object to serialize.

  • out_folder – output folder path.

cas.flatten_data_to_tables.generate_metadata_table(cta, project_config, out_folder)[source]

Generates the metadata table.

Parameters:
  • cta – cell type annotation object to serialize.

  • project_config – metadata coming from project config

  • out_folder – output folder path.

cas.flatten_data_to_tables.generate_labelset_table(cta, out_folder)[source]

Generates labelset table.

Parameters:
  • cta – cell type annotation object to serialize.

  • out_folder – output folder path.

cas.flatten_data_to_tables.generate_annotation_table(accession_prefix, cta, out_folder)[source]

Generates annotation table.

Parameters:
  • cta – cell type annotation object to serialize.

  • out_folder – output folder path.

  • accession_prefix – accession id prefix

cas.flatten_data_to_tables.generate_reviews_table(cta, out_folder)[source]

Generates annotation reviews table.

Parameters:
  • cta – cell type annotation object to serialize.

  • out_folder – output folder path.

cas.flatten_data_to_tables.list_to_string(my_list)[source]

Converts a list to its string representation. Nanobot has problem with single quotations so removes them as well. :type my_list: list :param my_list: list to serialize

Returns:

string representation of the list

Parameters:

my_list (list)

cas.flatten_data_to_tables.assign_parent_accession_ids(accession_manager, std_parent_records, std_parent_records_dict, labelsets)[source]

Assigns accession ids to parent clusters and updates their references from the child clusters. :type accession_manager: :param accession_manager: accession ID generator :type std_parent_records: :param std_parent_records: list of all parents to assign accession ids :type std_parent_records_dict: :param std_parent_records_dict: parent cluster - child clusters dictionary :type labelsets: :param labelsets: labelsets list

cas.flatten_data_to_tables.assign_parent_cell_set_names(id_index)[source]

Assigns parent cell set names to the child cell sets. :type id_index: dict :param id_index: dictionary of cell set accessions and their corresponding records

Parameters:

id_index (dict)

cas.flatten_data_to_tables.normalize_column_name(column_name)[source]

Normalizes column name for url compatibility. URL compatible column name requirement: All names must match: ^[w_ ]+$’ for to_url()

Parameters:

column_name (str) – current column name

Return type:

str

Returns:

normalized column_name

cas.model module

class cas.model.EncoderMixin[source]

Bases: DataClassJsonMixin

dataclass_json_config: Optional[dict] = {'exclude': <function EncoderMixin.<lambda>>, 'undefined': Undefined.EXCLUDE}
class cas.model.AutomatedAnnotation(algorithm_name, algorithm_version, algorithm_repo_url, reference_location)[source]

Bases: EncoderMixin

Parameters:
  • algorithm_name (str)

  • algorithm_version (str)

  • algorithm_repo_url (str)

  • reference_location (str | None)

algorithm_name: str

The name of the algorithm used. It MUST be a string of the algorithm’s name.

algorithm_version: str

The version of the algorithm used (if applicable). It MUST be a string of the algorithm’s version, which is typically in the format ‘[MAJOR].[MINOR]’, but other versioning systems are permitted (based on the algorithm’s versioning).

algorithm_repo_url: str

This field denotes the URL of the version control repository associated with the algorithm used (if applicable). It MUST be a string of a valid URL.

reference_location: Optional[str]

This field denotes a valid URL of the annotated dataset that was the source of annotated reference data. This MUST be a string of a valid URL. The concept of a ‘reference’ specifically refers to ‘annotation transfer’ algorithms, whereby a ‘reference’ dataset is used to transfer cell annotations to the ‘query’ dataset.

class cas.model.Labelset(name, description=None, annotation_method=None, automated_annotation=None, rank=None)[source]

Bases: EncoderMixin

Parameters:
  • name (str)

  • description (str | None)

  • annotation_method (str | None)

  • automated_annotation (AutomatedAnnotation | None)

  • rank (int | None)

name: str

name of annotation key

description: Optional[str] = None

Some text describing what types of cell annotation this annotation key is used to record

annotation_method: Optional[str] = None

‘algorithmic’, ‘manual’, or ‘both’

Type:

The method used for creating the cell annotations. This MUST be one of the following strings

automated_annotation: Optional[AutomatedAnnotation] = None

A set of fields for recording the details of the automated annotation algorithm used. (Common ‘automated annotation methods’ would include PopV, Azimuth, CellTypist, scArches, etc.)

rank: Optional[int] = None

A number indicating relative granularity with 0 being the most specific. Use this where a single dataset has multiple keys that are used consistently to record annotations and different levels of granularity.

class cas.model.AnnotationTransfer(transferred_cell_label=None, source_taxonomy=None, source_node_accession=None, algorithm_name=None, comment=None)[source]

Bases: EncoderMixin

Parameters:
  • transferred_cell_label (str | None)

  • source_taxonomy (str | None)

  • source_node_accession (str | None)

  • algorithm_name (str | None)

  • comment (str | None)

transferred_cell_label: Optional[str] = None

Transferred cell label

source_taxonomy: Optional[str] = None

PURL of source taxonomy.

source_node_accession: Optional[str] = None

accession of node that label was transferred from

algorithm_name: Optional[str] = None

The name of the algorithm used.

comment: Optional[str] = None

Free text comment on annotation transfer

class cas.model.Review(datestamp=None, reviewer=None, review=None, explanation=None)[source]

Bases: EncoderMixin

Annotation review.

Parameters:
  • datestamp (datetime | None)

  • reviewer (str | None)

  • review (str | None)

  • explanation (str | None)

datestamp: Optional[datetime] = None

Time and date review was last edited.

reviewer: Optional[str] = None

Review Author.

review: Optional[str] = None

Reviewer’s verdict on the annotation. Must be ‘Agree’ or ‘Disagree’.

explanation: Optional[str] = None

Free-text review of annotation. This is required if the verdict is disagree and should include reasons for disagreement.

class cas.model.Annotation(labelset, cell_label, cell_set_accession=None, cell_fullname=None, cell_ontology_term_id=None, cell_ontology_term=None, cell_ids=None, rationale=None, rationale_dois=None, marker_gene_evidence=None, synonyms=None, parent_cell_set_name=None, parent_cell_set_accession=None, author_annotation_fields=None, neurotransmitter_accession=None, neurotransmitter_rationale=None, neurotransmitter_marker_gene_evidence=None, transferred_annotations=None, reviews=None)[source]

Bases: EncoderMixin

A collection of fields recording a cell type/class/state annotation on some set os cells, supporting evidence and provenance. As this is intended as a general schema, compulsory fields are kept to a minimum. However, tools using this schema are encouarged to specify a larger set of compulsory fields for publication. Note: This schema deliberately allows for additional fields in order to support ad hoc user fields, new formal schema extensions and project/tool specific metadata.

Parameters:
  • labelset (str)

  • cell_label (str)

  • cell_set_accession (str | None)

  • cell_fullname (str | None)

  • cell_ontology_term_id (str | None)

  • cell_ontology_term (str | None)

  • cell_ids (List[str] | None)

  • rationale (str | None)

  • rationale_dois (List[str] | None)

  • marker_gene_evidence (List[str] | None)

  • synonyms (List[str] | None)

  • parent_cell_set_name (str | None)

  • parent_cell_set_accession (str | None)

  • author_annotation_fields (dict | None)

  • neurotransmitter_accession (str | None)

  • neurotransmitter_rationale (str | None)

  • neurotransmitter_marker_gene_evidence (List[str] | None)

  • transferred_annotations (List[AnnotationTransfer] | None)

  • reviews (List[Review] | None)

labelset: str

The unique name of the set of cell annotations. Each cell within the AnnData/Seurat file MUST be associated with a ‘cell_label’ value in order for this to be a valid ‘cellannotation_setname’.

cell_label: str

This denotes any free-text term which the author uses to label cells.

cell_set_accession: Optional[str] = None

An identifier that can be used to consistently refer to the set of cells being annotated, even if the cell_label changes.

cell_fullname: Optional[str] = None
This MUST be the full-length name for the biological entity listed in cell_label by the author. (If the value

in cell_label is the full-length term, this field will contain the same value.)

NOTE: any reserved word used in

the field ‘cell_label’ MUST match the value of this field.

cell_ontology_term_id: Optional[str] = None

This MUST be a term from either the Cell Ontology or from some ontology that extends it by classifying cell types under terms from the Cell Ontology e.g. the Provisional Cell Ontology.

cell_ontology_term: Optional[str] = None

This MUST be the human-readable name assigned to the value of ‘cell_ontology_term_id

cell_ids: Optional[List[str]] = None

List of cell barcode sequences/UUIDs used to uniquely identify the cells

rationale: Optional[str] = None

The free-text rationale which users provide as justification/evidence for their cell annotations. Researchers are encouraged to use this field to cite relevant publications in-line using standard academic citations of the form (Zheng et al., 2020) This human-readable free-text MUST be encoded as a single string. All references cited SHOULD be listed using DOIs under rationale_dois. There MUST be a 2000-character limit.

rationale_dois: Optional[List[str]] = None

A list of valid publication DOIs cited by the author to support or provide justification/evidence/context for ‘cell_label’.

marker_gene_evidence: Optional[List[str]] = None

List of gene names explicitly used as evidence for this cell annotation.

synonyms: Optional[List[str]] = None

This field denotes any free-text term of a biological entity which the author associates as synonymous with the biological entity listed in the field ‘cell_label’.

parent_cell_set_name: Optional[str] = None
parent_cell_set_accession: Optional[str] = None

A list of accessions of cell sets that subsume this cell set. This can be used to compose hierarchies of annotated cell sets, built from a fixed set of clusters.

author_annotation_fields: Optional[dict] = None

“A dictionary of author defined key value pairs annotating the cell set. The names and aims of these fields MUST not clash with official annotation fields.

neurotransmitter_accession: Optional[str] = None

Accessions of cell neurotransmitter associated with this cell set.

neurotransmitter_rationale: Optional[str] = None

The free-text rationale which users provide as justification/evidence for supporting the neurotransmitter association.

neurotransmitter_marker_gene_evidence: Optional[List[str]] = None

List of gene names used as evidence for neurotransmitter association. Each gene MUST be included in the matrix of the AnnData/Seurat file.

transferred_annotations: Optional[List[AnnotationTransfer]] = None
reviews: Optional[List[Review]] = None
add_user_annotation(user_annotation_set, user_annotation_label)[source]

Adds a user defined annotation which is not supported by the standard schema. :type user_annotation_set: :param user_annotation_set: name of the user annotation set :type user_annotation_label: :param user_annotation_label: label of the user annotation set

class cas.model.CellTypeAnnotation(author_name, annotations, title, description=None, matrix_file_id=None, labelsets=None, author_contact=None, orcid=None, cellannotation_schema_version=None, cellannotation_timestamp=None, cellannotation_version=None, cellannotation_url=None, author_list=None)[source]

Bases: EncoderMixin

Parameters:
  • author_name (str)

  • annotations (List[Annotation])

  • title (str)

  • description (str | None)

  • matrix_file_id (str | None)

  • labelsets (List[Labelset] | None)

  • author_contact (str | None)

  • orcid (str | None)

  • cellannotation_schema_version (str | None)

  • cellannotation_timestamp (str | None)

  • cellannotation_version (str | None)

  • cellannotation_url (str | None)

  • author_list (List[str] | None)

author_name: str

This MUST be a string in the format [FIRST NAME] [LAST NAME]

annotations: List[Annotation]

A collection of fields recording a cell type/class/state annotation on some set os cells, supporting evidence and provenance. As this is intended as a general schema, compulsory fields are kept to a minimum. However, tools using this schema are encouarged to specify a larger set of compulsory fields for publication.

title: str

The title of the dataset. This MUST be less than or equal to 200 characters. e.g. ‘Human retina cell atlas - retinal ganglion cells’.

description: Optional[str] = None

The description of the dataset. e.g. ‘A total of 15 retinal ganglion cell clusters were identified from over 99K retinal ganglion cell nuclei in the current atlas. Utilizing previous characterized markers from macaque, 5 clusters

can be annotated.’

matrix_file_id: Optional[str] = None

accession, e.g. CellXGene_dataset:8e10f1c4-8e98-41e5-b65f-8cd89a887122. Please see https://github.com/cellannotation/cell-annotation -schema/registry/registry.json for supported namespaces.

Type:

A resolvable ID for a cell by gene matrix file in the form namespace

labelsets: Optional[List[Labelset]] = None

A list of labelsets that are used in the annotations.

author_contact: Optional[str] = None

This MUST be a valid email address of the author

orcid: Optional[str] = None

This MUST be a valid ORCID for the author

cellannotation_schema_version: Optional[str] = None

The schema version, the cell annotation open standard. Current version MUST follow 0.1.0 This versioning MUST follow the format ‘[MAJOR].[MINOR].[PATCH]’ as defined by Semantic Versioning 2.0.0, https://semver.org/

cellannotation_timestamp: Optional[str] = None

The timestamp of all cell annotations published (per dataset). This MUST be a string in the format ‘%yyyy-%mm-%dd %hh:%mm:%ss’.

cellannotation_version: Optional[str] = None

The version for all cell annotations published (per dataset). This MUST be a string. The recommended versioning format is ‘[MAJOR].[MINOR].[PATCH]’ as defined by Semantic Versioning 2.0.0, https://semver.org/

cellannotation_url: Optional[str] = None

A persistent URL of all cell annotations published (per dataset).

author_list: Optional[List[str]] = None

This field stores a list of users who are included in the project as collaborators, regardless of their specific role. An example list; John Smith|Cody Miller|Sarah Jones.

add_annotation_object(obj)[source]

Adds given object to annotation objects list :type obj: :param obj: Annotation object to add

set_exclude_none_values(value)[source]
get_all_annotations(show_cell_ids=False, labels=None)[source]

Lists all annotations.

Parameters:
  • show_cell_ids (bool) – identifies if result have ‘cell_ids’ column. Default value is false

  • labels (Optional[list]) – list of key(labelset), value(cell_label) pairs to filter annotations

Return type:

DataFrame

Returns:

Annotations data frame

as_dictionary()[source]
remove_none_values(d)[source]

Recursively removes all key-value pairs from the dictionary where the value is None. :type d: :param d: The dictionary to clean. :return: A new dictionary with None values removed.

cas.populate_cell_ids module

cas.populate_cell_ids.populate_cell_ids(cas_json_path, anndata_path, labelsets=None, validate=False)[source]

Add/update CellIDs in a CAS JSON file using matching data from an AnnData file.

This function reads a CAS JSON file and an AnnData file, validates their consistency, and updates CellIDs in CAS based on matching labelsets from the AnnData obs DataFrame. The modified CAS JSON is then saved back to the original file.

Parameters:
  • cas_json_path (str) – Path to the CAS JSON file.

  • anndata_path (str) – Path to the AnnData file.

  • labelsets (list, optional) – A list of labelsets to update with CellIDs from AnnData. If None, the labelset with rank ‘0’ is used by default.

  • validate (bool, optional) – If True, runs validation checks to ensure labelset consistency. The program will exit with an error if validation fails. Defaults to False.

Raises:

Exception – If the AnnData file cannot be read.

Returns:

None

cas.populate_cell_ids.update_cas_with_cell_ids(cas_json, anndata_obs, labelsets=None)[source]

Update a CAS dictionary by adding or modifying CellIDs using matching AnnData observations.

This function takes a CAS dictionary and an AnnData obs DataFrame and updates the CAS with cell IDs extracted from the specified labelsets in the AnnData.

Parameters:
  • cas_json (dict) – The CAS dictionary to update with cell IDs from AnnData.

  • anndata_obs (CapAnnDataDF) – The obs DataFrame extracted from an AnnData object.

  • labelsets (list, optional) – A list of labelsets to update with IDs from AnnData. If None, the labelset with rank ‘0’ is used.

Returns:

The updated CAS dictionary with CellIDs populated.

Return type:

dict

cas.populate_cell_ids.add_cell_ids(cas, ad_obs, labelsets=None, validate=False)[source]

Add/update CellIDs to CAS from matching AnnData file.

Parameters:
  • cas (dict) – CAS JSON object

  • ad_obs (Union[DataFrame, CapAnnDataDF]) – Obs DataFrame extracted from an AnnData object.

  • labelsets (Optional[list]) – List of labelsets to update with IDs from AnnData. If value is null, rank ‘0’ labelset is used. The

  • order (labelsets should be provided in)

  • 0 (starting from rank)

  • validate (bool, optional) – If True, runs validation checks to ensure labelset consistency. The program wil`l exit with an error if validation fails. Defaults to False.

cas.populate_cell_ids.get_obs_cluster_identifier_column(obs_keys, labelsets=None, rank_zero_labelset=None)[source]

Anndata files may use different column names to uniquely identify Clusters. Get the cluster identifier column name for the current file. :type obs_keys: List[str] :param obs_keys: Anndata observation keys. :type labelsets: Optional[list] :param labelsets: List of labelsets to update with IDs from AnnData. The labelsets should be provided in order, :param starting from rank 0: :type starting from rank 0: leaf nodes :type rank_zero_labelset: Optional[str] :param rank_zero_labelset: rank 0 labelset name

Returns:

cluster identifier column name

Parameters:
  • obs_keys (List[str])

  • labelsets (list | None)

  • rank_zero_labelset (str | None)

cas.reports module

cas.reports.get_all_annotations(cas, show_cell_ids=False, labels=None)[source]

Lists all annotations.

Parameters:
  • cas (dict) – Cell Annotation Schema json object.

  • show_cell_ids (bool) – identifies if result have ‘cell_ids’ column. Default value is false

  • labels (Optional[list]) – list of key(labelset), value(cell_label) pairs to filter annotations

Return type:

DataFrame

Returns:

Annotations data frame

cas.reports.filter_by_label(row, labels)[source]

Filters and shows row if it

cas.spreadsheet_to_cas module

cas.spreadsheet_to_cas.resolve_ref(schema, ref)[source]
cas.spreadsheet_to_cas.read_spreadsheet(file_path, sheet_name, schema)[source]

Read the specific sheet from the Excel file into a pandas DataFrame.

Parameters:
  • file_path (str) – Path to the Excel file.

  • sheet_name (str, optional) – Target sheet name. If not provided, reads the first sheet.

  • schema (dict) – Cell annotation schema

Returns:

Tuple containing metadata (dict), column names (list), and raw data (pd.DataFrame).

Return type:

tuple

cas.spreadsheet_to_cas.custom_lowercase_transform(s)[source]

Transforms the given string to lowercase except for words that are acronyms or specific cell type names which are three characters or fewer.

Parameters:

s (str) – The input string.

Returns:

The transformed string.

Return type:

str

cas.spreadsheet_to_cas.spreadsheet2cas(spreadsheet_file_path, sheet_name, anndata_file_path, labelset_list, schema_name, output_file_path)[source]

Convert a spreadsheet to Cell Annotation Schema (CAS) JSON.

Parameters:
  • spreadsheet_file_path (str) – Path to the spreadsheet file.

  • sheet_name (Optional[str]) – Target sheet name in the spreadsheet. Can be a string or None.

  • anndata_file_path (Optional[str]) – The path to the AnnData file.

  • labelset_list (Optional[List[str]]) – List of names of observation (obs) fields used to record author cell

  • names (type)

  • spreadsheet. (which determine the rank of labelsets in a)

  • schema_name (Optional[str]) – Name of the CAS schema, can be one of ‘base’, ‘bican’ or ‘cap’.

  • output_file_path (str) – Output CAS file name.

cas.spreadsheet_to_cas.add_annotations_to_cas(cas, raw_data_result, columns, schema, parent_cell_look_up)[source]

Adds processed annotations from raw data to the CAS structure and tracks labelsets. Assumes certain external definitions for column names and transformation functions.

Parameters:
  • cas (dict) – The CAS structure to update with annotations.

  • raw_data_result (DataFrame) – Raw annotation data.

  • columns (list) – Column names of raw data to process.

  • schema (dict) – Cell annotation schema.

  • parent_cell_look_up (Dict[str, Any]) – A precomputed dictionary containing hierarchical metadata about cell labels.

Returns:

Tracks labelsets encountered, initialized to None.

Return type:

OrderedDict

Note

Requires custom_lowercase_transform, get_cell_ids, and column constants to be defined.

cas.spreadsheet_to_cas.initialize_cas_structure(matrix_file_id, meta_data_result)[source]

Initializes the Cell Annotation Schema (CAS) structure with basic information and placeholders for annotations and labelsets. Fields initialized with None values are omitted in the final output.

Parameters:
  • matrix_file_id (str) – The ID of the matrix file, used within the CAS for identification.

  • meta_data_result (dict) – Metadata containing at least the ‘matrix_file_id’ for the CAS URL.

Returns:

The initial CAS structure with the matrix file ID, annotation URL, and placeholders

for future data. Excludes fields that remain None.

Return type:

dict

cas.spreadsheet_to_cas.load_or_fetch_anndata(anndata_file_path, meta_data_result)[source]

Loads or fetches an AnnData file, based on a local path or a matrix file ID from metadata.

Parameters:
  • anndata_file_path (str) – Path to an AnnData file, or None to fetch using metadata.

  • meta_data_result (dict) – Metadata with ‘matrix_file_id’ for fetching the dataset.

Returns:

(AnnData object, matrix file ID), ready for use.

Return type:

tuple

Raises:

ValueError – If ‘matrix_file_id’ is missing from metadata.

cas.validate module

cas.validate.validate(schema_name, data_path)[source]

Validates all instances in data_path against the given schema. Assumes all *.json files in the test_dir should validate against the schema. Logs all validation errors and throws an exception if any of the test files is invalid. :type schema_name: str :param schema_name: One of ‘base’, ‘bican’ or ‘cap’. Identifies the CAS schema to validate data against. :type data_path: str :param data_path: Path to the data file (or folder) to validate

Parameters:
  • schema_name (str)

  • data_path (str)

Module contents