cas.ingest package

Submodules

cas.ingest.config_validator module

cas.ingest.config_validator.validate(json_object)[source]

Validates the given json configuration object using the cell type annotation schema.

Returns: :type json_object: object :param json_object: configuration object :rtype: bool :return: True if object is valid, False otherwise.

Parameters:

json_object (object)

Return type:

bool

cas.ingest.config_validator.validate_file(file_path)[source]

Read the configuration object from the given path and validates it. :type file_path: str :param file_path: path to the json file :rtype: bool :return: True if object is valid, False otherwise.

Parameters:

file_path (str)

Return type:

bool

cas.ingest.config_validator.validate_json_str(json_str)[source]

Validates the given json string.

Returns: :type json_str: str :param json_str: string representation of a json object :rtype: bool :return: True if object is valid, False otherwise.

Parameters:

json_str (str)

Return type:

bool

cas.ingest.ingest_user_table module

cas.ingest.ingest_user_table.ingest_data(data_file, config_file, out_file, format='json', print_undefined=False, generate_accession_ids=False)[source]

Ingests given data into standard cell annotation schema data structure using the given configuration.

Parameters:
  • data_file (str) – Unformatted user data in tsv/csv format.

  • config_file (str) – configuration file path.

  • out_file (str) – output file path.

  • format (str) – Data export format. Supported formats are ‘json’ and ‘tsv’

  • print_undefined (bool) – prints null values to the output json if true. Omits undefined values from the json output if

  • generate_accession_ids (bool)

Return type:

dict

false. False by default. Only effective in json serialization. :type generate_accession_ids: bool :param generate_accession_ids: determines if incrementally generate accession_ids for all annotations that don’t have an id. :rtype: dict :return: output data as dict

cas.ingest.ingest_user_table.ingest_user_data(data_file, config_file, generate_accession_ids=False)[source]

Ingest given user data into standard cell annotation schema data structure using the given configuration. :type data_file: str :param data_file: Unformatted user data in tsv/csv format. :type config_file: str :param config_file: configuration file path. :type generate_accession_ids: bool :param generate_accession_ids: determines if incrementally generate accession_ids for all annotations that don’t have an id.

Return type:

CellTypeAnnotation

Parameters:
  • data_file (str)

  • config_file (str)

  • generate_accession_ids (bool)

cas.ingest.ingest_user_table.generate_ids_for_annotations(cas, config, labelset_ranks)[source]

Generates unique IDs for the annotations in the given CellTypeAnnotation object. :type cas: CellTypeAnnotation :param cas: CellTypeAnnotation object :type config: dict :param config: ingestion configuration dictionary :type labelset_ranks: dict :param labelset_ranks: ranks of the labelsets :rtype: CellTypeAnnotation :return: CellTypeAnnotation object with generated IDs.

Parameters:
Return type:

CellTypeAnnotation

cas.ingest.ingest_user_table.init_accession_managers(cas, config)[source]

Initializes IncrementalAccessionManager for each labelset in the config. :type cas: CellTypeAnnotation :param cas: CellTypeAnnotation object :type config: dict :param config: ingestion configuration dictionary :rtype: dict :return: dictionary of IncrementalAccessionManager objects

Parameters:
Return type:

dict

cas.ingest.ingest_user_table.register_parent(field, labelset_ranks, parent_ao, parents)[source]

Registers the parent annotation object to the parents list. :type field: :param field: config field :type labelset_ranks: :param labelset_ranks: labelset ranks dictionary :type parent_ao: :param parent_ao: parent to add :type parents: :param parents: sparse parents list

cas.ingest.ingest_user_table.get_annotation(ao_names, field, record)[source]

Creates a annotation object if it does not exist in the ao_names dictionary at the same labelset. :type ao_names: :param ao_names: list of existing annotation objects :type field: :param field: config field :type record: :param record: data record

Returns: annotation object

cas.ingest.ingest_user_table.add_user_annotations(ao, headers, record, utilized_columns)[source]

Adds user annotations that are not supported by the standard schema. :type ao: :param ao: current annotation object :type headers: :param headers: all column names of the user data :type record: :param record: a record in the user data :type utilized_columns: :param utilized_columns: list of processed columns

cas.ingest.ingest_user_table.add_parent_node_names(ao, ao_names, cas, parents)[source]

Creates parent nodes if necessary and creates a cluster hierarchy through assigning parent_node_names. :type ao: :param ao: current annotation object :type ao_names: :param ao_names: list of all created annotation objects :type cas: :param cas: main object :type parents: :param parents: list of current annotation object’s parents

cas.ingest.ingest_user_table.populate_labelsets(cas, config_fields)[source]

Populates labelsets list based on the fields of the config. :type cas: :param cas: main object :type config_fields: :param config_fields: config file fields :return: ranks of the labelsets

Module contents