MetaNetX SDK Documentation¶
Parse and process information from MetaNetX for MIRIAM compatibility using the Identifiers.org namespaces.
Install¶
It’s as simple as:
pip install metanetx-sdk
Usage¶
The authoritative source on how to use the various commands is always accessible via the commands’ help.
mnx-sdk -h
Normally you would start by loading the files from the MetaNetX FTP server
mnx-sdk pull ./data
and then transforming each data table.
mnx-sdk etl chem-prop ./data/chem_prop.tsv.gz ./data/transformed_chem_prop.tsv.gz
You can also directly use the functions from the metanetx_sdk.api
module.
Copyright¶
Copyright © 2019-2020, Moritz E. Beber.
Free software distributed under the Apache Software License 2.0.
Contents¶
API Reference¶
metanetx_sdk¶
metanetx_sdk package¶
Subpackages¶
Provide a command line interface for working with MetaNetX data.
Provide MetaNetX table processing commands.
Provide a command line interface.
Provide data files.
Provide an FTP configuration data model.
-
class
metanetx_sdk.model.ftp_configuration_model.
FTPConfigurationModel
[source]¶ Bases:
pydantic.main.BaseModel
Define the FTP configuration data model.
-
base_directory
: FTPPath = None¶
-
property
directory
¶ Return the compound working directory for the FTP server.
-
files
: List[str] = None¶
-
host
: str = None¶
-
classmethod
load
(version: Optional[str] = None) → metanetx_sdk.model.ftp_configuration_model.FTPConfigurationModel[source]¶ Load the packaged FTP configuration.
-
timezone
: Timezone = None¶
-
version
: str = None¶
-
-
class
metanetx_sdk.model.ftp_configuration_model.
FTPPath
[source]¶ Bases:
pathlib.PurePosixPath
Define an FTP path data type.
-
class
metanetx_sdk.model.ftp_configuration_model.
Timezone
[source]¶ Bases:
datetime.tzinfo
Define a timezone custom data type.
Provide an FTP configuration data model.
-
class
metanetx_sdk.model.path_info_model.
PathInfoModel
[source]¶ Bases:
pydantic.main.BaseModel
Describe information found about FTP files.
-
localize
(local_tz: pytz.timezone) → None[source]¶ Convert the modify timestamp into a timezone aware one.
-
modify
: datetime = None¶
-
size
: int = None¶
-
classmethod
transform_modify
(value: str) → datetime.datetime[source]¶ Transform the modify string to a datetime object.
-
type
: str = None¶
-
Provide an FTP configuration data model.
-
class
metanetx_sdk.model.table_configuration_model.
SingleTableConfigurationModel
[source]¶ Bases:
pydantic.main.BaseModel
Describe the configuration needed for a single table.
-
columns
: List[str] = None¶
-
skip
: int = None¶
-
-
class
metanetx_sdk.model.table_configuration_model.
TableConfigurationModel
[source]¶ Bases:
pydantic.main.BaseModel
Describe all table configuration models.
-
chem_prop
: SingleTableConfigurationModel = None¶
-
chem_xref
: SingleTableConfigurationModel = None¶
-
comp_prop
: SingleTableConfigurationModel = None¶
-
comp_xref
: SingleTableConfigurationModel = None¶
-
classmethod
load
(version: Optional[str] = None) → metanetx_sdk.model.table_configuration_model.TableConfigurationModel[source]¶ Load the configuration from the packaged file.
-
reac_prop
: SingleTableConfigurationModel = None¶
-
reac_xref
: SingleTableConfigurationModel = None¶
-
version
: str = None¶
-
Provide data models.
Provide chemical data transformation functions.
-
metanetx_sdk.transform.chemical.
transform_chebi_prefix
(table: pandas.core.frame.DataFrame)[source]¶ Transform all ChEBI identifiers.
-
metanetx_sdk.transform.chemical.
transform_chemical_cross_references
(references: pandas.core.frame.DataFrame, prefix_mapping: Mapping) → pandas.core.frame.DataFrame[source]¶ Transform the MetaNetX chemical cross-references.
-
metanetx_sdk.transform.chemical.
transform_chemical_properties
(chemicals: pandas.core.frame.DataFrame, prefix_mapping: Mapping) → pandas.core.frame.DataFrame[source]¶ Transform the MetaNetX chemical cross-references.
-
metanetx_sdk.transform.chemical.
transform_kegg_prefix
(table: pandas.core.frame.DataFrame)[source]¶ Transform all KEGG identifiers.
-
metanetx_sdk.transform.chemical.
transform_metanetx_prefix
(table: pandas.core.frame.DataFrame)[source]¶ Transform all MetaNetX identifiers.
-
metanetx_sdk.transform.chemical.
transform_swisslipid_prefix
(table: pandas.core.frame.DataFrame)[source]¶ Transform all swisslipid identifiers.
Provide compartment data transformation functions.
-
metanetx_sdk.transform.compartment.
transform_cell_cycle_ontology_prefix
(table: pandas.core.frame.DataFrame)[source]¶ Transform all CCO terms.
-
metanetx_sdk.transform.compartment.
transform_compartment_cross_references
(references: pandas.core.frame.DataFrame, prefix_mapping: Mapping) → pandas.core.frame.DataFrame[source]¶ Transform the MetaNetX compartment cross-references.
-
metanetx_sdk.transform.compartment.
transform_compartment_properties
(compartments: pandas.core.frame.DataFrame, prefix_mapping: Mapping) → pandas.core.frame.DataFrame[source]¶ Transform the MetaNetX compartment properties.
-
metanetx_sdk.transform.compartment.
transform_go_prefix
(table: pandas.core.frame.DataFrame)[source]¶ Transform all GO terms.
-
metanetx_sdk.transform.compartment.
transform_metanetx_prefix
(table: pandas.core.frame.DataFrame)[source]¶ Transform all MetaNetX identifiers.
Provide reaction data transformation functions.
-
metanetx_sdk.transform.reaction.
transform_metanetx_prefix
(table: pandas.core.frame.DataFrame)[source]¶ Transform all MetaNetX identifiers.
-
metanetx_sdk.transform.reaction.
transform_reaction_cross_references
(references: pandas.core.frame.DataFrame, prefix_mapping: Mapping) → pandas.core.frame.DataFrame[source]¶ Transform the MetaNetX reaction cross-references.
-
metanetx_sdk.transform.reaction.
transform_reaction_properties
(reactions: pandas.core.frame.DataFrame, prefix_mapping: Mapping) → pandas.core.frame.DataFrame[source]¶ Transform the MetaNetX reaction properties.
Provide data transformation functions.
Submodules¶
metanetx_sdk.api module¶
Expose the application programmer interface.
-
metanetx_sdk.api.
etl_table
(filename: pathlib.Path, output: pathlib.Path, configuration: metanetx_sdk.model.table_configuration_model.SingleTableConfigurationModel, mapping: Mapping, transform: Callable) → None[source]¶ Extract, transform, and load a MetaNetX table.
- Parameters
filename (pathlib.Path) – The table to extract and transform.
output (pathlib.Path) – Where to store the processed output.
configuration (metanetx_sdk.model.SingleTableConfigurationModel) – The configuration to use for extracting the specific file.
mapping (typing.Mapping) – A mapping between MetaNetX resources and Identifiers.org registries.
transform (typing.Callable) – The table-specific transformation function to apply.
-
metanetx_sdk.api.
pull
(directory: pathlib.Path, files: Optional[List[pathlib.Path]] = None, configuration: Optional[metanetx_sdk.model.ftp_configuration_model.FTPConfigurationModel] = None, last_checked: Optional[datetime.datetime] = None, compress: bool = True) → datetime.datetime[source]¶ Pull in changes to one or more files from the MetaNetX FTP server.
- Parameters
directory (pathlib.Path) – The working directory where files are updated.
files (list of pathlib.Path, optional) – A list of one or more filenames as they are found on the FTP server (basename only). By default all known files are checked.
configuration (metanetx_sdk.model.FTPConfigurationModel, optional) – Configuration values encoded in an object. A default configuration is provided.
last_checked (datetime, optional) – The time when the files were last checked for updates. By default it is assumed that the files have never been checked before.
compress (bool, optional) – Whether or not to compress the downloaded files with gzip (default True).
- Returns
The current time (timezone of the FTP server) when files were checked for updates.
- Return type
datetime
metanetx_sdk.extract module¶
Provide extraction functions.
-
metanetx_sdk.extract.
extract_chemical_prefix_mapping
()[source]¶ Return the packaged chemical prefix mapping.
-
metanetx_sdk.extract.
extract_compartment_prefix_mapping
()[source]¶ Return the packaged compartment prefix mapping.
-
metanetx_sdk.extract.
extract_reaction_prefix_mapping
()[source]¶ Return the packaged reaction prefix mapping.
-
metanetx_sdk.extract.
extract_table
(filename: pathlib.Path, columns: List[str], skip: int) → pandas.core.frame.DataFrame[source]¶ Extract tabular MetaNetX data.
The tables dumped by MetaNetX have their column names in comments and are not always appropriate for the given table.
- Parameters
filename (pathlib.Path) – The filesystem location of the table.
columns (list of str) – The column headers to use for this table.
skip (int) – The number of initial lines in the file to skip.
- Returns
- Return type
pandas.DataFrame
metanetx_sdk.ftp module¶
Provide functions to interact with the MetaNetX FTP server.
-
async
metanetx_sdk.ftp.
update_file
(host: str, ftp_directory: pathlib.PurePosixPath, path: pathlib.Path, filename: pathlib.Path, last_checked: datetime.datetime, local_timezone: pytz.timezone, compress: bool = True, timeout: Union[float, int, None] = 5) → None[source]¶ Retrieve a file from an FTP server if it is newer than a local version.
- Parameters
host (str) – The FTP host, for example, ftp.vital-it.ch.
ftp_directory (pathlib.Path) –
path (pathlib.Path) – Working directory where files are searched and stored.
filename (pathlib.Path) – The file to retrieve relative to the working directory on the server.
last_checked (datetime) – The date and time when this script was last run.
local_timezone (pytz.timezone) –
compress (bool, optional) – Whether or not to gzip the files.
timeout (float, int, or None, optional) – The timeout in seconds for FTP operations (default 5 s). Can be disabled by setting None.
-
async
metanetx_sdk.ftp.
update_tables
(host: str, ftp_directory: pathlib.PurePosixPath, output: pathlib.Path, files: List[pathlib.Path], last_checked: datetime.datetime, local_tz: pytz.timezone, compress: bool) → None[source]¶ Load all given files if newer versions exist.
- Parameters
host (str) – The FTP host, for example, ftp.vital-it.ch.
ftp_directory (pathlib.PurePosixPath) – The working directory on the host.
output (pathlib.Path) – The output directory for the files. If a filename of any of the
files
exists in that directory, it is only overwritten if the one on the host is more recent.files (list of pathlib.Path) – Pure filenames of files of interest to be loaded from the server.
last_checked (datetime.datetime) – When the local files were last checked.
local_tz (pytz.timezone) – A timezone that the FTP server is in, for example, Europe/Zurich.
compress (bool) – Whether or not to gzip compress downloaded files.
metanetx_sdk.helpers module¶
Define general helper functions.
Module contents¶
Create top level imports.