This module contains tools for describing objects being published to DLHub.
Base class for models describing objects published via DLHub
Covers information that goes in the
dataciteblock of the metadata file and some of the DLHub block.
There are many kinds of MetadataModel classes that each describe a different kind of object. Each of these different types are created using the
KerasModel.create_model('model.h5')), but have different arguments depending on the type of object. For example, TensorFlow models only require the directory created when saving the model for serving but scikit-learn models require the pickle file, how the pickle was created (e.g., with joblib), and how many input features it requires.
Once created, you will need to fill in additional details about the object to make it reusable. The MetadataModel classes attempt to learn as much about an object as possible automatically, but there is some information that must be provided by a human. To start, you must define a title and name for the object and are encouraged to provide an abstract describing the model and list any associated papers/websites that describe the model. You will fill plenty of examples for how to describe the models in the DLHub_containers repostiory. Some types of objects require data specific to their type (e.g., Python servables need a list of required packages). We encourage you to find examples for your specific type of object in the containers repository for inspiration and to see the Python documentation for each Metadata Model.
The MetadataModel object can be saved using the to_dict operation and read back into memory using the from_dict method. We recommend you save your dictionary to disk in the JSON or yaml format, which will allow for manual edits to be made before submitting or resubmitting a object description.
add_directory(directory: str, include: Union[str, Iterable[str]] = (), exclude: Union[str, Iterable[str]] = (), recursive: bool = False)¶
Add all the files in a directory
- include (string or [string]) – Only add files that match any of these patterns
- exclude (string or [string]) – Exclude all files that match any of these patterns
- directory (string) – Path to a directory
- recursive (bool) – Whether to add all files in a directory
Add a file to the list of files to be distributed with this object
- file (string) – Path to the file
- name (string) – Optional. Name of the file, if it is a file that serves a specific purpose (e.g., “pickle” if this is a pickle file of a scikit-learn model)
Add files that should be distributed with this artifact.
Parameters: files – Paths of files that should be published
Add a resource that is related to this object.
- We use the DataCite to describe the relations. Common relation types for DLHub objects are:
- “IsDescribedBy”: For a paper that describes a dataset or model
- “IsDocumentedBy”: For the software documentation for a model
- “IsDerviedFrom”: For the database a training set was pulled from
- “Requires”: For any software libraries that are required for this module
- identifier – Identifier
- identifier_type – Identifier type
- relation_type – Relation between this identifier and the object you are describing
Add a required Python library.
The name of the library should be either the name on PyPI, or a URL for the git repository holding the code (e.g.,
- library (string) – Name of library
- version (string) – Required version. ‘latest’ to use the most recent version on PyPi (if
available). ‘detect’ will attempt to find the version of the library installed on
the computer running this software. Default is
Add several Python library requirements
Parameters: requirements (dict) – Keys are names of library (str), values are the version
create_model(**kwargs) → dlhub_sdk.models.BaseMetadataModel¶
Instantiate the metadata model.
Takes in arguments that allow metadata describing a dataset to be autogenerated. For example, these could include options describing how to read a dataset from a CSV file or which class method to invoke on a Python pickle object.
Write all the listed files to a ZIP object
Takes all of the files returned by list_files. First determines the largest common path of all files, and preserves directory structure by using this common path as the root directory. For example, if the files are “/home/a.pkl” and “/home/a/b.dat”, the common directory is “/home” and the files will be stored in the Zip as “a.pkl” and “a/b.dat”
Parameters: path (string) – Path for the ZIP File Returns:
- Base path for the ZIP file (useful for adjusting the paths of the files
- included in the metadata model)
Return type: (string)
Provide a list of files associated with this artifact.
Returns: ([string]) list of file paths
Get the name of the servable
Returns: (string) Name of the servable
Gathers information about required environment from repo2docker configuration files.
See https://repo2docker.readthedocs.io/en/latest/config_files.html for more details
Parameters: directory (str) – Path to directory containing configuration files (default: current working directory)
read_codemeta_file(directory: Optional[str] = None)¶
Read in metadata from a codemeta.json file
Parameters: directory (string) – Path to directory contain the codemeta.json file (default: current working directory)
set_abstract(abstract: str) → dlhub_sdk.models.BaseMetadataModel¶
Define an abstract for this object. Use for a high-level summary
Parameters: abstract – Description of this artifact
set_creators(authors: List[str], affiliations: List[Sequence[str]] = ()) → dlhub_sdk.models.BaseMetadataModel¶
Add authors to this object
- authors – List of authors for the dataset. In format: “<Family Name>, <Given Name>”
- affiliations – List of affiliations for each author.
Set the DOI of this object, if available
This function is only for advanced usage. Most users of the toolbox will not know the DOI before sending the object in to DLHub.
Parameters: doi (string) – DOI of the object
Set the field of science that is associated with this object
Parameters: domains – Names of fields of science (e.g., “materials science”)
set_methods(methods: str) → dlhub_sdk.models.BaseMetadataModel¶
Define a methods section for this object. Use to describe any specific details about how the dataset, model, etc was generated.
Parameters: methods (str) – Detailed method descriptions
Set the name of the object.
Should be something short, descriptive, and memorable
Parameters: name (string) – Name of artifact
Define the publication year
This function is only for advanced usage. Normally, this will be assigned automatically
Parameters: year (string) – Publication year
set_title(title: str) → dlhub_sdk.models.BaseMetadataModel¶
Set the title for this object
Parameters: title – Desired title
Set the version of this resource
Parameters: version (string) – Version number
set_visibility(users: Optional[Iterable[str]] = None, groups: Optional[Iterable[str]] = None)¶
Set the list of people this object should be visible to.
By default, it will be visible to anyone ([“public”]).
- users – GlobusAuth UUIDs of allowed users
- groups – GlobusAuth UUIDs of allowed Globus groups
Basic metadata for a DLHub artefact
Includes information used by the DLHub web service to recognize an object, define how to build the computational environment, control access to it, et cetera