dlhub_sdk.models package


dlhub_sdk.models.datasets module

dlhub_sdk.models.pipeline module

Module contents

This module contains tools for describing objects being published to DLHub.

class dlhub_sdk.models.BaseMetadataModel

Bases: pydantic.main.BaseModel

Base class for models describing objects published via DLHub

Covers information that goes in the datacite block of the metadata file and some of the DLHub block.

There are many kinds of MetadataModel classes that each describe a different kind of object. Each of these different types are created using the create_model operation (e.g., KerasModel.create_model('model.h5')), but have different arguments depending on the type of object. For example, TensorFlow models only require the directory created when saving the model for serving but scikit-learn models require the pickle file, how the pickle was created (e.g., with joblib), and how many input features it requires.

Once created, you will need to fill in additional details about the object to make it reusable. The MetadataModel classes attempt to learn as much about an object as possible automatically, but there is some information that must be provided by a human. To start, you must define a title and name for the object and are encouraged to provide an abstract describing the model and list any associated papers/websites that describe the model. You will fill plenty of examples for how to describe the models in the DLHub_containers repostiory. Some types of objects require data specific to their type (e.g., Python servables need a list of required packages). We encourage you to find examples for your specific type of object in the containers repository for inspiration and to see the Python documentation for each Metadata Model.

The MetadataModel object can be saved using the to_dict operation and read back into memory using the from_dict method. We recommend you save your dictionary to disk in the JSON or yaml format, which will allow for manual edits to be made before submitting or resubmitting a object description.

add_directory(directory: str, include: Union[str, Iterable[str]] = (), exclude: Union[str, Iterable[str]] = (), recursive: bool = False)

Add all the files in a directory

  • include (string or [string]) – Only add files that match any of these patterns
  • exclude (string or [string]) – Exclude all files that match any of these patterns
  • directory (string) – Path to a directory
  • recursive (bool) – Whether to add all files in a directory
add_file(file, name=None)

Add a file to the list of files to be distributed with this object

  • file (string) – Path to the file
  • name (string) – Optional. Name of the file, if it is a file that serves a specific purpose (e.g., “pickle” if this is a pickle file of a scikit-learn model)
add_files(files: Iterable[str])

Add files that should be distributed with this artifact.

Parameters:files – Paths of files that should be published

Add a resource that is related to this object.

We use the DataCite to describe the relations. Common relation types for DLHub objects are:
  • “IsDescribedBy”: For a paper that describes a dataset or model
  • “IsDocumentedBy”: For the software documentation for a model
  • “IsDerviedFrom”: For the database a training set was pulled from
  • “Requires”: For any software libraries that are required for this module
  • identifier – Identifier
  • identifier_type – Identifier type
  • relation_type – Relation between this identifier and the object you are describing
add_requirement(library, version=None)

Add a required Python library.

The name of the library should be either the name on PyPI, or a URL for the git repository holding the code (e.g., git+https://github.com/DLHub-Argonne/dlhub_sdk.git)

  • library (string) – Name of library
  • version (string) – Required version. ‘latest’ to use the most recent version on PyPi (if available). ‘detect’ will attempt to find the version of the library installed on the computer running this software. Default is None

Add several Python library requirements

Parameters:requirements (dict) – Keys are names of library (str), values are the version
classmethod create_model(**kwargs) → dlhub_sdk.models.BaseMetadataModel

Instantiate the metadata model.

Takes in arguments that allow metadata describing a dataset to be autogenerated. For example, these could include options describing how to read a dataset from a CSV file or which class method to invoke on a Python pickle object.


Write all the listed files to a ZIP object

Takes all of the files returned by list_files. First determines the largest common path of all files, and preserves directory structure by using this common path as the root directory. For example, if the files are “/home/a.pkl” and “/home/a/b.dat”, the common directory is “/home” and the files will be stored in the Zip as “a.pkl” and “a/b.dat”

Parameters:path (string) – Path for the ZIP File
Base path for the ZIP file (useful for adjusting the paths of the files
included in the metadata model)
Return type:(string)

Provide a list of files associated with this artifact.

Returns:([string]) list of file paths

Get the name of the servable

Returns:(string) Name of the servable

Gathers information about required environment from repo2docker configuration files.

See https://repo2docker.readthedocs.io/en/latest/config_files.html for more details

Parameters:directory (str) – Path to directory containing configuration files (default: current working directory)
read_codemeta_file(directory: Optional[str] = None)

Read in metadata from a codemeta.json file

Parameters:directory (string) – Path to directory contain the codemeta.json file (default: current working directory)
set_abstract(abstract: str) → dlhub_sdk.models.BaseMetadataModel

Define an abstract for this object. Use for a high-level summary

Parameters:abstract – Description of this artifact
set_creators(authors: List[str], affiliations: List[Sequence[str]] = ()) → dlhub_sdk.models.BaseMetadataModel

Add authors to this object

  • authors – List of authors for the dataset. In format: “<Family Name>, <Given Name>”
  • affiliations – List of affiliations for each author.

Set the DOI of this object, if available

This function is only for advanced usage. Most users of the toolbox will not know the DOI before sending the object in to DLHub.

Parameters:doi (string) – DOI of the object
set_domains(domains: List[str])

Set the field of science that is associated with this object

Parameters:domains – Names of fields of science (e.g., “materials science”)
set_methods(methods: str) → dlhub_sdk.models.BaseMetadataModel

Define a methods section for this object. Use to describe any specific details about how the dataset, model, etc was generated.

Parameters:methods (str) – Detailed method descriptions

Set the name of the object.

Should be something short, descriptive, and memorable

Parameters:name (string) – Name of artifact

Define the publication year

This function is only for advanced usage. Normally, this will be assigned automatically

Parameters:year (string) – Publication year
set_title(title: str) → dlhub_sdk.models.BaseMetadataModel

Set the title for this object

Parameters:title – Desired title

Set the version of this resource

Parameters:version (string) – Version number
set_visibility(users: Optional[Iterable[str]] = None, groups: Optional[Iterable[str]] = None)

Set the list of people this object should be visible to.

By default, it will be visible to anyone ([“public”]).

  • users – GlobusAuth UUIDs of allowed users
  • groups – GlobusAuth UUIDs of allowed Globus groups
to_dict(simplify_paths: bool = False)

Render the object to a JSON-ready dictionary

Parameters:simplify_paths (bool) – Whether to simplify the paths of each file
Returns:(dict) A description of the dataset in a form suitable for download
class dlhub_sdk.models.DLHubMetadata

Bases: pydantic.main.BaseModel

Basic metadata for a DLHub artefact

Includes information used by the DLHub web service to recognize an object, define how to build the computational environment, control access to it, et cetera

class dlhub_sdk.models.DLHubType

Bases: enum.Enum

Type supported by DLHub

dataset = 'dataset'
pipeline = 'piepline'
servable = 'servable'