dlhub_sdk.models package¶

Subpackages¶

dlhub_sdk.models.servables package

Submodules¶

dlhub_sdk.models.datasets module¶

dlhub_sdk.models.pipeline module¶

Module contents¶

This module contains tools for describing objects being published to DLHub.

class dlhub_sdk.models.BaseMetadataModel¶

Bases: pydantic.main.BaseModel

Base class for models describing objects published via DLHub

Covers information that goes in the datacite block of the metadata file and some of the DLHub block.

There are many kinds of MetadataModel classes that each describe a different kind of object. Each of these different types are created using the create_model operation (e.g., KerasModel.create_model('model.h5')), but have different arguments depending on the type of object. For example, TensorFlow models only require the directory created when saving the model for serving but scikit-learn models require the pickle file, how the pickle was created (e.g., with joblib), and how many input features it requires.

Once created, you will need to fill in additional details about the object to make it reusable. The MetadataModel classes attempt to learn as much about an object as possible automatically, but there is some information that must be provided by a human. To start, you must define a title and name for the object and are encouraged to provide an abstract describing the model and list any associated papers/websites that describe the model. You will fill plenty of examples for how to describe the models in the DLHub_containers repostiory. Some types of objects require data specific to their type (e.g., Python servables need a list of required packages). We encourage you to find examples for your specific type of object in the containers repository for inspiration and to see the Python documentation for each Metadata Model.

The MetadataModel object can be saved using the to_dict operation and read back into memory using the from_dict method. We recommend you save your dictionary to disk in the JSON or yaml format, which will allow for manual edits to be made before submitting or resubmitting a object description.

add_directory(directory: str, include: Union[str, Iterable[str]] = (), exclude: Union[str, Iterable[str]] = (), recursive: bool = False)¶

Add all the files in a directory

Parameters:	include (string or [string]) – Only add files that match any of these patterns exclude (string or [string]) – Exclude all files that match any of these patterns directory (string) – Path to a directory recursive (bool) – Whether to add all files in a directory

add_file(file, name=None)¶

Add a file to the list of files to be distributed with this object

Parameters:	file (string) – Path to the file name (string) – Optional. Name of the file, if it is a file that serves a specific purpose (e.g., “pickle” if this is a pickle file of a scikit-learn model)

add_files(files: Iterable[str])¶

Add files that should be distributed with this artifact.

Parameters:	files – Paths of files that should be published

add_related_resource(identifier: str, identifier_type: Union[str, dlhub_sdk.models.datacite.DataciteRelatedIdentifierType], relation_type: Union[str, dlhub_sdk.models.datacite.DataciteRelationType]) → dlhub_sdk.models.BaseMetadataModel¶

Add a resource that is related to this object.

We use the DataCite to describe the relations. Common relation types for DLHub objects are:

“IsDescribedBy”: For a paper that describes a dataset or model
“IsDocumentedBy”: For the software documentation for a model
“IsDerviedFrom”: For the database a training set was pulled from
“Requires”: For any software libraries that are required for this module

Parameters:	identifier – Identifier identifier_type – Identifier type relation_type – Relation between this identifier and the object you are describing

add_requirement(library, version=None)¶

Add a required Python library.

The name of the library should be either the name on PyPI, or a URL for the git repository holding the code (e.g., git+https://github.com/DLHub-Argonne/dlhub_sdk.git)

Parameters:	library (string) – Name of library version (string) – Required version. ‘latest’ to use the most recent version on PyPi (if available). ‘detect’ will attempt to find the version of the library installed on the computer running this software. Default is `None`

add_requirements(requirements)¶

Add several Python library requirements

Parameters:	requirements (dict) – Keys are names of library (str), values are the version

classmethod create_model(**kwargs) → dlhub_sdk.models.BaseMetadataModel¶

Instantiate the metadata model.

Takes in arguments that allow metadata describing a dataset to be autogenerated. For example, these could include options describing how to read a dataset from a CSV file or which class method to invoke on a Python pickle object.

get_zip_file(path)¶

Write all the listed files to a ZIP object

Takes all of the files returned by list_files. First determines the largest common path of all files, and preserves directory structure by using this common path as the root directory. For example, if the files are “/home/a.pkl” and “/home/a/b.dat”, the common directory is “/home” and the files will be stored in the Zip as “a.pkl” and “a/b.dat”

Parameters:	path (string) – Path for the ZIP File
Returns:	Base path for the ZIP file (useful for adjusting the paths of the files included in the metadata model)
Return type:	(string)

list_files()¶

Provide a list of files associated with this artifact.

Returns:	([string]) list of file paths

name¶

Get the name of the servable

Returns:	(string) Name of the servable

parse_repo2docker_configuration(directory=None)¶

Gathers information about required environment from repo2docker configuration files.

See https://repo2docker.readthedocs.io/en/latest/config_files.html for more details

Parameters:	directory (str) – Path to directory containing configuration files (default: current working directory)

read_codemeta_file(directory: Optional[str] = None)¶

Read in metadata from a codemeta.json file

Parameters:	directory (string) – Path to directory contain the codemeta.json file (default: current working directory)

set_abstract(abstract: str) → dlhub_sdk.models.BaseMetadataModel¶

Define an abstract for this object. Use for a high-level summary

Parameters:	abstract – Description of this artifact

set_creators(authors: List[str], affiliations: List[Sequence[str]] = ()) → dlhub_sdk.models.BaseMetadataModel¶

Add authors to this object

Parameters:	authors – List of authors for the dataset. In format: “<Family Name>, <Given Name>” affiliations – List of affiliations for each author.

set_doi(doi)¶

Set the DOI of this object, if available

This function is only for advanced usage. Most users of the toolbox will not know the DOI before sending the object in to DLHub.

Parameters:	doi (string) – DOI of the object

set_domains(domains: List[str])¶

Set the field of science that is associated with this object

Parameters:	domains – Names of fields of science (e.g., “materials science”)

set_methods(methods: str) → dlhub_sdk.models.BaseMetadataModel¶

Define a methods section for this object. Use to describe any specific details about how the dataset, model, etc was generated.

Parameters:	methods (str) – Detailed method descriptions

set_name(name)¶

Set the name of the object.

Should be something short, descriptive, and memorable

Parameters:	name (string) – Name of artifact

set_publication_year(year)¶

Define the publication year

This function is only for advanced usage. Normally, this will be assigned automatically

Parameters:	year (string) – Publication year

set_title(title: str) → dlhub_sdk.models.BaseMetadataModel¶

Set the title for this object

Parameters:	title – Desired title

set_version(version)¶

Set the version of this resource

Parameters:	version (string) – Version number

set_visibility(users: Optional[Iterable[str]] = None, groups: Optional[Iterable[str]] = None)¶

Set the list of people this object should be visible to.

By default, it will be visible to anyone ([“public”]).

Parameters:	users – GlobusAuth UUIDs of allowed users groups – GlobusAuth UUIDs of allowed Globus groups

to_dict(simplify_paths: bool = False)¶

Render the object to a JSON-ready dictionary

Parameters:	simplify_paths (bool) – Whether to simplify the paths of each file
Returns:	(dict) A description of the dataset in a form suitable for download

class dlhub_sdk.models.DLHubMetadata¶

Bases: pydantic.main.BaseModel

Basic metadata for a DLHub artefact

Includes information used by the DLHub web service to recognize an object, define how to build the computational environment, control access to it, et cetera

class dlhub_sdk.models.DLHubType¶

Bases: enum.Enum

Type supported by DLHub

dataset = 'dataset'¶

pipeline = 'piepline'¶

servable = 'servable'¶