Skip to content

Data Commands

About Volume ID

When performing operations with data volumes, you are required to specify the volume ID.

If you do not specify a data volume:

  • If there is only one, MissingLink uses it.
  • If there is more than one data volume, a list of those found is shown and you are prompted to choose one before the command is executed.

Commands

The ml data command group provides facilities for handling data.

The following commands can be used together with ml data:

add

Adds data to the staging area of the data volume and puts the file in the storage.

Note

The command adds data to the index of the data volume and not to the metadata.

Flags

The following flags are available with the ml data add command:

  • --files, -f TEX | Name of file to add.

    Notes

    • If you provide a relative path and not --data-path, the relative path will be used.
    • If you provide --data-path, the file will always be relative to the data path even you provide an absolute path or a relative path.

    You can use multiple flags to specify several files, as follows:

    ml data add -f 1.jpg -f 2.jpg -f 3.jpg
    
  • --files, -f TEXT

    Name of file to add.

    Notes

    • If you provide a relative path and not --data-path, the relative path will be used.
    • If you provide --data-path, the file will always be relative to the data path even you provide an absolute path or a relative path.

    You can use multiple flags to specify several files, as follows:

    ml data add -f 1.jpg -f 2.jpg -f 3.jpg
    
  • --commit TEXT

    Indicates that after the add is complete, the new data points should be committed to a new version.

    You can add an optional message to the commit.

  • --enable-progressbar (default)/--no-progressbar

    Shows or hides the progress bar during the add process.


clone

Clones data from the specified data volume to the root of the current project (the default location).

If you do not specify a data volume:

  • If there is only one found, MissingLink uses that.
  • If there is more than one data volume, a list of those found is shown and you are prompted to choose one before the command is executed.

Example

The following command:

ml data clone  --query "@version:<version-hash> class:dogs @sample:0.1" --dest-folder "\dest-folder/\$classes"

performs the following actions:

  • Saves all the files into \dest-folder and replaces $classes with the metadata "classes". For example, if a file has classes:dog in its metadata, it will be saved into "\dest-folder\dog". Any metadata can be used as a parameter and if it does not exist for a certain file, it will be replaced with an empty space.

  • Clones and downloads all the data that the query returns.

For more information about building a query string, see Query Syntax.

Flags

The following flags are available with the ml data clone command:

  • --query, -q

    Query string to filter the relevant data from the data volume. Performing a query on the data clones the data to the specified destination.

    Example

    ml data clone  --query "@version:<version-hash> class:dogs @sample:0.1" --dest-folder "\dest-folder/\$classes"
    
  • --delete

    Indicates that the clone action should delete all existing data found under the specified destination folder.

    Warning

    Exercise caution when using this action, as it can potentially delete things you did not mean to delete. There is no way to undo this action.

  • --enable-progressbar (default)/--no-progressbar

    Shows or hides the progress bar during the clone process.

  • --dest-folder, -d TEXT [required]

    Filepath to clone the filtered data to.

    Note

    You can combine this flag with the special MissingLink variables that follow..

    Example

    Assuming the data is tagged according to class:

    • 1.jpg [class:cat]
    • 2.jpg [class:dog]

    and you want to clone the data so that the files in the target are organized in folders named by class, so:

    • \dog\2.jpg
    • \cat\1.jpg

    you issue the following command:

    ml data clone --dest-folder "./\$class"
    
  • --dest-file, -df TEXT

    File to clone the filtered data to.

    Note

    You can combine this flag with the special MissingLink variables that follow.

    The default is [email protected]. Without specifying this variable, the original file name, including its extension is preserved in the target.

    Example

    Assuming the data is tagged according to class:

    • 1.jpg [class:cat]
    • 2.jpg [class:dog]

    and you want to clone the data so that the files in the target are named so: \dog.2.jpg \cat.1.jpg

    you issue the following command:

    ml data clone --dest-file "./\$class.$name"
    

There are several special MissingLink variables that the ml data clone method can translate automatically. These keywords can be used with the dest-folder and dest-file flags.

The variables are detailed below. An example follows.

  • [email protected]: Replaced by the phase folder that the file should be copied to, that is, the train data points will be cloned to the train folder, validation data points to the validation folder and test data points will be cloned to the test folder, along with their respective names and extensions.

    Note

    You can also use [email protected], instead, to specify [email protected] as a shortcut.

  • [email protected]: Replaced by the directory or directories of the file inside the data volume.

  • [email protected]: Replaced by the hash of the file.
  • [email protected]_name: Replaced by the name of the file, without its extension.
  • [email protected] or [email protected]: Replaced by the extension of the file.
  • [email protected]: Replaced by the [email protected]_name + [email protected] of the file.
  • [email protected]_field: Replaced by the value of the metadata field. If, for example, the user has assigned the metadata breed:poodle to the datapoint using $breed will translate to poodle for that data point.

    Note

    Whenever you use the special commands denotated by the $ sign, the query string must be within single quotes. It is recommended to have the whole query within single quotes. If you need to introduce spaces in values that you supply to MissingLink within the queries or destination path, it is recommended that you do so by wrapping them within double quotes to avoid conflicts or errors.

    For example:

    ml data clone --dest-folder './[email protected]'

    There is no need to wrap the command in quotes if it is being used in a recipe file.


commit

Commits files that are in the staging area to a version of the specified data volume.

If you do not specify a data volume:

  • If there is only one found, MissingLink uses that.
  • If there is more than one data volume, a list of those found is shown and you are prompted to choose one before the command is executed.

Flags

The following flags are available with the ml data commit command:

  • --message, -m

    The message to attach to the commit.

    Example:

    ml data commit yourDataVolumeID --message "your commit message"
    
  • --isolation-token TEXT

    Token obtained from an isolated sync.

See also

data sync with the --commit flag.


create

Creates a data volume with the specified display name. The data volume will be attached to the specified organization.

Flags

The following flags are available with the ml data create command:

  • --display-name TEXT

    Name to show in the display. Required.

  • --description TEXT

    More detailed description of the data volume

  • --org TEXT

    Organization to use

  • --linked/embedded

    Specifies link or embedded mode.

    • When the data volume is created in embedded mode (the default), MissingLink copies all the data during sync and manages the storage in the user-assigned storage bucket.
    • In linked mode MissingLink does not duplicate the data but stores only links to the data during sync. In this mode, the user is responsible not to delete or modify files after they were synced to the data volume.
  • --bucket TEXT

    Name of a private bucket. Specify the bucket name using the following syntax:

    • For Google cloud: gs://YourBucketName
    • For Amazon S3: s3://YourBucketName
    • For Azure storage: az://{storage_account_name}.{container_name}
    • For local storage: file://path

    If you do not specify a bucket name:

    • If there is only one bucket found, MissingLink uses that.
    • If there is more than one bucket, a list of buckets found is shown and you are prompted to choose one before the command is executed.
  • --shared-storage-volume-id VOLUME ID

    Id of an existing volume, whose storage the new data volume will use.


list

Lists the data volumes across all organizations of which the user is a member.


metadata add

Attaches metadata to files that are already in the data volume, or adds stand-alone metadata.

If you do not specify a data volume:

  • If there is only one found, MissingLink uses that.
  • If there is more than one data volume, a list of those found is shown and you are prompted to choose one before the command is executed.

Flags

The following flags are available with the ml data metadata add command:

  • --files, -f TEXT

    Path to the files to which metadata will be tagged.

    Example

    ml data metadata add --files YourFolderWithfiles --property class dog \
       --property-float weight 40.2 --property-int age 10
    

    Note

    MissingLink will recursively attach the same metadata to every file that is found in the directory or subdirectories of the path provided.

  • --data, -d TEXT

    Metadata that should be tagged to the files that are being added.

    Note

    The metadata must be passed as a JSON structure.

    Example

    ml data metadata add yourDataVolumeID \
       --files pathToYourFiles --data '{"class": "dog"}'
    
  • --data-point, -dp TEXT

    Specific data point that the metadata should be tagged to.

    Example

    In this example, the JSON is tagged to the data points 1.jpg and 2.jpg.

    ml data metadata add --data-point 1.jpg  --data-point 2.jpg \
        --data '{"classes": {"breed": "labrador", "type": "dog"}}'
    
  • --data-file, -df FILENAME

    Filepath of a JSON file that describes to which data points to add metadata and the metadata that you wish to add.

    Example

    ml data metadata add --data-file PathtoDataFile
    

    where the DataFile looks like this:

    {
       "1.jpg": {"class": "dog"},
       "2.jpg": {"class": "cat"}
    }
    
  • --property, -p TEXT TEXT

    String metadata that should be tagged to the data supplied. The flag accepts two strings: the first is the property name and the second is the property string value.

    Example

    ml data metadata add yourDataVolumeID --files pathToYourFiles \
        --property propertyName propertyValue
    
  • --property-int, -pi TEXT INTEGER

    Integer metadata that should be tagged to the data supplied. The flag accepts two strings: the first is the property name and the second is the property integer value.

    Example

    ml data metadata add yourDataVolumeID --data-point 1.jpg  --property class dog \
         --property-float weight 40.2 --property-int age 10
    
  • --property-float, -pf TEXT FLOAT

    Float metadata that should be tagged to the data supplied. The flag accepts two strings: the first is the property name and the second is the property float value.

    Example

    ml data metadata add yourDataVolumeID --data-point 1.jpg  --property class dog \
         --property-float weight 40.2 --property-int age 10
    
  • --enable-progressbar (default)/--no-progressbar

    Shows or hides the progress bar during the add process.

  • --update (default)/--replace

    Updates or replaces data.

    These flags allow you to control the behavior in case of conflicts where metadata is added to the same data point in the staging version.

    • update: Indicates that in case of conflicts, the two versions of the metadata must be merged and old metadata must be overwritten with new metadata.

    Note

    The --update flag only applies to uncommitted data in the staging area of the version control, as data already committed into a version is immutable.

    • replace: Indicates that in case of conflicts, the original metadata attached should be removed before the supplied metadata supplied is attached.

    Note

    The --replace flag only applies to uncommitted data in the staging area of the version control, as data already committed into a version is immutable.

  • --data-path: Path to the data.


query

Retrieves the metadata of data points that meet the query criteria.

The metadata are aggregated into a single file, as a JSON structure.

Flags

The following flags are available with the ml data query command:

  • --query, -q TEXT

    The query to execute.

  • --batch-size INTEGER

    Number of data points in each batch of data that is retrieved.

  • --as-dict/--as-list

    Presents information as a dictionary or as a list.

  • --silent

    Suppresses printing of progress.


sync

Syncs data to the specified data volume.

If you do not specify a data volume:

  • If there is only one found, MissingLink uses that.
  • If there is more than one data volume, a list of those found is shown and you are prompted to choose one before the command is executed.

Notes

  • Ensure that you create the .metadata.json files in the same folder as your current dataset. The JSON file contains a flat dictionary of attributes that can have only basic type values (string, number, boolean).
  • MissingLink will recursively add every file that is found in the directory or subdirectories of the path provided.
  • The ml data sync command syncs only the changes that are not yet in the data volume. For example, if you sync a directory once and then change one file and sync again, only the changed file will be uploaded to the data volume.

Example

ml data sync yourDataVolumeID --data-path pathToYourFiles --commit commitMessage \ 
   --enable-progressbar

Flags

The following flags are available with the ml data sync command:

  • --data-path

    The path to the data that should be added.

  • --commit

    Indicates that after the sync is complete, the new data points should be committed to a new version.

    ml data sync yourDataVolumeID --data-path yourDataPath --commit "your commit message"
    

    Note

    The commit takes all uncommitted changes into the same version and not only the changes in the sync command.

  • --enable-progressbar (default)/--no-progressbar

    Shows or hides the progress bar during the add process.

  • --isolated

    Performs an isolated sync.

validate

Validates data.

This action is almost the same as sync. It does not actually sync the files but only goes over them and validates the metadata files.

Flags

The following flags are available with the ml data validate: command:

  • --data-path TEXT

    Path to the data.

  • --enable-progressbar (default)/--no-progressbar

    Shows or hides the progress bar during the add process.