Skip to content

Data Commands

About Volume ID

When performing operations with data volumes, you are required to specify the volume ID.

If you do not specify a data volume:

  • If there is only one, MissingLink uses it.
  • If there is more than one data volume, a list of those found is shown and you are prompted to choose one before the command is executed.

Commands

The ml data command group provides facilities for handling data.

The following commands can be used together with ml data:

add

Adds data to the staging area of the data volume and puts the file in the storage.

Note

The command adds data to the index of the data volume and not to the metadata.

Flags

The following flags are available with the ml data add command:

  • --files, -f TEXT

    Name of file to add.

    Notes

    • If you provide a relative path and not --data-path, the relative path will be used.
    • If you provide --data-path, the file will always be relative to the data path even you provide an absolute path or a relative path.

    You can use multiple flags to specify several files, as follows:

    ml data add -f 1.jpg -f 2.jpg -f 3.jpg

  • --commit TEXT

    Indicates that after the add is complete, the new data points should be committed to a new version.

    You can add an optional message to the commit.

  • --enable-progressbar (default)/--no-progressbar

    Shows or hides the progress bar during the add process.

clone

Clones data from the specified data volume.

If you do not specify a data volume:

  • If there is only one found, MissingLink uses that.
  • If there is more than one data volume, a list of those found is shown and you are prompted to choose one before the command is executed.

Example

The following command:

ml data clone  --query "@version:<version-hash> class:dogs @sample:0.1" --dest-folder "\dest-folder/$classes"

performs the following actions:

  • Saves all the files into \dest-folder and replaces $classes with the metadata "classes". For example, if a file has classes:dog in its metadata, it will be saved into "\dest-folder\dog". Any metadata can be used as a parameter and if it does not exist for a certain file, it will be replaced with an empty space.

  • Clones and downloads all the data that the query returns.

Flags

The following flags are available with the ml data clone command:

  • --dest-folder, -d TEXT [required]

    Filepath to clone the filtered data to.

    The command can be used with the system variables that follow.

System variables with special meaning for cloning

There are several special system variables that the ml data clone command can translate automatically. These keywords can be used in the --destFolder and --destName flags.

They are detailed below. An example follows.

  • [email protected]: Replaced by the phase folder that the file should be copied to, that is, the train data points will be cloned to the train folder, validation data points to the validation folder and test data points will be cloned to the test folder, along with their respective names and extensions.

    Note

    You can also use [email protected], instead, to specify [email protected] as a shortcut.

  • [email protected]: Replaced by the hash value of the content of the file.

  • [email protected]: Replaced by theId of the file.
  • [email protected]_name: Replaced by the name of the file, without its extension.
  • [email protected] or [email protected]: Replaced by the extension of the file.
  • [email protected]: Replaced by the [email protected]_name + [email protected] of the file.
  • [email protected]_field: Replaced by the value of the metadata field. If, for example, the user has assigned the metadata breed:poodle to the datapoint using $breed will translate to poodle for that data point.

    Example

    Assuming the data is tagged according to class:

    • 1.jpg [class:cat]
    • 2.jpg [class:dog]

    and you want to clone the data so that the files in the target are organized in folders named by class, so:

    • \dog\2.jpg
    • \cat\1.jpg

    you issue the following command:

    ml data clone --dest-folder "./$class"

  • --dest-file, -df TEXT

    File to clone the filtered data to.

    The command can be used with global system variables.

    The default is [email protected]. Without specifying this global variable, the original file name, including its extension is preserved in the target.

    Example

    Assuming the data is tagged according to class:

    • 1.jpg [class:cat]
    • 2.jpg [class:dog]

    and you want to clone the data so that the files in the target are named so: \dog.2.jpg \cat.1.jpg

    you issue the following command:

    ml data clone --dest-file "./$class.$name"

  • --query, -q

    Query string to filter the relevant data from the data volume. Performing a query on the data clones the data to the specified destination.

    Example

    ml data clone --query "@version: class:dogs @sample:0.1" --dest-folder "\dest-folder/$classes"

    For more information about building a query string, see Query Syntax.

  • --delete

    Indicates that the clone action should delete all existing data found under the specified destination folder.

    Warning

    Exercise caution when using this action, as this flag can potentially delete things you did not mean to delete and there is no way to revert this action.

  • --enable-progressbar (default)/--no-progressbar

    Shows or hides the progress bar during the clone process.

commit

Commits files that are in the staging area to a version of the specified data volume.

If you do not specify a data volume:

  • If there is only one found, MissingLink uses that.
  • If there is more than one data volume, a list of those found is shown and you are prompted to choose one before the command is executed.

Flags

The following flags are available with the ml data commit command:

  • --message, -m

    The message to attach to the commit.

    ml data commit yourDataVolumeID --message "your commit message"

See also

data sync with the --commit flag.

create

Creates a data volume with the specified display name. The data volume will be attached to the specified organization.

Flags

The following flags are available with the ml data create command:

  • --display-name TEXT

    Name to show in the display. Required.

  • --description TEXT

    More detailed description of the data volume

  • --org TEXT

    Organization to use

  • --data-path TEXT

    Path to the data.

  • --linked/embedded

    Specifies link or embedded mode.

    • When the data volume is created in embedded mode (the default), MissingLink copies all the data during sync and manages the storage in the user-assigned storage bucket.
    • In linked mode MissingLink does not duplicate the data but stores only links to the data during sync. In this mode, the user is responsible not to delete or modify files after they were synced to the data volume.
  • --bucket TEXT

    Name of a private bucket. Specify the bucket name using the following syntax:

    • For Google cloud: "gs://YourBucketName"
    • For Amazon S3: "s3://YourBucketName"
    • For Azure storage: "az://{storage_account_name}.{container_name}"
    • For local storage: "file://path"

    If you do not specify a bucket name:

    • If there is only one bucket found, MissingLink uses that.
    • If there is more than one bucket, a list of buckets found is shown and you are prompted to choose one before the command is executed.

list

Lists the data volumes across all organizations of which the user is a member.

metadata

Attaches metadata to files that are already in the data volume, or adds stand-alone metadata.

If you do not specify a data volume:

  • If there is only one found, MissingLink uses that.
  • If there is more than one data volume, a list of those found is shown and you are prompted to choose one before the command is executed.

Flags

The following flags are available with the ml data metadata command:

  • --files, -f TEXT

    Path to the files to which metadata will be tagged.

    Example

    ml data metadata --files YourFolderWithfiles --property class dog \
    --propertyFloat weight 40.2 --propertyInteger age 10
    

    Note

    MissingLink will recursively attach the same metadata to every file that is found in the directory or subdirectories of the path provided.

  • --data, -d TEXT

    Metadata that should be tagged to the files that are being added.

    Note

    The metadata must be passed as a JSON structure.

    Example

    ml data metadata yourDataVolumeID \
        --files pathToYourFiles --data '{"class": "dog"}'
    
  • --dataPoint, -dp TEXT

    Specific data point that the metadata should be tagged to.

    Example

    In this example, the JSON is tagged to the data points 1.jpg and 2.jpg.

    ml data metadata --dataPoint 1.jpg  --dataPoint 2.jpg \
        --data '{"classes": {"breed": "labrador", "type": "dog"}}'
    
  • --dataFile, -df FILENAME

    Filepath of a JSON file that describes to which data points to add metadata and the metadata that you wish to add.

    Example

    ml data metadata --dataFile PathtoDataFile
    

    where the DataFile looks like this:

    {
       "1.jpg": {"class": "dog"},
       "2.jpg": {"class": "cat"}
    }
    
  • --property, -p TEXT

    String metadata that should be tagged to the data supplied. The flag accepts two strings: the first is the property name and the second is the property string value.

    Example

    ml data metadata yourDataVolumeID --files pathToYourFiles \
        --property propertyName propertyValue
    
  • --propertyInt, -p TEXT INTEGER

    Integer metadata that should be tagged to the data supplied. The flag accepts two strings: the first is the property name and the second is the property integer value.

    Example

    ml data metadata yourDataVolumeID --dataPoint 1.jpg  --property class dog \
    --propertyFloat weight 40.2 --propertyInteger age 10
    
  • --propertyFloat, -pf TEXT FLOAT

    Float metadata that should be tagged to the data supplied. The flag accepts two strings: the first is the property name and the second is the property float value.

    Example

    ml data metadata yourDataVolumeID --dataPoint 1.jpg  --property class dog \
    --propertyFloat weight 40.2 --propertyInteger age 10
    
  • --enable-progressbar (default)/--no-progressbar

    Shows or hides the progress bar during the add process.

  • --update (default)/--replace

    Updates or replaces data.

    These flags allow you to control the behavior in case of conflicts in metadata added to the same data point in the staging version.

    • update: Indicates that in case of conflicts, the two version of the metadata must be merged and old metadata be overwritten with new metadata.

    Note

    The --update flag only applies to uncommitted data in the staging area of the version control, as data already committed into a version is immutable.

    • replace: Indicates that in case of conflicts, the original metadata attached should be removed before the supplied metadata supplied is attached.

    Note

    The --replace flag only applies to uncommitted data in the staging area of the version control, as data already committed into a version is immutable.

query

Retrieves the metadata of data points that meet the query criteria.

The metadata are aggregated into a single file, as a JSON structure.

Flags

The following flags are available with the ml data query command:

  • --query, -q TEXT

The query to execute.

  • --batch-size INTEGER

    Number of data points in each batch of data that is retrieved.

  • --as-dict/--as-list

    Presents information as a dictionary or as a list.

  • --silent

    Suppresses printing of progress

sync

Syncs data to the specified data volume.

If you do not specify a data volume:

  • If there is only one found, MissingLink uses that.
  • If there is more than one data volume, a list of those found is shown and you are prompted to choose one before the command is executed.

Notes

  • Ensure that you create the .metadata.json files in the same folder as your current dataset. The JSON file contains a flat dictionary of attributes that can have only basic type values (string, number, boolean).
  • MissingLink will recursively add every file that is found in the directory or subdirectories of the path provided.
  • The ml data sync command syncs only the changes that are not yet in the data volume. For example, if you sync a directory once and then change one file and sync again, only the changed file will be uploaded to the data volume.

Example

ml data sync yourDataVolumeID --dataPath pathToYourFiles --commit commitMessage \ 
--enable-progressbar

Flags

The following flags are available with the ml data sync command:

  • --dataPath

    The path to the data that should be added.

  • --commit

    Indicates that after the sync is complete, the new data points should be committed to a new version.

    ml data sync yourDataVolumeID --dataPath yourDataPath --commit "your commit message"
    

    Note

    The commit takes all uncommitted changes into the same version and not only the changes in the sync command.

  • --enable-progressbar (default)/--no-progressbar

    Shows or hides the progress bar during the add process.

  • --resume

    Resumes the sync in case it failed before completing.

    ml data sync yourDataVolumeID --dataPath pathToYourData --resume yourResumeToken