Skip to content

Cloning Data Volumes

In this section, you'll learn how to clone data from your data volumes on MissingLink.

  1. Click Copy Clone Command in the query page.

    Step 1

  2. Clone the specific queried data.

    Note that whenever you use the special commands denotated by the $ sign, the query string must be within single quotes. It is recommended to have the whole query within single quotes. If you need to introduce spaces in values that you supply to us within the queries or destination path, it is recommended that you do so by wrapping them within double quotes to avoid conflicts or errors.

    ml data clone  --query 'queryString' \
        --dest destinationPath
    

    The implicit command above translates to the following explicit command (the reserved keywords such as $phase and $name behind the destinationPath in the destinationPath string as detailed below):

    ml data clone  --query 'queryString' \
        --destFolder 'destinationPath' --destName 'destinationFileName'
    

    Note

    The clone command must be executed on the specific machine where you wish to access the cloned data. If you move to another machine, you must execute the command again on the other machine to gain access to the cloned data.

Flags for cloning data from your data volume

Run the following command for viewing the flags available for the command:

ml data clone --help
  • query: The query string to filter the relevant data from the data volume.
  • destFolder: The filepath to clone the queried data to.
  • destFile: The filename.
  • delete: Indicates that the clone action should delete all existing data that was under the provided destination folder.

    Warning

    Exercise caution when using this action, as this flag can potentially delete things you did not mean to delete and there is no way to revert this action.

  • processes (default 1): The number of processes that should be used to add the files.

  • no_progressbar: Hides the progress bar during the add process.
  • enable_progressbar (default): Shows the progress bar during the add process.

Reserved keywords with special meaning for cloning

There are several special keywords that the MissingLink CLI clone command can translate automatically. These keywords can be used in the --destFolder and --destName flags.

They are detailed below. Some examples follow.

  • $phase: Replaced by the phase folder that the file should be copied to, that is, the train data points will be cloned to the train folder, validation data points to the validation folder and test data points will be cloned to the test folder, along with their respective names and extensions.

    Note

    You can also use "[email protected]" instead to specify $phase as a shortcut.

  • $dir: Replaced by the directory path of the files.

  • $id: Replaced by the id of the file.
  • $basename: Replaced by the name of the file.
  • $ext: Replaced by the extension of the file.
  • $name: Replaced by the [email protected] + [email protected] of the file.
  • $metadata_field: Replaced by the value of the metadata field. An example and note is provided below for this reserved keyword to help you better understand how we interpret it.

Examples

For the purpose of the examples, the data set contains data points with a single attribute in the metadata named type_of_animal that has the values: Dog, Cat, and Fish.

1) Running the following command:

ml data clone  --query '@version:versionID 
    AND @sample:0.2 AND @split:0.5:0.25:0.25 @seed:1337' \
    --destFolder '/destinationPath/[email protected]/'

creates three folders under the destinationPath named train, test, and validation and copies the data points according to the @split ratio to each folder.

2) Running the following command:

ml data clone  --query '@version:versionID
    AND @sample:0.2 AND @split:0.5:0.25:0.25 @seed:1337' \
    --destFolder '/destinationPath/[email protected]/' --destFile '$name' 

generates the original filename for each data point copied.

3) Running the following command:

ml data clone  --query '@version:versionID
    AND @sample:0.2 AND @split:0.5:0.25:0.25 @seed:1337' \
    --destFolder '/destinationPath/[email protected]/$dir' --destFile '$name' 

creates subfolders with the original folder structure of the data points from the sync command under the folders of train, test, and validation

4) Running the following command:

ml data clone  --query '@version:versionID
    AND @sample:0.2 AND @split:0.5:0.25:0.25 @seed:1337' \
    --destFolder '/destinationPath/[email protected]/$type_of_animal' --destFile '$name' 

creates 'Dog', 'Cat' and 'Fish' subfolders under the train, test, and validation folders and copies the relevant data points for each subfolder according to the type_of_animal attribute.