Cloning Data Volumes
In this section, you'll learn how to clone data from your data volumes on MissingLink.
Cloning Data Using MissingLink CLI
Click Copy Clone Command in the query page.
Clone the specific queried data.
Note that whenever you use the special commands denotated by the
$sign, the query string must be within single quotes. It is recommended to have the whole query within single quotes. If you need to introduce spaces in values that you supply to us within the queries or destination path, it is recommended that you do so by wrapping them within double quotes to avoid conflicts or errors.
ml data clone --query 'queryString' \ --dest destinationPath
The implicit command above translates to the following explicit command (the reserved keywords such as
destinationPathstring as detailed below):
ml data clone --query 'queryString' \ --destFolder 'destinationPath' --destName 'destinationFileName'
The clone command must be executed on the specific machine where you wish to access the cloned data. If you move to another machine, you must execute the command again on the other machine to gain access to the cloned data.
Flags for cloning data from your data volume
Run the following command for viewing the flags available for the command:
ml data clone --help
query: The query string to filter the relevant data from the data volume.
destFolder: The filepath to clone the queried data to.
destFile: The filename.
delete: Indicates that the clone action should delete all existing data that was under the provided destination folder.
Exercise caution when using this action, as this flag can potentially delete things you did not mean to delete and there is no way to revert this action.
processes(default 1): The number of processes that should be used to add the files.
no_progressbar: Hides the progress bar during the add process.
enable_progressbar(default): Shows the progress bar during the add process.
Reserved keywords with special meaning for cloning
There are several special keywords that the MissingLink CLI clone command can translate automatically.
These keywords can be used in the
They are detailed below. Some examples follow.
$phase: Replaced by the phase folder that the file should be copied to, that is, the train data points will be cloned to the train folder, validation data points to the validation folder and test data points will be cloned to the test folder, along with their respective names and extensions.
You can also use "[email protected]" instead to specify $phase as a shortcut.
$dir: Replaced by the directory path of the files.
$id: Replaced by the id of the file.
$basename: Replaced by the name of the file.
$ext: Replaced by the extension of the file.
$name: Replaced by the [email protected] + [email protected] of the file.
$metadata_field: Replaced by the value of the metadata field. An example and note is provided below for this reserved keyword to help you better understand how we interpret it.
For the purpose of the examples, the data set contains data points with a single attribute in the metadata named
type_of_animal that has the values: Dog, Cat, and Fish.
1) Running the following command:
ml data clone --query '@version:versionID AND @sample:0.2 AND @split:0.5:0.25:0.25 @seed:1337' \ --destFolder '/destinationPath/[email protected]/'
creates three folders under the
destinationPath named train, test, and validation and copies the data points according to the @split ratio to each folder.
2) Running the following command:
ml data clone --query '@version:versionID AND @sample:0.2 AND @split:0.5:0.25:0.25 @seed:1337' \ --destFolder '/destinationPath/[email protected]/' --destFile '$name'
generates the original filename for each data point copied.
3) Running the following command:
ml data clone --query '@version:versionID AND @sample:0.2 AND @split:0.5:0.25:0.25 @seed:1337' \ --destFolder '/destinationPath/[email protected]/$dir' --destFile '$name'
creates subfolders with the original folder structure of the data points from the sync command under the folders of train, test, and validation
4) Running the following command:
ml data clone --query '@version:versionID AND @sample:0.2 AND @split:0.5:0.25:0.25 @seed:1337' \ --destFolder '/destinationPath/[email protected]/$type_of_animal' --destFile '$name'
creates 'Dog', 'Cat' and 'Fish' subfolders under the train, test, and validation folders and copies the relevant data points for each subfolder according to the