Skip to content

Data commands

Create data volume

ml data create --displayName yourDataVolumeDisplayName \
 --dataPath pathToYourData

Create a new data volume with the specified displayName. The data volume will be attached to the specified organization. - The description flag is optional and it will add a more detailed description to the data volume. - The dataPath flag is optional using this flag will replace the need to run data map command on the same machine.

Sync data points

ml data sync yourDataVolumeID --dataPath pathToYourFiles --commit commitMessage \ 
--processes numOfProcesses --enable_progressbar

Notes

  • The sync command will sync only the changes that are not already in the data volume. For example, if you sync a directory once and then change one file and sync again, only the changed file will be uploaded to the data volume.
  • To upload metadata, you must have files with the same name of the data point with the .metadata.json extension. The JSON string should be in the format: "Key":"Value".
  • We will recursively sync every file that is found in the directory or subdirectories of the path provided.

Sync data points to the specified dataVolume .

  • commit flag is optional that allow to automatically commit the data version after adding the new data points.
  • processes flag - specify the number of processes that the CLI will create in order to add the data points the default value is 1.
  • no_progressbar flag - Hides the progress bar during the add process.
  • enable_progressbar flag (Default) - Shows the progress bar during the add process.

Add metadata

ml data metadata add yourDataVolumeID --files pathToYourFiles \
    --data '{"meta": "data"}' --update --processes numOfProcesses \
    --enable_progressbar

Attach metadata to data points in the specified data volume.

There are three ways to specify the data points that you want to add metadata to:

1) Use the --files flag and provide the path to files that are already in your data volume.

ml data metadata add yourDataVolumeID --files pathToYourFiles \
    --data '{"meta": "data"}'

Note

Note that we will recursively attach metadata to every file that is found in the directory / subdirectories of the path provided.

2) Use the --dataPoint flag and provide the id of a specific data point that you want to add metadata to.

ml data metadata add yourDataVolumeID --dataPoint yourDataPointID \
    --data '{"meta": "data"}'

3) Use --dataFile flag and provide a path to a json file that describes which data points to add metadata to and the metadata that you wish to add.

ml data metadata add yourDataVolumeID --dataFile pathToYourDataFile

Note

Data file is a JSON file with a specific structure of dataPointID: '{meta data}'

For e.g., "ann/00008756_Y772PM50.json": {"entity_ID": "00008756_Y772PM50", "state": "Illinois", "type": "license plate"}

There are three ways to define which metadata will be added to the specified data points.

1) Use the --data flag with the json parameter e.g. '{"key":"value"}'. 2) use the --property flag to specify the property that you wish to add and its value.

ml data metadata add yourDataVolumeID -files pathToYourFiles \
    --property propertyName propertyValue

ml data metadata add yourDataVolumeID --files pathToYourFiles \
    --propertyInt propertyName propertyValue

ml data metadata add yourDataVolumeID --files pathToYourFiles \
    --propertyFloat propertyName propertyValue

3) Use --dataFile flag and provide a path to a json file that describes which data points to add metadata to and the metadata that you wish to add.

ml data metadata add yourDataVolumeID --dataFile pathToYourDataFile

Note

Data file is a JSON file with a specific structure of dataPointID: '{meta data}'

For example, "ann/00008756_Y772PM50.json": {"entity_ID": "00008756_Y772PM50", "state": "Illinois", "type": "license plate"}

replace and update flags allow you to control the behavior in case of conflicts in metadata added to the same data point in the staging version.

  • update (default) - will merge the two version of the meta data and overwrite old metadata with new metadata in case of conflicts.
  • replace - will delete old metadata and take only the metadata specified in the command.
  • processes flag - specify the number of processes that the CLI will create in order to add the data points the default value is 1.

  • no_progressbar flag - Hides the progress bar during the add process.

  • enable_progressbar flag (Default) - Shows the progress bar during the add process.

clone data to a local machine

ml data clone yourDataVolumeID --query 'queryString' \
    --destFolder 'destinationPath' --destName 'destinationFileName'

You can use the following command for viewing options of passing the flags:

ml data clone --help
  • query flag - The query string to filter the relevant data from the data volume.
  • destFolder flag - The filepath to clone the filtered data to.
  • destName flag - The filename structure.
  • delete flag - This is used to indicate that the clone action should delete all existing data that was under the provided destination folder.

Warning

Please make sure to use this action with care as this flag can potentially delete things you did not mean to and there is no way to revert this action.

  • no_split - This is used to indicate that all the files that is cloned should be cloned under the provided destination folder only, instead of into three different folders in the three different phases, namely train, validation and testing phases.
  • processes (default 1) - The number of processes that should be used to add the files.
  • no_progressbar flag - Hides the progress bar during the add process.
  • enable_progressbar flag (Default) - Shows the progress bar during the add process.

Reserved keywords with special meaning for cloning

There are several special keywords that ml clone command can translate automatically.

They are detailed below with some examples.

  • $phase - replaced by the phase folder that the file should be copied to, i.e. that train data points will be cloned to the train folder, validation data points to the validation folder and test data points will be cloned to the test folder along with their respective names and extensions.

Note

Note that you can also use "[email protected]" instead to specify [email protected] as a shortcut

In a data set where you have tagged data points with metadata of property name type_of_animal with values cat, dog and fish, when executing the following command:

ml data clone yourDataVolumeID --query 'queryString' \
    --dest '/destinationPath/[email protected]/$type_of_animal'

You will get under the training phase folder /train, validation phase folder /validation and test phase folder /test, a cat, dog and fish folder respectively and in each respective folder, the data points that was tagged with the respective metadata.