Syncing Data Points
This topic shows you how to sync data to a data volume on MissingLink.
A data point can consist of one or more files. For example, in the VOC2007 dataset, each data point can have up to four files: the raw image, an XML file with annotations, a segmentation JPG file, and a classification JPG file.
To be able to query and filter the dataset using MissingLink, you'll need to add an additional file that shares the same name as the original file but with a
.metadata.json extension. This file will contain the attributes on which you wish to query the dataset and filter the data points. We'll refer to this file as queryable metadata.
For example, if you have a file named "myfile.jpg", the queryable metadata file name will be "myfile.jpg.metadata.json".
- If your data point consists of more than one file, you will need to create a queryable metadata file for each one of the files.
- It is recommended to add an attribute to the queryable metadata file named
data_point_idand have the same value for all the files that constitute the data point. For more information, see the
@datapoint_byoperators in our Query Syntax.
Syncing data to a data volume with the MissingLink CLI
The CLI command is:
ml data sync yourDataVolumeID --data-path pathToYourData
You can copy the command from the Wizard screen in the MissingLink web dashboard.
To display the command:
Select Wizard from the menu at the right end of the Project you wish to sync.
Copy the command that appears in the Wizard.
After syncing data to the data volume with, you will be able to see the data in the dashboard under the data volume in the staging section.
There are more examples of sync commands.
- Don't forget to create the
.metadata.jsonfiles in the same folder as your current dataset. The JSON file contains a flat dictionary of attributes that can have only basic type values (string, number, boolean).
- MissingLink will recursively add every file that is found in the directory or subdirectories of the path provided.
ml data synccommand syncs only the changes that are not yet in the data volume. For example, if you sync a directory once and then change one file and sync again, only the changed file will be uploaded to the data volume.
For a full description of the command and the flags available, see the CLI reference.