Skip to content

Syncing Data Points

This topic shows you how to sync data to a data volume on MissingLink.

A data point can consist of one or more files. For example, in the VOC2007 dataset, each data point can have up to four files: the raw image, an XML file with annotations, a segmentation JPG file, and a classification JPG file.

To be able to query and filter the dataset using MissingLink, you'll need to add an additional file that shares the same name as the original file but with a .metadata.json extension. This file will contain the attributes on which you wish to query the dataset and filter the data points. We'll refer to this file as queryable metadata.

For example, if you have a file named "myfile.jpg", the queryable metadata file name will be "myfile.jpg.metadata.json".

Note

  1. If your data point consists of more than one file, you will need to create a queryable metadata file for each one of the files.
  2. It is recommended to add an attribute to the queryable metadata file named data_point_id and have the same value for all the files that constitute the data point. For more information, see the @group_by and @datapoint_by operators in our Query Syntax.

data point 1

The CLI command is:

ml data sync yourDataVolumeID --data-path pathToYourData

You can copy the command from the Wizard screen in the MissingLink web dashboard.

To display the command:

  1. Select Wizard from the menu at the right end of the Project you wish to sync.

    Added Data 1

  2. Copy the command that appears in the Wizard.

    Added Data 2

After syncing data to the data volume with, you will be able to see the data in the dashboard under the data volume in the staging section.

Step 1

There are more examples of sync commands here.

Note

  1. Don't forget to create the .metadata.json files in the same folder as your current dataset. The JSON file contains a flat dictionary of attributes that can have only basic type values (string, number, boolean).
  2. MissingLink will recursively add every file that is found in the directory or subdirectories of the path provided.
  3. The ml data sync command syncs only the changes that are not yet in the data volume. For example, if you sync a directory once and then change one file and sync again, only the changed file will be uploaded to the data volume.

For a full description of the command and the flags available, see the CLI reference.