Skip to content

Creating a Data Volume

MissingLink uses managed containers of datasets, called data volumes, to better structure your training data.

There are two methods available for creating a data volume:

  • Through the MissingLink's web dashboard
  • By using MissingLink CLI commands

You can set up a data volume with the following steps through MissingLink's web dashboard.

  1. If this is your first project, click the Data Volumes icon in the navigation bar at left, then click Add a Data Volume.

    Add a Data Volume button for adding a data volume

    If you are adding a data volume, click the Data Volumes icon, then click New Data Volume:

    New Data Volume button for creating a data volume

  2. If you have not yet installed the MissingLink CLI and authenticated, perform those steps now. Click Next to proceed.

    install the MissingLink CLI and authenticate

  3. Provide a name and description for the data volume.

  4. In the Storage list, select the location for storing your data volume:

    • By default, the data volume is stored in the MissingLink Cloud.
    • To maintain data privacy, you can sync data to a private data volume, to which MissingLink has no access. To use a private bucket, select the relevant storage location in the Storage list, or click Add Storage to create a new, private storage location.

      click Add Storage to create a new, private storage location

      For more details on creating a new storage location, see Configuring Storage.

    • If you select a local location for the data volume, you must select the data volume type: embedded or linked.

      select the data volume type: embedded or linked

      • When the data volume is of type embedded (the default), MissingLink copies all the data during sync and manages the storage in the user-assigned storage bucket.

      • When the data volume is of type linked, MissingLink does not duplicate the data but only stores links to the data during sync. In this mode, the user is responsible not to delete or modify files after they were synced to the data volume.

    • If the new data volume is going to share its storage with an existing data volume, select Shared storage with existing data volume and select the data volume from the list that appears.

      To let the new data volume share its storage with an existing data volume, select Shared storage with existing data volume

      For more details on sharing storage, see Sharing Storage Across Data Volumes.

  5. Click Next. Follow the instructions for syncing your data with the new data volume.

    Click Next to follow the instructions for syncing your data

Use the ml data create CLI command to perform one of the following:

  • For a data volume in MissingLink's secured bucket:

    ml data create --display-name yourDataVolumeDisplayName
    
  • For a data volume in a private bucket:

    ml data create --display-name yourDataVolumeDisplayName \
    --bucket yourBucketName
    

    Note

    The bucket name that you provide to MissingLink should start with a prefix:

    • For Google cloud: "gs://"
    • For Amazon S3: "s3://"
    • For Azure storage: "az://{storage_account_name}.{container_name}"

    The default prefix is "gs://" if none is provided.

  • For a data volume in local storage:

    ml data create --displayName yourDataVolumeDisplayName \
    --bucket yourLocalBucket
    

    Note

    The local bucket name that you provide to MissingLink must start with a file://path prefix.

After running the ml data create command above, you will see the new data volume in the web dashboard under the Data Management tab.

For a full description of the command and the flags available, see the MissingLink CLI reference.