Skip to content

Creating a Data Volume

This topic describes how to create a managed dataset (data volume) on MissingLink.ai.

You can set up a data volume with the following steps through MissingLink's web dashboard.

  1. If this is your first project, click the Data Volumes icon in the navigation bar at left, then click Add a Data Volume.

    Step 1

    If you are adding a data volume, click the Projects icon, then click New Data Volume:

    Step 1

  2. Provide a name and description for the data volume. Choose where you are going to store your data volume. At MissingLink.ai, we understand the importance of data privacy and allow you to create a private data volume, which we have no access to.

    • By default, the data volume is stored in the MissingLink secured bucket. If you choose this option, click Next to follow the wizard.
    • To use a private bucket, click the Set up your storage link. Proceed to the configuration screen.

    Step 2

  3. Configure the private storage bucket:

    Step 1

  4. Click Add Storage.

    Step 1

  5. From the list, choose a cloud provider. Provide a name for the bucket. Click Add.

    The new bucket appears in the list of storage buckets.

  6. Validate that your machine has access to the cloud service:

    • For Google Cloud: If your private bucket resides on Google Cloud, you will need to make sure that the machine that you run the commands from has access to Google Cloud storage. Run the following command:

       gsutil ls
      

      Make sure that you see the bucket name where the data volume resides.

      Note

      Each machine that is going to access the data volume using the CLI needs to access the Google Cloud storage bucket.

    • For AWS S3: If your private bucket resides on AWS S3, you will need to make sure that the machine that you run the commands from has access to S3. Run the following command:

       aws s3 ls
      

      Make sure that you see the bucket name where the data volume resides.

      Note

      Each machine that is going to access the data volume from MissingLink CLI needs to access the S3 bucket.

    • For Azure: If your private bucket resides on Azure, you will need to make sure that the machine that you run the commands from has access to the storage container. Run the following command:

       az storage container list [--account-key] [--account-name]
      

      Make sure that you see the bucket name where the data volume resides.

      Note

      Each machine that is going to access the data volume from MissingLink CLI needs to access the storage.

  1. Create a data volume on MissingLink's secured bucket using MissingLink CLI.

    ml data create --displayName yourDataVolumeDisplayName
    
  2. Create a data volume on your private bucket using MissingLink CLI.

    ml data create --displayName yourDataVolumeDisplayName \
    --bucket yourBucketName
    

    Note

    The bucket name that you provide to MissingLink should start with a prefix:

    • For Google cloud: "gs://"
    • For Amazon S3: "s3://"
    • For Azure storage: "az://{storage_account_name}.{container_name}"

    The default prefix is "gs://" if none is provided.

After running the create command above, you will see the new data volume in the web dashboard under the Data Management tab.

Flags for creating a data volume

Run the following command for viewing the flags available for the ml data create command:

ml data create --help
  • display-name: Display name of the data volume
  • description (optional): Description of the data volume
  • org: Name of the organization associated with the data volume
  • bucket (optional): Name of your private bucket