Skip to content

Sharing Storage Across Data Volumes

Data volumes can share the same storage. This feature is especially useful for developing multiple "views" of a set of data.

Let's assume that a team of data scientists is creating several models based on the same data points, which consist of images of urban settings. Each member of the team is tagging a specific aspect of the data: Joan is tagging traffic lights, Simon is tagging buildings, while Marc is tagging cars and buses. In this scenario, there is a single, shared copy of the data, while the metadata, indexes and versions that reference it are unique to the needs of the scientists creating the models.

Sharing data in this way affords significant savings in time, storage space and expenses.

When you first create a data volume, you assign it a dedicated storage area, or let it share the storage that is assigned to an existing data volume.

For more information, see Creating a Data Volume.

Examples

  • A user creates a regular data volume DV1 and assigns it to bucket A on their cloud storage. By using the command ml data sync DV1_Id --data-path PathToFilesA, all the dataset files are copied to bucket A from the directory FilesA.
  • Now, the user creates a shared data volume DV2 and links it to volume DV1. Using the command ml data sync DV2_Id --data-path PathToFilesA, none of the dataset files is copied from directory FilesA to bucket A.
  • Now, two users can maintain two sets of metadata on the files in FilesA, one in DV1 and the other in DV2. Also, a query run on items on either data volume will return the same files.