Sharing Storage Across Data Volumes
Data volumes can share the same storage. This feature is especially useful for developing multiple "views" of a set of data.
Let's assume that a team of data scientists is creating several models based on the same data points, which consist of images of urban settings. Each member of the team is tagging a specific aspect of the data: Joan is tagging traffic lights, Simon is tagging buildings, while Marc is tagging cars and buses. In this scenario, there is a single, shared copy of the data, while the metadata, indexes and versions that reference it are unique to the needs of the scientists creating the models.
Sharing data in this way affords significant savings in time, storage space and expenses.
When you first create a data volume, you assign it a dedicated storage area, or let it share the storage that is assigned to an existing data volume.
For more information, see Creating a Data Volume.
- A user creates a regular data volume
DV1and assigns it to bucket
Aon their cloud storage. By using the command
ml data sync DV1_Id --data-path PathToFilesA, all the dataset files are copied to bucket
Afrom the directory
- Now, the user creates a shared data volume
DV2and links it to volume
DV1. Using the command
ml data sync DV2_Id --data-path PathToFilesA, none of the dataset files is copied from directory
- Now, two users can maintain two sets of metadata on the files in
FilesA, one in
DV1and the other in
DV2. Also, a query run on items on either data volume will return the same files.