Data Version Control
This topic describes data science version control in MissingLink for managing the versions of your data volumes.
When working with datasets, you might need to check out a specific, earlier data volume version, especially when modifications to a dataset did not produce the desirable results.
Creating and committing versions of your data volumes helps you to achieve that.
Each data volume version is immutable and if you want to change the dataset, you have to commit a new version.
When you perform a regular sync to the data volume, that is, without explicitly committing it, you create a staging version of the data. The data is shown in the dashboard under the data volume and marked Staging. You can then later commit it as a new version.
Unless otherwise specified, the default version that MissingLink references in each case is the staging version.
Committing a staged version
You can commit a version either by using the MissingLink dashboard, or by running a command from the CLI.
Option 1 - using the MissingLink dashboard
Inside the Staging area, click Commit.
Provide a description for the commit and click Commit.
The new commit is added to the list of versions.
Option 2 - using the MissingLink CLI
ml data commit command:
ml data commit yourDataVolumeID --message "your commit message"
A new commit is added to the web dashboard under the data volume.
You can also achieve the commit by running the
ml data sync command with the
ml data sync yourDataVolumeID --data-path yourDataPath --commit "your commit message"
Note that the commit takes all uncommitted changes into the same version and not only the changes in the sync command.
Unstaging a version using the MissingLink dashboard
You can unstage a version that has been staged, but not yet committed.
Inside the Staging area, click Unstage.
Approve that you want to unstage your changes. __ !!! note In the case of an unstage, all of your unstaged data will be lost.