Introduction to Management of Deep Learning Datasets
MissingLink helps data engineers streamline and automate the entire deep learning cycle: data, code, experiments and resources. It eliminates the grunt work and significantly shortens the time it takes to train and deliver effective models.
When it comes to data management, MissingLink provides an accessible smart data lake that manages your datasets and versions. MissingLink enables automated data exploration, versioning, and curation automatically.
Data structuring and versioning
Utilize MissingLink containers of datasets - "data volumes" - to better structure your training data. Each data volume contains an unlimited number of versions.
Immutable data versions
Once a data version is committed, it's impossible to write over an existing data version. New changes will automatically be added to a new version.
Data protection
Your data will never leave your cloud. MissingLink can manage your deep learning dataset on your own private storage.
Data evaluation
A staging version allows for quality evaluation before committing to a new data version.
Data exploration
Search for the right training data from among your existing datasets.
MissingLink’s data exploration capabilities allow you to easily understand how your deep learning datasets are actually structured and get better insights into your data, prior to training. Moreover, advanced slicing of your data is one query away.
Easy experiment reproducibility and debriefing
Each experiment is saved with a reference to its data query, making experiment reproduction as easy as one click.
You can easily compare the data queries between experiments and analyze the impact of different datasets on experiment performance.