Skip to content

Introduction to Management of Deep Learning Datasets

MissingLink helps data engineers streamline and automate the entire deep learning cycle: data, code, experiments and resources. It eliminates the grunt work and significantly shortens the time it takes to train and deliver effective models.

When it comes to data management, MissingLink provides an accessible smart data lake that manages your datasets and versions. MissingLink enables automated data exploration, versioning, and curation automatically.

Data structuring and versioning

Utilize MissingLink containers of datasets - "data volumes" - to better structure your training data. Each data volume contains an unlimited number of versions.

Immutable data versions

Once a data version is committed, it's impossible to write over an existing data version. New changes will automatically be added to a new version.

Data protection

Your data will never leave your cloud. MissingLink can manage your deep learning dataset on your own private storage.

Data evaluation

A staging version allows for quality evaluation before committing to a new data version.

Data exploration

Search for the right training data from among your existing datasets.

MissingLink’s data exploration capabilities allow you to easily understand how your deep learning datasets are actually structured and get better insights into your data, prior to training. Moreover, advanced slicing of your data is one query away.

Easy experiment reproducibility and debriefing

Each experiment is saved with a reference to its data query, making experiment reproduction as easy as one click.

You can easily compare the data queries between experiments and analyze the impact of different datasets on experiment performance.