Querying Data Volumes in the MissingLink Dashboard
You can build queries in the query editor inside the dashboard. The queries can then be the basis for cloning the data or for streaming it directly to the experiment you are working with.
The simplest query in the dashboard returns the full set of data points in any data volume. To run it, display the data volume that you want to query, then click Run, without specifying a query.
Each result will be an item and its associated metadata. Labels for the data are displayed as column headers. Clicking an item will give you more detailed information.
A more typical query returns a subset of the data points. For example, assuming you have a dataset based on ChestXray14. Running
returns the subset of the data points that have the label
You can find a specific subset of data by running more complex queries. For example, you can query the images that answer the following criteria: all female patients, ages 18 to 55, and fewer than five follow-ups. Here's the query, shown in the dashboard:
You can use operators to further slice the query. For example, use
@split to divide the data into train, test, and validation subsets by adding
@split:0.6:0.2:0.2 to the query. You can even alter the sample size as well by adding
For more information about the syntax for building queries, see Query Syntax.
When operating on a data volume in the staging section, your queries will be running only on the data currently in the staging version.
When operating on a specific version, your queries will be running on all the data up to and including that particular commit. For example, if you have two versions of your dataset and in each version, you've added 1000 data points, an empty query on the latest version will return 2000 data points.
Cloning data volumes
MissingLink Data Volumes allow you to clone the data you have filtered using queries to a new location, based on your experiment’s needs. For more information about cloning data volumes, see Cloning Data Volumes.