Microsoft Azure Authorization Process
This page provides a deeper look at the process you set in motion when you authorize Resource Management to access your Azure account using the
ml resources azure init command.
To fully understand the description that follows, first familiarize yourself with some key Azure terminology:
Azure Active Directory: Enables you to manage access to Azure services and resources securely. Using AD, you can create and manage Azure users and groups, and use permissions to allow and deny their access to Azure resources.
Template-based Deployment: Templates provide a common language for you to describe and provision all the infrastructure resources in your cloud environment. You define the template once, then to run a new deployment, you provide the specific parameters. All the resources specified in the template are created in an automated and secure manner. This template serves as the "single source of truth" for your cloud environment.
Key Vault: A managed service that makes it easy for you to create and control the encryption keys used to encrypt your data, and uses FIPS 140-2 validated hardware security modules to protect the security of your keys.
Managed Identity: Part of the Azure Active Directory (Azure AD). Managed Identity manages the credentials in your code for authenticating to cloud services.
Service Principal: An Azure service principal is a security identity used by user-created apps, services, and automation tools to access specific Azure resources.
Virtual Network: Lets you provision a logically isolated section of the Azure Cloud where you can launch Azure resources in a virtual network that you define.
Resource Group: A logical collection of different cloud entities. A group might contain virtual machines, storage accounts, key vaults, virtual networks, and so on.
Storage Account: A basic entity for storing data in the Azure cloud. It can store queues, blobs, files, and tables, for instance.
Container Registry: A private registry for Docker images.
As part of the authorization process, MissingLink installs the following:
Two Azure Resource Groups:
The first (named MissingLinkAI-<organization_name>) contains the following:
- All the virtual machines that are managed by MissingLink.
- A storage account for base machine images. MissingLink uses custom images based on Ubuntu that have preinstalled GPU drivers and a Docker service. They are distributed as blobs and must be stored in the user’s account in order to be used in the creation of the managed image.
- Base managed images for machines. They are created from blobs stored in the storage account from the previous point.
- Managed Identity is assigned to all the virtual machines created by MissingLink and allows them to access Key Vault, the storage accounts that contain data volumes, and the container registry.
- Key Vault contains your SSH encryption key. As part of the
ml resources azure initcommand, the Managed Identity used by the virtual machines is granted
decryptpermission. This allows virtual machines inside your cloud to decrypt your organization encryption key securely without granting MissingLink access to any of those keys.
- A virtual network is dedicated to all the virtual machines that are managed by MissingLink. By definition, this virtual network is isolated from the rest of the user’s network.
- Container Registry that can be readily used to store the private Docker images that you are going to use with MissingLink.
The second Resource Group (named MissingLinkAI-<organization_name>-storage) contains a storage account for storing the default artifact data volume. Users can create additional data volumes in this storage account.
Two Service Principals:
- One is associated with the MissingLink service and allows MissingLink to create and remove virtual machines, as well as check their lifecycle.
- The other is associated with Managed Identity that is assigned to the virtual machines.
Confidentiality and data ownership
The creation of all the entities detailed above is done from the user's computer from which they run
ml resources azure init on their own behalf using the authorization tokens that they obtained from running
az login. These tokens never leave the user’s machine. The only access MissingLink has to the user’s cloud is through the first of the service principals (as detailed above) that has very limited permissions and can only manage virtual machines for running jobs but cannot access any data.
By accessing the Azure Activity Log, you can see all the changes MissingLink makes in the cloud account. Furthermore, if you would like to revoke or modify MissingLink’s access to your account, you can simply modify permissions to the first Resource Group (MissingLinkAI-<organization_name>) in Azure Active Directory, or you can completely delete the first Resource Group. All your data is fully preserved since it is stored in the second Resource Group (MissingLinkAI-<organization_name>-storage), to which MissingLink has no access.