GCP Authorization Process
This page provides a deeper look at the process you set in motion when you authorize Resource Management to access your Google Cloud Platform (GCP) account using the
ml resources gcp init command.
To fully understand the description that follows, first familiarize yourself with some key GCP terminology:
IAM (Identity and Access Management): Enables you to manage access to GCP services and resources securely. Using IAM, you can create and manage GCP users and service accounts, and use roles and permissions to allow and deny their access to GCP resources.
Cloud Deployment Management: Allows you to specify all the resources needed in a declarative, template-driven, format.
KMS (Key Management Service): A managed service that makes it easy for you to create and control the encryption keys used to encrypt your data, and uses FIPS 140-2 validated hardware security modules to protect the security of your keys.
Roles: A collection of permissions. You cannot assign a permission to the user directly; instead you grant them a role. When you grant a role to a user, you grant them all the permissions that the role contains.
Service Account: A special Google account that belongs to your application or a virtual machine (VM), instead of to an individual end user. Your application uses the service account to call the Google API of a service, so that the users aren't directly involved.
GCS: Google Cloud’s unified object storage.
As part of the authorization process MissingLink installs the following:
- Authorize required APIs: The following Google APIs are required to complete the setup, so will need to be enabled on the project: IAM, CloudKMS, Deployment Management.
- Two custom roles: MissingLink uses two different roles:
- One role allows MissingLink to manage compute instances in your account, so that we can create and remove VMs that are required to fulfil your resource management queue.
- The other role is assigned to machines that MissingLink launches. It has read/write permissions to your GCs buckets and the ability to decrypt sensitive data that was encrypted using the MissingLink KMS key.
Two service accounts:
- Invite the MissingLink Resource Manager service account, and grant it the resource manager role.
- Create a dedicated service account with the instances role, for Resource Management instances to use.
KMS keyring and key: This is a key used for encrypting your SSH key. As part of the
ml resources gcp initcommand, the role used by the GCP instances is granted the
decryptpermission. This allows instances inside your cloud to decrypt your sensitive data using your encryption key securely without granting MissingLink access to any of those keys.
- GCS bucket: If you have not used data volumes before, MissingLink creates and configures a new GCS bucket for you. This bucket will be used for a data volume dedicated to artifact management. You can create additional data volumes in the bucket using the MissingLink dashboard or the MissingLink CLI commands.
- Deployment: The custom roles, instances service account, and GCS bucket are grouped together in a deployment called “ml-deployment”.
Confidentiality and data ownership
The creation of all the entities detailed above is done from the user's computer from which they run
ml resources gcp init using the pre-configured authorization tokens obtained from authorizing their GCP account. These tokens never leave the user’s machine. The only access MissingLink has to the user’s cloud is through the first of the service accounts (as detailed above) that has very limited permissions and can only manage virtual machines for running jobs but cannot access any data.
Deleting the MissingLink deployment (“ml-deployment”) will delete the resources it created, including the GCS bucket, along with all of its files. The KMS key cannot be removed, as deleting KMS keys is not supported by GCP.
In order to revoke access to the GCP project by MissingLink, you can do one of the following:
- If you want to keep the data in the GCS bucket created by MissingLink, manually remove the custom roles from the missinglink Resource Management and Instance service accounts at any time, here.
- Otherwise, you can delete “ml-deployment” here.
Running jobs on Google Compute Engine GPUs
Google Compute Engine provides graphics processing units (GPUs) that you can add to your virtual machine instances.
According to Google Cloud policy, when GPU instances receive maintenance events, they will be terminated after 60 minutes. To receive advance notice of host maintenance events, monitor the
/computeMetadata/v1/instance/maintenance-event metadata value of the instance. For more information, see Handling host maintenance events.