Broker AWS accounts, GCP projects, and Microsoft Azure Subscriptions for their internal customers;
Implement enterprise security controls through policies, active detection, and automated remediation; and
Integrate cloud assets deeply into their enterprise identity management, authentication, authorization, and network security infrastructure.
Important differentiators between RHEDcloud and typical cloud migrations are:
RHEDcloud brokers the cloud for departments, teams, and research groups within a larger institutional or corporate entity with enterprise provisioning, integration, networking, and security requirements.
As such RHEDcloud cannot impose tool and framework constraints on the end users in the departments, teams, or research groups. For example, RHEDcloud cannot require that all users of cloud accounts use Terraform, Ansible, Heat, etc., because those tooling and administration decisions are made at the department, research group, or team level.
Also, in order to be successful RHEDcloud must strive to preserve the cloud-native experience in the consoles, command-line tools, and APIs to the greatest extent possible while still implementing all required controls. If RHEDcloud did not preserve the cloud platform provider experience institutional or corporate users would bypass the service and procure their own cloud accounts, projects, and subscriptions.
A typical cloud migration performed by a central IT team or department can be more specific in mandating specific tools, procedures, and practices, but RHEDcloud was specifically designed for the general use case where those choices are left to the end users and their teams.
The RHEDcloud Project works collaboratively with its members to assess risk for each cloud service and to specify controls in the form of policies, detective controls with automated remediation, and rules of behavior. As a result of the collaboration of its diverse institutional membership, RHEDcloud reduces the investment for all members, accelerates time-to-implement, and improves the quality of risk assessments and controls.
When Emory University and Emory Healthcare initially implemented AWS at Emory, which led to the RHEDcloud Project, they decided that cloud services could not be made available to users without a risk assessment and potentially countermeasures to address unacceptable risk. To understand what the RHEDcloud Project is and what RHEDcloud applications and services do, one must first understand this foundational concept. If one does not agree with this fundamental premise, then RHEDcloud may not make sense for your organization.
At the time of this writing Emory had performed 101 risk assessments for 101 AWS services and brought those processes to the RHEDcloud Security Risk Assessment Committee, so RHEDcloud member institutions can collaborate on these assessments. Presently, Emory has blocked 24 AWS services, because they are considered too risky given Emory’s security posture. Emory has temporarily blocked 114 AWS services pending review. Emory has made 101 AWS services available in its standard compliance class accounts. Of these services 38 had risks that required additional countermeasures before they could be made available to users and 63 did not. Emory’s HIPAA AWS accounts presently only have 69 AWS services available with 36 requiring countermeasures and 33 that do not.
The RHEDcloud Security Risk Assessment committee recently performed 6 risk assessments for GCP services as the University of Wisconsin started implementing RHEDcloud for GCP. Going forward the Security Risk Assessment committee will assess cloud services that member organizations would like to make available to their users. Assessments for Microsoft Azure will likely follow soon once the proof-of-concept for Azure is completed.
Specify Controls to Mitigate Identified Risk
After risk identification, the risk assessment process continues by specifying controls to address the identified risks for each service. RHEDcloud Project participants prefer to implement declarative controls in the form of policies. The cloud platform providers each offer various types of policies such as Identity and Access Management (IAM) Policies, Service Control Policies (SCPs), Organization Policies, etc. that can be used to restrict access at various levels and also (to varying degrees) specify allowable properties when creating cloud resources.
When declarative controls are inadequate to mitigate risk, the RHEDcloud project specifies detective controls. The team enumerates logic to detect the risk and in almost all cases a default action to remediate the risk automatically. Automatic remediation is preferred when operating at the scale of hundreds of accounts, because alerts will often go unattended by the customer administrators and even central administrators for a long periods of time. Risk detection and remediation can be implemented with any framework that supports these operations, but the RHEDcloud Project has developed its own enterprise-scale service for performing this work called the RHEDcloud Security Risk Detection service. The service detects and remediates risks as quickly as every 30 seconds to 60 seconds or it can be backed off for specific detectors that need to be run less frequently based on each site’s tolerance for risk.
Some risks influence the over-arching design of some aspects of the service. For example, RHEDcloud Project participants have (at least as of the time of this writing) determined that some identified risks can only or best be mitigated with the use of NextGen firewalls. For this reason all RHEDcloud Project network designs include NextGen firewalls either by backhauling all traffic from cloud VPCs through an on-premises network that is protected by NextGen firewalls or by routing all traffic though a security VPC in cloud which contains NextGen firewalls. Integrating the cloud identity management (IDM) with enterprise IDM is another example of eliminating the risk of separate cloud-based identities and the potential for these to get out of sync with enterprise security policies implemented in the enterprise IDM system. Yet another example is how the potential for abuse of long-lived API keys led to the strong recommendation not to avoid them whenever possible and develop the Temporary Key Issuance (TKI) Service to make issuance and management of temporary keys easy for end users.
There always remain some risks that cannot be adequately mitigated with any declarative or detective controls. These risks are remediated by business policies informing users of expectations for their behavior. These business policies are very important to inform users about general and specific risks they should help mitigate, but declarative and detective controls are preferred whenever possible. The RHEDcloud project has collected ample data in the form of cloud platform logging and Security Risk Detection Service events and logs that indicate user frequently attempt risky actions that are prohibited by policy or detected and remediated by the security overwatch. For this reason business policies cannot be the sole or even primary approach to mitigating cloud security risks.
Work with Cloud Platform Providers and Cloud Service Providers to Implement Controls
The RHEDcloud Project works with the three major cloud platform providers, who are also project participants, to implement these controls with declarative controls on the cloud platform whenever possible. For example, whenever a security architect or engineer at a site can’t identify a declarative control on the platform or some other platform feature to mitigate and identified risk, they can send their analysis to the AWS, Google, and Microsoft representatives on the RHEDcloud Security Risk Assessment Committee to determine if they have missed something. The expert service provider participants often also have expertise to help answer these questions. In many cases the cloud platform provider and service provider representatives will identify platform controls that work. In other cases, they confirm that detective controls and remediation are indeed necessary.
In cases where declarative controls do not adequately mitigate the risk, detective controls are needed. Typically implementing sites are the organizations focused an new risk assessments, and an implementing site will:
Define a new risk,
Specify detection criteria and a default remediation for the risk, and
Implement the detection and remediation using its preferred security risk detection framework
Share their assessment, countermeasures, detection, and remediations with the other RHEDcloud participants on the Security Risk Assessment Committee for discussion and review
This collaboration between implementing sites, cloud platform providers, and cloud service providers was one of the major reasons the RHEDcloud project was founded.
Develop Applications and Services to Provision, Secure, and Administer Cloud Resources
Beyond collaborating on risk assessments and specific countermeasures, the RHEDcloud participants also work together on software development and patterns to implement countermeasures. For example, in addition to developing its own Security Risk Detection Service, the RHEDcloud Project participants worked with AWS to research and expand patterns for security risk detection to include AWS Config rules, AWS Lambda, and AWS Systems Manager automation documents. AWS and the RHEDcloud Automation and Integration Team developed a prototype of and AWS Config-based detection and remediation architecture, integrated it with the RHEDcloud notification subsystem and console, and documented the latencies observed in risk detection and remediation.
When you implement RHEDcloud at your site what do you do?
Provide Customers with a Unified View of Cloud Provisioning, Security, and Administration
Two of the primary goals of RHEDcloud is to preserve the experience users have using each cloud platform provider to the greatest extent possible and to provide a unified, easy-to-use interface in place of the native cloud features when it is not possible to preserve the cloud experience. These design principles were adopted based on feedback from the initial focus group participants at Emory University and Emory Healthcare and were reinforced in discussions with other RHEDcloud member organizations. Essentially users want to use cloud as the cloud platform providers intended and as they are documented. Users want easy, on-demand access to provisioning cloud accounts, projects, and subscriptions, and they want to be able to follow published instructions from the cloud platform providers on how to use the cloud and deploy stacks of cloud infrastructure and applications that their colleges at other institutions and companies have developed. Users want to choose their own cloud tools and utilities at the team level and not have those decisions made by central or corporate IT. Careless security and user experience design that interferes with these use cases impede adoption and could motivate users to procure cloud infrastructure on their own without guardrails, putting the organization at increased risk. From the beginning RHEDcloud has been a balancing act between usability and security wanting to give users as much power and flexibility of the cloud platform while helping them secure the organization’s workloads and data in the cloud.
To achieve these goals the RHEDcloud project studied the cloud platform providers' documentation closely and designed Virtual Private Cloud and Virtual Network architectures that are closely aligned with that cloud providers' documentation and patterns. RHEDcloud tests common console features, launch wizards, and API calls to ensure that security guardrails either do not interfere with them or when necessary document configuration settings that will ensure customer success with a verified path through the console or launch wizard. When the native features of the cloud platform cannot be secure and must be shut off, the RHEDcloud Project opted to provide a unified user experience called the RHEDcloud Console for all of these features. For example, the RHEDcloud design requirement for on-premises or cloud-based NextGen firewalls means that users are no longer able to expose cloud instances directly to the internet using cloud features like public or elastic IPs. To preserve the on-demand and ease of use of these cloud-native features, RHEDcloud automated institutional static NAT and firewall rule exception processes and present them to the user in the RHEDcloud Console web application. Another example of this type of accommodation is temporary key issuance. The issuance of long-lived API keys was deemed too risky for common use and users should request and use temporary keys. However, the process for authenticating institutional users with two-factor authentication and issuing them temporary keys proved very cumbersome for users. RHEDcloud developed the Temporary Key Issuance (TKI) Service and TKI command-line client to make requesting and using temporary keys as easy as possible for institutional users.
In addition to its own implementations of the RHEDcloud Console and RHEDcloud TKI Service, the RHEDcloud Project is interested in learning about and evaluating commercial projects that may implement these functions. The following products may implement aspects of these features:
[List any products here]
Implement Automated Provisioning and Deprovisioning of Cloud Accounts, Projects, and Subscriptions Integrated with Enterprise Security and Network Infrastructure
The RHEDcloud Project’s need to provide users with on-demand provisioning for AWS accounts, GCP projects, and Azure subscriptions led the project to develop central registry services for account, project, and subscription metadata as well as a modular provisioning services that implements the creation and setup of these accounts, projects, and subscriptions and integration with enterprise infrastructure such as e-mail distribution lists, identity management services, enterprise single sign-on, site-to site VPN connectivity to on-premises networks, transit gateway and route table associate for pre-provisioned network connectivity, and more.
It is important not to confuse this account, project, and subscription provisioning or brokering with the provisioning of cloud resources to run specific workloads. RHEDcloud specifies the cloud infrastructure like Virtual Private Clouds and Virtual Networks with CloudFormation templates for AWS, Deployment Manager templates for GCP, and Azure Resource Manager templates. These could all also be specified with Terraform configs or something similar as well. RHEDcloud chose to use the native cloud platform providers' tools for doing this. This provisioning is the overarching flow of what has to happen at a specific institution or corporation to determine if someone is allowed to provision an account, project, or subscription, specify the appropriate compliance class, integrate it with all of the enterprise infrastructure mentioned above, associate the account with the proper enterprise identities, register it in the metadata registry so the enterprise know who is responsible for the workloads and charges, and register the account with security overwatch and any other necessary services for tracking alerts, notification, log aggregation, etc. The step to run the CloudFormation, Deployment Manager, or Azure Resource Manager templates are definitely part of this provisioning, but they are only a few of the 30-50 provisioning steps that most organizations find they need to adequately provision and integrate these cloud accounts, projects, and subscriptions into their enterprise infrastructure.
The RHEDcloud Project has found that while the precise set of 30-50 steps for provisioning are site-specific, every site needs to do similar things and many of the steps they develop are re-usable if they have the same endpoints. For example, almost every site needs to:
Determine if the requestor is authorized to provision based on the status of their enterprise identity or role-based security implemented in their identity management system.
Validate the financial account number provided in the provisioning request to determine if it is valid and if possible determine if the requestor is authorized to charge to it.
Create roles for the new account, project, or subscription in the enterprise identity management system and add the requested users to the role.
Create an e-mail distribution list for the new account, project, or subscription in the enterprise messaging system.
Create the new account, project, or subscription.
Run the appropriate cloud provider specific templates with the proper input parameters to create the account, project, or subscription.
Stand up the site-to-site VPN connection from enterprise side if the site-to-site VPN approach is used or associate the new cloud network with the appropriate cloud pre-provisioned cloud network connectivity.
Register the account, project, or subscription with security overwatch and turn it on.
Activate any cloud-based security services.
Notify the owners and users of the account, project, or subscription that it is ready and point them to login information and user documentation.
The RHEDcloud project has found that sites with the same networking approach or identity management system can in many cases completely re-use each other’s relevant provisioning steps.
Given the complexity of these integrations, automated deprovisioning is critically important to clean up all of these enterprise systems when accounts are no longer needed. Without automated deprovisioing many teams would have to manually clean up many enterprise-side resources for these accounts in a long and error-prone process.
In addition to its own implementations of the following services:
RHEDcloud AWS Account Service
RHEDcloud GCP Service
RHEDcloud Azure Subscription Service
RHEDcloud Identity Management Service (with providers for multiple IDM system)
RHEDcloud Lightweight Directory Service (with different providers for differing directory structures)
RHEDcloud Financial Account Number Service (with different providers for differing financial systems)
the RHEDcloud Project is interested in learning about and evaluating commercial projects that may implement these functions. The following products may implement aspects of these features:
…the RHEDcloud Project is interested in learning about and evaluating commercial projects that may implement these functions. The following products may implement aspects of these features:
[List any products here]
Implement Security Controls Specified by the RHEDcloud Collaboration to Mitigate Risk
As described in the section above entitled Work with Cloud Platform Providers and Cloud Service Providers to Implement Controls, the RHEDcloud project prefers to implement declarative controls using IAM policies, service control policies, organization policies, etc. When the specified controls cannot be implement with declarative policy controls, detective controls with automatic remediation are preferred. If automatic remediation is not possible, then detective controls with alerting are the next preferable solution. Finally if for some reason the risk cannot be mitigated with declarative controls or detective countermeasures then an implementing site must decide if they can tolerate the risk of making the service available with only business policies and terms in its rules of behavior for end users.
RHEDcloud sites work to use the declarative policy controls available on each of the cloud platform providers, but then when necessary develop detective controls and remediation. The RHEDcloud project has developed the following to address the specified security controls:
AWS CloudFormation templates and service control policies
GCP Deployment Manager templates and organization policies
Security Risk Detection Service (presently for AWS but could be extended to GCP and Azure if needed)
…the RHEDcloud Project is interested in learning about and evaluating commercial projects that may implement these functions. The following products may implement aspects of these features:
[List any products here]
How can a site implement RHEDcloud practices and controls with commercial products?
Cloud Account, Project, and Subscription Registry
When an organization brokers the cloud for its users, it must store AWS account, GCP project, and Azure subscription metadata to perform basic administration, tracking, and billing operations. While the cloud platform providers have begun to provide multi-account/project/subscription management and provisioning services such as AWS Control Tower, there is additional information that must be tracked for these cloud accounts, projects, and subscriptions such as who their owner (organizationally responsible party), what the organizational financial account number is that will settle charges in the internal financial account system, why compliance class was initially requested, who initially requested the account, etc. The requirement to collect, store, and manage this data over time leads to the need for a database and application to manage this data.
The RHEDcloud Project developed the AWS Account Service, GCP Project Service, and Azure Subscription Service and attendant database as a backend storage and management service and the RHEDcloud console allows administrators and customers to search and manage this data as authorized by their roles in the organizational identity management system.
The RHEDcloud Project is interested in learning about and evaluating commercial projects that may implement these functions. The following products may implement aspects of these features:
[List any products here]
Cloud Account, Project, and Subscription Provisioning and Deprovisioning
When discussing provisioning and deprovisioning, it is important to clarify precisely what type of provisioning and deprovisioning one is talking about. When the RHEDcloud Project discusses provisioning it is typically talking about the provisioning required to create a new account, project, and subscription and integrate it with enterprise security, network, and billing infrastructure. So the discussion is about the work to provide automated brokering of cloud accounts, projects, and subscriptions
Provisioning the cloud-side of a new account, project, or subscriptions is definitely part of what is done in this brokering provisioning, but it is a relatively small part---just a few steps. RHEDcloud has expressed most of its cloud structures, policies, and controls using the cloud platform providers' native tools and formats. For example, when a new AWS account is provisioned CloudFormation templates implement account- and VPC-specific structures, when a new GCP project is provisioned Deployment Manager templates are run to create project and VPC-specific structures, and when new Azure subscriptions are created Azure Resource Manager templates are used to create subscription and Virtual Network specific structures.
The remainder of the 30-40 steps of processes like provisioning and deprovisioning are integrating the account, project, or subscription with enterprise infrastructure as described in the section above entitled Implement Automated Provisioning and Deprovisioning of Cloud Accounts, Projects, and Subscriptions Integrated with Enterprise Security and Network Infrastructure. The RHEDcloud Project implemented a modular provisioning web service to orchestrate this provisioning and deprovisioning and web service endpoints that sit in front of critical services like identity management, single sign-on, network provisioning, e-mail distribution list provisioning and validation, account/project/subscription metadata repository, financial accounting system, etc. While there are many frameworks that could help implement these provisioning integrations, to date there do not appear to be turnkey solutions given the complex, enterprise-integration nature of the problem. For example, AWS Control Tower provides a mechanism for provisioning new accounts within an organization and policy structure with pre-determine guardrails, but it does not specifically address or implement the enterprise integrations described above. AWS is starting to work with partners to implement solutions for AWS Control Tower that address some of these integrations for specific products. For example, at the time of this writing there are solutions for identity management integrations with AWS Control Tower for Okta, OneLogin, and Ping Identity. There are also solutions for cloud network provisioning with Aviatrix Cloud Network Platform and Cisco Cloud ACI. These solutions are only useful to sites that already have these IDM or network providers or for sites that are willing to implement them. Presumably this support from leading enterprise infrastructure vendors will grow in the future, but their features and appropriateness for any given site is really dependent on the infrastructure each site has and how a site uses the infrastructure, which brings this all back to a fundamental enterprise integration problem that will always take some work to solve at each site.
The RHEDcloud Project is very interested in any other solutions that directly and generally attempt to solve the provisioning and deprovisioning of account, project, and subscriptions and their integrations with enterprise infrastructure.
[List any products here]
As described above in the section entitled Specify Controls to Mitigate Identified Risk, the RHEDcloud Project has determined that some risks can only be addressed adequately with NextGen firewalls. The firewall or security group features of the cloud platform providers are not adequate to mitigate risk and provide security teams with all of the tools they need. For this reason, the RHEDcloud Project has incorporated NextGen firewalls into its design patterns. To date we have done that with Palo Alto firewalls, because that is the firewall that Emory University and Emory Healthcare use on their on-premises network and for cloud-based firewalls as well. The RHEDcloud Project should continue to work with Palo Alto and other firewall vendors to incorporate patterns for using other NextGen firewalls with RHEDcloud.
[List specific PA firewalls presently supported]
[List other vendors firewalls for which we should provide RHEDcloud implementation patterns.]
Security Risk Detection and Remediation
The RHEDcloud Project implemented the RHEDcloud Security Risk Detection Service to scan AWS accounts for risks that could not be mitigated with declarative policy controls and perform an automatic remediation. The Security Risk Detection Service uses the AWS Account Service for account metadata to know which accounts to scan and which compliance class applies to each account. The RHEDcloud Security Risk Detection Service also publishes events about what it detects and remediates that can be consumed by the AWS Account Service and presented in the unified RHEDcloud Console.
There are other commercial products that implement security risk detection and remediation. For example, Palo Alto is a member of RHEDcloud and its Prisma Public Cloud can detect risk and correct improper configurations.
[Describe how it does that and how it could be integrated with the RHEDcloud account/project/subscription services to know what to scan. Also describe how it could be integrated with the RHEDcloud notification subsystem so users know what was detected and remediated.]
[List other vendor product here and how they could work within the context of the RHEDcloud ecosystem and integrate with other components.]
Customer and Administrator User Experience
One of the key factors of RHEDcloud’s success at implementing sites is preserving a positive, cloud-native user experience. Whenever possible user use the cloud platform providers' consoles, tools, and APIs as they were intended to be used. When guardrails and restrictions don’t allow for that, RHEDcloud provides one web application, the RHEDcloud Console, and command-line tools like the TKI client that work like the cloud platform provider, but just add the enterprise specific features.
The focus groups we ran determined that a disjunct user experience with one application to request provisioning account, another to request firewall rules, another process to request static NAT, and yet another to provision network connectivity would be confusing for user and administrators of the service. In fact, the focus groups design team determined that such a user experience would likely drive users to away from the institutional service and lead them to use their own public cloud accounts without guardrails and oversight.
When adopting commercial product to implement components of RHEDcloud, one needs to consider how to preserve the desired user experience by integrating components to provide a unified user and administrator experience. For example, if one implements a commercial product like Palto Alto Prisma for Public Cloud to implement detective controls and automatic remediation, how will users and administrators receive alerts about what was detected and what was remediated? Where will users and administrators go to see a list of these detections and remediations later? How will these events be propagated to the enterprise security information event management (SIEM) system? The RHEDcloud services implement all of these integrations, and with a commercial product implementation one should either maintain all of these integrations or decide on new interfaces and tools that users and admins will use to access this data.