Security Concept#

Assumed SCL architecture differs from current architecture

The context of this document is the envisioned, final architecture. It differs from the current architecture, as some key technologies, such as SmartNIC based network encryption, network storage, and lifecycle management, are not available, yet.

About this document#

This document describes the security concept of the Separation Control Layer (SCL) management system. It does not require insight into the codebase, but overall familiarity with the SCL high-level architecture is expected. The scope of consideration encompasses the whole life cycle, including the software components, development, and execution environment.

The security concept is structured as follows: We firstly define a threat model. We then state the protection goals the SCL system should uphold. Afterward, we outline our security assumptions of the platform the SCL is deployed to. Finally, we list and elaborate our defensive measures that should uphold the mentioned protection goals against any attacker within the stated threat model.

This document is no replacement for a general or more detailed architecture documentation. A basic understanding of the components (what components are there, what is their purpose, how do they connect) is required. This document is also not the authoritative documentation of particular mechanics (e.g., deletion process via finalizers).

It shall further be noted that this document can only detail the security concept behind the SCL management system, though, a large number of further components contributes to the overall security of a cloud system or appliance. Due to the overall complexity as well as the potential variability of systems (there may be multi-node cloud, single-node cloud-appliance, single-node, and many further variants in the future), a divide-and-conquer approach is necessary for security analysis. Thus, outside of the SCL context, this document only details which components are expected to be independently secured as well as which security mechanisms are expected to be available from the platform where the SCL management system is deployed.

Threat model#

The threat model which serves as the basis for the SCL security discussion is centered around two aspects, the goals and positions attackers may have.

Attacker goals#

We consider the following goals (or objectives) of possible attackers:

G1: degrade service availability (-> compromise availability) of the SCL management system (G1a) or the tenant workloads or data (G1b)
G2: unauthorized read access to tenant data (-> compromise confidentiality)
G3: unauthorized changes of tenant data (-> compromise integrity)
G4: perform unaccounted / untracked changes (-> compromise accountability) to SCL state (G4a, e.g., for using services without payment) or tenant data (G4b)

Attack surface#

Attackers can leverage several and even multiple positions for their attack. We differentiate attacks on the supply chain and development process from attacks on (deployed) production systems.

Supply chain attacks#

Supply chain attack vectors include all aspects that contribute in some way to the production environment, be it software (e.g., some VM image with the SCL components that will be deployed within some system) or hardware.

This includes:

SCL code written by SCL developers,
Rust dependencies: referenced or in any other way integrated 3rd party code,
the tools and underlying systems used by developers to view, edit, review, manage code (including git and GitLab),
build tools that transform the SCL code to the final distribution artifact, (cargo, rustc, nix-build, ...) and the build environment, and
the machines and communication links that are relevant for the whole process.

Depiction of the SCL code supply chain

Production system attacks#

The (deployed) production environment consists of various systems and subsystems. We consider a "multi-node setup" and a "single-node" deployment variant of the SCL. For both deployment variants we differentiate the positions an attacker might use for an attack. This perspective is useful to enumerate possible actions and to reason about the boundaries that separate different positions.

Multi node setup#

The following figure enumerates the possible attacker positions in a multi-node deployment of the SCL management system. The positions considered in this document are highlighted.

Possible and considered attacker positions in a multi-node SCL deployment

Single node setup#

The following figure enumerates the attacker positions for a single-node deployment setup of the SCL. In this setup, no SmartNIC-based inter-node networking is present. Hence, inter-VM networking is realized in the Control VM using Linux kernel mechanisms. This setup is compatible with an evolutionary development approach where first an alternative virtualization stack is leveraged. The controllers of the SCL management system are executed in the context of the Control VM to provide access to the operating system interfaces used for provisioning resources toward the tenant VMs.

Possible attacker positions in a single-node SCL deployment

Protection goals#

The following protection goals have been defined that should be upheld by the SCL.

Confidentiality of:

C1: any key accessible to SCL components that protects other security goals
C2: tenant data (volumes, VM images, RAM, and their communication)

Integrity (including controlled access) of:

I1: SCL management system code and configuration
I2: SCL management system communication
I3: tenant data

Accountability of:

Ac1: SCL management system code and configuration changes
Ac2: SCL management system state changes (-> "SclObjects", includes tenant assets)

Availability of:

Av1: SCL services
Av2: tenant data

Security assumptions#

As mentioned in the introduction, the focus of this document is the SCL management system. Thus, this section details which components are expected to be independently secured as well as which security mechanisms are expected to be available from the platform where the SCL management system is deployed. The following section on defensive measures then assumes that the components mentioned below are appropriately secured and the listed mechanisms are in-place and sufficiently strong.

Components and entities assumed to be secured independently#

Hardware and software supply chain (except SCL code): Security of the supply chain is a dedicated, project-wide issue and has to be addressed in the context of each component (for its code and dependencies). This document thus only considers supply-chain security for the code comprising the SCL management system.
Node hardware: The hardware onto which the components are deployed is assumed not to be compromised. Dedicated mechanisms such as firmware integrity measurements may be implemented by the platform but are out of scope for this document.
Deployment mechanisms: All mechanisms used to deploy, update, and operate the software in the target environment are considered secured independently of this concept. This includes ensuring an appropriate chain of trust is present to validate the integrity and authenticity of the system images (including the VM images for cloud management services) executed on the nodes.
SmartNICs including network and storage encryption stack: In multi-node cloud systems with SmartNICs, tenant network and storage encryption is performed transparently on these dedicated building blocks. The security for network as well as storage encryption as well as their deployment and the associated key management mechanisms is subject of dedicated documentation. It is assumed that the SmartNICs offer an API toward the SCL management system to allocate network interfaces plus storage devices in tenant contexts specified by the SCL as a non-cryptographic identifier and all cryptographic operations are performed transparently to both the SCL management system and the tenant workloads.
Hypervisor: It is assumed that the microhypervisor provides strong workload separation including measures greatly reducing threats from side-channel attacks.
Operating system(s) leveraged in control plane and management VMs: The operating system for the management and control plane infrastructure including all packages deployed on the node are assumed to be checked for integrity and authenticity and not be compromised, using mechanisms out of this document's scope.
VPN stack: It is assumed that a secure means for VPN access is provided for tenants and cloud operators to connect to the internal networks that separates tenants from each other and from operators. The SCL management system expects an API from this stack allowing for creating and managing tenant VPN endpoints.
Identity management system: It is assumed that an identity management system (IDM) is used to authenticate tenants and grant them access to SCL resources. This document does not consider specific security mechanisms (public-key/password authentication, 2FA, biometrics, ...) that may be employed by the IDM to achieve this purpose. It is expected that the IDM issues a token with a cryptographically-strong signature that states the tenant contexts it grants access to and which type (read/write) of access is granted to which type of resource in these contexts. Apart from this, the IDM is considered as a "black box" with respect to its internals.
System operators (SCL, Identity, ...): Finally, it is expected that the people managing the deployed platform itself are trustworthy.

Security measures#

The following measures are assumed to be present within the platform underlying all deployment types (single-node and multi-node).

#	Protection goals	Description of implemented measures
P1	C1	Secure provisioning, strong isolation in various aspects
P2	C2, I3	Strong hypervisor
P3	I1	Transparent development processes, reproducible builds / deployments, RO images, secure boot, W^X, ECC RAM, no interpreter
P4	Ac1	Accountable development (git commits, GPG signatures), automated release

The following measures are assumed to be present in multi-node deployments.

#	Protection goals	Description of implemented measures
P5	C1	Sealed SmartNICs, physical isolation of nodes
P6	C2, I3	Network isolation via VLANs
P7	C2	Transparent storage encryption via SmartNIC
P8	I3	Storage integrity checks (e.g., scrubbing)
P9	I1, I3	Compute nodes physically isolated from management nodes
P10	Av1	Monitoring of service availability
P11	Av2	Redundancy/replication in storage

Defensive measures in the SCL#

The measures provided by the SCL components for achieving the above defined protection goals are summarized in the following table. The focus is the deployed multi-node setup.

The following defensive measures are present in all deployment scenarios of the SCL.

#	Protection goals	Description of implemented measures
D1	C1, I1	Infrastructure Management API inside isolated environment with remote attestation
D2	C2, I3	Linux network namespaces (single-node only)
D3	C2, I3	SCL component and tenant workload separation by virtualization
D4	I1	Git, mandatory code review, reproducible builds, controlled dependency management
D5.1	I2	Inter-component authentication (shared secret)
D5.2	I2	Inter-component authentication (mTLS)
D6	I3	User authentication via OpenID Connect
D7	Ac2	Logging of requests
D8	Av1, Av2	Scalable and almost stateless architecture

The individual measures are further explained in dedicated subsections below.

Infrastructure Management API inside isolated environment with remote attestation#

The Infrastructure Management API (IM-API) acts, together with the IDM, as a gatekeeper, which needs to be highly secured. The IM-API will run inside an environment providing both, isolation and remote attestation. Isolation is required to protect the host integrity for the case that an attacker compromises the IM-API and gains remote code execution. Although in this case most tenant-side protection goals would already be broken, as a last-resort measure the management infrastructure of the stack should be protected from compromise, to allow for a recovery of the deployment. Attestation is required to verify the integrity of the IM-API software at run-time, which can help to detect problems early on.

The IM-API proxy component has been successfuly evaluated to run inside an SGX enclave provided using the SCONE Rust toolchain¹.

Dependency management#

If we introduce new dependencies, we prefer ones that have a low Trusted Computing Base (TCB), have a good project health and sufficient quality. Introducing any new dependencies must be justified. Existing dependencies may only be updated in dedicated commits. Due to resource constraints, in depth reviews of all dependencies are currently not possible. However, once the software reaches maturity and passed (including dependencies) an initial audit, updates of dependencies may be subject to thorough reviews.

Linux network namespaces (single node setup only)#

Linux network namespace isolation is used to separate network traffic from different SCs from each other. Each SC gets its own network namespace. A Linux bridge within the namespace can connect the VMs of an SC via Linux tap devices.

Technical overview of the single-node networking setup

We use the VLAN tag of an SC to derive network namespace names. Like other Linux network devices, the name is limited to 16 bytes. Special care is used to ensure that network infrastructure is cleaned up during SC deletion, so a new SC cannot access resources of a deleted SC with the same VLAN tag. This is implemented with Finalizers, which work in a similar fashion to k8s, are initialized by the SCL API and strictly validated during the deletion process.

Virtualization#

All tenant workloads are strongly separated from each other in dedicated virtual machines.

Additionally, we consider isolating SCL processes other than the IM-API (see IM-API isolation details above) from each other using VMs:

Single node setup: SCL API and etcd run inside a VM. Controllers with any privilege requirements (e.g., creation of network devices) run on the host. Any other controllers should also run inside VMs.
Multi node setup: Everything runs inside VMs. It is still to decide whether each process gets a dedicated one.

Authenticated SCL component communication#

All communication between individual SCL components is encrypted and authenticated using appropriate security mechanisms such as TLS. See the dedicated documentation on SCL cryptographic material for a detailed write-up.

User authentication via OpenID Connect#

An open, widely-used standard (OpenID Connect) has been chosen to authenticate users of the SCL management system at the boundary of the IM-API. On the side of the IM-API, the OpenID Connect tokens are validated. Beside the tenant identity these tokens contain information corresponding to which level of access should be granted to which resources. Such information is determined by the IDM, which is assumed to be secured independently of the SCL management system (see previous chapter). However, by using OpenID Connect, an appropriate IDM system can be selected from a wide range of mature and broadly used implementations. Further details regarding the user authentication process can also be found in the dedicated documentation on SCL cryptographic material.

Logging of requests#

The architecture allows logging at multiple locations:

User requests can be logged at the Infrastructure Management API,
User and controller requests can be logged at the SCL API, and
all etcd state changes can be subscribed via watch operations.

It should be noted that the amount of associated information concerning the initiator of the request is decreasing at each stage it passes through, while the amount of information concerning the actual technical implications (affected nodes, etc.) is increasing. Thus, the goal is to combine logging at the earliest possible point (in the IM-API) with logging at later points.

Information logging alone cannot guarantee accountability. But aggregated logs can be used to identify and defend against abnormal behavior to some degree. There exists an extension proposal at the bottom of this document that aims to include signature chains in most network requests to improve accountability.

Scalable and almost stateless architecture#

The SCL management system follows a similar architecture like k8s. An etcd cluster (using the Raft consensus protocol under the hood, so consistent and partition tolerant regarding CAP) holds the system state (desired state vs. last reported status). Controllers retrieve the relevant state form the SCL API and take actions to drive desired state of their concern, while updating the SCL API accordingly.

Overview of the SCL management system architecture

Form a high level perspective, all components except etcd are stateless and can therefore be scaled out. Consequently, a) only the etcd data needs to replicated and, b) scalability is first and foremost limited by the performance of the etcd cluster.

Additional measures#

In addition to the measures necessary for fulfilling the aforementioned goals, the SCL development implements further measures that reduce the number of possible errors, boost correctness, and aim to make audits easier.

#	Description of implemented measures
A1	Security-oriented technology stack and codebase
A2	"Total" data structures
A3	Tiny Infrastructure Management API
A4	Stateless controller design
A5	Strong SCL API validation

Security-oriented technology stack and codebase#

The technologies selected for developing the SCL management system have been carefully selected with memory safety, safe concurrency, and a minimal overall codebase in mind. The SCL components are developed from scratch in the Rust programming language, to ensure that they are tailored to the job, with a minimal TCB, implemented in a language specifically designed for reducing the attack surface.

In addition to that, dependencies of the SCL such as the etcd database or Rust dependencies have been carefully evaluated for the overall extent of their codebases (LOC, transitive dependencies) and the community activity behind them (e.g., dependencies with slightly larger codebases but very active maintenance and broad usage are preferred over smaller libraries with barely any development activity or usage).

"Total" data structures#

We try to make the representation of illegal state in our code base, especially regarding SclObjects, impossible. This reduces the need for repeated (or risk of forgotten) validation, reduces opportunities for corner cases, and makes writing total functions and business logic tests easier.

An obligatory merge request check-list prompts developers to reflect about the possibility of illegal state before handing over the MR to a reviewer. Reviewers are asked to consider this aspect as well. This approach relies on the judgement of the team.

Tiny Infrastructure Management API#

The IMA acts as a proxy (with reduced functionality -- no management of Nodes is possible) for the SCL API. The gatekeeper role as described above makes it crucial that this component can be easily reviewed.

Note that all SCL components (ignoring 3rd party dependencies) currently (September 2022) amount to less than 11k single lines of Rust code -- including tests.

Stateless controller design#

The controllers are stateless. They implement simple state machines considering desired state vs. last state (as reported by the SCL API) and local state. Side effects are deterministic, allowing idempotent actions. This can also help to identify and cleanup side effects (network devices, VMs, ...) that should not be present (anymore).

Strong SCL API validation#

The SCL API tries to trust incoming requests as little as possible. Beyond parsing the request data in "total" data structures, validation is performed. Any implementation of the SclObject interface (e.g., SeparationContext or Volume) must implement methods that validate initial values and value transitions. These methods are organized in the same file as the (total) data structure, which reduces risks that updates in the structure are not reflected in the crucial validation logic. Some fields are exclusively initialized by the SCL API, such as Finalizers.

It is planned to embed the role of an API consumer (e.g., ControllerKind) in the TLS certificates. In addition to checking what is requested, some kind of internal access management is implemented. An SclObject is roughly undergoing 3 phases: creation, regular updates, and deletion. Each phase comes with its limitations concerning which role is allowed to change which field:

During creation, users may fully specify the desired state. (White or blacklisting roles with the ability to create could be implemented in the future.)
After creation and before entering the deletion process, users may only change a specific subset of the desired state. Controllers are allowed to change the current state of the object.
Deletion is triggered with an HTTP delete request, which causes the SCL API to mark the object for deletion and to initialize finalizers. Afterwards, users are not allowed to perform changes anymore and predetermined controllers may only remove Finalizers in a predetermined order, one by one. For each SclObject and Finalizer requested to be removed, the SCL API can perform additional validation, leveraging etcd if necessary, to check if the transition seems plausible.

Extension proposals#

The mechanisms outlined here are not part of the current plan of record but may be considered for integration in a future revision .

Network segregation of SCL components#

SCL components can only interact with other components as specified by the architecture (e.g., only the SCL API may interact with etcd). This way, interactions can be limited by dedicated networks to a minimum degree, making lateral movement (compromising more components) harder.

Audit log / signature chains (especially multi node setup)#

We currently have to trust the controllers that they actually did the work they report to the SCL API. To improve this situation, other APIs (e.g., SmartNIC API etc.) could sign what work has been done and include this in responses to the controllers, which in turn present this to the SCL API server. This approach could be extended to most of the involved systems, resulting in a more or less verifiable chain, which could also be used for audit purposes. The signatures could be embedded in the SclObjects, which makes it possible to see all involved actors of the currently stored state.

Discussion#

This section discusses the measures provided by the SCL as described in the previous section with respect to the attacker model defined in the second section of this document. In the course, for each attacker position, it is theoretically evaluated how the specified protection goals are upheld.

Outside attacker (1)#

Attacker position 1 (in both deployment scenarios) is an attacker outside of the SCL components and networks. This position is further differentiated:

1a: Attacker has no valid IM-API access token for any tenant and no tenant network access (but access to the IM-API and the IDM)
1b: Attacker has access to a tenant network (e.g., via VPN) but no valid IM-API access token (similar to 5, but without having access to a specific virtual machine of the tenant)
1c: Attacker has access to a tenant network and a valid IM-API access token for the given tenant

In case 1a, the attacker has three possibilities to gain further access:

Authenticate with the IDM as a valid tenant (either the attacker's target or another tenant with the goal to gain 1c-type access). IDM security is out of scope, but this attack vector is briefly discussed here, nevertheless. To prevent this attack vector from being successful, the IDM should only allow strong authentication mechanisms. It is advised to leverage two-factor authentication and either strong passwords or (even better) public-key-based authentication for tenants. Always having two independent mechanisms in place for authentication significantly raises the bar for breaking into a tenant account using stolen or brute-forced credentials.
Compromise the IDM. This requires a security vulnerability in the IDM software, which in itself should be specifically hardened and tested against exploitation. However, as there is a public interface, the possibility of exploitation has to be considered.
Compromise the IM-API. As with the IDM, this requires a security vulnerability to be present in the IM-API. Due to the deliberately small and controlled codebase (A3, A1), this risk is significantly reduced. Additionally, all actions executed via the IM-API are logged (D7) and, thus, attempts of attack can be easily traced.

As case 1b is a subset of 5, this will be discussed in the following subsection focusing on position 5.

Attacker 1c has access to data of the tenant for which the token was issued. The platform intentionally does not provide security inside tenant contexts due to the complexity and possibly-wide variety of tenant-controlled applications. Tenants may, however, use features such as SGX to secure their applications on their own. From the side of the SCL, however, in the case of attacker 1c it is ensured that no further access to data other than that of the tenant impersonated by the attacker can be gained. Ideally, attacker 1c has the same possibilities as attacker 1a with respect to the other tenant. Beside the mechanisms protecting tenant authentication and the IM-API itself, several platform mechanisms (P1, P2, P5, P6, and P7) apply to ensure that an attacker cannot easily move to another tenant context.

The steps expected to be needed by an attacker in position 1 to compromise the defined security goals are outlined in the following table.

Protection goal	How to compromise as attacker `1a` / `1b/c` for other tenant
C1	exploit IM-API, then SCL API
C2	exploit IM-API or IDM; or brute-force tenant credentials
I1	exploit IM-API, then hypervisor, then break secure boot chain; alt.: perform a supply-chain attack
I2	exploit IM-API, then SCL API
I3	exploit IM-API or IDM; or brute-force tenant credentials
Ac1	exploit IM-API, then hypervisor, then break secure boot chain; alt.: perform a supply-chain attack
Ac2	exploit IM-API, then SCL API
Av1	exploit IM-API, then launch DoS against SCL API or exploit SCL API or hypervisor
Av2	exploit IM-API or IDM; or brute-force tenant credentials

It becomes evident that in most cases more than one step of exploitation is necessary. Unfortunately, to break confidentiality, integrity, availability, and controlled access to tenant data, only one out of the following has to be achieved:

Exploitation of the IDM
Exploitation of the IM-API
Acquisition of target tenant credentials (e.g., via brute force)

Furthermore, the exploitation of the IM-API is one step in the path to the compromise of many of the defined security goals. This again highlights that the IDM and especially IM-API deserve special attention for securing the SCL system as already discussed above (bullets 2., 3. in this section). In addition to exploitation, in some cases supply chain attacks are another possible attack vector. These are discussed in a dedicated subsection below.

Tenant attacker (5)#

As with attacker 1c, from the SCL point of view, the tenant context to which the attacker has access in this case can be considered compromised as a whole. However, access to other tenant contexts is not possible due to 1) the platform-inherent separation by virtualization and encryption and 2) the need for authentication as another tenant via the IDM (see description of attacker 1c).

In addition to 1c, attacker position 5 has full access also to running infrastructure in the affected tenant context, whereas attacker 1c would need to spawn or take over such infrastructure first, which might be detectable. However, the platform mechanisms to separate tenants apply in both cases in the same manner and when the attacker tries to authenticate or execute actions as another tenant, this can be tracked using IM-API logs (D7).

For taking over other tenant contexts in a cloud system, either the IDM, the IM-API, or the hypervisor has to be exploited. The former two were already discussed above. Concerning the latter, the platform leverages a specifically hardened microhypervisor (P2) to reduce the chances of exploitation.

For the attack routes via the compute node in multi-node deployments (not considering the possibility to access the IM-API from there as this is already discussed in the previous subsection), the steps expected to be needed by an attacker to compromise the stated security goals are summarized in the following table.

Protection goal	How to compromise as attacker `5` for other tenant
C1	exploit hypervisor, then SmartNIC
C2	exploit hypervisor
I1	exploit hypervisor, then SCL API, then man. node hypervisor, then break secure boot chain
I2	exploit hypervisor, then SCL API
I3	exploit hypervisor
Ac1	exploit hypervisor, then SCL API, then man. node hypervisor, then break secure boot chain
Ac2	exploit hypervisor, then SCL API
Av1	exploit hypervisor, then launch DoS against SCL API or exploit SCL API
Av2	exploit hypervisor

For security goals such as C2, I3, and Av2, additional protection can be offered by exclusively assigning compute nodes to individual tenants in the SCL scheduler. This can be easily achieved using configuration if the increased overhead incurred by it is deemed worth the security benefits. In this case, not only the hypervisor has to be compromised to break into another tenant context but, in addition to that, either the SCL API, SmartNIC or another compute node's Node API.

Overall, however, it becomes clear that the hypervisor is the core element for the platform security.

Supply chain attacks#

The overall project as well as the SCL management system development put a great emphasis on supply chain security.

Upstream dependencies of the created components are analyzed for the extent of their codebase, the activity of maintenance, the number of transitive dependencies, and so on. All dependency updates have to be performed explicitly as the dependency versions are "pinned" and recorded along with the SCL management system code. (D4)

Secondly, the development processes employed for the SCL management system matches and tightly integrates with the one used for other components (P3, P4). Multiple testing and review steps are necessary for each code change to enter the path to production system: After an SCL developer has proposed a change via a "Merge Request" and the automated testing pipeline has passed, a review is performed by other developers. Given review clearance, the change is provided in the main Git branch for the SCL.

Attackers that already compromised infrastructure (2,3,4)#

For attackers that managed to compromise parts of the infrastructure already, in specific cases relevant to the SCL, the SCL management system foresees measures to reduce the chances of lateral movement. A discussion of these follows below.

Compromised ID Management (2)#

If an attacker managed to successfully compromise the IDM system, they can issue arbitrary valid access token for the IM-API. This means that access to all tenant contexts managed by the SCL is possible - the attacker can, e.g., launch additional virtual machines attached to the tenant networks and storage devices and upload/download arbitrary data.

Due to the isolation of the IDM in a dedicated virtual machine (in case it is not fully deployed outside of the infrastructure which is another option), such an attacker will not be able to directly compromise further management systems. Thus, a re-deployment of the IDM and all tenant workloads can in principle remove the attacker from the system.

Furthermore, all accesses to tenant contexts via valid IDM tokens are recorded by the IM-API. Thus, the actions performed by the attacker can be audited to analyze the potential damage in case such an attack was executed.

Compromised IM-API (3)#

The IM-API is the second component which exposes an interface to users before authentication. Thus, it may be another target for exploitation. Despite the various features hardening it such that the chances for exploitation are reduced drastically, successful attacks may be possible.

In this case the attacker has access to all tenant resources as is the case with an IDM compromise. Additionally, the IM-API logging cannot be relied upon anymore. However, the SCL API, i.e., the only component accessible by the IM-API for executing SCL actions, features own logging mechanisms that allow for tracking the actions performed by the attacker.

The IM-API is also isolated in a dedicated virtual machine and uses specific credentials for authenticating with the SCL API that do not allow for manipulating system settings such as node assignment. Therefore, such an attack can potentially also be resolved by re-deploying the IM-API and all resources in tenant contexts manipulated by the attacker.

Compromised SCL controller context (4)#

In a multi-node setup, SCL controllers may be deployed in dedicated contexts (e.g., in a dedicated VM), such that they are isolated from the rest of the SCL system. The TLS client certificates issued to the controllers restrict them to the kind of resources they are expected to take care of, e.g., block storage controllers may only post changes to block storage volumes toward the SCL API. This reduces the possible influence of the specific controller on the database state and, thus, may be especially considered in case there is a chance of exploitation of a controller, e.g., if it has to parse binary packet data or similar. (Though, currently, this is not expected and in single-node setups without Node API and SmartNIC all controllers have to run in the privileged Control VM context.)

Compromised Compute Node (13)#

In a multi-node setup, compute nodes are physically separated from management nodes, thus, even if the hypervisor has been broken on one node, the attacker may not take over infrastructure other than the tenant workloads running on that single node.

Furthermore, in the multi-node setup, cryptographic keys are protected inside the sealed SmartNICs (P5), such that no persistent key material can be stolen and the attacker only has access to the attacked tenant networks and storage devices for the duration they have access to the given compute node.

Due to the SCL controllers only unidirectionally calling the compute node using its APIs, the chance of exploitation on the reverse channel (taking over an SCL controller) are reduced as no custom requests can be sent to the controllers - the exploited compute node can only modify and forge the responses. Additionally, as mutual TLS authentication is used for the Node API, an exploited compute node cannot easily assume the identity of another node, even if it manages to adjust network settings such as its own IP address appropriately.

https://sconedocs.github.io/Rust/ ↩