Telecommunication customers expect their voice and data services to always be available. System availability is dependent on the availability of individual components in the system. To help ensure 24/7 service, it must be possible to perform system maintenance and system expansion on running telecommunication networks and servers without disrupting the services they implement. Systems must be able to withstand component failures, making redundancy of components such as power supplies, fans, network adapters, storage, and storage paths essential. Software failures can also significantly impact the availability of a compute node, so robust application software, middleware, and operating system software is required for single node availability.
This section is a collection of requirements that address the robustness of a single computing node. Availability is further enhanced by clustering individual computing nodes so that a node cannot represent a single point of failure. The single node requirements in the Availability section can be categorized as:
- On-line operations
- Redundancy
- Monitoring
- Robustness
On-line operations enable the system to continue to provide a service while the software or the hardware is replaced or upgraded on the system. For instance, when a file system needs repair, repair procedures may require rebooting the system. However, CGL requires that it be possible to forcibly un-mount a file system, allowing repair and remounting without rebooting. The ability to replace or upgrade hardware such as disks, processors, memory, or even entire processor/memory blades without bringing down that node or the network contributes significantly to continuous service availability.
A highly available system must be composed of redundant components and must be able to take advantage of redundant hardware such that the system continues to function when a component fails. Ideally, designs can eliminate all single points of failure from a system. Using redundant communication paths, such as redundant network ports and host adapters, together with network fail-over software capabilities, such as Ethernet bonding, improve network availability. Redundant storage paths, such as redundant fiber channel ports and host adapters used with multipath I/O, improve storage availability. Redundancy of memory components may not be possible, but error detection and correction can be used to mask memory cell failures; CGL requires software Error Correction Code (ECC) support. Single bit errors are reported when they are detected in the hardware and logged by the kernel. The kernel invokes a panic routine whenever uncorrectable multi-bit errors are detected.
Rapid detection of hardware or software failures requires health monitoring. Health monitoring is also needed to check for hardware or software that is beginning to fail, such as ECC memory checking, predictive analysis for disks, and processes that do not respond in a predicted way. Examples of CGL monitoring requirements include Non-Intrusive Monitoring of Processes and Memory Over-commit Actions. The Non-Intrusive Monitoring of Processes requirement detects abnormal behavior by a process, such as process death, and initiates an action, such as the creation of a new process. The Memory Over- commit Actions requirement monitors system memory usage and controls process activity when memory usage exceeds specified thresholds.
A highly available system must be composed of redundant components and must be able to take advantage of redundant hardware such that the system continues to function when a component fails. Ideally, designs can eliminate all single points of failure from a system. Using redundant communication paths, such as redundant network ports and host adapters, together with network fail-over software capabilities, such as Ethernet bonding, improve network availability. Redundant storage paths, such as redundant fiber channel ports and host adapters used with multipath I/O, improve storage availability. Redundancy of memory components may not be possible, but error detection and correction can be used to mask memory cell failures; CGL requires software Error Correction Code (ECC) support. Single bit errors are reported when they are detected in the hardware and logged by the kernel. The kernel invokes a panic routine whenever uncorrectable multi-bit errors are detected.
ID | Name | Category | Priority |
---|---|---|---|
AVL.2.0 | Single-bit ECC handling | Availability | P2 |
CGL specifies that carrier grade Linux shall provide a mechanism for reporting when hardware error checking and correcting (ECC) detects and/or recovers from a single-bit ECC error. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.2.1 | Multi-bit ECC handling | Availability | P2 |
CGL specifies that carrier grade Linux shall provide a panic trigger mechanism when hardware error checking and correcting (ECC) detects multi-bit ECC errors. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.4.1 | VM Strict Over-Commit | Availability | P1 |
CGL specifies that carrier grade Linux shall provide the ability to control kernel virtual memory allocation adjustments based on the specific needs of the system. Control of virtual memory shall include but not be limited to the following:
|
ID | Name | Category | Priority |
---|---|---|---|
AVL.5.3 | Process-Level Non-Intrusive Application Monitor | Availability | P1 |
CGL specifies that carrier grade Linux shall provide control and management capabilities for processes that cannot be altered to incorporate a monitoring API. Such capabilities are known as non-intrusive monitoring. These capabilities must be implemented programmatically using commands or scripts. Another issue for many such processes is that the start script itself may spawn an application process that is not under the control of the management process. This sub-requirement assumes that this does not happen, and the child process remains under the control of the management entity. Capabilities required:
|
ID | Name | Category | Priority |
---|---|---|---|
AVL.6.0 | Disk Predictive Analysis | Availability | P1 |
CGL specifies that carrier grade Linux shall provide capabilities to assist in monitoring storage systems. The aim of this support is to assist in predicting situations likely to lead to failure of disks. This allows preventive action to be taken to avoid the failure and resulting disruption of service. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.7.1.1 | Multi-Path Access to Storage: Multi-Path Detection | Availability | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism to enable multiple access paths from a node to storage devices. The software shall determine if multiple paths exist to the same port of the I/O device. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.7.1.2 | Multi-Path Access to Storage: I/O Balancing | Availability | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism to enable multiple access paths from a node to storage devices. The software shall determine if multiple paths exist to the same port of the I/O device, and, with configurable controls, balance I/O requests across multiple host bus adapters. If multiple paths exist to the same device over two separate device ports on the same host bus adapter, those I/Os will not be balanced. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.7.1.3 | Multi-Path Access to Storage: Automatic Path Failover | Availability | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism to enable multiple access paths from a node to storage devices. Handling a path failure must be automatic. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.7.1.4 | Multi-Path Access to Storage: Failed Path Reactivation | Availability | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism to enable multiple access paths from a node to storage devices. A mechanism must be provided for the reactivation of failed paths, allowing them to be placed back in service. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.7.1.5 | Multi-Path Access to Storage: Automatic Path Configuration | Availability | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism to enable multiple access paths from a node to storage devices. It must be possible to automatically determine and configure multiple paths. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.7.1.6 | Multi-Path Access to Storage: Automatic Volume Configuration | Availability | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism to enable multiple access paths from a node to storage devices. Automatic configuration shall allow automatic multi-path configuration of complete disks and partitions located on those disks. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.7.1.7 | Multi-Path Access to Storage: Root File System Hosting | Availability | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism to enable multiple access paths from a node to storage devices. A multipath device feature that allows multipath detection and mapping early in the boot process must be provided so that the root file system can exist on a multipath device. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.7.1.8 | Multi-Path Access to Storage: Link Failure Reporting | Availability | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism to enable multiple access paths from a node to storage devices. The mechanism should implement error logging functions that clearly identify the failing device path. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.8.1 | Fast Linux Restart Bypassing System Firmware | Availability | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism to speed up operating system initialization by bypassing the system firmware when one instance of Linux reboots to another instance of Linux. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.9.0 | Boot Image Fallback Mechanism | Availability | P2 |
CGL specifies that carrier grade Linux shall provide a mechanism that enables a system to fallback to a previous "known good" boot image in the event of a catastrophic boot failure (i.e. failure to boot, panic on boot, failure to initialize HW/SW). System images are captured from the "known good" system and the system reboots to the latest good image. This mechanism would allow an automatic fallback mechanism to protect against problems resulting from system changes, such as program updates, installations, kernel changes, and configuration changes." |
ID | Name | Category | Priority |
---|---|---|---|
AVL.10.0 | Application Live Patching | Availability | P2 |
CGL specifies that carrier grade Linux shall provide a mechanism and framework by which a custom application can be built so that it can be upgraded by replacing symbols in its live process. Dynamic replacement of symbols allows a process to access upgraded functions or values without requiring a process restart and in many circumstances can lead to improved process availability and uptime. The mechanism should be applied only to user applications. Patch to underlying distribution software component may lose distribution support. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.12.0 | NFS Client Protection Across Server Failures | Availability | P2 |
CGL specifies that carrier grade Linux shall provide mechanisms that allow an NFS server to have failover capability to provide service continuity upon a node failure. The NFS service has to be resumed on another node without any impact on NFS clients other than the retransmission of pending requests (open files must remain open). Clients authenticated on the old server must remain authenticated on the new server. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.13.1 | Parallel User Initialization During Startup | Availability | P2 |
CGL specifies that the user initialization procedure executed by the program /sbin/init shall provide a mechanism to allow multiple init scripts to run in parallel. CGL further specifies that a service is only started once its dependent services have started. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.15.0 | Fast Application Restart Mechanism | Availability | P2 |
CGL specifies that carrier grade Linux shall provide a mechanism that enables a quick application restart. Typical applications in a carrier environment use multiple processes with inter-process communications. As applications become more complex, application initialization times become longer. To speed up application initialization, the mechanism shall provide the functionality to simultaneously save memory images of multiple processes (including the kernel resources used by each process) and to restore the images. When the application completes initialization, including making connections between processes and setting up kernel resources for inter-process communication, the application invokes a save function that makes a copy of the memory images of the process and kernel resources. If the application hangs, the mechanism restores the memory images and kernel resources and restarts the application. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.17.0 | Multiple FIB Support | Availability | P2 |
CGL specifies that Linux shall support multiple Forwarding Information Base (FIB) quick look-up tables with forwarding addresses to allow better server virtualization of overlapping addresses. An FIB is a table that contains a copy of the forwarding information in the IP routing table. All hooks/changes required to support multiple FIBs shall be added. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.21.0 | Ethernet link bonding using IPV4 | Availability | P1 |
CGL specifies that carrier grade Linux shall support bonding of multiple Ethernet NICs within a single node using IPV4. The bonding supports the following functions:
|
ID | Name | Category | Priority |
---|---|---|---|
AVL.21.1 | Ethernet link bonding using IPV6 | Availability | P1 |
CGL specifies that carrier grade Linux shall support bonding of multiple Ethernet NICs within a single node using IPV6. The bonding supports the following functions:
|
ID | Name | Category | Priority |
---|---|---|---|
AVL.22.0 | Software RAID 1 support | Availability | P1 |
CGL specifies that carrier grade Linux shall provide RAID 1(Mirroring) support so that the OS maintains duplicate sets of all data on separate disk drives. RAID 1 support shall allow booting off of selected mirror disk drive even if the other drive is failed. RAID 1 implementation shall provide a user-controllable parameter to throttle the syncing operation. Support can be configured out if desired. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.23.0 | Watchdog Timer Pre-Timeout Interrupt | Availability | P1 |
CGL specifies that carrier grade Linux shall provide support for a watchdog timer pre-timeout interrupt. Where the hardware supports such a capability an interrupt handler routine will be called before the real timeout occurs. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.24.0 | Watchdog Timer Interface Requirements | Availability | P1 |
CGL specifies that carrier grade Linux shall provide the ability to use an interface to reset the hardware watchdog timer, where the hardware supports such a capability. This timeout value shall be a configurable item. A configurable action can be performed when a timeout occurs. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.25.0 | Application Heartbeat Monitor | Availability | P1 |
CGL specifies that carrier grade Linux shall provide an application heartbeat service that allows applications to register to be monitored via specified APIs. The mechanism shall use periodic synchronized events (heartbeats) between an application and the monitor. If a registered application fails to provide a heartbeat, the monitor shall report the events. The application heartbeat service shall be available to any process or sub-process (thread) entity on the system. A process or thread may register for multiple heartbeats. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.26.0 | Resilient File System Support | Availability | P1 |
CGL specifies that carrier grade Linux shall provide support for the installation of a file system that is resilient against system failures in terms of recovering rapidly upon reboot without requiring a full, traditional fsck. This is normally achieved using logging or journaling techniques. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.27.0 | Kernel Live Patching | Availability | P2 |
CGL specifies that carrier grade Linux shall provide a mechanism for symbols, functions, or variables within a running kernel to be replaced with new symbols, functions, or variables. CGL further specifies this operation be completed without a system shutdown or restart |
ID | Name | Category | Priority |
---|---|---|---|
AVL.28.1 | File System De-fragmentation | Availability | P1 |
CGL specifies that carrier grade Linux shall provide support for a file system that allows for de-fragmentation of on-disk data. It is expected that the file system will not be mounted or otherwise in use at the time. Reference: SCOPE Alliance Carrier Grade Gap CGOS-1.6 |
ID | Name | Category | Priority |
---|---|---|---|
AVL.28.2 | Multi-Architecture File System Support | Availability | P1 |
Linux Foundation CGL specifies that carrier grade Linux shall provide support for a file system where the metadata and data are stored independent of host CPU word length and endianness. Reference: SCOPE Alliance Carrier Grade Gap CGOS-1.9 Proof-of-Concept: ext2, ext3, etc. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.28.3 | File System Metadata Integrity Checksum | Availability | P1 |
Linux Foundation CGL specifies that carrier grade Linux shall provide support for a file system that guarantees file system metadata and data consistency and fast recovery in the event of interrupted updates with checksums on all metadata. Reference: SCOPE Alliance Carrier Grade Gap CGOS-1.2 Proof-of-Concept: ext4, BTRFS |
ID | Name | Category | Priority |
---|---|---|---|
AVL.28.4 | File System Block Checksumming | Availability | P2 |
Linux Foundation CGL specifies that carrier grade Linux shall provide support for a file system that provides end-to-end checksums of all blocks currently in use on the file system. Reference: SCOPE Alliance Carrier Grade Gap CGOS-1.1 Proof-of-Concept: BTRFS, ZFS on FUSE |
ID | Name | Category | Priority |
---|---|---|---|
AVL.28.5 | File System Multiple Access Protection | Availability | P2 |
Linux Foundation CGL specifies that carrier grade Linux shall provide support for shared, simultaneous read and write access to file system data that is assured protection against accidental corruption of the data and/or metadata. |
ID | Name | Category | Priority |
---|---|---|---|
AVL.28.6 | File System Snapshots | Availability | P2 |
Linux Foundation CGL specifies that carrier grade Linux shall provide support for a file system that allows the creation of atomic snapshots of volumes while mounted. These snapshots must be valid filesystem images that can be mounted as if they were the original volume at the time of the snapshot. Reference: SCOPE Alliance Carrier Grade Gap CGOS-1.7 |
ID | Name | Category | Priority |
---|---|---|---|
AVL.28.7 | File System Clones | Availability | P2 |
Linux Foundation CGL specifies that carrier grade Linux shall provide support for a file system that allows atomic backups while the volume is mounted and in use. These backups should be writable where subsequent updates to the file system will not be reflected in the original and therefore each can be considered a fork of a single, live file system image. Reference: SCOPE Alliance Carrier Grade Gap CGOS-1.7 |
- POSIX:
Open Group References:
POSIX conformance data on Linux:
POSIX Technical Corrigendum 1 text:
POSIX Specification with current Technical Corrigendum:
Linux Standard Base (LSB) http://www.linuxbase.org/
Free Standards Group http://www.freestandards.org/
Service Availability Forum (SAF) http://www.saforum.org/
Internet Engineering Task Force (IETF) http://www.ietf.org/rfc.html
The CGL working group conducted a clusters usage model study from which they learned that no single clustering model meets the needs of all carrier applications. So CGL takes a more general approach to defining clustering requirements. CGL defines the functional components of a carrier grade High Availability Cluster (HAC). The requirements for other cluster models, such as a scalability cluster, a server consolidation cluster, and a High Performance Computing (HPC) cluster, have been treated as secondary to requirements for the HAC cluster model. See Illustration 3.
A CGL high availability cluster is characterized by a set of two or more computing nodes between which an application or workload can migrate depending on a policy-based failover mechanism. Essentially, the cluster nodes can “cover” for each other. Carrier grade services must maintain an uptime of 5 nines (99.999%) or better and, quite often, a failing service must restart in sub-second time frames to maintain continuous operation.
A loosely coupled cluster model with no shared storage is a basic clustering technique that is suitable for many types of telecommunications applications servers. This model eliminates the possibility of a failed shared component affecting the availability of the service or the availability of system.
Whether shared storage is implied or not, a cluster provides the following advantages:
- Prevents a node from being a single point of failure. With hardware faults, the failing node can be replaced or repaired without affecting the service uptime (no unscheduled downtime)
- Allows a software or kernel upgrade to be completed on each node separately without affecting the availability of the service
- Isolates failing nodes from the cluster and enables service to continue using the remaining healthy nodes
- Allows hardware upgrades on each node separately without affecting service availability
- Enables increased capacity to meet load/traffic increases
CGL clustering functional requirements include support for redundancy (no single point of failure), not only at the cluster node level, but at the hardware level as well, including fans, power supplies, memory ECC, communication paths, and storage paths. To support continuous operation of carrier grade services, requirements are defined for node failure detection and various forms of service failover, such as application, node address, and connections failovers.
The CGL clustering requirements are framed around industry standard programming interfaces. The Service Availability Forum (SA Forum) has developed an Application Interface Specification (AIS) that defines service interfaces for clustered applications. The specification is OS-independent and is being used in both proprietary and open source cluster developments. The SA Forum AIS specifies a membership service API, a checkpoint service API, an event service API, a message service API, and a lock service API. AIS also specify an availability management framework (AMF) that provides resource management and application failover policy in the cluster.
As stated previously, we learned from our usage model study that no one clustering model fits and meets the needs of all carrier applications. We are not going to create such model. Instead, a more generalized CGL clustering model is presented in this document that serves to identify the functional need of each component of a High Availability Cluster environment. This general model is illustrated in the diagram below, which shows the need for redundancy, stateful failover, and shared storage in a cluster application. This diagram is not a topology of any specific cluster deployment. It is up to application developers and system administrators to determine the usage and configuration of their cluster systems.
The functions shown in Illustration 3 are described below:
- 1+1 Hot Standby Cluster is composed of one active primary node and one hot standby node and possibly a set of shared storage. It includes redundant paths between cluster nodes and to the storage.
- Shared Storage provides a set of mirrored disks (for redundant data) and can be achieved with software or hardware.
- Redundant Paths include the multiple communication paths between cluster nodes (CCPs) and the multiple paths from a node to access the storage (CSPs).
- N+M Cluster is the extension of a 1+1 hot standby cluster. In this model, the cluster can be configured with additional hot or cold standby nodes as needed by the application. Functional needs of the data check pointing capability and the access to the shared storage remain the same.
- Data Check Pointing is part of the cluster services. It constantly synchronizes the in-memory states and data of an application allowing the cluster to provide stateful failover of the application from one node to another node.
- Access Shared Storage – A cluster application stores and retrieves application data to and from the redundant shared storage. These data are persistent on the mirrored disks.
- Service Entry Point Director routes and directs which cluster node shall provide the service to the service requester.
- Cluster Management Console is a node in the system that manages all cluster nodes, but is not part of the cluster membership. It provides a view of the cluster to an operator. It monitors the hardware status of the cluster nodes and monitors cluster events such as cluster node failure. The operator can use it to perform some cluster node failure recovery functions, such as the re-boot of a cluster node allowing the node to re- join the cluster membership.
- Users are the service requesters. A user can be a human being, an external device, or another computer system .
End users of carrier grade equipment have prioritized the need for HAC cluster configurations as:
- 2-node (active/hot standby) cluster that support
- Checkpointing of in-memory application states for rapid application failover
- Shared storage access from a single node at a time.
- Redundant access to shared storage from a single node
- Redundant inter-node communication paths
- 2-node (active/active) cluster that support
- Concurrent access to shared storage.
- N node (active/active) cluster that support
- Storage “scalability”
- Improved service performance in accessing shared storage.
- N+M node (active/hot or cold standby) cluster that support
- Extension of active/standby pair.
The requirements described in this section are intended to be independent of specific projects, products, or implementations.
The cluster requirements are framed around industry standard application programming interfaces. For these clustering requirements, the SA Forum Application Interface Specification will be used. The SA Forum AIS services that apply to this specification are:
- SA Cluster Membership Service API (Chapter 6)
- SA Checkpoint Service API (Chapter 7)
- SA Event Service API (Chapter 8)
- SA Message Service API (Chapter 9)
- SA Lock Service API (Chapter 10)
The Availability Management Framework API (Chapter 5) provides the following services to SA-aware applications:
- Registration and un-registration
- Health monitoring
- Availability management
- Protection group management
- Error reporting
Other requirements are described in this document are not related to cluster application APIs, but define requirements that are needed in a cluster. These include items such as shared storage support, synchronized time, and cluster management functions such as monitoring, control, and diagnostics. Items such as a clustered file system and clustered volume manager are also included in this document as they are essential building blocks for HA clustering, although they have no established APIs.
CLUSTERING REQUIREMENT SUB-CATEGORIES | |
Requirement Sub-Category | Sub-Category Description |
CMS | Membership Service |
CES | Event Service |
CCS | Checkpoint Service |
CCM | Communication and Messaging |
CLS | Lock Service |
CAF | Availability Framework |
CMON | Monitoring |
CCON | Control |
DIAG | Diagnostics |
CSM | Shared Storage Management |
CFH | Fault Handling |
ID | Name | Category | Priority |
---|---|---|---|
CFH.1.0 | Cluster Node Failure Detection | Cluster | P2 |
CGL specifies that carrier grade Linux shall provide a fast, communicationbased cluster node failure mechanism that is reflected in a cluster membership service. At a minimum, the cluster node failure mechanism maintains a list of the nodes that are currently active in the cluster. Changes in cluster membership must result in a membership event that can be monitored by cluster services, applications, and middleware that register to be notified of membership events. Fast node failure detection must not depend on a failing node reporting that the node is failing. However, self-diagnosis may be leveraged to speed up failure detection in the cluster. This requirement does not address the issue of how to prevent failing nodes from accessing shared resources (see CFH.3.0 Application Fail-Over Enabling). Fast node failure detection shall include the following capabilities:
Cluster node failure detection must use only a small percentage of the total cluster communication bandwidth for membership health monitoring. The guideline is that the bandwidth used by the health monitoring mechanism shall be linear with respect to the number of bytes per second per node. |
ID | Name | Category | Priority |
---|---|---|---|
CFH.2.0 | Prevent Failed Node From Corrupting Shared Resources | Cluster | P1 |
CGL specifies that carrier grade Linux shall provide a way to fence a failed or errant node from shared resources, such as SAN storage, to prevent the failed node from causing damage to shared resources. Since the surviving nodes in the cluster will want to failover resources, applications, and/or middleware to other surviving nodes in the cluster, the cluster must make sure it is safe to do the failover. Killing the failed node is the easiest and safest way to protect shared resources from a failing node. If a failing node can detect that it is failing, the failing node could kill itself (suicide) or disable its ability to access shared resources to augment the node isolation process. However, the cluster cannot depend on the failing node to alter the cluster when it is failing, so the cluster must be proactive in protecting shared resources. External Specification Dependencies: This requirement is dependent on hardware to provide a mechanism to reset or isolate a failed or failing node. |
ID | Name | Category | Priority |
---|---|---|---|
CFH.3.0 | Application Fail-Over Enabling | Cluster | P2 |
CGL specifies that carrier grade Linux shall provide mechanisms for failing over applications in a cluster from one node to another. Applications and nodes are monitored and a failover mechanism is invoked when a failure is detected. Once a failure is detected, the application failover mechanism must determine which policies apply to this failover scenario and then begin the process to start a standby application or initiate the re-spawn of an application within 1 second. Note: The full application failover time is dependent upon application and node failure detection, the time to apply the failover policies, and the time it takes to start or restart the application. The aggregate failover time for an application must allow the cluster to maintain carrier grade application availability. |
ID | Name | Category | Priority |
---|---|---|---|
CSM.1.0 | Storage Network Replication | Cluster | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism for storage network replication. The storage network replication shall provide the following:
|
ID | Name | Category | Priority |
---|---|---|---|
CSM.2.0 | Cluster-aware Volume Management for Shared Storage | Cluster | P2 |
CGL specifies that carrier grade Linux shall provide management of logical volumes on shared storage from different cluster nodes. Volumes in such an environment are usually on physical disks accessible to multiple nodes. Volume management shall include the following:
|
ID | Name | Category | Priority |
---|---|---|---|
CSM.4.0 | Redundant Cluster Storage Path | Cluster | P1 |
CGL specifies that Linux shall provide each cluster node with the ability to have redundant access paths to shared storage. CGL Availability Requirement: AVL.7.1.x Multi-Path Access To Storage |
ID | Name | Category | Priority |
---|---|---|---|
CSM.6.0 | Cluster File System | Cluster | P1 |
CGL specifies that carrier grade Linux shall provide a cluster-wide file system. A clustered file system must allow simultaneous access to shared files by multiple computers. Node failure must be transparent to file system users on all surviving nodes. A clustered file system must provide the same user API and semantics as a file system associated with private, single-node storage. |
ID | Name | Category | Priority |
---|---|---|---|
CSM.7.0 | Shared Storage Consistent Access | Cluster | P1 |
CGL specifies that carrier grade Linux shall provide a consistent method to access shared storage from different nodes to ensure partition information isn't changed on one node while a partition is in use on another node that would prevent the change. |
ID | Name | Category | Priority |
---|---|---|---|
CCM.2.2 | Cluster Communication Service: Fault Handling | Cluster | P1 |
CGL specifies that carrier grade Linux shall provide a reliable communication service that detects a connection failure, aborts the connection, and reports the connection failure. An established connection must react to and report a problem to the application within 100 ms upon any kind of service failure, such as a process or node crash. The connection failure detection requirement must offer controls that allow it to be tailored to specific conditions in different clusters. An example is to allow the specification of the duration of timeouts or the number of lost packets before declaring a connection failed. |
ID | Name | Category | Priority |
---|---|---|---|
CAF.2.1 | Ethernet MAC Address Takeover | Cluster | P1 |
CGL specifies a mechanism to program and announce MAC addresses on Ethernet interfaces so that when a SW Failure event occurs, redundant nodes may begin receiving traffic for failed nodes. |
ID | Name | Category | Priority |
---|---|---|---|
CAF.2.2 | IP Takeover | Cluster | P1 |
CGL specifies a mechanism to program and announce IP addresses (using gratuitous ARP) so that when a SW Failure event occurs, redundant nodes may begin receiving traffic for failed nodes. |
ID | Name | Category | Priority |
---|---|---|---|
CDIAG.2.1 | Cluster-Wide Identified Application Core Dump | Cluster | P1 |
CGL specifies that carrier grade Linux shall provide a cluster-aware application core dump that uniquely identifies which node produced the core dump. For instance, if a diskless node dumps core files to network storage, the core dump will be uniquely identified as originating from that node. |
ID | Name | Category | Priority |
---|---|---|---|
CDIAG.2.2 | Cluster-Wide Kernel Crash Dump | Cluster | P1 |
CGL specifies that carrier grade Linux shall provide a cluster-aware kernel crash dump that uniquely identifies which node produced the crash dump. For instance, if a diskless node dumps crash data to network storage, the data will be uniquely identified as originating from that node. |
ID | Name | Category | Priority |
---|---|---|---|
CDIAG.2.3 | Cluster Wide Log Collection | Cluster | P1 |
CGL specifies that carrier grade Linux shall provide a cluster-wide logging mechanism. A cluster-wide log shall contain node identification, message type, and cluster time identification. This cluster-wide log may be implemented as a central log or as the collection of specific node logs. |
ID | Name | Category | Priority |
---|---|---|---|
CDIAG.2.4 | Synchronized/Atomic Time Across Cluster | Cluster | P1 |
CGL specifies that carrier grade Linux shall provide cluster wide time synchronization within 500mS, and must synchronize within 10 seconds once the time synchronization service is initiated. In a cluster, each node must have be synchronized to the same wall-clock time to provide consistency in access times to shared resources (i.e. clustered file system modification and access times) as well as time stamps in cluster-wide logs. |
- Birman, Kenneth P. 1997. Building Secure and Reliable Network Applications. Manning Publishing Company and Prentice Hall.
- Birman, Ken, et al (circa 2000). “The Horus and Ensemble Projects: Accomplishments and Limitations.”
- Chandra, Tushar, Vassos Hadzilacos, Sam Toueg. June 1996. “The Weakest Failure Detector for Solving Consensus”.
- Davis, Roy G. 1993. VAX Cluster Principles. Digital Press.
- Dolev, Danny, and Dalia Malki. 1996. “The Transis Approach to High Availability Cluster Communication.” Comm. of the ACM 39 (April): 64-70.
- Pfister, Greg. 1998. “In Search of Clusters”, Second Edition, Prentice Hall PTR.
- Simmons, Chuck, and Patty Greenwald. 1994. “Oracle Lock Manager Requirements,” Oracle Corporation.
- Thomas, Kristin. 2001. “Programming Locking Applications,” IBM Corporation.
- van Renesse, Robbert, Kenneth P. Birman, and Silvano Maffeis. 1996. “HORUS: A flexible Group Communication System.” Comm. of the ACM 39 (April): 76-83.
- Service Availability Forum http://www.saforum.org/
- Open Cluster Framework http://www.opencf.org
The following references discuss virtual synchrony:
- Birman, Kenneth.1987. "Exploiting virtual synchrony in distributed systems"
- Extended Virtual Synchrony: http://www.cs.jhu.edu/~yairamir/dcs-94.ps
The following cluster-related whitepapers can be found at http://developer.osdl.org/cherry/cluster-whitepapers/.
- OSDL Cluster Architecture (OSDL-cluster.html)
- Carrier Grade Linux Clustering Model (cluster_alcatel.doc)
- Ericsson Clustering Model Proposal (cluster_ericsson.pdf)
- The Telecom System View (cluster_intel.pdf)
- Foundational Components of Service Availability (cluster_mv.pdf)
- NTT Clustering Model (cluster_ntt.pdf)
[ ] indicates a term that is defined elsewhere in the definitions of terms.
A set of [processes], running on a computer [system], that provides a service to the [users] of this [system]. An application is usually referred to as the non operating system portion of the software in a [system].
Availability is the amount of time that a [system] [service] is provided in relation to the amount of time the [system] [service] is not provided. [System] [service] downtime could be the result of [system] [failures] (unscheduled downtime) or for things like upgrades, system relocation, or backups (scheduled downtime). A [system] [service] is provided if the [service] is functioning at an acceptable level of [performance] or [scalability]. Availability is commonly expressed as a percentage (see [five-nines] or [six-nines]).
Percent Availability = (time service is provided / total time) X 100
Two or more computer [nodes] in a [system] used as a single computing entity to provide a [service] or run an [application] for the purpose of [high availability], [scalability], and distribution of tasks.
The exchange of information between [processes]. These [processes] can be running on the same [node] (intra-node) or on different [nodes] (inter-nodes). The information includes [events] and [messages].
Numerical or other information represented in a form suitable for processing by a [process].
The mechanism by which [application] state is transmitted from an active [service unit] to one or more standby [service units].
A [communication] with or without data which notifies a set of zero or more [processes] that something took place. This communication can be either within a [node] and/or between [nodes].
A publish/subscribe event service that manages [events]. [Events] may be grouped into named channels and handle attributes such as priority, ordering, retention times, and persistence. A [subscriber] informs the event mechanism that it wishes to receive a certain event. A [publisher] posts an event to the event mechanism to be delivered to all [subscribers] of that event. This way the [publisher] and [subscriber] are decoupled, they do not have to directly know about each other, just about the event. Events may be asynchronous or synchronous. A [publisher] posting a synchronous event will block or be informed when all [subscribers] have received the event. The [publisher] of an asynchronous event will not block waiting for delivery or be informed when the event is delivered to any [process].
The process to migrate back to a [node] after it has been [repaired]. It can be controlled or automatic.
The ability to automatically switch a [service] or capability to a [redundant] [node], [system], or [network] upon the [failure] or abnormal termination of the currently-active [node], [system], or [network].
The inability of a [system] or [system] component to perform a required function within specified limits. A failure may be produced when a [fault] is encountered. Examples of failures include invalid data being provided, slow response time, and the inability for a [service] to take a request. Causes of failure can be hardware, firmware, software, network, or anything else that interrupts the [service].
A failure is ultimately caused by an unmasked [fault] in the [system]. Failure detection is the process, usually from external view, to detect a [failure] of the [service] the [system] is providing.
An error in a computer [system] or the [service] it provides. A fault may be masked and not impact the [application] or the [service] it provides. A fault can also be classified as transient or permanent. A fault is often associated with a [system] defect in the software or hardware. A fault can be caused by external stimulus to the [system].
Equivalent to [fault isolation].
Ability to detect an abnormal condition (device failure, temperature error, etc.) in the [system].
The localization of a [fault] to its repair unit.
Ability to protect the rest of the [system] from the effects of a [fault].
Detecting or forecasting [faults].
Ability for a [system] to mask a set of [failures] from impacting the [service] it provides.
Five-nines is measured as 99.999% [service] [availability]. It is equivalent to 5 minutes a year of total planned and unplanned downtime of the [service] provided by the [system].
The sending of a single [message] to a set of destination [processes].
Equivalent to [switch-over].
The lock [service] is a distributed lock [service], suitable for use in a [cluster], where [processes] in different [nodes] might compete with each other for access to shared resources. A lock [service] may provide the following capabilities: exclusive and shared access, synchronous and asynchronous calls, lock timeout, trylock, deadlock detection, orphan locks, and notification of waiters.
A [communication] with [data] in a form suitable for transmission. A message may contain attributes of the [communication] such as source, destination, time stamps, and authorization information, etc. It may also contain [application] specific information.
Mean Time To [Failure]. The interval in time which the [system] can provide [service] without [failure].
Mean Time To [Repair]. The interval in time it takes to resume [service] after a [failure] has been experienced.
A connection of [nodes] which facilitates [communication] among them. Usually, the connected nodes in a network use a well defined [network protocol] to communicate with each other.
Rules for determining the format and transmission of data. Examples of network protocols include TCP/IP, UDP, etc.
The state of a [system] having a very high ratio of [service] uptime compared to [service] downtime. Highly available systems are typically rated in terms of number of nines such as [five-nines] or [six-nines].
A single computer unit, in a [network], that runs with one instance of a real or virtual operating system.
The mechanism by which computer [nodes] join and leave a cluster as well as the mechanism to detect [node] [failure]. A [node] is deemed to be a member if it has joined the [cluster] successfully. A [node] is deemed to be a non-member if it has not joined the cluster or if it has left the cluster. A detected [failure] may result in the [node] leaving the cluster or being isolated from the cluster, depending on node membership policy.
The efficiency of a [system] while performing tasks. Performance characteristics include Performance total throughput of an operation and its impact to a [system]. The combination of these characteristics determines the total number of activities that can be accomplished over a given amount of time.
A single instance of a software program running on a single [node].
A collection of processes registered within [cluster] software.
The mechanism by which [process] registration, un-registration, and [failure detection] is managed. A [process] is deemed to be a member if it has registered with the [process group] successfully. A [process] is deemed to be a non-member if it has not registered with the process group. A [detected] failure may cause the [process] to become a non- member, depending on the process group membership policy. A [process] can gracefully un-register to depart from the process group. The process group membership also handles authorization to join the membership. Process group membership depends upon [node membership] if process group membership is available on multiple [nodes]. Process group membership is used to execute application [failover] policy.
A [process] that sends [events].
[Reliability], [availability], and [serviceability]
To return a failing component, [node] or [system] to a working state. A failing component can be a hardware or a software component of a [node] or [network]. Recovery can also be initiated to work around a [fault] that has been detected; ultimately restoring the [service].
Duplication of hardware, software, or network components in a [system] to avoid [Single Points of Failure].
The continuation of [service] in the absence of [failure]. Reliability is commonly measured as the [MTTF] of a [system].
The process to remove a [fault].
A component, [node], or [system] which is configured identically to a base component, [node] or [system] for the purpose of [fault tolerance], [performance], or ease of [service].
How well a solution to some problem will work when the size of the problem increases? In the CGL context, the scalability is defined as the ability of a [system] to provide the same level of [high availability] performance when the work load of the [service] increases. The solution to increase the [system] or [service] scalability can be software or hardware oriented.
A set of functions provided by a computer [system]. Examples of communications services include media gateway, signal, or soft switch types of applications. Some general examples of services include web based or database transaction types of applications.
A collection of one or more software [processes] that provide [service] to a [user].
The capability for a [system] to be maintained and updated. Often, serviceability is measured by how easy a maintenance task can be performed or how quickly a [system] [fault] can be tracked down and repaired so that the [system] can resume the [service].
Any component or [communication] path within a computer [system] that would result in an interruption of the [service] if it failed.
Six-nines is measured as 99.9999% [service] [availability]. It is equivalent to 30 seconds a year of total planned and unplanned downtime of the [service] provided by the [system].
A [process] that receives [events]. A [subscriber] may subscribe to one or many [events]. A subscriber may join and leave an event subscription at any time without involving the publishers.
Ability to switch to a [redundant] [node], [system], or [network] upon a normal termination of the currently-active [node], [system], or [network]. Switch-over can happen with or without human intervention.
A computer system that consists of one computer [node] or many nodes connected via a computer network mechanism.
An external entity that acquires [service] from a computer [system]. It can be a human being, an external device, or another computer [system].
This section specifies a set of useful and necessary features for servicing and maintaining a system. Telecommunication systems such as management servers, signaling servers, and gateways must have the capability to be managed and monitored remotely, have robust software package management for installations and upgrades, and have mechanisms for capturing and analyzing failure information. A single point of control is required for applications, software, hardware, and data for functions such as data movement, security, backup, and recovery.
CGL systems will support remote management standards such as Simple Network Management Protocol (SNMP), Common Information Model (CIM), and Web-Based Enterprise Management (WBEM). Local management standards include IPMI and the Service Availability Forum's Hardware Platform Interface (HPI).
Debuggers, application and kernel dumpers, watchdog triggers, and error analysis tools are needed to debug and isolate failures in a system. Diagnostic monitoring of temperature controls, fans, power supplies, storage media, the network, CPUs, and memory are needed for quick failure detection and failure diagnosis.
Serviceability Sub-Categories | |
Requirement Sub-Category | Sub-Category Description |
SMM | Management and Monitoring |
SPM | Software Package Management |
SFA | Failure Analysis |
ID | Name | Category | Priority |
---|---|---|---|
SMM.3.1 | Serial Console Operation | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide support for a connection to a system console via a serial port on the system where a serial port exists. All output that would appear on a local console must appear on the remote console. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.3.2 | Network Console Operation | Serviceability | P1 |
CGL specifies that Linux shall provide support for a management console connection via a network port in addition to providing the standard support for a management console connection via a serial port. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.4.0 | Persistent Device Naming | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide consistent device naming functionality. The user-space system name of the device shall be maintained when the device is removed and reinstalled even if the device is plugged into a different bus, slot, or adapter. A device name shall be assigned, based on hardware identification information using policies set by the administrator. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.5.0 | Kernel Profiling | Serviceability | P1 |
CGL specifies that Linux shall support profiling of a running kernel and applications to identify bottlenecks and other kernel and application statistics. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.5.1 | Application Profiler (was AVL.19.0) | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism to profile critical resources of the kernel and applications. The critical resources that are profiled by this mechanism shall include (but are not limited to):
|
ID | Name | Category | Priority |
---|---|---|---|
SMM.7.1 | Temperature Monitoring | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide a capability that supports the monitoring of system temperature settings and conditions. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.7.2 | Fan Monitoring | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide a capability that supports the monitoring of system fan settings and conditions. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.7.3 | Power Monitoring | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide a capability that supports the monitoring of system power settings and conditions. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.7.4 | Media Monitoring | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide a capability that supports the monitoring of media settings and conditions for system media, such as hard disks or hardware specific disk sub-systems. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.7.5 | Network Monitoring | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide a capability that supports the monitoring of system network settings and conditions. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.7.6 | CPU Monitoring | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide a capability that supports the monitoring of CPU settings and conditions, such as current utilization totals, per process totals and trends, and current speed settings. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.7.7 | Memory Monitoring | Serviceability | P2 |
CGL specifies that carrier grade Linux shall provide a capability that supports the monitoring of memory conditions, such as current utilization totals, and per process totals and trends. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.8.1 | Kernel Message Structuring | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide support that allows the structuring of kernel messages using an event log format to provide more information to identify the problem and its severity, and to allow client applications registered for the fault event to take policy-based corrective action. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.8.2 | Platform Signal Handler | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide an infrastructure to allow "hardware errors" to be logged using the event logging mechanism. A default handler shall be provided. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.8.3 | Remote Access to Event Log | Serviceability | P2 |
CGL specifies that carrier grade Linux shall provide support for a remote access capability that allows a centralized system to access the Linux OS event log information of a remote system. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.9.0 | Disk and Volume Management | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide support for the installation of a subsystem that supports hard disks to be managed without incurring downtime:
|
ID | Name | Category | Priority |
---|---|---|---|
SMM.12.0 | Remote Boot Support (was PMT.2.0) | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide support for remote booting across common LAN and WAN communication media to support diskless systems. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.13.0 | Diskless Systems (was PMS.4.0) | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide for Linux on diskless systems. |
ID | Name | Category | Priority |
---|---|---|---|
SMM.15 | Thread Naming | Serviceability | P2 |
Linux Foundation CGL specifies that carrier grade Linux shall provide the ability to uniquely identify threads with a symbolic name in addition to the existing process and thread ID mechanism. These symbolic names can be assigned via an API exposed to applications and can be assigned either at process / thread creation time or at any time after the process / thread has been started. Reference: SCOPE Alliance Carrier Grade Gap CGOS_V3-3.0 Proof-of-Concept: Linux kernel |
ID | Name | Category | Priority |
---|---|---|---|
SMM.16 | System Black Box | Serviceability | P2 |
Linux Foundation CGL specifies that carrier grade Linux shall provide a system-wide monitoring and logging facility, a system black box, with at least the following attributes:
Reference: SCOPE Alliance Carrier Grade Gap CGOS_V3-4.0 |
ID | Name | Category | Priority |
---|---|---|---|
SMM.17 | Discovery of Platform CPU Architecture | Serviceability | P1 |
Linux Foundation CGL specifies that carrier grade Linux shall provide a mechanism for applications to discover at runtime the number of caches and the sizes of each. This mechanism must present such architectural information in a format that is uniform across platforms. Reference: SCOPE Alliance Carrier Grade Gap CGOS-6.1 Proof-of-Concept: sysfs |
ID | Name | Category | Priority |
---|---|---|---|
SMM.18 | API for Non-Uniform Memory Architectures | Serviceability | P1 |
Linux Foundation CGL specifies that carrier grade Linux shall implement the notion of a latency domain, defined as a set of CPUs with directly attached, local memory. All systems shall have at least one latency domain, representing a uniform memory architecture. Additional latency domains can exist for non-uniform memory architectures, in which case carrier grade Linux will provide an API that allows a process to:
Reference: SCOPE Alliance Carrier Grade Gap CGOS-6.2 Proof-of-Concept: libnuma |
ID | Name | Category | Priority |
---|---|---|---|
SPM.1.0 | Remote Package Update and Installation | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide a remote software package update feature. The package shall include functions that allow kernel modules and application software to be installed or upgraded remotely, while minimizing downtime of the system. The use of the term "remotely" does not imply a central package management platform, nor does it preclude such a system. This requirement only necessitates that a single device may be upgraded without requiring the administrator to be physically at the device. Note: Due to the wide range of platforms and applications in use, CGL does not specify a specific downtime limit metric. Downtime targets will vary based on the system application. |
ID | Name | Category | Priority |
---|---|---|---|
SPM.2.0 | No System Reboot for Upgrade of Kernel Modules | Serviceability | P2 |
CGL specifies that carrier grade Linux shall provide remote software installation and upgrade mechanisms that requiring no system reboots:
|
ID | Name | Category | Priority |
---|---|---|---|
SPM.2.1 | No System Reboot for Application Package Update | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide remote software installation and upgrade mechanisms that require no system reboots:
|
ID | Name | Category | Priority |
---|---|---|---|
SPM.3.0 | Version and Dependency Checking via Package Management | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide remote software installation and upgrade capabilities that include provisions for version compatibility and dependency checking at the package level. |
ID | Name | Category | Priority |
---|---|---|---|
SPM.4.0 | Upgrade Log | Serviceability | P2 |
CGL specifies that carrier grade Linux shall provide remote software installation and upgrade mechanisms that perform transaction logging of dates, times, changes, and the identity of the user performing a change. |
ID | Name | Category | Priority |
---|---|---|---|
SFA.1.0 | Kernel Panic Handler Enhancements | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide enriched capabilities in response to a system panic. Currently the default system panic behavior is to print a short message to the console and halt the system. CGL systems shall provide a set of configurable functions, including:
CGL shall support enhanced kernel panic reporting, at a minimum supporting proper resolution of in-kernel symbols. This will make kernel panic reports useful to administrators that do not have access to the kernel for which the report was generated. |
ID | Name | Category | Priority |
---|---|---|---|
SFA.2.1 | Live Kernel Remote Debugger | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide support for remote debugging of a live kernel. This shall include support over serial and/or local Ethernet. |
ID | Name | Category | Priority |
---|---|---|---|
SFA.2.2 | Dynamic Probe Insertion | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide support for the ability to dynamically insert software instrumentation into a running system in the kernel or applications.
|
ID | Name | Category | Priority |
---|---|---|---|
SFA.2.3 | User Space Debug Support for Threads | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide support to fully enable debugging of multi-t hreaded programs. This support should allow any actions available for debugging a single-threaded (non-threaded) process be extended to be available for every thread in a multi-threaded process. CGL shall provide specific additional debugging capabilities that are unique to multi-threaded applications:
|
ID | Name | Category | Priority |
---|---|---|---|
SFA.2.4 | Multithreaded Core Dump Support for Threaded Applications | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide support for correctly storing core dumps of multi-threaded user-space applications. |
ID | Name | Category | Priority |
---|---|---|---|
SFA.3.0 | Kernel Dump: Analysis | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide support for tools to enable enhanced analysis of kernel dumps. These enhancements must include, but not be limited to, the following capabilities:
|
ID | Name | Category | Priority |
---|---|---|---|
SFA.4.0 | Kernel Dump: Limit Scope | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide support for configuring the amount of system information that is retained. The minimum type of configuration would be only kernel memory or all system memory. A way must be provided for a system administrator to specify which type of system dump should be performed. |
ID | Name | Category | Priority |
---|---|---|---|
SFA.8.0 | Kernel Flat/Graph Execution Profiling | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide support for profiling of the running kernel using a prof or gprof style of recording trace information during system execution. |
ID | Name | Category | Priority |
---|---|---|---|
SFA.10.0 | Kernel Dump: Configurable Destinations | Serviceability | P1 |
CGL specifies that carrier grade Linux shall provide support for producing and storing kernel dumps as follows:
|
This section is a collection of requirements for the Linux operating system that describe the performance and scalability requirements of typical communications systems. Key requirements include a system's ability to meet service deadlines; to scale in order to take advantage of symmetric multiprocessing (SMP), simultaneous multithreading (SMT) technology, and large memory systems; and to provide efficient, low latency communication.
Without predictable execution latencies, it is possible that service deadlines would not be met, resulting in dropped calls, unreasonable call-response characteristics, or even dropping the entire service from active operation. Soft real-time scheduling provides predictable CPU scheduling latencies within defined loads. Latency and scheduling parameters are required to be configurable at runtime, including the scheduling quantum being configurable to 1ms or less. However, the services use many resources other than the CPU; therefore, protection against priority inversion, priority inheritance to system resources, and appropriate system resource scheduling are also required to maintain predictable scheduling.
To take advantage of scalable hardware architectures, CGL specifies support for SMP and SMT, which includes process affinity, task exclusive binding to logical CPUs and interrupt affinity capabilities. Large memory systems of more than 4GB of physical memory are needed to handle the memory demands of scalable communication applications.
Protocol stacks are required to be prioritized so certain protocols may take scheduling priority over less important network protocols. To improve latency and reduce CPU usage in network communications, zero-copy network protocols may be needed. IPv6 forwarding tables are required to be compact and use a small amount of memory. Support in the Linux Kernel for a 9000 byte Maximum Transfer Unit (MTU) is required.
The telecommunications application market faces new technical challenges with the introduction of architectures such as Next Generation Networks and IP multimedia services for mobile networks.
Real-time behavior is a major issue for new applications and protocol classes based on IP services such as VoIP, SIGTRAN, and RTP, where real time behavior drives the quality of service for end-users. Enhancements in real-time behavior would allow Linux to be used for some applications that are currently run on other real-time operating systems.
This document does not make a distinction between hard real-time and soft real- time support in the Linux kernel. Real-time capabilities are defined in terms such as maximum scheduling latency.
Incorporating high-resolution timers based on a 1 ms tick, rather than the currently supported 10 ms tick, will enhance the real-time task scheduling capabilities of Linux. If hardware platform support is provided for a 1 ms tick, the kernel will no longer be required to program a specific timer to elapse after 1 ms, eliminating overhead.
This feature enables:
- A 1 ms quantum to be managed for task scheduling.
- A 1 ms timer to be managed without requiring the kernel to program a specific clock. Configuring the kernel with a 1 ms tick value rather than the current 10 ms tick value allows rescheduling to occur every 1 ms in response to a periodic clock timer interrupt.
POSIX real-time and advanced real-time features enable better support for real- time, portable applications at the API level.
Priority inversion is an issue for real-time application programming because scheduling priorities defined by design may be inverted causing unexpected latencies. Priority inversion happens when a lower priority thread blocks a higher priority one. The most general case is when a lower priority thread holds a resource needed by the higher priority thread.
Priority inversion protection can be provided in the Linux kernel by dynamically modifying the thread scheduling priority when lower priority threads are holding resources.
Transitive priority inheritance is required to deal with cases where several mutexes are used by several threads.
Scheduling policy can also be dynamically modified by the protection mechanism. For example, time-sharing threads can be promoted to real-time FIFO threads. This can have undesired consequences, however, as timesharing processes are generally not coded with FIFO policy in mind. A means should be provided for the client application to specify priority inheritance or priority protection capabilities for the internal mutexes that they use.
APIs providing this capability should be implemented in such a way so that they will perform correctly if they are promoted to real-time policies.
The priority inheritance protection mechanism can be extended by using a dynamic priority promotion system for message queues. In such a system, the priority of the receiver thread is promoted by the scheduler according to the message priority, enabling processing of urgent messages with high scheduling priority.
Since interrupt service routines are not allowed to sleep, preemption locks in interrupt handlers normally can‟t be changed to mutexes. To change preemption locks that are placed in interrupt service routines, interrupt service routines (aside from the timer interrupt routines) could be handled by kernel threads.
Mapping interrupt service routines onto real-time kernel threads enables interrupt handlers to be assigned priorities and soft real-time processes to be given higher priorities than interrupt handlers, allowing better designs. An additional benefit is the reduction of critical sections in interrupt handlers.
Improving performance and scalability in an SMP system can be accomplished by reducing resource contention through process affinity interrupt affinity, and Hyper-Threading support.
SMP kernel critical sections can be handled by:
- A spin-lock
- A mutex, if not used in an interrupt handler
Generally, the spin-lock option is the faster in terms of CPU time, but it requires that preemption be disabled and introduces processor-level latency when the resource is already locked. The mutex option adds mutex and context switching costs, but latency remains at the process level.
Using spin-lock with a high number of processors can lead to high latency depending on the critical section length.
Quality of service must be taken into account for following cases:
- When timers are armed in parallel on several processors
- When concurrent file accesses occur
- When shared-memory is accessed by several processors
Process affinity provides for load balancing at the application level. When process affinity is used, it provides more efficient caching. For example, it must be possible to bind real-time processes to specified processors. Other processes in the systems do not need to be assigned to specified processors.
Assigning the top half of interrupt handlers to a single processor enables load balancing of interrupt handlers. The bottom half and top half of each interrupt handler should be assigned to the same CPU to reduce inter-processor contention.
Because the logical Hyper-Threaded processors share a cache, the scheduler only needs to keep threads attached to one of the adjacent logical processors. The scheduler can move threads between adjacent logical processors with no performance degradation because the cache is stable between the two logical processors.
As CPU capabilities increase, memory demands also increase as more communication contexts can be handled per system. Memory related requirements are oriented toward high physical memory (HIGHMEM) and virtual memory.
Support for more than 4G of physical memory is a requirement for 32-bit and 64- bit processor architectures.
Communication services have a major impact on performance of telecommunications applications. Performance of Linux stacks should be evaluated as follows:
- Message delivery latency and throughput
- Resource usage including CPU and memory usage
- Load balancing capability on an SMP system
The speed at which packets can be routed is limited by the time it takes to perform the forwarding table lookup for each packet.
When a basic lookup method is used, such as the BSD binary trie, the number of nodes equal to the length of the address in bits is potentially traversed in the forwarding table, generating an equivalent number of memory accesses. The current Linux implementation is not highly scalable.
Methods faster than those currently available should be implemented to support 2000 routes updated per second and up to 500,000 routes with low lookup latency. The tradeoff between memory and access latency should also be addressed.
See “Survey and taxonomy of IP address lookup algorithms “ at http://mia.ece.uic.edu/~papers/Surveys/pdf00000.pdf .
A cluster benefits from a cluster specific communication service that addresses specific issues such as latency, ordering, and recovery. A cluster communication service can achieve better performance than a general communication service when used in a cluster, because it has knowledge of the local topology, including the cluster membership.
Support should be provided for Differentiated Services (RFCs 2474 and 2475) for IPv4 to enable quality of service and traffic control.
A prioritized protocol processing mechanism enables a high-priority process to quickly obtain data from the network even if massive packets arrive for multiple processes. It is based on a protocol priority assignment mechanism that allows a higher scheduling priority to be given to the protocol with higher priority.
A network storage replication service uses local network and device resources. Performance depends on the local network and storage devices used.
A network storage replication service provides a lower performance level compared to local storage access. The relative difference must be less than 30% in terms of user throughput in normal conditions when mirrored devices are synchronized.
Upon device resynchronization, the user throughput should not be reduced more than 25% compared to normal conditions.
The CGL 2.0 requirement for application pre-loading should be extended to enhance dynamic loading performance. Often, several seconds are spent in the dynamic ELF loader for symbol relocation.
ID | Name | Category | Priority |
---|---|---|---|
PRF.1.4 | High-Resolution Timers | Performance | P1 |
CGL specifies that carrier grade Linux shall provide high-resolution timer support. As specified by POSIX 1003.1b section 14, Clocks and Timers API. |
ID | Name | Category | Priority |
---|---|---|---|
PRF.1.7 | Handling Interrupts As Threads | Performance | P2 |
CGL specifies that carrier grade Linux shall enable handling of interrupt handlers (top half and bottom half) as a task-based process rather than in interrupt processing routine mechanism to allow:
|
ID | Name | Category | Priority |
---|---|---|---|
PRF.2.1 | Enabling Process Affinity | Performance | P1 |
CGL specifies that carrier grade Linux shall enable process affinity. Process affinity enables a process to run on an explicitly designated processor. When process affinity is used, it provides more efficient caching. For example, it must be possible to bind real-time processes to specified processors. |
ID | Name | Category | Priority |
---|---|---|---|
PRF.2.2 | Enabling Interrupt CPU Affinity | Performance | P1 |
CGL specifies that carrier grade Linux shall enable interrupt CPU affinity. The interrupts are divided into a critical urgent part that the kernel needs to execute quickly and a deferrable part. CGL should enable interrupt CPU affinity on the critical urgent part. Note: The latest stable kernel enables interrupt affinity based on the /proc configuration interface. |
ID | Name | Category | Priority |
---|---|---|---|
PRF.2.3 | (Hyper-Threading) Optimized SMT Support | Performance | P1 |
CGL specifies that carrier grade Linux shall enable optimized symmetric multi-threading (SMT) processors and interrupt migration between logical processors. Note: The latest stable kernel enables this feature. |
ID | Name | Category | Priority |
---|---|---|---|
PRF.4.2 | Support of Gigabit Ethernet Jumbo MTU | Performance | P1 |
CGL specifies that carrier grade Linux shall enable support for a 9000 byte Maximum Transmission Unit (MTU) for the Gigabit Ethernet protocol to enable lower CPU overhead and better throughput. This shall be a configurable option as some applications may prefer low latency to large message sizes. Hardware support is required. |
ID | Name | Category | Priority |
---|---|---|---|
PRF.5.0 | Efficient Low-Level Asynchronous Events | Performance | P1 |
CGL specifies that carrier grade Linux shall provide an API for applications that allows asynchronous notifications to be delivered based either level or edge triggers. |
ID | Name | Category | Priority |
---|---|---|---|
PRF.6.0 | Managing Transient Data | Performance | P1 |
CGL specifies that carrier grade Linux shall provide support for a self resizing, file system stored in virtual memory for transient data that can be limited to a maximum size. |
ID | Name | Category | Priority |
---|---|---|---|
PRF.7.0 | Interruptless Ethernet Delivery | Performance | P1 |
CGL specifies that carrier grade Linux shall provide for the capability for Ethernet drivers to operate in a pure polling mode in which they do not generate interrupts for arriving frames. This is to prevent interrupt-storms from consuming too many CPU cycles. This is primarily an issue for gigabit Ethernet. |
ID | Name | Category | Priority |
---|---|---|---|
PRF.8.0 | Network Storage block level Replication Performances | Performance | P2 |
CGL specifies that carrier grade Linux shall provide a network storage replication service with the following performance levels:
|
ID | Name | Category | Priority |
---|---|---|---|
PRF.14.0 | RAID 0 Support | Performance | P1 |
CGL specifies that carrier grade Linux shall provide RAID 0 (striping) support that stripes data across multiple disks without any redundant information to enhance performance in either a request-rate-intensive or transfer-rate-intensive environment. |
- Linux Scheduler latency, Clark Williams, Red Hat, Inc. March 2002 http://www.linuxdevices.com/files/article027/rh-rtpaper.pdf
- The Linux scalability Project http://www.citi.umich.edu/techreports/reports/citi-tr-99-4.pdf
- Scalable statistic counter project http://lse.sourceforge.net/counters/statctr.html
- Linux 2.5 Timer scalability study from Andy Pfiffer http://developer.osdl.org/andyp/timers/
- LK SCTP / TCP performance comparison http://datatag.web.cern.ch/datatag/WP3/sctp/tests.htm
- kernel 2.6 includes some scalability enhancements that are referenced in http://www.kernelnewbies.org/status/Status-08-Aug-2003.html
- lmbench: Portable Tools for performance analysis: http://www.usenix.org/publications/library/proceedings/sd96/full_papers/mcvoy.pdf
- Time-critical tasks in Linux 2.6. Concept to increase the preemptability of the Linux kernel. http://inf3-www.informatik.unibw-muenchen.de/research/linux/hannover/automation_conf04.pdf
- CELF-RT working group http://tree.celinuxforum.org/pubwiki/moin.cgi/RealTimeWorkingGroup
- Integration New Capabilities into NetPIPE: http://www.scl.ameslab.gov/netpipe/np_euro.pdf
One goal of the CGL effort to achieve high reliability, availability, and serviceability (RAS), and application portability is to leverage mature and well- established industry standards that are common and relevant to the carrier-grade environment and include them as part of the CGL requirements.
Open standards are important because they are freely available for anyone or any organization to use and because open standards can evolve with wide community feedback and validation. The CGL WG is actively working with recognized standard bodies, such as the Linux Standard Base (LSB – a workgroup of the Linux Foundation) and the Service Availability Forum (SA Forum). These organizations are producing standards and specifications that address the RAS and application portability gaps between Linux as it exists today and where it needs to be to support highly available communications applications.
The first requirement in this section shows the CGL working group's desire to work alongside recognized standards bodies:
CGL specifies the need for compliance to the Linux Standard Base (LSB) version 3.0 to ensure a CGL 5.0 distribution will have the support for the same level of the application binary compatibility as is required by the LSB standard.
CGL 5.0 requires implementation of the latest interface specifications from the SA Forum to provide a common set of standards and building blocks for high availability architectures and platform management. The SA Forum provides standards specifications that define interfaces for cluster-aware applications (Application Interface Specification - AIS version B.01.01) and for platform management applications (Hardware Platform Interface - HPI version B.01.01). See the SA Forum site (www.saforum.org) for the B.01.01 versions of the AIS and HPI specifications.
Continuing from previous versions of the CGL specifications, the CGL Standards Definition adds more POSIX compliance requirements based on IEEE Std 1003.1-2001. These additional areas of POSIX compliance are intended to bridge the application portability gaps as mainstream communications applications are ported to Linux application environments.
A variety of other standards requirements are included in the CGL Standards Definition to address the networking, communications, and platform needs of carrier environments. Standards requirements such as Stream Control Transfer Protocol (SCTP), Internet Protocols (Ipv4/IPv6), Mobile Internet Protocol (MIPv6), Simple Network Management Protocol (SNMP), Intelligent Platform Management Interface (IPMI), IEEE 801.Q (virtual LAN), Diameter, Common Information Model (CIM), Web-Based Enterprise Management (WBEM), Advanced Configuration and Power Interface (ACPI), and PCI Express, are included.
More open industry standards will become mature and recognized over time. The CGL working group will evaluate them for consideration in future versions of the CG requirements. The CGL working group believes that the adoption of open standards in mainline Linux offerings will benefit application developers and solution providers and will carry Linux to the next level of popularity in the communications industry as well as the general Linux user community.
ID | Name | Category | Priority |
---|---|---|---|
STD.1.0 | Linux Standard Base Compliance | Standards | P1 |
http://www.linuxbase.org CGL specifies that carrier grade Linux shall be compliant with the Linux Standard Base (LSB) 3.0 The LSB 3.0 specification has been split into a generic LSB core, a generic module for C++, and a set of architecture specific modules. Required LSB 3.0 modules for CGL are:
The developer may choose to implement more than one architecture platform. In this case, each supported architecture platform shall contain an implementation of at least one architecture specific LSB-Core module and one architecture specific LSB-CXX module. |
ID | Name | Category | Priority |
---|---|---|---|
STD.3.1 | SCTP: Base Features | Standards | P1 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs below. |
ID | Name | Category | Priority |
---|---|---|---|
STD.3.2.1 | SCTP: Additional Features | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.3.2.2 | Extensions to BSD Sockets to support SCTP | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the Internet draft below:
Carrier Grade Linux Standards Requirements Definition Version 4.0 |
ID | Name | Category | Priority |
---|---|---|---|
STD.3.2.3 | RFC 3873 MIB for SCTP | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the Internet draft below.
|
ID | Name | Category | Priority |
---|---|---|---|
STD.3.2.4 | Extension for adding IP addresses to SCTP association | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the Internet draft below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.3.2.5 | RFC 3758 Partial reliability | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFC below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.3.2.6 | SCTP Threats | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the Internet draft below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.4.1 | IPv6 Base Features | Standards | P1 |
CGL specifies that carrier grade Linux shall provide the IPv6 functionality listed in the RFCs below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.4.2.1 | IPv6 Additional Features: RFC 2451 Ciphers | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.4.2.2 | IPv6 Additional Features: RFC 4213/2893 Tunnels | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below: |
ID | Name | Category | Priority |
---|---|---|---|
STD.4.2.3 | IPv6 Additional Features: RFC 3484 Default Address Selection | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.4.2.4 | IPv6 Additional Features: RFC 3315 Dynamic Host Configuration | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.4.2.5 | IPv6 Additional Features: RFC 3633 Prefix Options for Dynamic Host Configuration Protocol | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
STD.4.2.6 IPv6 Additional Features: RFC 4191 Default Router Preferences
ID | Name | Category | Priority |
---|---|---|---|
STD.4.2.6 | IPv6 Additional Features: RFC 4191 Default Router Preferences | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.4.2.7 | IPv6 Additional Features: RFC 2428 FTP Extensions | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.4.2.8 | IPv6 Additional Features: RFC 3596 DNS Extensions | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.4.2.9 | IPv6 Additional Features: RFC 2874 DNS Address Aggregation and Renumbering | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
STD.4.2.10 IPv6 Additional Features: RFC 3646 DNS options for DHCP
ID | Name | Category | Priority |
---|---|---|---|
STD.4.2.10 | IPv6 Additional Features: RFC 3646 DNS options for DHCP | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.4.2.13 | IPv6 Additional Features: NFS | Standards | P2 |
Linux Foundation CGL specifies that carrier grade Linux shall provide support for IPv6-based NFS. Reference: SCOPE Alliance Carrier Grade Gap CGOS-5.2 Proof-of-Concept: Mainline kernel and NFSv4 / http://wiki.linux-nfs.org/wiki/index.php/Ipv6PlanningDocument |
ID | Name | Category | Priority |
---|---|---|---|
STD.5.1 | IPSec Major CGL Features | Standards | P1 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs below.
|
ID | Name | Category | Priority |
---|---|---|---|
STD.5.2.1 | IPSec Minor CGL Features: RFC 4301 Security Architecture for IP | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.5.2.2 | IPSec Minor CGL Features: RFC 4302 IP Authentication Header | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.5.2.3 | IPSec Minor CGL Features: RFC 4303 IP Encapsulating Security Payload | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.5.2.4 | IPSec Minor CGL Features: RFC 4305 Cryptographic Algorithm Requirements | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below: |
ID | Name | Category | Priority |
---|---|---|---|
STD.5.2.5 | IPSec Minor CGL Features: RFC 4307 Cryptographic Algorithms for Use in IKE | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.5.2.6 | IPSec Minor CGL Features: RFC 4322 Opportunistic Encryption using IKE | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.5.2.7 | IPSec Minor CGL Features: RFC 4434 AES Algorithm for IKE | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs and internet drafts below:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.6.1 | MIPv6 CGL Major Features | Standards | P1 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFC below.
|
ID | Name | Category | Priority |
---|---|---|---|
STD.6.2 | IPv6 Minor CGL Features | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality listed in the RFCs below.
|
ID | Name | Category | Priority |
---|---|---|---|
STD.7.1 | SNMP v1, v2, v3 | Standards | P1 |
CGL specifies that carrier grade Linux shall provide SNMPv1, SNMPv2, and SNMPv3 functionality as defined in the RFCs listed below. |
ID | Name | Category | Priority |
---|---|---|---|
STD.7.2 | SNMP MIBs for IPv6/IPv4 | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality for the SNMP IPv6/IPv4 MIBs as defined by the RFCs listed below:
Note: There is currently an ongoing effort within IETF to combine IPv4 and IPv6 MIBs into unified MIBs. The developer may choose to implement RFC 2011, RFC 2466. |
STD.8.1 SA Forum AIS http://www.saforum.org
ID | Name | Category | Priority |
---|---|---|---|
STD.8.1 | SA Forum AIS http://www.saforum.org | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the APIs as defined by the SA Forum AIS Release 5 or a subsequent level of the relevant AIS specification |
STD.8.8 SA Forum HPI http://www.saforum.org
ID | Name | Category | Priority |
---|---|---|---|
STD.8.8 | SA Forum HPI http://www.saforum.org | Standards | P1 |
CGL specifies that carrier grade Linux shall provide the functionality defined in the SA Forum HPI B.02.01 specification or a subsequent level of the relevant HPI specification. |
ID | Name | Category | Priority |
---|---|---|---|
STD.9.0 | IPMI | Standards | P1 |
CGL specifies that carrier grade Linux shall provide the System Management Software (SMS) functionality to interface with the below-listed levels of the Intelligent Platform Management Interface (IPMI):
|
ID | Name | Category | Priority |
---|---|---|---|
STD.10.0 | 802.1Q VLAN Endpoint | Standards | P1 |
CGL specifies that carrier grade Linux shall provide the functionality defined in the IEEE Std 802.1Q-1998 specification. This standard defines the operation of virtual LAN (VLAN) endpoints that permit the definition, operation and administration of Virtual LAN topologies within a LAN infrastructure. |
ID | Name | Category | Priority |
---|---|---|---|
STD.11.1 | Diameter Protocol CGL Major Features | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality defined in the following RFCs and Internet drafts.
|
ID | Name | Category | Priority |
---|---|---|---|
STD.11.2 | Diameter Protocol Minor CGL Features | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality defined in the following Internet drafts. RFC 4004 |
ID | Name | Category | Priority |
---|---|---|---|
STD.17.1 | iSCSI Support: RFC 3270 iSCSI | Standards | P1 |
CGL specifies that carrier grade Linux shall provide support for Internet Small Computer Systems Interface (iSCSI) Initiators. The iSCSI Initiators shall support IPv6, SNMP MIBs, error handling, target discovery, and multiple sessions. This functionality is defined in the following RFCs:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.17.2 | iSCSI Support: RFC 3271 iSCSI Naming & Discovery | Standards | P1 |
CGL specifies that carrier grade Linux shall provide support for Internet Small Computer Systems Interface (iSCSI) Initiators. The iSCSI Initiators shall support IPv6, SNMP MIBs, error handling, target discovery, and multiple sessions. This functionality is defined in the following RFCs:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.17.3 | iSCSI Support: RFC 3273 iSCSI Securing Block Storage Protocols over IP | Standards | P1 |
CGL specifies that carrier grade Linux shall provide support for Internet Small Computer Systems Interface (iSCSI) Initiators. The iSCSI Initiators shall support IPv6, SNMP MIBs, error handling, target discovery, and multiple sessions. This functionality is defined in the following RFCs:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.18.1 | Differentiated Services: RFC 2474 Definition | Standards | P2 |
CGL specifies that carrier grade Linux shall provide support for differentiated services for IPv4 protocol as defined by the RFCs below. Differentiated services provide network traffic with different levels of service to enable quality of service and traffic control.
|
ID | Name | Category | Priority |
---|---|---|---|
STD.18.2 | Differentiated Services: RFC 2475 Definition | Standards | P2 |
CGL specifies that carrier grade Linux shall provide support for differentiated services for IPv4 protocol as defined by the RFCs below. Differentiated services provide network traffic with different levels of service to enable quality of service and traffic control.
|
ID | Name | Category | Priority |
---|---|---|---|
STD.20.1 | PKI CA: RFC 2527 X.509 PKI | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality for private key infrastructure (PKI) support as defined in the standards:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.20.2 | PKI CA: RFC 2527 X.509 PKI | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality for private key infrastructure (PKI) support as defined in the standards:
|
ID | Name | Category | Priority |
---|---|---|---|
STD.20.3 | PKI CA: RFC 3279 Algorithms for X.509 PKI | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality for private key infrastructure (PKI) support as defined in the standards: RFC 3279 - Algorithms and Identifiers for the Internet X.509 Public Key Infrastructure |
ID | Name | Category | Priority |
---|---|---|---|
STD.20.4 | PKI CA: RFC 3280 X.509 PKI Certificate Stuff | Standards | P2 |
CGL specifies that carrier grade Linux shall provide the functionality for private key infrastructure (PKI) support as defined in the standards: RFC 3280 - Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile |
ID | Name | Category | Priority |
---|---|---|---|
STD.26.1 | Layer 2 Tunneling Protocol Support | Standards | P1 |
Linux Foundation CGL specifies that carrier grade Linux shall provide support for Layer 2 Tunneling Protocol (L2TP) as described in RFC 2661: Layer Two Tunneling Protocol "L2TP". Reference: SCOPE Alliance Carrier Grade Gap CGOS-5.3 Proof-of-Concept: Mainline kernel. |
ID | Name | Category | Priority |
---|---|---|---|
STD.26.2 | Layer 2 Tunneling Protocol Support Version 3 | Standards | P1 |
Linux Foundation CGL specifies that carrier grade Linux shall provide support for Layer 2 Tunneling Protocol (L2TP) as described in RFC 3931: Layer Two Tunneling Protocol - Version 3 (L2TPv3). Reference: SCOPE Alliance Carrier Grade Gap CGOS-5.3 Proof-of-Concept: Mainline kernel. |
The telecommunications environment is different from a general-purpose computing environment. The most salient differences to consider in developing a CGL threat model are:
- CGL systems do not have many user accounts.
- User accounts do not reflect individual users.
- CGL systems are configured through custom user interfaces.
- CGL systems are typically configured without shell access.
- Administrators are trusted and competent.
The major threat to the telecommunications environment is, therefore, unauthorized access to management and control interfaces by outsiders. These outsiders can gain access by subverting the operating system or one of the applications it is running.
A severe potential security threat arises when applications need to touch multiple security planes. Many telecommunication services can be provisioned remotely by the end-user.
Many ISPs that offer domain hosting allow customers to create new mailboxes or route incoming calls to 5-digit work extensions to any telephone number in the world with just a few clicks on a web page. Facilities like these create a new set of risks:
- Unauthorized rerouting of email and telephone calls by disgruntled associates or unscrupulous competitors.
- Exploitation of vulnerabilities in software to “jump” from one security plane to another, which can lead to many types of risks.
Mitigating these risks will require some forethought such that users of these systems are properly authenticated and authorized and that information traveling between planes passes through narrowly defined interfaces that protect against unauthorized access.
The security objectives and requirements in this document are aimed at analyzing and mitigating threats and improving resiliency to attacks on CGL systems. The requirements in this section attempt to implement security objectives for CGL systems and are based on an intersection of assumptions about CGL systems:
- Intended use
- Environment
- Security policies
- Exposure to expected threats and vulnerabilities
The security requirements are firmly rooted in sound security practices. These practices and terminology borrow heavily from [CSPP-OS03], an example Common Criteria profile for common off the shelf (COTS) operating systems.
Given the environment described in the previous section, the significant threat to carrier grade systems is unauthorized access to management and control interfaces by intruders.
The CGL Security Requirements have been based upon the Common Criteria Protection Profiles:
- Identify the assumptions about CGL systems based upon their use and their environment.
- Draft a set of security policies to which CGL systems shall adhere.
- Identify common threats to which CGL systems are exposed.
- Derive the set of functional objectives that CGL systems shall implement.
- Derive a coherent set of requirements that address the functional objectives.
This section identifies the security objectives met by the requirements in this specification. A more complete list from which these security objectives were taken is found in section 10.7. A Target of Evaluation (TOE) is the system and environment to which these objectives are applied.
The following table specifies the security objectives met by requirements listed in section of this document.
Security Objective | Description |
O.DETECT-SOPHISTICATED | The environment must provide the ability to detect sophisticated attacks and the results of such attacks (e.g. corrupted system state). |
O.ENTRY-NON-TECHNICAL | The environment must provide sufficient protection against non-technical attacks by other than authenticated users. |
O.PHYSICAL | Those responsible for the system must ensure that those parts of the system critical to security policy are protected from physical attack that might compromise security. |
O.ACCESS-TOE | The system must provide public access and access by authenticated users to those resources and actions for which they have been authorized. |
O.ACCOUNT-TOE | The system must ensure, for actions under its control or knowledge, that all users can subsequently be held accountable for their security relevant actions. It is anticipated that individual accountability might not be achieved for some actions. |
O.AUTHORIZE-TOE | The system must provide the ability to specify and manage user and system process access rights to individual processing resources and data elements under its control, supporting the organization‟s security policy for access control. |
O.BYPASS-TOE | The system must prevent errant or non-malicious, authorized software or users from bypassing or circumventing security policy enforcement. NOTE: This objective is limited to "non-malicious" because CSPP-OS controls are not expected to provide sufficient mitigation for the greater negative impact that "malicious" implies. |
O.DETECT-TOE | The system must enable the detection of a specified set of vulnerabilities. |
O.ENTRY-TOE | The system must prevent logical entry to itself using unsophisticated technical methods by persons without authority for such access. |
O.KNOWN-TOE | The system must ensure that, for all actions under its control and except for a well-defined set of allowed actions, all users are identified and authenticated before being granted access. |
O.OBSERVE-TOE | The system must ensure that its security status is not misrepresented to the administrator or user. This is a combination of prevention and detection. |
O.RESOURCES | The system must protect itself from user or system errors that result in shared resource exhaustion. |
O.APPLICATION-TOOLS | The system must provide a reasonable, up-to-date set of security tools and libraries for use by applications. |
O.ACCESS-MALICIOUS | System and environmental controls are required to sufficiently mitigate the threat of malicious actions by authenticated users. |
O.DETECT-SYSTEM | The system, in conjunction with other entities in the environment, must enable the detection of system insecurities. |
O.NETWORK | The system must be able to meet its security objectives in a distributed environment. |
O.ENTRY-SOPHISTICATED | The system and environment must sufficiently mitigate the threat of an individual (other than an authenticated user) gaining unauthorized access via sophisticated, technical attack. |
O.CONTAINMENT | The system and environment must provide the ability to contain the effect of a security failure of an application to that application. |
The following table specifies the security objectives not met by requirements in section of this document.
Security Objective | Rationale for not including in specification |
O.ACCESS-NON-TECHNICAL | The environment must provide sufficient protection against non-technical attacks by authenticated users for non-malicious purposes. |
O.AVAILABLE-TOE | The system must protect itself from unsophisticated denial-of-service attacks. |
O.INFO-FLOW | The environment must ensure that any information flow control policies are enforced between system components and at the system external interfaces. |
O.RECOVER-TOE, O.RECOVER-SYSTEM | Fail-secure is not something that CGL can provide. |
O.COMPLY | There are many regulations that might apply to CGL. It is not the responsibility of this specification to enumerate requirements to conform to this myriad of regulations. |
O.DUE-CARE | It is the responsibility of the administrative personnel to properly secure and maintain a system. |
O.MANAGE | It is the responsibility of administrative personnel to properly secure and maintain a system. This includes periodic audits of system configuration (not log analysis). However, no such software is being required by CGL. |
O.OPERATE | Mostly this is the responsibility of administrative personnel. Secure default configuration settings will not be listed in this specification. |
O.DENIAL-SOPHISTICATED | CGL is not directly able to mitigate most denial of service attacks, as mitigating them would require redesign of protocols and interfaces. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.1.1 | Dynamic Kernel Security Module Mechanism | Security | P1 |
CGL specifies that carrier grade Linux shall support an interface that allows the addition of new access control policy implementations to the kernel without requiring patching or recompilation. This support must allow for the dynamic loading of such policy implementations. The mechanism must govern all of the kernel objects. This requirement does not specify any particular policies. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.1.2 | Process Containment using File System Restrictions | Security | P1 |
CGL specifies that carrier grade Linux shall provide support for constraining the privileges and access to system resources of a process independently of the user account under which the process runs by limiting a process' access to a subset of the file system hierarchy. This limits the effects of a security compromise of a process (such as a buffer overflow exploit). |
ID | Name | Category | Priority |
---|---|---|---|
SEC.1.3 | Process Containment Using MAC-based Mechanism | Security | P1 |
CGL specifies that carrier grade Linux shall provide support for constraining the privileges and access to system resources of a process independently of the user account under which the process runs, using a mandatory access control (MAC) mechanism. This limits the effects of a security compromise of a process, such as a buffer overflow exploit, even if it running as root. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.1.3.1 | MAC-based Policy Administration Tools | Security | P2 |
CGL specifies that carrier grade Linux shall provide tools for the administration of MAC-based access control policies. These tools should facilitate the creation, maintenance, and management of policies. The tools should provide at least one of a command line or graphical interface. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.1.4 | Buffer Overflow Protection | Security | P1 |
CGL specifies that carrier grade Linux shall provide at least one mechanism to protect against the exploitation of software bugs that exploit the lack of boundary checking in many programs and give an attacker some access to a task's address space by writing outside of buffer bounds. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.1.5 | Access Control List Support for File Systems | Security | P1 |
CGL specifies that carrier grade Linux shall provide access control list (ACL) capabilities on file systems that allow the specification of access rights for multiple users and groups. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.2.1 | Generic Authentication Modules | Security | P1 |
CGL specifies that carrier grade Linux shall support a mechanism for implementing new operating system authentication mechanisms. This support must allow for the dynamic loading of authentication modules. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.2.2 | Password Integrity Checking | Security | P1 |
CGL specifies that carrier grade Linux shall provide tools to check passwords to ensure they cannot be cracked using common attack methods. These tools shall support at least the DES cipher text format and allow the user to specify rules for rejecting passwords. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.3.1 | Auditing | Security | P1 |
CGL specifies that carrier grade Linux shall provide auditing mechanisms that flag security-relevant events and alert a system administrator. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.3.2 | Secure Transport of Log Information | Security | P1 |
CGL specifies that carrier grade Linux shall provide secure transport of log information over a network to the log files. The transport mechanism shall ensure that the information remains confidential, cannot be modified, is not a replay of an earlier log message, and originated at the source it claims. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.3.3 | Periodic Automated Log Analysis | Security | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism for periodically and automatically analyzing log files. This mechanism shall be able to generate reports if any suspicious or unrecognized log entry is detected. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.3.4 | Active Log Monitoring | Security | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism for automatically analyzing security-relevant log information. This mechanism shall be able to generate alarms if criteria set by a system administrator are met. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.3.5 | Log Integrity and Origin Authentication | Security | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism to check that log files have not been modified (integrity), even by most insiders. In addition, CGL specifies that carrier grade Linux shall provide a mechanism to verify the origin of a log message. CGL specifies that carrier grade Linux shall provide a mechanism to prevent replay attacks of a log message. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.4.1 | IPsec | Security | P1 |
CGL specifies that carrier grade Linux shall provide IPsec support for network level confidentiality and integrity. The implementation shall conform to RFC 2401, 2402, 2406 and at least one encapsulating security payload (ESP) algorithm such as specified by RFC 2451. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.4.2 | IKE | Security | P1 |
CGL specifies that carrier grade Linux shall provide an Internet Key Exchange (IKE) service to perform standards-based key exchange for IPsec. The service shall conform to RFC 2409. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.4.3 | PF_KEY Version 2 | Security | P1 |
CGL specifies that carrier grade Linux shall provide PF_KEY support, as defined by RFC 2367, for key management for the IPsec module and the IKE service. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.4.4 | PKI Support for Applications | Security | P1 |
CGL specifies that carrier grade Linux shall provide basic PKI features, which shall conform to the IETF PKIX standards, specifically RFC 2527, 3279 and 3280. Support for processing certification revocation lists (CRLs) is required, although a specified delivery mechanism such as HTTP/FTP RFC 2585) is not specified. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.4.5 | SSL/TLS Support for Applications | Security | P1 |
CGL specifies that carrier grade Linux shall provide basic SSL/TLS support, which shall conform to the legacy SSL and IETF TLS standards. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.4.6 | PKI Certificate Authority (CA) | Security | P1 |
CGL specifies that carrier grade Linux shall provide a basic PKI CA service. This service shall conform to the IETF PKIX standards, specifically RFC 2527, RFC 3279 and 3280. Support for the management of certification revocation lists (CRLs) is required. Certificate management and request protocols as defined by RFC 2527 3279, and 3280, are not requirements. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.5.1 | Periodic User-Level File Integrity Checking | Security | P1 |
CGL specifies that carrier grade Linux shall provide a mechanism to enable a periodic checking of the integrity of files at user-level. Files to be checked are both binary files, which should not change after installation, and text files, such as configuration and log files, which may change. File integrity checks shall be able to be scheduled at any time of the day. The checking mechanism shall be able to send alarms to a system administrator when inconsistencies are detected. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.7.1 | Memory Limits | Security | P1 |
CGL specifies that carrier grade Linux shall provide support for per-process limits for the use of system memory. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.7.2 | File System Quotas | Security | P1 |
CGL specifies that carrier grade Linux shall provide support for per-user file system quotas. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.7.3 | Process Quotas | Security | P1 |
CGL specifies that carrier grade Linux shall provide support for per-user quotas on the number of processes which may be created. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.8 | Trusted Platform Module (TPM) Support | Security | P2 |
CGL specifies that, if and only if it is installed and executing on a TPMenabled platform, carrier grade Linux shall provide OS support for the TPM hardware, as defined in TCG TPM Specification, version 2. |
ID | Name | Category | Priority |
---|---|---|---|
SEC.9.1 | Role-Based Access Control | Security | P1 |
Linux Foundation CGL specifies that carrier grade Linux shall provide a mechanism to associate a name with a set of privileges and commands to be executed, defining a role within the system. It must be possible to assign a list of authorized users to a role, to remove users from a role and to log and audit actions performed within the role. Each role must have a symbolic name and be able to be uniquely identified within the system. Reference: SCOPE Alliance Carrier Grade Gap CGOS-3.4 Proof-of-Concept: SELinux |
ID | Name | Category | Priority |
---|---|---|---|
SEC.9.2 | Advanced Role-Based Access Control | Security | P2 |
Linux Foundation CGL specifies that carrier grade Linux shall implement the Common Criteria Role-Based Access Control protection profile, version 1.0. Reference: SCOPE Alliance Carrier Grade Gap CGOS-3.4, http://www.commoncriteriaportal.org/files/ppfiles/RBAC_987.pdf Proof-of-Concept: SELinux |
ID | Name | Category | Priority |
---|---|---|---|
SEC.10 | Tamper-Resistant Storage | Security | P2 |
Linux Foundation CGL specifies that carrier grade Linux shall provide secure, tamper-resistant storage for security-relevant data such as keys and certificates. It must be possible for both kernel and user space to request validation of such data and to receive an assessment whether such data has been modified either via the operating system or some external source. Reference: SCOPE Alliance Carrier Grade Gap CGOS-3.3 |
ID | Name | Category | Priority |
---|---|---|---|
SEC.11.1 | File Access Tracing | Security | P1 |
Linux Foundation CGL specifies that carrier grade Linux shall provide the ability to record and report, via the normal system event reporting mechanism, file access events. At least the following file access events must be recorded and reported:
The reports must at least include the event that is being recorded and some uniquely identifiable information about the issuer of the operation. Reference: SCOPE Alliance Carrier Grade Gap CGOS_V3-2.0 Proof-of-Concept: Linux Audit Framework |
ID | Name | Category | Priority |
---|---|---|---|
SEC.11.2 | File Access Tracing: Limiting | Security | P2 |
Linux Foundation CGL specifies that carrier grade Linux shall provide the ability to record and report file access events. It must be possible to include or exclude arbitrary files and/or directory hierarchies from the file access tracing and the types of events that shall be logged. Reference: SCOPE Alliance Carrier Grade Gap CGOS_V3-2.0 Proof-of-Concept: AIDE, Samhain, GRSecurity |
Principle | Description |
Relevance | The requirement must be relevant and implement the function CGL objectives. |
Correctness of Implementation | The requirement must faithfully implement the security model upon which it is based. |
Simplicity | The requirements should be simple to implement. Complexity is the enemy of security. Common uses should be easy to handle and defaults should be sensible. |
Robustness | The implementations of the requirements should be difficult to configure incorrectly, fail in secure ways, and produce useful error messages. |
Orthogonality | Requirements should be useful individually without significant overlap in functionality. |
Interface Stability | Changes and additions to the Linux APIs should be done with backward compatibility in mind for both source code and binary code. |
Provision of Defense-in-Depth | Multiple security mechanisms should exist to provide additional security protection. |
Designed for Testing | A test suite should be provided for unit testing of the requirement implementations. |
The International Telecommunications Union (ITU) has published many standards that are relevant to the security of telecommunications systems. The specification defers to the ITU standards for telecommunications-specific security requirements. The CGL Security Requirements Definition is limited to issues relating to security of the underlying operating system.
X.805 defines security in terms of two major concepts which are layers and planes.
- The three layers are:
- Infrastructure - security of routers, switches, servers, communication
- links, etc.
- Services - security of services offered to the customer, such as leased
- lines, e-mail, SMS.
- Application - security of customer applications using services.
- The three planes are:
- Management - security of OAM&P
- Control - security of signaling, i.e. Session creation and modification
- End-user - security of end-user data flows
Layers and planes intersect, forming a 3 by 3 matrix. Orthogonal to this, X.805 defines eight security dimensions:
- Privacy and data confidentiality
- Authentication
- Integrity
- Non-repudiation
- Access Control
- Communication
- Availability
These dimensions touch each of the cells of the layers/planes matrix. For brevity's sake, we refer to the definitions in [ITU03].
Many of the issues addressed by X.805 are not relevant to our analysis, because they are outside the scope of an operating system.
All discussion of security revolves around risk. Risks are created when a security vulnerability is combined with the threat of that vulnerability being exploited. In the common buffer overflow attack scenario vulnerability (the lack of input validation in the software) and a threat (the attacker using software that exploit the vulnerability), creates the risk of a successful attack. The risk can be mitigated in different ways. The vulnerability is removed by fixing the software. The vulnerability is also removed by preventing the attack.
Risks do not necessarily have to be mitigated in software, but that the environment in which a system is embedded can also mitigate them. This is an important point because it is nearly impossible to construct systems that are invulnerable to attack.
All software contains vulnerabilities and it is impractical to find and remove all of them in a system. Some methods for lowering the risks relating to vulnerabilities are:
- Not exposing the system running the software to insecure networks. This is practical for certain limited purposes, for instance controlling a power plant. In the CGL environment one could segregate network traffic from different security planes, which would eliminate the threat of intruders attacking software operating in the management and control plane.
- Overflow detection through the use of programming languages and development tools. One example is the gcc compiler using the stack protection (previously known as ProPolice) extension. Most stack buffer overflows will result in the premature termination of a program. This termination transforms the risk of a successful buffer overflow attack into a denial of service attack.
- Limiting software privileges. A common approach is the use of 'chroot' jails, a method of restricting a program's access to a very limited part of the file system. Another approach is the use of a security manager that decides whether an application is allowed to perform certain operations. A common example is the Java sandbox which prevents access of applets to most system resources.
- Restricting network access using a DMZ. The application and the system running it may still be compromised, but the problem is somewhat contained.
The solution of many security problems will be a combination of the correct application of OS facilities, and a correct design of the environment in which the systems operate.
A particular issue exists where applications need to access multiple security planes. Many CGL services can be provisioned remotely by the end-user. Many ISPs that offer domain hosting allow the creation of new mailboxes by the customer. These facilities create new risks:
- Unauthorized rerouting of e-mail and telephone calls by disgruntled employees or unscrupulous competitors.
- Exploitation of vulnerabilities in software to 'jump' from one security plane to another.
Mitigating these risks requires forethought.
- The users of these systems need to be properly authenticated and authorized.
- Information traveling between planes should pass through narrowly defined interfaces that protect against unauthorized access to the control and management planes from the end-user plane. A security failure in an exposed part of the system should not result in failure of the system as a whole.
Facilities that limit information flow between planes are not commonly available. Possible approaches could be:
- Running software on multiple hosts, with very limited connectivity between them.
- Running multiple processes on the same host, using operating system facilities to contain each process in its own security domain.
Unix-like systems such as Linux share a few common security facilities:
- Discretionary access control using user IDs, group IDs, and file system privileges.
- Restriction of processes to a portion of the filesystem. Some Unix-like systems provide additional facilities which can be useful under certain circumstances, such as:
- Access Control Lists: Some access control policies are difficult to implement with the classical Unix access control mechanism. ACLs provide a more powerful mechanism to describe access rules. The lack of users on typical carrier grade equipment makes ACLs not overly useful.
- Role Based Access Control: Users of the system can be assigned 'roles' which grant privileges to resources. The role 'help desk' for example could include privileges to change passwords for non-administrative users. RBAC is most useful if there are many instances of the role. This is not commonly the case for CGL systems.
To mitigate risks precipitated by software design or implementation errors, CGL requires a much more fine-grained control over system privileges. The common way to handle programs that need certain privileges is to give them full privileges at start-up time and let the program drop all the privileges they don't need. This causes a few problems. The privileges that need to be dropped are not necessarily the same on all systems, and there becomes a proliferation of privilege-manipulation code on the system. Tools that allow the designer or administrator to start software with the minimal set of privileges is required.
Another issue is that Linux systems do not have a sufficiently fine-grained privilege model. For example, it is impossible to restrict the use of a specific IP address and/or port range to a limited number of processes. Ideally, it should be possible to allow a specific process to bind to port 80 (WWW) on a single interface. Multi-level security (MLS) implementations can be used to prohibit processes from accessing network interfaces they do not need to access.
The following sections borrow heavily from [CSPP-OS03], an example Common Criteria profile for COTS operating systems.
Name | Assumption | Rationale |
A.COTS | The TOE is constructed from near-term achievable off the shelf Linux technology. | This follows from the charter of CGL. |
A.MALICIOUS-INSIDER | The TOE is not expected to be able to sufficiently mitigate the risks resulting from the malicious abuse of authorized privileges. | In CGL environments the primary threats are network- based attacks, so the focus is on this type of threat. |
A.SOPHISTICATED-ATTAC K | The TOE is expected to be able to mitigate risks resulting from the application of moderately sophisticated attack methods. | Internet-based CGL applications are subject to network-based attacks, and should be more resistant to attacks than general-purpose systems. |
A.APPLICATION-HOSTILE | The network containing the TOE is used to provide a limited set of applications to an untrusted network, not to provide shell access to users at different trust levels. | Communications architectures are moving away from general-purpose computing to application servers in hostile environments. |
Name | Assumption | Rationale |
A.ADMIN | The security features of the TOE are competently administered on a continuous basis. | It is essential for security that administration is both competent and continuous. |
A.ADMIN-ONLY | Authenticated access to the TOE is only provided to those charged with maintaining the TOE and the applications it provides. | CGL is not targeting general purpose computing. |
A.USER-NEED | Authenticated users, such as administrators, recognize the need for a secure CGL environment. | Application administrators value security of applications which they maintain. |
A.USER-TRUST | Authenticated users, such as administrators, are generally trusted to perform discretionary actions in accordance with security policies. | Access is restricted to administrators maintaining applications. |
A.NET-SEGREGATION | Network connections in the management, control and end-user planes are adequately segregated. One approach is to use physically separate networks. Another approach is the use of cryptographic methods for authentication, integrity verification and data confidentiality. | The end user should not be able to gain access to either the control or management plane. |
A.CLUSTER-SEGREGATION | If the TOE is part of a cluster the intra-cluster communications should be adequately segregated from any other traffic, either by physical separation or by the use of cryptographic methods for authentication, integrity verification and data confidentiality. | Results are likely to be disruptive if cluster traffic is tampered with or captured. For this reason, separate interconnect is preferable. |
A.PROCESS-UNTRUSTED | Processes running on the TOE cannot always be trusted to perform their duties as designed, and may attempt to access resources it is not meant to access. | It is often impossible to run legacy code in restricted environments such as chroot jails. The TOE should support a safe way to run this type of code in such a way that program bugs or vulnerability exploits only have limited consequences. |
Name | Policy | Rationale |
P.ACCESS | Access rights to specific data objects are determined by object attributes assigned to that object, user identity, user attributes, and environmental conditions as defined by the security policy. | Linux supports policies that grant or deny access to objects using rules driven by attributes of the user (such as user identity), attributes of the object (such as permission bits), type of access (such as read or write), and environmental conditions (such as time-of-day). |
P.ACCOUNT | Users must be held accountable for security-relevant actions. | Organizational policies should require that users are held accountable for their actions. This facilities after-the-fact investigations and providing some deterrence to improper actions. |
P.COMPLY | The implementation and use of the organization's CGL systems must comply with all applicable laws, regulations, and contractual agreements imposed on the organization. | The organization will meet all requirements imposed upon it from outside governmental or contractual obligations. |
P.DUE-CARE | The organization’s CGL systems must be implemented and operated in a manner that represents due care and diligence with respect to the risks to the organization. | It is important that the level of security afforded by the CGL system be in accordance with best practices within the business or government sector in which the organization is placed. |
P.INFO-FLOW | Information flow between application components must be in accordance with established information flow policies. | This document includes information flow control as this is needed in many environments. While this might not be implemented by mechanisms within the Linux TOE, the CGL system, of which the TOE is a part, will likely have to meet this policy. |
P.KNOWN | Except for well-defined set of allowed operations, users of the TOE must be identified and authenticated before TOE access is granted. | Beyond a well-defined set of actions such as read access to a public web-server, there is a finite community of known, authenticated users who are authenticated before being allowed access. |
P.NETWORK | The organization's IT security policy must be maintained in the environment of distributed systems interconnected via insecure networking. | CGL system will likely connect through untrested networks and these connections should not compromise security of a CGL system. |
P.PHYSICAL | The processing resources of the TOE that must be physically protected in order to ensure that security objectives are met will be located within controlled access facilities that mitigate unauthorized, physical access. | A TOE will not be able to meet its security requirements unless at least a minimum degree of physical security is provided. |
P.SURVIVE | The IT system, in conjunction with its environment, must resist, be resilient to, and detect a security breach and recover from the breach when possible. | Linux systems will provide a measure of their resilience through functionality and assurances that resist, detect, and recover from security breaches. For sophisticated attacks, a large portion of this resilience is provided by the TOE environment. |
P.TRAINING | Authenticated user of the system must be adequately trained. This enables the users to effectively implement organizational security policies with respect to their discretionary actions. It also supports the need for non-discretionary controls implemented to enforce these policies. | Once granted legitimate access, authenticated users are expected to use CGL resources and information only in accordance with the organizational security policy. In order for this to be possible, these users must be adequately trained both to understand the purpose and need for security controls and to be able to make secure decisions with respect to their discretionary actions. |
P.USAGE | The organization's IT resources must be used only for authorized purposes. | Linux systems must, in conjunction with its environment, ensure that the organization's information technology is only used for authorized purposes. |
P.CONTAINMENT | The TOE must be able to mitigate the risks of common threats to the integrity of applications and data caused by security-relevant errors in applications. | Linux systems should limit the damage done by buffer overflows and other common attacks. This is achieved through privilege minimization and process containment mechanisms such as jails. |
P.PRIVILEGE-MIN | The TOE must be able to run applications with a minimal set of necessary privileges. | Linux systems should allow granting of privileges on a need-only basis. The nothing-or-everything model of 'root' privileges is not acceptable. |
P.NET-SEGREGATION | The TOE must be configured to provide adequate segregation between the management, control and end-user planes, using separate networks, cryptographic methods, or both. | As per the requirements in X.805, the planes should be adequately segregated. |
P.CLUSTER-SEGREGATION | If the TOE is part of a cluster the intra-cluster traffic must be adequately segregated from any other traffic. | As per the requirements in X.805, the planes should be adequately segregated including intra-cluster traffic. |
P.PROCESS-NET-SEGREGATION | The TOE must allow the configuration of access controls on network resources in such a way that a process's network access can be restricted to the minimum subset necessary. | Network resources should be segregated such that access is limited to the planes required for the network process’s operation. |
P.PROCESS-FILE-SEGRAGATION | The TOE must allow the configuration of access controls on files in such a way that the process can only access necessary files. | Limit the impact of process subversion of a process through buffer overflow attacks, insertion attacks and other common attacks. |
P.TRACEABLE-TOE | The TOE should log sufficient information for security-relevant events. | Information such as user and process identifiers are needed for forensics and log file analysis. |
This section borrows from a published example Common Criteria protection profile. According to [CSPP-OS03] the following threats do not have to be addressed by the target of evaluation. We believe that given some of the intended uses of this document we do need to address these two threats where possible.
Threat | Description of Threat |
Access rights to specific data objects are determined by object attributes assigned to that object, user identity, user attributes, and environmental conditions as defined by the security policy | |
P.ACCESS | Linux supports organizational policies that grant or deny access to objects using rules driven by attributes of the user (such as user identity), attributes of the object (such as permission bits), type of access (such as read or write), and environmental conditions (such as time-of-day). |
T.DENIAL-SOPHISTICATED | Sophisticated denial of network attacks include such threats as:
|
T.ENTRY-SOPHISTICATED | Sophisticated technical attacks by unauthenticated users, such as:
|
The following threats must be addressed by the target of evaluation:
Threat | Description of Threat |
T.ACCESS-TOE | An authorized user may gain non-malicious access to a resource or information controlled by the TOE. Such attacks include:
|
T.AUDIT-CONFIDENTIALITY-TOE | Disclosure of security event records to unauthorized users or processes. This is caused by:
|
T.AUDIT-CORRUPTED-TOE | Unauthorized modification or destruction of security event records. This is caused by:
|
T.CRASH-TOE | Compromise of secure state when system crashes because the system does not fail securely. |
T.DENIAL-TOE | Unsophisticated denial-of-service attacks. Examples include:
|
T.OBSERVE-TOE | Security compromise going undetected, for example:
|
T.RECORD-EVENT-TOE | Security-relevant events going unrecorded which is caused by:
|
T.RESOURCES | Exhaustion of system resources, which can be caused by:
|
T.TOE-CORRUPTED | The security of the TOE is intentionally corrupted, enabling future attack. This can include back doors left by programmers or intentional improper configuration of security-relevant systems (e.g. through the use of unauthenticated install media) |
According to [CSPP-OS03] the following set of threats does not have to be addressed by the OS (TOE) alone. The environment should also play a role in addressing these vulnerabilities:
Objective | Description | Threat or Policy |
O.ACCESS-NON-TECHNICAL | The IT other then the TOE environment must provide sufficient protection against non-technical attacks by authenticated users for non-malicious purposes. This will be accomplished primarily via prevention with a goal of high effectiveness. Personnel security and user training and awareness will provide a major part of achieving this objective. | P.TRAINING |
O.ACCESS-NON-TOE | The IT other then the TOE must provide public access and access by authenticated users to the resources and actions for which they have been authorized and over which the TOE does not exercise control. The focus is on prevention with a high degree of effectiveness. | P.ACCESS |
O.ACCOUNT-NON-TOE | The TOE must ensure, for actions under its control or knowledge, that all users can subsequently be held accountable for their security relevant actions. This is expected with a high degree of effectiveness. | P.ACCOUNT T.TRACEABLE-NON-TOE T.RECORD-EVENT-NON-TOE T.AUDIT-CORRUPTED-NON-TOE T.AUDIT-CONFIDENTIALITY-NON-TOE |
O.APPLICATION-TOOLS | The TOE must provide a reasonable, current set of security tools and libraries for use by applications. | P.DUE-CARE T.INSTALL T.OPERATE |
O.AUTHORIZE-NON-TOE | The TOE must provide the ability to specify and manage user and system process access rights to individual processing resources and data elements under its control, supporting the organization’s security policy for access control. This is expected with a high degree of effectiveness. NOTE: This includes initializing, specifying and managing (1) object security attributes, (2) active entity identity and security attributes, and (3) security relevant environmental conditions. |
P.ACCESS |
O.AVAILABLE-NON-TOE | The IT other than the TOE must protect itself from unsophisticated, denial-of-service attacks. This is a combination of prevention and detection and recover with a high degree of effectiveness. | P.SURVIVE T.DENIAL-NON-TOE |
O.BYPASS-NON-TOE | For access not controlled by the TOE, IT other than the TOE must prevent errant or non-malicious, authorized software or users from bypassing or circumventing security policy enforcement. This will be accomplished with high effectiveness. NOTE: This objective is limited to ‘non-malicious’ because IT controls in the notional CSPP system are not expected to provide sufficient mitigation for the greater negative impact that ‘malicious’ implies. |
T.ACCESS-NON-TOE |
O.DETECT-SOPHISTICATED | The TOE environment must provide the ability to detect sophisticated attacks and the results of such attacks (e.g., corrupted system state). The goal is for moderate effectiveness. | P.SURVIVE T.SYSTEM-CORRUPTED |
O.ENTRY-NON-TECHNICAL | The TOE environment must provide sufficient protection against non-technical attacks by other than authenticated users. This will be accomplished primarily via prevention with a goal of high effectiveness. User training and awareness will provide a major part of achieving this objective. | P.TRAINING |
O.ENTRY-NON-TOE | For resources not controlled by the TOE, IT other than the TOE must prevent logical entry using unsophisticated, technical methods, by persons without authority for such access. This is clearly a prevent focus and is to be achieved with a high degree of effectiveness. | P.USAGE T.ENTRY-NON-TOE |
O.INFO-FLOW | The TOE environment must ensure that any information flow control policies are enforced - (1) between system components and (2) at the system external interfaces. This will be accomplished by preventing unauthorized flows with high effectiveness. | P.INFO-FLOW |
O.KNOWN-NON-TOE | The IT other than the TOE must ensure that, for all actions under its control and except for a well-defined set of allowed actions, all users are identified and authenticated before being granted access. This is expected with a high degree of effectiveness. | P.KNOWN |
O.OBSERVE-NON-TOE | The IT other than the TOE must ensure that its security status is not misrepresented to the administrator or user. This is a combination of prevent and detect and, considering the potentially large number of possible failure modes, is to be achieved with a moderate, verses high, degree of effectiveness. | T.OBSERVE-NON-TOE |
O.PHYSICAL | Those responsible for the TOE must ensure that those parts of the TOE critical to security policy are protected from physical attack that might compromise IT security. This will be accomplished primarily via prevention with a goal of high effectiveness. | P.PHYSICAL T.PHYSICAL |
Objective | Description | Threat or Policy |
O.ACCESS-TOE | The TOE must provide public access and access by authenticated users to those TOE resources and actions for which they have been authorized. This will be accomplished with high effectiveness. | P.ACCESS |
O.ACCOUNT-TOE | The TOE must ensure, for actions under its control or knowledge, that all TOE users can subsequently be held accountable for their security relevant actions. This will be done with moderate effectiveness, in that it is anticipated that individual accountability might not be achieved for some actions. | P.ACCOUNT T.TRACEABLE-TOE T.RECORD-EVENT-TOE T.AUDIT-CORRUPTED-TOE T.AUDIT-CONFIDENTIALITYTOE |
O.AUTHORIZE-TOE | The TOE must provide the ability to specify and manage user and system process access rights to individual processing resources and data elements under its control, supporting the organization’s security policy for access control. This will be accomplished with high effectiveness. | P.ACCESS |
O.AVAILABLE-TOE | The TOE must protect itself from unsophisticated, denial-of-service attacks. This will include a combination of protection and detection with high effectiveness. | P.SURVIVE T.DENIAL-TOE |
O.BYPASS-TOE | The TOE must prevent errant or non-malicious, authorized software or users from bypassing or circumventing TOE security policy enforcement. This will be accomplished with high effectiveness. NOTE: This objective is limited to ‘non-malicious’ because CSPP-OS controls are not expected to be sufficient mitigation for the greater negative impact that ‘malicious’ implies. |
T.ACCESS-TOE |
O.DETECT-TOE | The TOE must enable the detection of TOE specific insecurities. The goal is high effectiveness for lower grade attacks. | P.SURVIVE T.TOE-CORRUPTED |
O.ENTRY-TOE | The TOE must prevent logical entry to the TOE using unsophisticated, technical methods, by persons without authority for such access. This will be accomplished with high effectiveness. | P.USAGE T.ENTRY-TOE |
O.KNOWN-TOE | The TOE must ensure that, for all actions under its control and except for a well-defined set of allowed actions, all users are identified and authenticated before being granted access. This will be accomplished with high effectiveness. | P.KNOWN |
O.OBSERVE-TOE | The TOE must ensure that its security status is not misrepresented to the administrator or user. This is a combination of prevent and detect and, considering the potentially large number of possible failure modes, is to be achieved with a moderate, verses high, degree of effectiveness. | T.OBSERVE-TOE |
O.RECOVER-TOE | The TOE must provide for recovery to a secure state following a system failure, discontinuity of service, or detection of an insecurity. This will be accomplished with a high effectiveness for specified failures and a low effectiveness for failures in general. | P.SURVIVE T.CRASH-TOE |
O.RESOURCES | The TOE must protect itself from user or system errors that result in shared resource exhaustion. This will be accomplished via protection with high effectiveness. | P.SURVIVE T.RESOURCES |
Objective | Description | Threat or Policy |
O.ACCESS-MALICIOUS | The TOE controls will help in achieving this objective, but will not be sufficient. Additional, environmental controls are required to sufficiently mitigate the threat of malicious actions by authenticated users. This will be accomplished by focusing on deterrence, detection, and response with a goal of moderate effectiveness. | T.ACCESS-MALICIOUS |
O.COMPLY | The TOE environment, in conjunction with controls implemented by the TOE, must support full compliance with applicable laws, regulations, and contractual agreements. This will be accomplished via some technical controls, yet with a focus on non-technical controls to achieve this objective with high effectiveness. | P.COMPLY |
O.DETECT-SYSTEM | The TOE, in conjunction with other IT in the system, must enable the detection of system insecurities. The goal is high effectiveness for lower grade attacks. | P.SURVIVE T.SYSTEM-CORRUPTED |
O.DUE-CARE | The TOE environment, in conjunction with the TOE itself, must be implemented and operated in a manner that clearly demonstrates due-care and diligence with respect to IT-related risks to the organization. This will be accomplished via a combination of technical and non-technical controls to achieve this objective with high effectiveness. | P.DUE-CARE |
O.MANAGE | Those responsible for the system (in conjunction with mechanisms provided by the TOE) must ensure that it is managed and administered in a manner that maintains IT security. This will be accomplished with moderate effectiveness. | T.ADMIN-ERROR |
O.NETWORK | The system must be able to meet its security objectives in a distributed environment. This will be accomplished with high effectiveness. | P.NETWORK |
O.OPERATE | Those responsible for the system (in conjunction with mechanisms provided by the TOE) must ensure that the system is delivered, installed, and operated in a manner which maintains IT security. This will be accomplished with moderate effectiveness. | T.INSTALL T.OPERATE P.TRAINING |
O.RECOVER-SYSTEM | The system must provide for recovery to a secure state following a system failure, discontinuity of service, or detection of an insecurity. This will be accomplished with some prevention and a majority of detect and respond, with high effectiveness for specified failures. For general failure, this will be accomplished with low effectiveness. | P.SURVIVE T.CRASH-SYSTEM |
O.ENTRY-SOPHISTICATED | The TOE and the environment must sufficiently mitigate the threat of an individual unauthenticated user gaining unauthorized access via sophisticated, technical attack. This is accomplished by focusing on prevention, detection and response with a goal of high effectiveness. | T.ENTRY-SOPHISTICATED |
O.DENIAL-SOPHISTICATED | The TOE and the environment must maintain system availability in the face of sophisticated denial-of-service attacks. The focus is on prevention, detection and response with a goal of high effectiveness. | P.SURVIVE T.DENIAL-SOPHISTICATED |
O.DETECT-SOPHISTICATED | The TOE and the environment must provide the ability to detect sophisticated attacks and the results of such attacks such as corrupted system state. The goal is for high effectiveness. | P.SURVIVE T.SYSTEM-CORRUPTED |
O.CONTAINMENT | The TOE and the environment must provide the ability to constrain the effect of a security failure of an application to that application. | P.CONTAINMENT P.PRIVILEGE-MIN P.SURVIVE T.SYSTEM-CORRUPTED |
- ITU03: ITU-T, Security in Telecommunications and Information Technology, 2003
- CSPP-OS03: Gary Stoneburner, COTS Security Protection Profile - Operating Systems (CSPP-OS), 20
To stay competitive and profitable in the telecommunication industry, standards- based, modular, commercial-off- the-shelf (COTS) hardware components are being used along with open software, including operating systems, middleware, and applications. A goal of the CGL working group is to promote the migration of the telecommunication industry from the proprietary hardware platforms to COTS hardware by insuring that the Linux environment provides adequate support for these COTS platforms. The CGL Hardware Requirements Definition – Version 4.0 identifies a set of widely-used industry hardware platforms and defines the support that is needed in the operating system for these platforms. The scope of these hardware requirements applies to the Linux kernel, kernel interfaces (APIs and libraries), system software, and tools.
This section specifies a set of generic requirements that are common across platform types. It includes support for blade servers, for hardware management interfaces, and for blade hot swap events. To address the need to manage highly available carrier grade systems through hardware out-of-band mechanisms, management capabilities such as those found in the Intelligent Platform Management Interface (IPMI) are also described.
Carrier-grade systems require high performance and high throughput interconnections within a system and between system nodes. Hardware-related requirements, such as PCI Express support, and PCI Express Device Hot Plug, are included. Other hardware related requirements such as a CPU throttle mechanism, iSCSI Initiator Support”, and “iSCSI Target Discovery” are also specified.
Considering the diversity of hardware platforms used in a carrier grade environment, the CGL Hardware Requirements Definition - Version 4.0 does not define requirements for just one type of industry platform. Instead it defines generic platform requirements and then provides an “Industry Platforms” section to provide implementation guidelines for specific architectures. Examples of such industry platforms include AdvancedTCA, BladeCenter, CompactPCI and rack mount types of servers.
HARDWARE SUB-CATEGORIES | |
Requirement Sub-Category | Sub-Category Description |
PLT | General Platform |
PIC | Platform Interconnect |
PMT | Platform Management |
PMS | Platform Miscellaneous |
ID | Name | Category | Priority |
---|---|---|---|
PMS.1.0 | CPU Throttle | Hardware | P2 |
CGL specifies that carrier grade Linux shall provide a CPU power consumption management capability that enables adjustment of the CPU frequency. Any power, voltage and frequency settings shall be within the allowed range for the hardware. |
ID | Name | Category | Priority |
---|---|---|---|
PMS.5.1 | iSCSI Initiator Support | Security | P1 |
CGL specifies that carrier grade Linux shall support the iSCSI protocol to enable block level access to SCSI storage devices using the TCP/IP transport. The support shall be compliant with the RFC 3270 specification and should provide iSCSI initiator support. At a minimum the supported iSCSI initiators should be able to authenticate themselves to potential iSCSI targets using the two-way CHAP authentication algorithm. See STD.17.0 iSCSI. |
ID | Name | Category | Priority |
---|---|---|---|
PMS.5.3 | iSCSI Target Discovery | Security | P1 |
CGL specifies that the iSCSI Initiators implemented by carrier grade Linux shall support the SendTargets Discovery mechanism to discover potential iSCSI targets they can connect. See STD.17.0 iSCSI. |
This section provides background information for some of the hardware referred to in this specification.
- Intelligent Platform Management Interface (IPMI) Specifications: http://developer.intel.com/design/servers/ipmi
- PCI Express at the PCI-SIG web site: http://www.pcisig.com/
- Intel® Developer Network for PCI Express Architecture: http://www.express-lane.org
- Advanced Switching (ASI-SIG web site): http://www.asi-sig.com/
- Rapid I/O: http://www.rapidio.org
- Advanced Configuration and Power Interface (ACPI): http://www.acpi.info/