CellularLint

Abstract

In recent years, there has been a growing focus on scrutinizing the security of cellular networks, often attributing security vulnerabilities to issues in the underlying protocol design descriptions. These protocol design specifications, typically extensive documents that are thousands of pages long, can harbor inaccuracies, underspecifications, implicit assumptions, and internal inconsistencies. In light of the evolving landscape, we introduce CellularLint - a semi-automatic framework for inconsistency detection within the standards of 4G and 5G, capitalizing on a suite of natural language processing techniques. Our proposed method uses a revamped few-shot learning mechanism on domain-adapted large language models. Pre-trained on a vast corpus of cellular network protocols, this method enables CellularLint to simultaneously detect inconsistencies at various levels of semantics and practical use cases. In doing so, CellularLint significantly advances the automated analysis of protocol specifications in a scalable fashion. In our investigation, we focused on the Non-Access Stratum (NAS) and the security specifications of 4G and 5G networks, ultimately uncovering 157 inconsistencies with 82.67% accuracy. After verification of these inconsistencies on 3 open-source implementations and 17 commercial devices, we confirm that they indeed have a substantial impact on design decisions, potentially leading to concerns related to privacy, integrity, availability, and interoperability.

1. NAS Message Ciphering

Finding: After the completion of security mode procedures, the specifications suggest that, except for a few messages, all messages should be integrity-protected and ciphered. However, on two occasions (shown in Figure 6), it is emphasized that all messages after the security procedure should be integrity-protected and ciphered. Failure to enforce integrity and ciphering is a serious design flaw and can leave implementations subject to vulnerabilities and interoperability issues.

T₁: From this time onward the UE shall cipher and integrity protect all NAS signalling messages with the selected NAS ciphering and NAS integrity algorithms.

T₂: From this time onward, all NAS messages exchanged between the UE and the MME are sent integrity protected and except for the messages specified in clause 4.4.5, all NAS messages exchanged between the UE and the MME are sent ciphered

Inconsistency to exploit: In the case of T₁, the specification states after the security context has been established, the UE shall cipher and integrity protect all NAS signaling messages with selected NAS ciphering and integrity algorithms. However, T₁ does not mention the scenarios where the UE accepts plain-text messages. On the contrary, T₂ is more lucid and explicitly mentions such scenarios. Due to this inconsistency of not explicitly mentioning exception cases, an implementation might accept unexpected plain-text messages even after the security context is established. Therefore, we design an attack based on this scenario. Attack. For the attack, the adversary connects with a victim UE through a fake base station to send plain-text identity_request or authentication_request [53]. Similarly, the attacker can overshadow downlink packets to create plain-text messages. The affected UE accepts, processes and responds to the messages.

Investigation: In open-source implementations, we found all implementations accepting plain-text authentication and identity requests even after the security context has been established. On the other hand, on commercial UEs, we found 4 UEs accepting and responding to such plain-text messages (Table 6). Interestingly, for a very recent UE (Google Pixel 7a, 2023), plain-text authentication_request is accepted, whereas plain-text identity_request is not accepted. This is another interesting scenario where vendors try to implement different behaviors for different exception case messages. Impact. The impact of accepting plain-text or integrity-failed NAS messages can be catastrophic and can be exploited by attackers to fingerprint users, traceability, and denial of service attacks. Note that DoLTEst [53] also reports several UEs accepting unprotected messages after the security context has been established, albeit from the implementation testing perspective. We, on the other hand, show that this issue in different implementations can be traced to the inconsistent behavior defined in the standards.

2. Condition Over Integrity Check

Finding: On many occasions, the UE sets the counter for "SIM/USIM considered for GPRS/non-GPRS/5GS services" to implement a specific maximum value (Figure 7). While this flexibility is acceptable and standard practice, the precondition to check the integrity of the received message is often neglected.

T₁: The UE shall consider the USIM as invalid for EPS services and non-EPS services until switching off or the UICC containing the USIM is removed or the timer T3245 expires as described in clause 5.3.7a. Additionally, the UE shall delete the list of equivalent PLMNs and enter state EMM- DEREGISTERED.NO-IMSI. If the message has been successfully integrity checked by the NAS and the UE maintains a counter for "SIM/USIM considered invalid for GPRS services", then the UE shall set this counter to UE implementation-specific maximum value.

T₂: The UE shall consider the USIM as invalid for EPS services until switching off or the UICC containing the USIM is removed or the timer T3245 expires as described in clause 5.3.7a. The UE shall delete the list of equivalent PLMNs and shall enter the state EMM-DEREGISTERED.NO-IMSI. If the UE maintains a counter for "SIM/USIM considered invalid for GPRS services", then the UE shall set this counter to UE implementation-specific maximum value.

Inconsistency to exploit: In the case of T₁, the protocol specifies successful integrity checking by the NAS. However, in T₂, this integrity checking is skipped. These PoS are related to attach_reject and detach_request. However, we have also found instances of network-initiated detach_request, where there are inconsistencies. As the inconsistency is related to the integrity checking of messages, the attack steps and impact are the same as finding 1.

Investigation: In the case of open-source implementation, we see that all the open-source implementations in both 4G and 5G are vulnerable. However, in the case of commercial UEs, there were no instances of vulnerable behavior.

3. Integrity check failure

Finding: We found two conflicting PoS where different statements are found for failed integrity checks of the control plane messages (Figure 8). Following this PoS to 4G security specification, ultimately, we found three (one segment is common) instances of different statements. Two of them clearly suggest that messages that have faulty MAC should be discarded, whereas the third one directs to a slightly flexible strategy suggesting the processing of certain messages even if they fail integrity checks.

T₁: In case of failed integrity check (i.e. faulty or missing MAC-I) is detected after the start of integrity protection, the concerned message shall be discarded. This can happen on the UE side or on the eNB side.

T₂: In case of failed integrity check (i.e. faulty or missing NAS-MAC) is detected after the start of NAS integrity protection the concerned message shall be discarded except for some NAS messages specified in TS 24.301 [9]. For those exceptions the MME shall take the actions · · · NAS message with faulty or missing NAS-MAC.

Inconsistency to exploit: In T₁, whenever there are some exceptional NAS messages that can trigger further MME actions, even with failed integrity, the specification mentions that. However, in T₂ for RRC, it does not specify such exceptional cases, though such exceptional cases exist, for instance, when RRC_connection_resume fails an integrity check, rather than just discarding, there are further steps. Through further manual analysis, we find these cases are not mentioned in the TS 33.501 (Security Architecture and Procedures for 5G Systems) specification, but in TS 38.331 (RRC Specification), which were not included in the scope of CellularLint. In a practical scenario, an implementor would use the RRC specification first, which clearly states exceptional messages with failed integrity. But later on, when the security specifications would be taken into account, the implementor would find it confusing with a more strict description. There is a possibility that such inconsistency may not directly result in vulnerabilities. Still, it may cause differing implementations as both specifications cannot be logically taken into consideration at the same time. Hence, some implementations might not properly follow the integrity failing scenarios.

Investigation: In open-source implementations, we found 1 implementation accepting integrity-failed messages. On the contrary, in commercial UEs, none of the UEs accept control-plane messages with failed integrity. However, in our investigation, we found another interesting behavior: 16 UEs dropped the connection after receiving an RRC packet with failed integrity, whereas 1 UE did not (Table 6). These inconsistent UE behaviors can be traced back to T₂, which does not properly specify the exception cases and what to do in such scenarios.

4. NCC Reusage

Finding: We found a conflicting PoS in the 5G Security specification where two different conditions are expressed for the validity of the Next hop Chaining Counter (NCC) (shown in Figure 9). NCC is used for cryptographic derivation of the AS security algorithms and, hence, is a very important identifier. One segment dictates that the NCC value has to be fresh and previously unused to be accepted. On the contrary, the other segment claims that the only condition for acceptance is that the NCC value has to be different.

T₁: If the sent NCC value is fresh and belongs to an unused pair of NCC, NH, the gNB shall save the pair of {NCC, NH} in the current UE AS security context and shall delete the current AS key K_gNB.

T₂: The UE shall take the received NCC value and save it as stored NCC... . If the stored NCC value is different from the NCC value associated with the current K_gNB, the UE shall delete the current AS key K_gNB.

Implication: NCC and NH are critical parameters that are used to establish AS security and derive K_gnb. These parameters are used during the RRC_Reestablish procedure to re-establish the RRC connection. This procedure is particularly important during handover. The expectation is that these session key creation parameters (NCC and NH) would be fresh and unused to create diverse keys (precisely described in T₁). This is an important assumption ensuring forward and future secrecy guarantees of the keys. Forward and future secrecy ensures that the protocol defends the past and future sessions even if the current session is compromised [16]. These parameters are essentially used as nonces to ensure the diversification of the keys. However, if the fresh and unused NCC/NH value usage is not mandated, then the forward and future secrecy guarantees can be broken. Furthermore, as the RRC_Reestablish message (containing these parameters) is unencrypted, the attacker can easily detect the sessions where the same NCC/NH values are used for key derivation.

Investigation: We found that none of the open-source implementations properly check these parameters and just accept if they differ from the previously accepted ones.

5. GUTI Deletion

Finding: On many occasions, when a reject message is received from the network, the UE updates its EPS/5GS status, clears the context, and subsequently moves to a deregistered state. We observe that in many cases, a rejection cause TC in a reject message RM would suggest the UE to clear the context and move to deregistered state while the same cause TC would keep the UE in registered state without clearing the security context (shown in Figure 10). For example, 5GM cause #13 received through registration_reject suggests the UE to delete GUTI, TAI, ngKSI while the same cause received through tau_reject or service_reject message would keep the UE in a registered state without deleting the context. A cross-examination may suggest that since the messages are different, the same cause may trigger different behavior. However, other cause values such as #12 (tracking area not allowed) trigger similar behavior and state transition for different reject messages. Thus, it concretely verifies the conflicting suggestions about security context.

T₁: #13 (Roaming not allowed in this tracking area) The UE shall set the 5GS update status to 5U3 ROAMING NOT ALLOWED (and shall store it according to subclause 5.1.3.2.2) and shall delete 5G-GUTI, last visited registered TAI, TAI list and ngKSI.

T₂: #13 (Roaming not allowed in this tracking area) The UE shall set the 5GS update status to 5U3 ROAMING NOT ALLOWED (and shall store it according to subclause 5.1.3.2.2) and shall delete the list of equivalent PLMNs (if available).

Inconsistency to Exploit: GUTI (Globally Unique Temporary ID) is a kind of temporary ID used to identify UEs. Each UE has a couple of different kinds of unique IDs, like IMSI, IMEI, etc. These sorts of temporary identifiers, like GUTI, are used to prevent attackers from tracking users. However, if these identifiers are not changed or reset, this can cause several privacy issues. An adversary can utilize old GUTI to track a UE through a linkability attack, violating privacy [32].

Investigation: In commercial UEs, we found a total of 16 devices not properly deleting GUTI for these reject messages (Table 6). One recent UE (Apple iPhone 12 Pro), however, deletes the security context with GUTI after receiving service_reject.

6. TAU and Detach Collision

Finding: We found a PoS of conflicting directions when TAU and detach procedure collide (shown in Figure 11). The first one states that the tracking_area_update procedure shall be aborted and the detach procedure shall be progressed. On the other hand, the second segment states that the detach procedure shall be aborted and re-initiated while the tracking_area_update procedure is fully performed.

T₁: Tracking area updating and detach procedure collision EPS detach containing detach type "re-attach required" or "re-attach not required": If the UE receives a DETACH REQUEST message before the tracking area updating procedure has been completed, the tracking area updating procedure shall be aborted and the detach procedure shall be progressed.

T₂: If a cell change into a new tracking area that is not in the stored TAI list occurs before the UE initiated detach procedure is completed, the UE proceeds as follows: 1) If the detach procedure was initiated for reasons other than removal of the USIM or the UE is to be switched off, the detach procedure shall be aborted and re-initiated after successfully performing a tracking area updating procedure.

Inconsistency to Exploit: An attacker can achieve downgrade/denial-of-service by injecting detach_request through fake-base-stations or signal-injection attacks [34]. In this case, an implementation that aims to implement both scenarios can exacerbate the situation by creating a deadlock state. As the statements are mutually exclusive either way, an implementation violates the specification. In our investigation of commercial implementation, we found all the UEs progress with the tracking_area_update procedure when such collision occurs (Table 6).

7. Sub-state Transition Confusion

Finding: When attach_reject message is received with the emm cause #14, the 4G specification has differing state transition descriptions (shown in Figure 1). This certainly can cause confusion in implementation design.

T₁: Whenever an ATTACH REJECT message with the EMM cause #14 "EPS services not allowed in this PLMN" is received by the UE · · · Additionally the attach attempt counter shall be reset when the UE is in substate EMMDEREGISTERED.ATTEMPTING-TO-ATTACH.

T₂: #14 (EPS services not allowed in this PLMN); The UE shall set the EPS update status to EU3 ROAMING NOT ALLOWED · · · the UE shall reset the attach attempt counter and enter the state EMM-DEREGISTERED.PLMN-SEARCH.

Investigation: While looking further into open-source implementations, we found a very interesting scenario regarding this. In srsRAN, we see that the developers tried to implement both of them. We show the code snippet of srsRAN in Figure 12. Here, we can see that both of the conditional statements contain the same cause (lines 1 and 6) but have different sub-state transitions (lines 3 and 7). Of course, UE will go to the EMM_DEREGISTERED.PLMN_SEARCH sub-state due to the obscurity of the first "if". Such conflicts can potentially cause interoperability issues, leaving a possibility for further synchronization problems.

8. Registration Counter Reset

Finding: When a UE receives a registration_reject message with several cause such as #11 (PLMN not allowed) it resets the registration attempt counter. For the same cause value received through service_reject message, the UE should reset the associated service request attempt counter. However, this directive is missing in the specification, although in the same 5G NAS specification, the general guideline for service_reject with a list of causes clearly suggests resetting the counter. Failure to clarify this in corresponding cause descriptions has an impact on interoperability as some implementations take the general guideline into account while others focus on specific scenarios.

T₁: #73 (Serving network not authorized). ... The UE shall delete the list of equivalent PLMNs, shall reset the registration attempt counter. For 3GPP access the UE shall enter the state 5GMM-DEREGISTERED.PLMN-SEARCH, and for non-3GPP access the UE shall enter state 5GMM-DEREGISTERED.LIMITED-SERVICE. The UE shall store the PLMN identity in the forbidden PLMN list as specified in subclause 5.3.13A and if the UE is configured to use timer T3245 then the UE shall start timer T3245 and proceed as described in clause 5.3.19a.1.

T₂: #73 (Serving network not authorized). ... The UE shall delete the list of equivalent PLMNs and store the PLMN identity in the forbidden PLMN list as specified in subclause 5.3.13A and if the UE is configured to use timer T3245 then the UE shall start timer T3245 and proceed as described in clause 5.3.19a.1.

9. Timer Expiry as a Precondition

Finding: When a registration_reject is received in case of 5G Standalone Non-Public Network (SNPN), the "list of subscriber data" is considered invalid until switch off or entry update or timer T3245 expiration. However, the authentication_reject under same condition does not consider the time T3245 expiry for the list validity. This is another conundrum causing interoperability issues.

T₁: In case of SNPN, if the UE is neither registered for onboarding services in SNPN nor performing initial registration for onboarding services in SNPN and the UE supports access to an SNPN using credentials from a credentials holder, the UE shall consider the selected entry of the "list of subscriber data" as invalid for 3GPP access until the UE is switched off or the entry is updated.

T₂: In case of SNPN, if the UE is not performing initial registration for onboarding services in SNPN and the UE supports access to an SNPN using credentials from a credentials holder, the UE shall consider the selected entry of the "list of subscriber data" as invalid for 3GPP access until the UE is switched off, the entry is updated or the timer T3245 expires as described in clause5.3.19a.2.

10. Unutilized Service Attempt Counter

Finding: We observe that the service request attempt counter is less utilized compared to the registration attempt or tracking area update attempt counter for similar events. When a registration attempt is rejected with proper cause value, the counter is reset, but for many service_reject the service attempt counter is completely ignored in the description. This may potentially cause interoperability among different vendor implementations.

11. PDCP Counter Set

Finding: We observed that for the protection of RRC messages while transitioning away from RRC_INACTIVE to RRC_CONNECTED, UE derives a set of keys such as K_RRCint, K_RRCenc, and so on. However, despite having similar actions of key deriving for connection resume in CM-IDLE, the UE additionally resets the PDCP count to 0 and activates new AS keys in PDCP layer. This final action is totally skipped in the first case (Fig 13).

T₁: For protection of all RRC messages except RRCReject message following the sent RRCResumeRequest message, the UE shall derive a K_NG−RAN^* using the target PCI, target ARFCN-DL/EARFCN-DL and the KgNB/NH based on either a horizontal key derivation or a vertical key derivation as defined in clause 6.9.2.1.1 and Annex A.11/Annex A.12. The UE shall further derive K_RRCint, K_RRCenc, K_UPenc (optionally), and K_UPint (optionally) from the newly derived K_NG−RAN^*.

T₂: For protection of all RRC messages except RRC Reject message following the sent RRC Resume Request message, the UE shall derive a K_NG−RAN^* using the target PCI, target EARFCN-DL and the KgNB/NH based on either a horizontal key derivation or a vertical key derivation as defined in clause 6.9.2.1.1 and Annex A.12. The UE shall further derive K_RRCint, K_RRCenc, K_UPenc (optionally), and K_UPint (optionally) from the newly derived K_NG−RAN^*. Then the UE resets all PDCP COUNTs to 0 and activates the new AS keys in PDCP layer.

Impact: This subtle miss can have a significant de-synchronization impact if the COTS implementation considers one of them ignoring the other.

BibTeX

@inproceedings {298168,
author = {Mirza Masfiqur Rahman and Imtiaz Karim and Elisa Bertino},
title = {{CellularLint}: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications},
booktitle = {33rd USENIX Security Symposium (USENIX Security 24)},
year = {2024},
isbn = {978-1-939133-44-1},
address = {Philadelphia, PA},
pages = {5215--5232},
url = {https://www.usenix.org/conference/usenixsecurity24/presentation/rahman},
publisher = {USENIX Association},
month = aug
}

CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications