Antonio Policek looks at the issues surrounding the troubleshooting and optimisation of UMTS Networks, and explains how operators need to break down their KPIs service by service, network element by network element.
With UMTS networks offering a variety of customer services — from voice to video to online gaming — troubleshooting or optimizing networks is growing increasingly more complex for mobile network operators, who have looked to the traditional network key performance indicators (KPIs) as a tool for understanding and troubleshooting network performance.
Unfortunately, the term ‘KPI’ is loosely defined. Some operators have turned to the KPIs found on their network equipment only to discover that, as the network load increases, these KPIs fail to identify problems correctly. The result is that operators still face the challenge of identifying the best metrics for describing their customers’ perceptions of service quality and are looking for tools that go beyond merely recording network problems — and instead provide insights into why those problems occurred.
KPI protocol analysers
Traditional KPIs used in 2G and 2.5G networks are giving way to new, dynamic KPIs that take UMTS radio interface parameters into account and use protocol-related event counters. KPI hardware and software measurement tools must accomplish several things for performance related data and dynamic KPIs to be useful as UMTS network optimization and troubleshooting tools, as well as by equipment manufacturers as protocol testers:
* Capture performance related data accurately
* Contain a set of pre-defined, non-vendor specific KPIs useable in any UMTS network environment
* Provide a software-based graphic user environment designed to fit to the changing requirements for data acquisition and measurement, including the ability to update KPI definitions and calculations
* Offer drill-down capabilities for network error analysis and troubleshooting
For UMTS network optimization and troubleshooting, a protocol analyser works on typical network environments that involve multiple vendors. They provide operators with insights into the problems in 3G networks and customer’s needs. Rather than merely collecting and reporting data, they become tools for exploring the root causes of network problems.
KPIs and how to use them
A KPI is a mathematical formula describing network quality and behavior for optimizing a network. A clear and easily understandable comparison of the ideal KPI values points, with actual observed network performance, help an operator determine whether action taken to improve network and service quality have succeeded.
KPIs are made-up of performance related data: for example, performance related data event counters can be used to count protocol messages that indicate successful (or unsuccessful) procedures. Standards groups assist with this process. For example, documented in 3GPP 32.403 are a long list of easy-to-define Success and Failure Ratios based on performance measurement definitions. These values are useful, because they give an overview of network quality and behavior and are helpful for identifying problems in defined network areas. For example, when GRPS Attach Failure Ratio is extremely high in a defined SGSN area this indicates a potential problem.
Finding root causes
Whenever a KPI value indicates a problem, it is then necessary to find the root causes using a drill-down analysis procedure. The first step to find the root cause is to analyze the reject cause values of Attach Reject messages. The sum of each possible Attach Reject cause value is displayed as a column in a table on a protocol analyser. In addition, graphical overviews — bar diagrams or pie diagrams — make it clear which are the most common reject cause values.
Sometimes it is not enough to determine that the KPI results indicate an erroneous procedure or not. Another problem emerges when analyzing the results: if the reject cause value is, for example, ‘network failure,’ typically this is not sufficient to determine the root cause of the problem and more analysis is required.
From 3GPP 24.008 (Mobility Management, Call Control, Session Management) an operator knows that the cause value ‘network failure’ is used “if the MSC or SGSN cannot service an MS generated request because of PLMN failures, e.g. problems in MAP.”
Most likely, the root problem of this Attach Reject seems to lie inside the core network. It can be analyzed in more detail using the next step parameters of Attach Request messages (ATRQ) that are essential for routing core network messages, especially IMSI, and determining the relation to the appropriate Attach Reject (ATRJ) cause values. This is an example of a procedure that requires not only data acquisition, but context-related filtering, extracting and storing of call parameters belonging to single user and/or call procedures.
After IMSI analysis, it might be discovered for example that that all ‘network failure’ rejected attach attempts were sent by roaming subscribers. Further investigation might show that there were no roaming contracts signed between the monitored PLMN operator and the Home PLMN operator of the roaming users. Therefore, SGSN was not able to connect to the roaming users’ HLR, as indicated by cause value ‘network failure.’ From the point of view of the roaming subscriber, this result is unacceptable network behaviour, but from the point of view of a network operator, the network executed correctly.
In this case, the root cause was easily found and based on flexible presentation of the analyzed data. (Step one: Show number of ATRJ messages sorted by included cause value. Step two: Show IMSI related to Attach Reject with cause equals ‘network failure’). Such techniques are often reported by existing performance measurement and troubleshooting software tools.
What is missing, however, is a drill-down analysis that goes further and includes steps and answers to deeper questions. Unlike the example above, not all ‘network failures’ indicate correct network behavior. And how can such ‘correct network failures’ be distinguished from ‘unacceptable network failures’ that may show the same reject cause value, but have been caused by incorrect network behavior — for example interconnection problems between home and visited PLMN? It is important to know if measurement equipment offers drill-
down analysis capability, and equally important to know the type of drill-down analysis provided.
For example, it’s important that a KPI protocol analyser offers highly flexible presentation methods for extracted data parameters, such as IMSI and cause values. This flexibility allows users to determine which parameters are shown in a single statistics table and how parameters are arranged in columns and lines. Advanced drill-down analysis capabilities are achieved linking KPIs to the detailed list of calls that contributed to generate the KPIs themselves. A multi-interface call trace application that supports Iub/Iur/Iu/Gr interfaces is used to identify the network area responsible for the problem.
Custom configurable KPIs
For accurate benchmarking activities, a predefined set of KPI is, invariably, not sufficient. Operators need to have the ability to define their own KPIs according to their own rules. Using a protocol tester, operators can create KPIs containing the specific selected parameters that they need to carry out ad hoc network analysis. They can capture specific messages or specific abnormal network conditions and then, using simple arithmetic operations, simple Boolean operations, or simple correlation functions (for example, counting all cell reselection procedures triggered by a paging message) they can identify problem root causes. With drill down capabilities they can then analyze the problem on a per call and per service basis.
Statistics & display options
As operators optimize their networks and identify the problem domains, they want to review and analyze the same set of KPI measurements from several perspectives and in different ways. They may start their analysis from diverse starting points that vary depending on the type of analysis and service they focus on. Once they have seen the first measurement KPI results table, they will often want to gain deeper insights into data details by browsing through the data.
To help accomplish this, some protocol
analysers offer data entry points called ‘dimensions’ or views, along with a main analysis path that guides users as they browse through the data to following a line of inquiry. These measurement systems allow operators to visualize KPI measurement results in these ways:
* per Service
* per Network Element
* per Subscriber
Once the operator measures a KPI, these ‘dimensions’ can provide a quick means of looking through the acquired data to narrow down the problem and ultimately the root case.
A common KPI, like call drop rate, is meaningless if it is not possible to breakdown the measurement by service, by network element and by subscriber. A very high call drop rate, for example, can occur just for video calls and not for voice calls. The call drop rate might be higher in a portion of the network connected to one specific radio network controller (RNC) rather than in another area of the network connected to another RNC.
Finally the call drop rate may affect particular subscribers like roamers and business users that might be more sensitive to quality of service. Because of this, once a KPI has been measured it is necessary to be able to have access to the list of calls that generated that specific KPI in order to understand the impact of the quality degradation onto the final subscriber.
UTRAN specific KPIs
The ‘classic’ success and failure ratio KPIs, based on protocol events counted on land line interfaces, as well as some throughput rates measured on the same interfaces, are not sufficient to provide an overview of UMTS network and service quality. In fact, nearly 70 percent of the UMTS problems are found in UTRAN arena. The bigger problems involve the radio interface (Uu) or on interfaces close to radio interface (Iub, Iur).
When using a protocol tester, new, dynamic KPI definitions can be based on protocol-related event counters that also take cellular radio interface parameters into account: the signalling events that are regulated by network protocols can be correlated with the conditions of the radio interface and, therefore, they indicate what network problems most affect the ultimate quality of service perceived by subscribers. For example a high sustained throughtput over time for a specific subscriber session performing file download can be jeopardized and reduced by frequent retransmissions on the radio interface that typically indicates a radio link quality issue.
Service specific performance measurement
It is also necessary to look at what kind of service is behind each call, because optimization and troubleshooting processes may differ from service to service.
A protocol analyser needs to examine different services separately by breaking them out into the appropriate service, such as: Adaptive Multi-Rate (AMR) voice calls; PS calls (PDP contexts); Multimedia calls (H.324M); and Short Message Service (SMS). But even further classification is possible, and all measured and displayed data can be arranged so that it is displayed by cell, RNC, MSC, SGNS, APN, etc.
Protocol tester users can look at a long list of 3G-H.324M KPIs, to garner insights about what’s happening on their network:
* Service Access or Service Release Times
* Service Set-Up Success Rate or Dropped Session Rate
* Peak Signal to Noise Ratio
* Frame Rate, Payload Bitrate: the bitrate carrying video and audio signals reflecting the effective bandwidth utilization.
* Dropped Pictures and video delay
* Audio frame error rate, frame loss rate or Audio-Video Sync
* Jitter
* H.245 retransmissions
Conclusion
KPIs with breakdown per service (voice, photo, video or packet), per network element (cell, NodeB, RNC, SGSN, MSC and others) on a call by call by subscriber basis (such as IMSI or MSISDN) are needed to navigate through data acquisition results and to narrow the scope of problems to isolate their root causes. KPI protocol analysers help in this analysis by providing an independent third-party analysis tool for UMTS network performance.
Measurement-based KPIs not only help report the presence of UMTS network problems, but also help operators derive the root causes of problems. However, for them to do so, once they know the problem, it is important that they drill down to the root cause in the network.