GSMA launches benchmarks to assess LLMs' usefulness in telco

The new community provides an open-source framework to assess large language models for capability, energy efficiency and safety

The GSMA Foundry has launched GSMA Open-Telco LLM Benchmarks, an open-source community aimed at improving the performance of large language models (LLMs) for telecom-specific applications. The community provides “an industry-first framework for evaluating AI models in real-world telecom use cases,” according to the GSMA. It is supported by Hugging Face, Khalifa University, The Linux Foundation and mobile network operators and vendors.

The GSMA says LLMs are found to have shortcomings regarding technical telecom knowledge, regulatory compliance and network troubleshooting. In recent tests, GPT4 scored less than 75% on TeleQnA (see more here and here), a dataset tailored to assess the knowledge of LLMs in the field of telecoms, and less than 40% on 3GPPTdocs Classification, a dataset based on 3GPP standards documentation.

Microsoft’s Phi2, a much smaller model, scored only 10% on MATH500 (see here and here, a benchmark of 500 general maths questions.

Current limitations

These results highlight the current limitations of AI models in addressing telecom-specific queries. The GSMA says its innovation hub’s Open-Telco LLM Benchmarks will address these gaps “by providing transparent, open evaluations of AI models across capabilities, energy efficiency and safety”.

“Today’s AI models struggle with telecom-specific queries, often producing inaccurate, misleading or impractical recommendations,” said Louis Powell, Head of AI Initiatives, GSMA. “By creating an industry-wide set of benchmarks, we’re not only improving model performance but also ensuring AI in telecoms is safe, reliable and aligned with real-world operational needs.”

The mobile network operators supporting the launch of GSMA Open-Telco LLM Benchmarks include Deutsche Telekom, LG Uplus, SK Telecom, Turkcell and Huawei.

Submitting cases

Mobile network operators, AI researchers and developers can submit use cases, datasets and models for evaluation. A standardised benchmarking framework is intended to ensure that all AI models are evaluated against real-world challenges in areas such as telecoms domain knowledge, mathematical reasoning, energy consumption and safety. The resulting benchmarks will be hosted on Hugging Face to ensure transparency and encourage community engagement.

Mobile network operators, vendors, startups and researchers are now encouraged to contribute, by submitting interest and LLM telcos use cases, to aiusecase@gsma.com. More information is available here.

The launch follows last year’s industry-wide commitment to exploring telco AI use cases ethically and sustainably, central to which was the GSMA’s Responsible AI Maturity Roadmap, to help mobile operators apply best practices from inception through evolution.

AI at MWC25 Barcelona

The Gen AI Summit: Experimentation to Transformation at MWC25 Barcelona will have sessions that explore practical applications and the transformative potential of GenAI within telecoms. They will include discussions about optimising AI-driven networks, personalising customers’ experiences and integrating GenAI in 5G and beyond.

GSMA launches benchmarks to assess LLMs’ usefulness in telco

The new community provides an open-source framework to assess large language models for capability, energy efficiency and safety

DOWNLOAD OUR NEW REPORT

5G Advanced

GSMA launches benchmarks to assess LLMs’ usefulness in telco

The new community provides an open-source framework to assess large language models for capability, energy efficiency and safety

RELATED ARTICLES

DOWNLOAD OUR NEW REPORT

5G Advanced