AI + ML

Mass data awakening highlights importance of scaling AI infrastructure

Intelligence starts with building AI data infrastructure

Published

PARTNER CONTENT Data has become the backbone of rapid artificial intelligence (AI) advancement, which drives industrial transformation and intelligence.

In the 2025 report, the World Bank identified the essential elements for AI adoption, adaptation, and innovation: infrastructure; processing power; training data, algorithms, and applications; and digital skills. With these essentials in mind, storage plays a particularly important role, and a robust AI data infrastructure becomes a necessity.

Firstly, data silos will hinder connectivity while storage bottlenecks constrain compute capacity. As data volumes balloon, these silos and bottlenecks will make raw data ingestion and data access across different storage sites inefficient.

This will lead to the challenge of low graphics processing unit (GPU) utilization for large language model (LLM) training when slow storage I/O responses result in data supply speed not keeping up with computing speed.

Further, inference will become slow as tasks transition from being compute-intensive to memory-intensive. During long-context processing and heavy workloads, Key-Value (KV) cache rapidly consumes GPU memory resources. This hinders concurrent processing and increases response latency.

“Traditional storage architectures cannot support the ingestion, real-time extraction, cleansing, labeling and cost-efficient storage of massive amounts of data,” said Michael Qiu, Huawei's president of Global Data Storage Marketing and Solution Sales Department. "The AI era is an era of mass data awakening. Huawei aims to reshape AI data infrastructure to create greater value for customers and partners."

Data infrastructure for industrial intelligence

At Mobile World Congress (MWC) 2026, Huawei Enterprise Business discussed the topic of "Advancing Industrial All Intelligence", with the aim of working with global customers and partners to achieve growth in the AI era.

“Storage will be a must-have for enterprises seeking to go deeper into AI development even as the integration of data, knowledge and memory becomes essential for enterprise AI over the next three to five years,” said Qiu. “Huawei Data Storage is committed to solving the data challenges of AI by streamlining the entire pipeline from data to corpus, and from corpus to knowledge and memory."

“For CIOs, CTOs and ICT professionals, in the era of data awakening, AI integration in the era of data awakening is shifting from experimentation to comprehensive efficiency enhancement," Qiu added. In line with this trend, Huawei Storage is improving corpus provisioning and delivering efficient training or inference storage infrastructure. Huawei's AI Data Lake Solution enables quality data supply while the AI Data Platform (AIDP) facilitates efficient training and inference.

Huawei AI Data Lake Solution

Quality, multimodal, and large-scale data is essential for AI. Huawei AI Data Lake Solution is designed specifically to address key challenges in data storage, management, and utilization.

This solution uses Huawei's OceanStor Pacific scale-out storage to offer a high density of 4 petabytes per 2 rack units and low power consumption of 0.25 watts per terabyte. This cost efficiency improves AI applications like model training for autonomous driving. An autonomous driving company can continuously ingest data and generate over 1 petabyte of new data every day from millions of vehicles on the road. The ability to store mass data at low cost is a distinctive competitive advantage of Huawei Data Storage.

Using OceanStor Pacific, Huawei has helped a world-leading science and engineering university build a data lake infrastructure for research data management (RDM) spanning four universities. This provides research data services for numerous research institutes and universities in Germany. The solution enables collaborative research innovation by overcoming challenges in research and education like limited storage capacity, inefficient data mobility, and difficult data sharing.

Unlocking the value of mass corpus data

Omni-Dataverse is a key component of the Huawei AI Data Lake Solution. It empowers efficient mass data management, provides a unified data view across regions, and enables the retrieval of hundreds of billions of files in seconds. Omni-Dataverse meets the complex and diversified data management needs of the AI era, breaks down data silos, and enables corpora to be aggregated, retrieved, and transferred, unleashing their strategic value as AI assets.

The development and deployment of AI applications is slow and inefficient. The AI model lifecycle from training to application is a lengthy process spanning data ingestion, cleaning, format conversion, labeling, training, optimization, model deployment, and application generation. Huawei's one-stop AI toolchain ModelEngine streamlines the development and deployment of large-scale AI models to help users convert data into AI applications faster.

"The Huawei AI Data Lake Solution ensures efficient data ingestion, cost-effective storage, on-demand flow, and one-stop processing of mass data across regions. This accelerates the development from models to applications," Qiu affirmed.

Huawei AIDP

As AI technology continues to evolve, enterprises are entering a new phase centered around inference. There is an urgent need to overcome critical challenges plaguing real-world adoption and deployment across industries. These include low accuracy in multimodal knowledge retrieval, dramatic increases in latency with inference performance, and AI agents not accumulating experience over time due to a lack of personalized memory.

"AI poses storage challenges unlike those of traditional IT systems," said Qiu. "Today, storage systems are not just about processing structured and unstructured data in databases or data lakes. Storage systems now also need to support new data semantics like vectors, graphs, and KV data. Enterprises need to transform data into knowledge and memory that AI can directly use."

At MWC 2026, Huawei released its AIDP. This platform integrates a knowledge base, KV cache acceleration, and a memory bank, and uses the Unified Cache Manager (UCM) to manage and schedule inference memory data. It breaks down data barriers to transform enterprise AI agents from demo stage prototypes to mission-critical production tools.

The AIDP can be deployed in integrated or standalone mode. Integrated deployment uses OceanStor A800 as the foundation of the full-stack system. Standalone deployment adopts a data engine node + OceanStor Dorado all-flash storage architecture. By adding data engine nodes to existing storage systems, it protects legacy investments while ensuring smooth service transitions.

AI agent implementation: Beyond computing power and models

More and more enterprises are seeing that the real challenge is whether AI can be stably and efficiently integrated into their service processes.

For example, a top hospital in China plans to launch an AI agent system to support doctors' diagnosis and medication decision-making, and provide literature reviews for academic research. During inference, the hospital's system accesses large amounts of data, including medical papers, pathology guidelines, and clinical records. However, the accuracy of knowledge generated from multimodal data is low, which affects the final inference quality. In addition, the processing of many documents often exceeds the sequence length or context window supported by the inference system, leading to poor user experience from input truncation and high end-to-end latency.

After Huawei AIDP was deployed, the hospital can accurately parse multimodal data like text, images, and tables. The hospital's adoption of a UCM-based KV cache sparsification technology also expanded the effective inference context window by 2.5 times. The KV cache generated during inference is persistently stored, alleviating the need for the hospital to reprocess the same papers or guidelines. By eliminating repeated computing via querying, the time to first token (TTFT) has been slashed by 90%, improving inference efficiency and accelerating the implementation of the hospital's AI agent applications.

"AI is reshaping data infrastructure," Qiu said. "Planning AI training and inference platforms requires more than just focusing on computing power and models. Deep collaboration between storage and compute is essential for improving system-level efficiency and user experience.

"Intelligence starts with data. Huawei Data Storage will innovate non-stop to ensure that every step forward is grounded in data."

Contributed by Huawei.