By Sandeep Bharathi, president, Data Center Group, Marvell
This blog was originally posted at Fortune.
Semiconductors have transformed virtually every aspect of our lives. Now, the semiconductor industry is on the verge of a profound transformation itself.
Customized silicon—chips uniquely tailored to meet the performance and power requirements of an individual customer for a particular use case—will increasingly become pervasive as data center operators and AI developers seek to harness the power of AI. Expanded educational opportunities, better decision making, ways to improve the sustainability of the planet all become possible if we get the computational infrastructure right.
The turn to custom, in fact, is already underway. The number of GPUs—the merchant chips employed for AI training and inference—produced today is nearly double the number of custom XPUs built for the same tasks. By 2028, custom accelerators will likely pass GPUs in units shipped, with the gap expected to grow.1

By Khurram Malik, Senior Director of Marketing, Custom Cloud Solutions, Marvell
Near-memory compute technologies have always been compelling. They can offload tasks from CPUs to boost utilization and revenue opportunities for cloud providers. They can reduce data movement, one of the primary contributors to power consumption,1 while also increasing memory bandwidth for better performance.
They have also only been deployed sporadically; thermal problems, a lack of standards, cost and other issues have prevented many of these ideas giving developers that goldilocks combination of wanted features that will jumpstart commercial adoption.2
This picture is now changing with CXL compute accelerators, which leverage open standards, familiar technologies and a broad ecosystem. And, in a demonstration at OCP 2025, Samsung Electronics, software-defined composable solution provider Liqid, and Marvell showed how CXL accelerators can deliver outsized gains in performance.
The Liqid EX5410C is a demonstration of a CXL memory pooling and sharing appliance capable of scaling up to 20TB of additional memory. Five of the 4RU appliances can then be integrated into a pod for a whopping 100TB of memory and 5.1Tbps of additional memory bandwidth. The CXL fabric is managed by Liqid’s Matrix software that enables real-time and precise memory deployment based on workload requirements:

By Michael Kanellos, Head of Influencer Relations, Marvell
Chiplets—devices made up of smaller, specialized cores linked together to function like a unified device—have dramatically transformed semiconductors over the past decade. Here’s a quick overview of their history and where the design concept goes next.
1. Initially, they went by the name RAMP
In 2006, Dave Patterson, the storied professor of computer science at UC Berkeley, and his lab published a paper describing how semiconductors will shift from monolithic silicon to devices where different dies are connected and combined into a package that, to the rest of the system, acts like a single device.1
While the paper also coined the term chiplet, the Berkeley team preferred RAMP (Research Accelerator for Multiple Processors).
2. In Silicon Valley fashion, the early R&D took place in a garage
Marvell co-founder and former CEO Sehat Sutardja started experimenting with combining different chips into a unified package in the 2010s in his garage, according to journalist Junko Yoshida.2 In 2015, he unveiled the MoChi (Modular Chip) concept, often credited as the first commercial platform for chiplets, in a keynote at ISSCC in February 2015.3
The first products came out a few months later in October.
“The introduction of Marvell’s AP806 MoChi module is the first step in creating a new process that can change the way that the industry designs chips,” wrote Linley Gwennap in Microprocessor Report.4

An early MoChi concept combining CPUs, a GPU and a FLC (final level cache) controller for distributing data across flash and DRAM for optimizing power. Credit: Microprocessor Forum.
By Kirt Zimmer, Head of Social Media Marketing, Marvell
The OFC 2025 event in San Francisco was so vast that it would be easy to miss a few stellar demos from your favorite optical networking companies. That’s why we took the time to create videos featuring the latest Marvell technology.
Put them all together and you have a wonderful film festival for technophiles. Enjoy!
Annie Liao — Product Management Director, Connectivity Marketing at Marvell — showcased how PCIe Gen6 signals can be converted to optical—extending trace length beyond traditional electrical limitations. The result? A 10-meter cable reach spanning across multiple racks. Whether connecting GPUs, accelerators, or storage across racks, this kind of extended PCIe connectivity is critical for building the infrastructure that powers advanced AI workloads.
By Kishore Atreya, Senior Director of Cloud Platform Marketing, Marvell
Milliseconds matter.
It’s one of the fundamental laws of AI and cloud computing. Reducing the time required to run an individual workload frees up infrastructure to perform more work, which in turn creates an opportunity for cloud operators to potentially generate more revenue. Because they perform billions of simultaneous operations and operate on a 24/7/365 basis, time literally is money to cloud operators.
Marvell specifically designed the Marvell® Teralynx® 10 switch to optimize infrastructure for the intense performance demands of the cloud and AI era. Benchmark tests show that Teralynx 10 operates at a low and predictable 500 nanoseconds, a critical precursor for reducing time-to-completion.1 The 512-radix design of Teralynx 10 also means that large clusters or data centers with networks built around the device (versus 256-radix switch silicon) need up to 40% fewer switches, 33% fewer networking layers and 40% fewer connections to provide an equivalent level of aggregate bandwidth.2 Less equipment, of course, paves the way for lower costs, lower energy and better use of real estate.
Recently, we also teamed up with Keysight to provide deeper detail on another crucial feature of critical importance: auto-load balancing (ALB), or the ability of Teralynx 10 to even out traffic between ports based on current and anticipated loads. Like a highway system, spreading traffic more evenly across lanes in networks prevents congestion and reduces cumulative travel time. Without it, a crisis in one location becomes a problem for the entire system.
Better Load Balancing, Better Traffic Flow
To test our hypothesis of utilizing smarter load balancing for better load distribution, we created a scenario with Keysight AI Data Center Builder (KAI DC Builder) to measure port utilization and job completion time across different AI collective workloads. Built around a spine-leaf topology with four nodes, KAI DC Builder supports a range of collective algorithms, including all-to-all, all-reduce, all-gather, reduce-scatter, and gather. It facilitates the generation of RDMA traffic and operates using the RoCEv2 protocol. (In lay person’s terms, KAI DC Builder along with Keysight’s AresONE-M 800GE hardware platform enabled us to create a spectrum of test tracks.)
For generating AI traffic workloads, we used the Keysight Collective Communication Benchmark (KCCB) application. This application is installed as a container on the server, along with the Keysight provided supportive dockers..
In our tests, Keysight AresONE-M 800GE was connected to a Teralynx 10 Top-of-Rack switch via 16 400G OSFP ports. The ToR switch in turn was linked to a Teralynx 10 system configured as a leaf switch. We then measured port utilization and time-of-completion. All Teralynx 10 systems were loaded with SONiC.