Latest
AI Risk Moves to the Boardroom at NYC Tech WeekAI Risk Moves to the Boardroom at NYC Tech Week|Why a16z, Sequence, and Thomson Bike Tours Are Bringing CFOs to Central Park During TECH WEEK NYCWhy a16z, Sequence, and Thomson Bike Tours Are Bringing CFOs to Central Park During TECH WEEK NYC|Built in NYC: AI Edition Signals New York’s Enterprise AI ShiftBuilt in NYC: AI Edition Signals New York’s Enterprise AI Shift|AI for Finance at NYTechWeek Signals a Shift in How CFOs Will OperateAI for Finance at NYTechWeek Signals a Shift in How CFOs Will Operate|Why PwC’s AI in Action Forum Signals Enterprise AI’s Next PhaseWhy PwC’s AI in Action Forum Signals Enterprise AI’s Next Phase|Goshe Energy Storage Secures $40M From S2G as Battery Storage Becomes Critical InfrastructureGoshe Energy Storage Secures $40M From S2G as Battery Storage Becomes Critical Infrastructure|Reactor Raises $59M to Build Infrastructure for Real-Time Generative VideoReactor Raises $59M to Build Infrastructure for Real-Time Generative Video|Ôrəbella Raises Series A Led by Silas Capital as Beauty Investors Back Product-Led GrowthÔrəbella Raises Series A Led by Silas Capital as Beauty Investors Back Product-Led Growth|Corgi Raises $106M Series B1 at $2.6B Valuation to Rebuild Insurance From the Carrier UpCorgi Raises $106M Series B1 at $2.6B Valuation to Rebuild Insurance From the Carrier Up|RevEng.AI Raises $15M Series A to Verify What Software Actually ShipsRevEng.AI Raises $15M Series A to Verify What Software Actually Ships|AI Risk Moves to the Boardroom at NYC Tech WeekAI Risk Moves to the Boardroom at NYC Tech Week|Why a16z, Sequence, and Thomson Bike Tours Are Bringing CFOs to Central Park During TECH WEEK NYCWhy a16z, Sequence, and Thomson Bike Tours Are Bringing CFOs to Central Park During TECH WEEK NYC|Built in NYC: AI Edition Signals New York’s Enterprise AI ShiftBuilt in NYC: AI Edition Signals New York’s Enterprise AI Shift|AI for Finance at NYTechWeek Signals a Shift in How CFOs Will OperateAI for Finance at NYTechWeek Signals a Shift in How CFOs Will Operate|Why PwC’s AI in Action Forum Signals Enterprise AI’s Next PhaseWhy PwC’s AI in Action Forum Signals Enterprise AI’s Next Phase|Goshe Energy Storage Secures $40M From S2G as Battery Storage Becomes Critical InfrastructureGoshe Energy Storage Secures $40M From S2G as Battery Storage Becomes Critical Infrastructure|Reactor Raises $59M to Build Infrastructure for Real-Time Generative VideoReactor Raises $59M to Build Infrastructure for Real-Time Generative Video|Ôrəbella Raises Series A Led by Silas Capital as Beauty Investors Back Product-Led GrowthÔrəbella Raises Series A Led by Silas Capital as Beauty Investors Back Product-Led Growth|Corgi Raises $106M Series B1 at $2.6B Valuation to Rebuild Insurance From the Carrier UpCorgi Raises $106M Series B1 at $2.6B Valuation to Rebuild Insurance From the Carrier Up|RevEng.AI Raises $15M Series A to Verify What Software Actually ShipsRevEng.AI Raises $15M Series A to Verify What Software Actually Ships
Back to articles

VAST Data Rewrites AI Inference Around Memory, Not Compute

The quiet truth about AI inference is that compute stopped being the hard part. Memory did. Context did. The moment AI systems moved past single prompts and started reasoning across time, the old...

NewsTop News

The quiet truth about AI inference is that compute stopped being the hard part. Memory did. Context did. The moment AI systems moved past single prompts and started reasoning across time, the old architecture showed its age. GPUs screaming fast, then standing around waiting for memory like a Ferrari stuck in city traffic. That friction is where performance dies and economics get ugly. VAST Data decided to confront it head on, with NVIDIA locked in beside them.

In early January 2026, VAST Data, headquartered in New York with deep engineering roots in Israel, announced a full redesign of AI inference architecture built specifically for the agentic era. Not tuned. Not patched. Rebuilt. The VAST AI Operating System now runs natively on NVIDIA BlueField 4 DPUs, embedding context memory directly inside the GPU server itself. No external client servers. No detours. Context lives where inference happens, shared at pod scale, deterministic, governed, and fast by default.

Renen Hallak, Chief Executive Officer and Co Founder of VAST Data, has been here before. At XtremIO, he helped turn architectural conviction into a $1B outcome. This moment rhymes. Inference is no longer a stateless transaction. It is a long conversation, sometimes 100K plus tokens deep, spanning agents, sessions, and users. Treating context like a sidecar is how GPUs idle and margins evaporate.

John Mao, Vice President of Global Technology Alliances at VAST Data, said it cleanly at CES 2026. Inference is becoming a memory system, not a compute job. VAST's DASE architecture, fused with NVIDIA BlueField 4 and Spectrum X networking, turns KV cache into shared infrastructure. GPUs stop waiting. Time to first token drops. Throughput stays predictable as concurrency scales from dozens to thousands.

Kevin Deierling, Senior Vice President of Networking at NVIDIA, framed it even more plainly. Context is the fuel of thinking. This architecture makes context persistent, accessible, and governed at line rate, aligning directly with NVIDIA's Vera Rubin platform vision for multi turn, multi user AI systems.

This is not a storage announcement dressed up as something bigger. VAST Data has crossed $2B in cumulative bookings, supports over 1M GPUs globally, and recently signed a $1.17B commercial partnership with CoreWeave. The company is no longer asking where AI infrastructure is going. It is building where it is landing.