The Physical AI Stack, Decoded: What Each Domain Actually Needs — And Who’s Building It

Physical AI is having its inflection year. At GTC 2026, NVIDIA called it the ChatGPT moment for robots. Deloitte says 58% of enterprises are already using it in some form. Accenture just backed General Robotics. ABB and Siemens are embedding NVIDIA’s simulation libraries directly into their industrial platforms.
But here’s the thing — and most people working in this space already know it — a surgical robot and a warehouse bot will never share the same brain. Not because it’s technically impossible to build a big enough model, but because the data they need, the latency they demand, and the consequences of getting it wrong are fundamentally different.
What’s less obvious, and far more useful, is understanding what each domain actually requires — which models work where, how the data gets collected, and which companies occupy which layer of the stack. That’s what this article maps out.
Four Domains at a Glance
Physical AI gets discussed as one field. In practice, it’s at least four distinct challenges that share a label — each gated by a different bottleneck.
Manufacturing & Warehousing is the proving ground. Structured environments, predictable physics, high tolerance for occasional failure. A mis-sorted package gets re-routed. This is where simulation-first training and cross-embodiment foundation models actually work — ABB’s RobotStudio HyperReality claims 99% sim-to-real correlation, because you can simulate a conveyor belt. The furthest along, the most forgiving, and the best fit for today’s VLA models.
Surgical Robotics is gated by data, not compute. Hospitals generate massive clinical data — but almost none of what surgical AI actually needs: hand trajectories, force feedback, tissue deformation. Most ORs were never built for structured data capture. And the tissue itself — deformable, fluid, anatomically unique per patient — is the hardest simulation challenge in Physical AI. Moon Surgical is tackling this by generating synthetic OR environments through NVIDIA Isaac for Healthcare, sidestepping the real-world data bottleneck. Regulatory clearance (FDA 510(k), Level 1 autonomy) keeps everything human-in-the-loop for now.
Autonomous Vehicles are gated by certification. The perception models are good enough. But safety standards like ISO 26262 require deterministic verification, and a model that continues learning after deployment is, by regulatory definition, a different system every time it updates. The industry response is modular stacks — large retrained VLMs for perception, frozen deterministic models for planning and control — because certification requires you to verify each layer independently.
Smart Prosthetics are gated by biology. The intelligence layer interfaces with a human nervous system, not a warehouse floor. Sub-200ms EMG decoding latency, per-user biomechanical adaptation, hardware that survives moisture and daily wear. Data from one patient doesn’t transfer to another. Models here are small, fast, and deeply personal — the opposite of everything else in Physical AI.
How Data Gets Collected: Three Strategies Converging
Across all four domains, a consensus is forming around a three-layer data strategy that blends synthetic and real-world approaches.
Simulation-first training uses platforms like NVIDIA Isaac Sim, Omniverse, and Cosmos world foundation models to generate millions of training scenarios before any physical hardware is involved. This is where 80–90% of initial training data comes from in manufacturing and AV applications. The economics are compelling: what used to take months of real-world data collection can now be generated in 36 hours.
Real-world fine-tuning bridges the sim-to-real gap. Synthetic data gets you 80% of the way, but the last 20% — the long tail of edge cases, material interactions, and environmental noise that simulation can’t perfectly capture — requires real-world data. This is where approaches like AGIBOT’s hierarchical annotation (including error-recovery trajectories) and teleoperation-based data collection become critical.
Continuous deployment feedback closes the loop. Once robots are in the field, they generate operational data that feeds back into the training pipeline. ABB’s RobotStudio HyperReality is designed for exactly this: the same firmware runs in simulation and on physical hardware, so field data can be directly replayed and augmented in the digital twin.
The balance between these three layers varies dramatically by domain. Warehousing leans heavily synthetic. Surgery leans heavily real-world (where it can get the data at all). Prosthetics is almost entirely patient-specific. AVs run the most sophisticated blend of all three.
The Physical AI Company Stack
Perhaps the most actionable lens for understanding this space is the who builds what question. Physical AI is assembling itself into a layered stack, and knowing which layer a company occupies tells you what it actually does — and what it needs partners for.
Layer 1: Silicon and Edge Compute
This is the hardware foundation — the chips and edge modules that run Physical AI models in the real world. NVIDIA dominates with Jetson (Orin for today’s deployments, Thor for next-generation humanoids), but Arm’s Neoverse architecture sits underneath both NVIDIA and Qualcomm’s robotics silicon. For prosthetics and medical devices, the conversation shifts to neuromorphic chips and ultra-low-power inference ASICs that can run on-body.
Key players: NVIDIA (Jetson family), Qualcomm (Robotics RB series), Arm (architecture layer), Intel (Mobileye for AV), specialised chip startups for neuromorphic/medical edge.
Layer 2: Simulation and World Models
The simulation layer is where training data gets generated at scale. NVIDIA’s Omniverse + Isaac Sim + Cosmos stack is the de facto platform, but it’s not the only game. NVIDIA’s Physical AI Data Factory blueprint, released at GTC 2026, provides an open reference architecture that cloud providers like Microsoft Azure and Nebius are deploying as managed services. Siemens is integrating Omniverse into its industrial digital twin platform. ABB is building RobotStudio HyperReality on top of it.
Key players: NVIDIA (Omniverse, Isaac Sim, Cosmos), Siemens (industrial digital twins), ABB (RobotStudio HyperReality), Applied Intuition (AV simulation/validation), Genesis AI (sim-to-real infrastructure), cloud providers (Azure, Nebius, Alibaba Cloud) offering managed Physical AI environments.
Layer 3: Foundation Models and Robot Brains
This is the intelligence layer — the VLA models, policy networks, and control algorithms that give robots the ability to perceive, reason, and act. The landscape here is rapidly consolidating around a few architectures, but domain specialisation remains the rule.
Key players: NVIDIA (GR00T N1 for humanoids), Google DeepMind (Gemini Robotics for general manipulation), Physical Intelligence (π₀ for dexterous tasks), Skild AI (general-purpose robot brains across embodiments), Genesis AI (GENE-26.5 for manipulation), Intuitive Surgical (proprietary surgical AI), RobotEra (VLA for logistics), General Robotics (modular intelligence grid).
Layer 4: Robot Hardware / OEMs
The physical embodiments — the actual robots, vehicles, and devices that the intelligence stack runs on.
Key players by domain: Manufacturing/logistics: ABB, FANUC, Universal Robots, Boston Dynamics, Unitree, Agility Robotics, Amazon (Sparrow/Sequoia), AGIBOT. Surgical: Intuitive Surgical (Da Vinci), Medtronic (Hugo), CMR Surgical (Versius), LEM Surgical (Dynamis), Moon Surgical (Maestro). Automotive: Waymo, Tesla, Mercedes-Benz, BMW, Porsche, plus Tier 1 suppliers like Bosch and Continental. Prosthetics: Össur, Ottobock, Open Bionics — smaller companies where the hardware-software integration is extremely tight.
Layer 5: System Integrators and Services
This is where the stack meets the enterprise. Large consulting and engineering firms take the platforms, models, and hardware from Layers 1–4 and deploy them into client environments — handling the messy work of IT/OT convergence, change management, regulatory compliance, and workforce integration.
Key players: Deloitte (Physical AI centres of excellence with NVIDIA, across automotive, life sciences, energy), Accenture (Physical AI Orchestrator for software-defined factories, investment in General Robotics), Capgemini (embedded AI, robotics software development, autonomous systems engineering), and a growing tier of specialist integrators like Azilen, SoftServe, and GlobalLogic.
The Real Takeaway
Physical AI isn’t one market. It’s a stack, and every domain sits differently on that stack. Warehousing can lean heavily on Layers 2 and 3 — simulation and foundation models do most of the heavy lifting. Surgery needs deep investment in Layer 4 (specialised hardware with proprietary data flywheels) and Layer 5 (regulatory navigation). AVs need the full stack, vertically integrated, with Layer 2 and Layer 3 in constant tension with certification requirements. Prosthetics need almost nothing from Layers 2 and 3 as they exist today, and everything from on-device intelligence and patient-specific adaptation.
The companies that win won’t be the ones with the biggest model. They’ll be the ones that understand which layer matters most for their domain — and build accordingly.