Autonomous from Scratch: Building Datameister's Physical AI Foundation

Autonomous from Scratch: Building Datameister's Physical AI Foundation

TL;DR We built and operate an autonomous navigation and data-capture skill running end-to-end on a Unitree GO2, an NVIDIA Jetson, and a Livox MID-360. Unified under a single ROS 2 framework, the robot navigates a site autonomously, builds a persistent 3D map of what it sees, captures the run as a replayable bag, and lets you watch it live from a browser, all without a tethered laptop.

This is the on-robot side of Datameister's Physical AI work, paired with the Datameister Platform on the training side. The quadruped is the mobile base, the legs, running in parallel with our stationary cobot manipulation work. The two converge into mobile manipulators and eventually humanoid platforms. Every run also generates the real-world data our training pipeline needs.

Result: a Physical AI stack we own end-to-end, ready to drive autonomous inspection runs today and to feed the manipulation skills coming next.

Applications: autonomous inspection patrol, indoor and outdoor 3D mapping, digital twin capture, real-to-sim data generation, repeatable site walkthroughs, mobile-manipulator roadmap, foundation for humanoid platforms.

Robots are stepping out of the lab. Quadrupeds are scanning construction sites, humanoids are picking parts on factory floors, mobile manipulators are running restocking shifts in retail. The hardware is moving faster than the software underneath it, and most teams trying to deploy these systems into real environments are stuck on the same gap: the autonomy stack between perception and reliable, deployable action does not exist as a product you can buy.

That stack is what makes the rest of the Physical AI work possible: the manipulation skills, the deployment feedback loops, the cognitive work above. It is also the kind of stack a hardware-agnostic robotics integrator would build if they had an in-house AI research team. We built ours from scratch on a Unitree GO2, an NVIDIA Jetson, and a Livox MID-360, unified under a single ROS 2 framework. This post walks through why we built it, what it lets us do today, and what it now enables on top.

Why we built our own robotics stack

What the existing options leave you with

Anyone who has tried to take a vendor demo from the lab to a real site knows the gap. Consumer-tier firmware is built around scripted demos. The platform will follow you around a room or run those scripts, but underneath it functions as a black box: no persistent map, no goal-driven navigation, no observability worth the name. Push past that, asking the platform to reliably execute a route while logging the full state of its perception, planning, and control pipelines, and you hit a wall.

A second wave of robotics companies tries to bridge that gap with a Robot-as-a-Service wrapper: source the hardware from a major OEM, plug in a foundation model, and sell warehouse operations or machine tending as a managed service. It looks like the answer on the surface. In practice the wrapper hides the layers that need to be hardened (SLAM, planners, transforms, sensor sync, control gating); they show up the moment you try to take the system from a controlled proof-of-concept to a deployable fleet, and a vendor whose differentiator is the wrapper cannot solve them on a project timeline.

For a European robotics integrator, neither is really the alternative anyway. The actual alternative is hiring and retaining an in-house AI research team of five to ten engineers across perception, planning, control, MLOps, and simulation, to build the same stack themselves. Most integrators of the 20-to-80-person shape cannot justify that headcount per customer engagement. Closing the gap between the OEM hardware they already buy and the skill their customer needs is systems engineering work, and no vendor sells it as a product.

Unitree GO2 and Unitree G1 The Unitree GO2 (left) and the Unitree G1 (right). Sources: New Atlas and Unitree.

What we wanted instead: Autonomous from Scratch

We realized that to truly enter the realm of Physical AI, we had to build the system autonomous from scratch. The brief was simple to state but required intense systems engineering to deliver. We needed an on-robot stack that pairs with the Datameister Platform on the training side: a setup where the robot acts as a mobile, intelligent data-harvesting node, and where the skills the Platform produces have a body to run on.

Instead of a black box, we wanted a transparent architecture we own end-to-end, delivering:

  • Persistent 3D Mapping & SLAM: Real-time, LiDAR-based 3D SLAM capable of generating high-resolution point clouds. We needed the system to build highly accurate, persistent maps on the fly, with the ability to refine them using loop closures.
  • Goal-Driven Autonomous Navigation: Point-to-point routing and continuous loop patrolling powered by Nav2, utilizing an adaptive gait controller to handle complex terrain.
  • Always-On Data Harvesting: Continuous capture of the platform's complete operational state. We needed every sensor feed, planning decision, and control output to be seamlessly logged, with built-in conversion pipelines making that data ready for machine learning ingestion.
  • Headless, Untethered Operations: A custom boot-and-provisioning sequence that allows the robot to power on, serve its own network or join the site's WiFi, and start working immediately with zero laptop tethers or manual script executions.

That is the system we shipped: a hardware-agnostic platform for cognitive robots, built on the Unitree GO2, where the exact same software principles and learning skills carry forward to the coming wave of humanoids.

How this fits alongside the Datameister Platform

A note on terminology before going further. The Datameister Platform is our training and deployment infrastructure for visual AI and 3D AI. It lives off the robot, in the cloud or on-prem at a customer site, and handles capture, real-to-sim reconstruction, simulation orchestration, training pipelines, and evaluation. The autonomy stack this post is about lives on the robot. It hosts the skills the Platform produces, captures the real-world data the next training round runs on, and handles the perception, navigation, and control underneath. Two distinct pieces, designed to feed each other.

The hardware: Unitree GO2 + NVIDIA Jetson + Livox MID-360

The hardware footprint is deliberately lean and self-contained. The Unitree GO2 provides a capable quadruped chassis with a front-mounted RGB camera and an SDK that lets us override its default vendor behaviors to inject our own Nav2 velocity commands. The NVIDIA Jetson J4012 sits on the robot's back and runs everything: the drivers, SLAM, Nav2, elevation mapping, the Foxglove bridge, and the heavy data-recording pipelines. No tethered laptop, no companion PC. The Livox MID-360 is a 360° non-repetitive scanning LiDAR. While sparse on a single scan, it builds incredibly dense maps over time, making it the perfect, lightweight match for a mobile robot that can't carry a heavy, costly spinning sensor.

This is the same standardized off-the-shelf hardware tier our broader work targets; integrators deploying mobile manipulators or humanoids end up on the same compute footprint, which is part of why owning the stack at this level compounds across platforms. The full hardware stack: Unitree GO2 with NVIDIA Jetson and Livox MID-360 mounted on the robot's back The full hardware stack. Compute, perception, and locomotion are all on-board, with no off-robot dependency. (source: Datameister)

The live system today: sensing, navigating, recording

The first complete skill running end-to-end on this stack is autonomous navigation and data capture: give the robot a site and a route, and it patrols autonomously, builds a persistent 3D map of what it sees, captures the run as a replayable bag, and lets you watch it live from a remote browser. The rest of this section unpacks the subsystems underneath.

The same on-board compute we already proved out for advanced indoor 3D perception now also runs that skill, so perception and action share one platform with no off-board computer in the loop. The stack is organized around the perception-to-action loop: it anchors observations in a shared map, turns them into navigation context, acts through Nav2, and records the run for replay and improvement. Each subsystem matters on its own, but the real value is how they connect inside one deployable ROS 2 workflow.

Multi-modal perception

The robot captures the world and itself simultaneously, across multiple streams. GLIM fuses the Livox LiDAR and IMU into real-time SLAM, producing odometry and geometry anchored in a shared spatial frame. The left image below shows the result: a complete city block mapped consistently from thousands of individual scans.

The GO2 front camera is synchronized with that frame and projected onto the geometry, producing a colorized point cloud shown on the right. The color layer adds object and scene context that geometry alone does not carry.

Beyond the spatial picture, the stack continuously logs every joint state, motor command, battery level, and control decision: a complete operational record of what the robot perceived, decided, and did. The result is a time-synced stream of everything the robot sees, does, and decides, built in real-time.

Side-by-side: a 3D SLAM map produced by GLIM (left) and the camera-colorized point cloud (right) from the same walkthrough. Real-time spatial mapping using GLIM (left) alongside the synchronized, camera-colorized point cloud (right). (source: Datameister)

Autonomous navigation and terrain adaptation

Once the robot has a shared spatial frame, it can act inside it. Nav2 handles goal-driven navigation on top of GLIM odometry, publishing updated plans and costmaps as it moves. Through Foxglove, shown in the left image below, the robot can be remotely directed for simple point-to-point movement or set on repeatable waypoint routes, allowing the robot to revisit the exact same paths for patrols, inspections, and repeatable data capture.

Reaching those waypoints reliably means accounting for what's underfoot. To handle uneven environments, elevation_mapping_cupy adds a local 2.5D height map and traversability layer on top of the SLAM output, visible on the right. This signal actively feeds both planning and locomotion: Nav2 switches to a terrain-aware parameter set, while an adaptive gait controller adjusts the robot's steps when the ground becomes harder to cross. The result transforms a static map into a truly navigable environment, telling the robot where to step, where to slow, and where to stop.

Side-by-side: Foxglove showing a planned route through the building (left) and a traversability-coloured elevation map (right). Foxglove route planning (left) and the traversability-aware elevation map (right). (source: Datameister)

Remote ops and data harvesting

Every run leaves a trail. The stack records live runs to ROS 2 bags by default, ensuring each session can be inspected or replayed later instead of disappearing when the robot stops. The recording captures the live point-cloud and state streams, so each bag can be replayed through the launch flow or exported into ML-ready datasets for training, evaluation, and map review. Every walkthrough becomes both an autonomy run and a reusable data capture session.

Simultaneously, the entire stack is built for untethered, remote observability. The robot exposes its live ROS graph and URDF description through Foxglove, rendering a live digital twin in the same scene as the map, point cloud, and planned motion directly from a remote browser. The field workflow is completely self-contained. On boot, the Jetson starts the stack through a systemd service and connects over Tailscale for remote viewing, development, and debugging. If it cannot join a known network, it falls back to Wi-Fi provisioning mode. The bags those runs produce are also the input the next section is built on.

From autonomy to manipulation

The autonomy stack is the foundation. The next round of work sits on top of it, and it runs in two directions at once.

The data flywheel

Every run leaves more than a successful patrol or a clean map. The same ROS 2 bags that record live runs are exactly the input the Datameister Platform needs upstream: real-to-sim reconstruction, scenario augmentation, simulation-grounded policy training, evaluation against held-out scenes. This is the same analytics loop we already run on the Platform for visual AI, now fed by a body that walks around. Every walkthrough produces a complete digital twin of the space; that twin is exactly what our indoor semantic segmentation pipeline was built to consume, with semantic labels for every object and space.

Trained skills come back down to the robot, where the autonomy stack hosts them at runtime. That is the flywheel: capture on the robot, train upstream on the Platform, deploy back to the robot. The autonomy stack and the Platform are designed to feed each other rather than to overlap.

A semantically segmented indoor point cloud, with each object class shown in a different colour. Advanced 3D perception running on our autonomy stack. (source: Datameister, from "Indoor Semantic Segmentation of Point Clouds")

Legs first, then arms

The quadruped is the mobile base, the legs. In parallel, we have a stationary cobot manipulation track running on standardized off-the-shelf arms (Franka, UR, AGIBOT). The two converge into mobile manipulators, and beyond that, into humanoid platforms. Manipulation is where most of the current industrial pain concentrates: precision insertion, bin picking, machine tending, SKU-variant grasping, depalletization, on hardware integrators are already buying. We are currently #1 on the Intrinsic AI for Industry Challenge leaderboard as of mid-May 2026, with the precision-insertion track being forced to production through the challenge.

A foundation that ports across robots

None of this is GO2-specific. With a different driver and URDF, the same stack runs on a humanoid or another mobile platform, and the investment compounds across robots. The on-board compute footprint (NVIDIA Jetson J4012 + Livox MID-360) is the same tier that runs production-grade perception, navigation, and the policy classes downstream of them: classical control where it works, learned policies where they earn their cost. That portability matters for any team running multiple hardware platforms across customer deployments.

Where this fits, and working with us

The autonomy stack is the on-robot side of Datameister's Physical AI work. The off-robot side is the Datameister Platform (training infrastructure that handles capture, real-to-sim, simulation, training, and evaluation), a perception and 3D-AI library that has been live in production for years as the building blocks, and a focused catalog of reference manipulation skills being built compositionally on top of those primitives. Datameister is currently #1 on the Intrinsic AI for Industry Challenge leaderboard as of mid-May 2026.

For European robotics integrators building skills on your customers' hardware (inspection today, mobile manipulation and humanoid next), three buyer paths sit on the same stack:

  • Use what we ship today: the autonomy stack as a reference, plus production-grade primitives in perception, 3D deep learning, and generation. Reference manipulation skills land progressively over the course of 2026.
  • Fine-tune anything in the Library on our infrastructure, against your customer's hardware and environment. You own the resulting Client-Specific Code.
  • Build your own skills on our infrastructure, with Datameister upstream of the skill behavior. You own the IP that differentiates you.

Our MSA framework (Platform, Library, Client-Specific Code, Buy-Out License) sits up front so the IP questions are settled before engineering starts. On-prem-default for industrial deployments, EU-domiciled, ISO27001-tracked.

If you are working on robotics in a different shape (a foundation-model lab with a deep cognitive subproblem you would rather not own end-to-end, an OEM looking at training infrastructure for your own platform, or a team trying to bring a specific robot to life in the real world) we are also glad to talk. We have handled the foundational systems engineering so that, together, we can focus on what your customer actually needs the robot to do.

More from the Blog