Data Infrastructure for Physical AI

Build the data layer
for physical intelligence.

We capture and structure multi-modal real-world interaction data — synchronized video, depth, pose, language, and sensor streams — for robotics, embodied agents, and world models.

View Dataset Talk to Research →

0 hrs Captured Data

0 M+ Interactions

0 x Modalities

0 TB+ Structured Data

PHYSICAL AI◈ EMBODIED DATA◈ WORLD MODELS◈ ROBOTICS◈ MULTI-MODAL◈ 4D CAPTURE◈ DEPTH SENSING◈ HAND POSE◈ SEMANTIC ANNOTATION◈ FOUNDATION MODELS◈ PHYSICAL AI◈ EMBODIED DATA◈ WORLD MODELS◈ ROBOTICS◈ MULTI-MODAL◈ 4D CAPTURE◈ DEPTH SENSING◈ HAND POSE◈ SEMANTIC ANNOTATION◈ FOUNDATION MODELS◈

TRUSTED BY LEADING AI LABS, UNIVERSITIES AND INDUSTRY PARTNERS

§ 03 DATASET.CORE ─── scale · fidelity · coverage ┤ ◈ ├

Core Dataset — v2.1

ZILLIA WorldTrace

The most comprehensive synchronized multi-modal dataset for real-world embodied interaction. Built to train robots that understand how humans navigate, grasp, and manipulate the physical world.

CAPTURE PIPELINE

capture

→

sync

→

annotate

→

package

→

index

→

deploy

Request Access Explore Dataset →

WorldTrace.db — LIVE INDEX

episodes 248,601

hours 8,240 hrs

frames 893M

modalities RGB · Depth · Pose · IMU · Lang · Seg

storage ~120 TB

annotations 12.4M labels

environments 340 unique

format MCAP · HDF5 · LeRobot

data coverage

87%

annotation quality

94%

sync accuracy

99%

last updated: 2026-01-15 · version: 2.1.3

§ 04 DATASET.METRICS ─── depth · scale · precision ┤ ◈ ├

Interaction Episodes

episodes

Unique task-completion sequences captured in-the-wild across diverse real environments

Capture Hours

hours

Continuous multi-modal recording at 30–120 fps with sub-ms sensor synchronization

Video Frames

M frames

RGB + depth frames with paired semantic segmentation and object pose ground truth

Annotation Labels

M+ labels

Human-verified bounding boxes, keypoints, action labels, and natural language descriptions

Sensor Streams

modalities

RGB, Depth, Hand Pose, IMU, Language, Semantic Segmentation — all synchronized

Environments

unique

Indoor, outdoor, industrial, domestic, lab — representing real deployment conditions

Structured Storage

TB+

Compressed, indexed, and packaged in MCAP and HDF5 for direct model ingestion

Sync Precision

Hardware-level timestamp alignment across all sensor modalities in real capture sessions

§ 05 DATASET.QUERY ─── search · retrieve · inspect ┤ ◈ ├

zillia.query — terminal v1.0 LIVE

try:

// ZILLIA Query Engine — ready

// Type a query or click a suggestion to search the dataset

§ 06 CAPTURE.SYSTEM ─── hardware · precision · coverage ┤ ◈ ├

DEVICE A — MOBILE

NP-Scout Mobile

Wearable egocentric capture unit for in-the-wild data collection. Lightweight, battery-powered, and optimised for unconstrained real-world environments.

Resolution4K RGB + 640×480 Depth

Frame Rate30 / 60 fps selectable

Battery8 hrs continuous

IMU9-DOF @ 200 Hz

Request Unit →

DEVICE B — LAB GRADE

NP-Precision Lab

High-fidelity multi-camera rig for controlled manipulation studies. Sub-millimeter pose accuracy with time-of-flight depth and structured light fusion.

Resolution6K multi-camera array

Depth ModeToF + Structured Light

SyncHardware PTP <0.1 ms

Pose Accuracy±0.5 mm / ±0.3°

Request Demo →

§ 07 WHY.ZILLIA ─── differentiation ┤ ◈ ├

01 //

In-the-Wild Scale

Data captured across hundreds of real-world environments — not synthetic scenes. Models trained on ZILLIA generalise to deployment conditions from day one.

02 //

Multi-Modal Synchronisation

Hardware-locked timestamps align RGB, depth, IMU, hand pose, and language streams to under 1 ms. No drift. No post-hoc alignment hacks.

03 //

Structured for Models

Data ships in MCAP and HDF5 with LeRobot-compatible episode indexing. Plug directly into your training loop with one import.

04 //

End-to-End Pipeline

From sensor capture to cloud-indexed, annotation-complete datasets. We own the full stack — no third-party annotation handoffs, no broken metadata.

┌──────────────────────────────────────────────────────────────────────┐
│ § 08 MISSION.STATEMENT ─── why we exist ┤ ◈ ├ │
└──────────────────────────────────────────────────────────────────────┘

The physical world
deserves better data.

Robots fail in the real world not because the models are wrong — but because the data they were trained on was never real. Simulations miss the chaos. Synthetic datasets miss the nuance. Lab recordings miss the scale.

ZILLIA was built to close this gap. We put sensors in the world, capture how humans actually move and interact with objects, and structure that signal into the cleanest possible training substrate for embodied intelligence.

We believe the next generation of physically capable AI will be trained on data that looks like ours.

2022Founded

6Team Members

7Research Partners

4Countries and Regions Active

Open for Collaboration — 2026

Build with us.

Whether you're training a foundation model, scaling a robotics product, or exploring embodied AI — we have the data infrastructure to accelerate your work.

Request Access Talk to Research →

GitHub · Hugging Face · Docs · Papers

Build the data layerfor physical intelligence.