Continuum robots are advancing bronchoscopy procedures by accessing complex lung airways and enabling
targeted interventions. However, their development is limited by the lack of realistic training and test
environments: Real data is difficult to collect due to ethical constraints and patient safety concerns,
and developing autonomy algorithms requires realistic imaging and physical feedback.
We present ROOM (Realistic Optical Observation in Medicine), a comprehensive simulation
framework designed for generating photorealistic bronchoscopy training data. By leveraging patient CT scans,
our pipeline renders multi-modal sensor data including RGB images with realistic noise and light specularities,
metric depth maps, surface normals, optical flow and point clouds at medically relevant scales.
We validate the data generated by ROOM through monocular depth estimation experiments, demonstrating
diverse challenges that state-of-the-art methods must overcome to transfer to these medical settings.
Furthermore, we show that the data produced by ROOM can be used to fine-tune existing depth estimation
models to overcome these challenges, enabling downstream applications such as vision-based navigation
with Model Predictive Control.