Successfully places empty bottle in blue bin.
Vision-Language-Action (VLA) models demonstrate impressive reasoning over visual, semantic, and spatial task variations by leveraging large-scale vision and language pre-training. They remain, however, largely blind to contact forces, which seldom manifest clearly in visual feedback but are central to contact-rich manipulation. Tactile sensing measures these forces directly, but integrating it into VLAs is difficult: tactile data is absent from the large-scale corpora used to pre-train VLAs, so adding it as a new input modality induces a distribution shift that erodes the very pre-training that makes VLAs effective. We propose Tactile Annotation Prompting for Vision-Language-Action models (TAP-VLA), a simple framework that supplies tactile feedback through visual augmentation rather than architectural change. TAP-VLA extracts shear fields from visuo-tactile sensors and overlays them as spatially-grounded vectors onto the multi-view RGB images the policy already consumes, yielding a clear, interpretable tactile cue in the VLA's native observation space. Because the architecture is untouched, the approach requires no tactile pre-training, adds negligible compute, and stays close to the pre-training distribution. Across four contact-rich tasks, TAP-VLA succeeds on 78% of trials, compared to under 50% for vision-only fine-tuning and alternative tactile-fusion baselines---including tasks where the baselines perform no better than chance.
Goal: If bottle is empty, place in blue bin, if full, place in orange bin.
Successfully places empty bottle in blue bin.
Successfully places full bottle in orange bin.
Goal: Balance the white object with adjustable CoM on the blue platform.
Successfully balances object with CoM "far" from camera perspective.
Successfully balances object with CoM "near" from camera perspective.
Goal: Place the gear onto a peg mounted on the tabletop.
Goal: Insert cordless plug end into socket mounted on the tabletop.
Goal: Bottle is full, should be placed in orange bin.
Goal: Balance the white object with adjustable CoM on the blue platform.
Goal: Place the gear onto a peg mounted on the tabletop.
Goal: Insert cordless plug end into socket mounted on the tabletop.