5. Autonomous shopping for frictionless customer experience Autonomous shopping is catching on quite rapidly. Autonomous shopping allows customers to enter, shop, and leave the store without ever having to interact with a cashier or wade their way through long lines of automated checkout counters that don’t always work reliably. Among the many models required to power this application, we would need one that can predict human actions accurately.

5.1 Challenges with recognizing human actions Recognizing human actions provides a deeper understanding of the scene than just detecting people. In a retail environment, it can help identify user activities like picking up an object from a shelf or inspecting the object or putting it in their cart, essential for a frictionless shopping experience. However, action recognition is a very difficult problem to solve. First, you need labeled video data, which is more expensive than labeled image data. Second, action recognition is a more complex, compute-intensive operation. To recognize an action, you need the temporal aspect, so you are on a sequence of frames instead of a single one.

5.2 Customizing a pretrained action recognition model The solution starts with a pretrained model for action recognition. For this experiment, we chose the I3D inception model architecture mentioned in this paper, which can be found at this repository. Fig. 11 3-D Action Classification Model

In TAO, both RGB and optical flow I3D pretrained models are supported. We take the RGB model and finetune it with the classes that represent the typical actions performed by a human in a retail setting.