Zero-Shot Detection via Vision and Language Knowledge Distillation

Where
Lab
Keywords
Multi-Modal