modality-encoder

1 note tagged “modality-encoder”

[CLIP] Learning Transferable Visual Models From Natural Language Supervision

Trains an image encoder and a text encoder to match images with their captions (contrastive) on 400M web pairs — enabling open-vocabulary zero-shot transfer, and becoming the vision encoder most VLMs freeze and reuse.

ICML 2021

2021 · modality-encoder