#11 SigLIP: Sigmoid Loss for Language Image Pre-Training
In contrastive image-text training, you can optimize loss computation by using sigmoid loss instead of the more common softmax approach. This allows you to scale up batch sizes significantly (though it turns out this has diminishing returns) and gives you overall efficiency gains.