#6 Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
As we’ve seen before, LLM-based visual agents are pretty good at planning what to do when completing high-level tasks, but pretty bad at “grounding”, i.e. turning the plan into an executable action.
Set-of-Mark prompting is a proposed technique to make grounding easier - it turns out that by annotating image inputs with masks and labels we can help LLMs ground the tasks better.