Alibaba's Qwen team built HopChain to fix how AI vision models fall apart during multi-step reasoning
Alibaba's Qwen team addresses AI vision model weaknesses.

Alibaba's Qwen team, in collaboration with Tsinghua University, has developed HopChain, a framework aimed at addressing the shortcomings of vision-language models (VLMs) in multi-step reasoning tasks. These models often produce errors that cascade through reasoning chains, leading to incorrect conclusions. HopChain generates multi-stage image questions that compel models to re-examine images, resulting in improved accuracy across various benchmarks.
Key Takeaways
- 1.
HopChain improved performance on 20 out of 24 benchmarks for AI models.
- 2.
The framework generates around 60,000 to 80,000 training examples per model.
- 3.
Models trained with HopChain showed significant gains in multi-step reasoning tasks.
Get your personalized feed
Trace groups the biggest stories, videos, and discussions into one feed so you can stay current without scanning ten tabs.
Try Trace free