Releases ChartNet, a million-scale, high-quality multimodal dataset for chart understanding spanning 24 chart types and 1.5 million samples.
March 31, 2026
Original Paper
ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding
arXiv · 2603.27064
The Takeaway
Current VLMs consistently fail at complex chart reasoning; this massive open-source dataset provides the necessary scale for fine-tuning models on aligned code, data tables, and QA. It is a major contribution to the open-source multimodal research community.
From the abstract
Understanding charts requires models to jointly reason over geometric visual patterns, structured numerical data, and natural language -- a capability where current vision-language models (VLMs) remain limited. We introduce ChartNet, a high-quality, million-scale multimodal dataset designed to advance chart interpretation and reasoning. ChartNet leverages a novel code-guided synthesis pipeline to generate 1.5 million diverse chart samples spanning 24 chart types and 6 plotting libraries. Each sa