LG Releases EXAONE 4.5 — The 33B Open VLM That Outscores GPT-5-mini and Claude 4.5 Sonnet on STEM

LG AI Research's EXAONE 4.5 is a 33B multimodal vision-language model that scores 77.3 average across five STEM benchmarks, outperforming GPT-5-mini (73.5) and Claude 4.5 Sonnet (74.6). The model supports text and images across Korean, English, Spanish, German, Japanese, and Vietnamese, and is available on HuggingFace for research and academic use.

Original source

LG AI Research has released EXAONE 4.5, a 33B-parameter multimodal Vision-Language Model that claims top performance among open models on scientific and technical reasoning tasks. The model scores an average of 77.3 across five STEM benchmarks — beating GPT-5-mini at 73.5 and Claude 4.5 Sonnet at 74.6.

The model is fully multimodal, handling both text and images, with expanded language support spanning Korean, English, Spanish, German, Japanese, and Vietnamese. LG has made the weights available on HuggingFace under a research and academic use license, with commercial licensing available via direct inquiry.

EXAONE 4.5 follows LG's EXAONE 4.0 series, which focused primarily on Korean and English language tasks. The expansion to multilingual STEM reasoning represents a meaningful capability leap and positions the model for use in scientific research tooling, technical documentation systems, and multilingual educational applications.

The context here matters: LG is not traditionally a frontier AI company. That a consumer electronics and industrial conglomerate is shipping models that outperform Anthropic's current-generation Sonnet on STEM benchmarks says something important about how broadly AI capability is diffusing across the industry. Enterprise buyers who assumed OpenAI and Anthropic were the only options for serious technical applications should be paying attention.

For builders, the research-use license is a constraint worth noting — you'll need to negotiate commercial terms for production applications. But for academic researchers, the model is available today.

Panel Takes

The Builder

Developer Perspective

“77.3 STEM average from a non-frontier lab using a 33B model is a genuine surprise. If the benchmarks hold up, this is competitive with models that cost 5x more per token. The multilingual STEM support is particularly valuable for global scientific tooling.”

The Skeptic

Reality Check

“The research-only license is a dealbreaker for most commercial applications, and LG's benchmarks are self-reported against a selective set of evaluations. Outperforming Claude 4.5 Sonnet on five STEM benchmarks doesn't tell you much about general capability or production reliability.”

The Futurist

Big Picture

“LG shipping a state-of-the-art STEM model is another data point in the story of AI capability becoming a commodity. When a TV and appliance company can beat Anthropic on STEM benchmarks, the moat for frontier labs is narrowing faster than their roadmaps assume.”

Panel Takes

Bookmarks