1 If there's Intelligent Life out There
Abigail Staley edited this page 1 week ago


Optimizing LLMs to be proficient at particular tests backfires on Meta, Stability.

-. -. -. -. -. -. -

When you acquire through links on our site, we may make an affiliate commission. Here’s how it works.

Hugging Face has launched its 2nd LLM leaderboard to rank the very best language designs it has tested. The new leaderboard looks for to be a more challenging consistent requirement for evaluating open big language design (LLM) efficiency throughout a variety of jobs. Alibaba’s Qwen designs appear dominant in the leaderboard’s inaugural rankings, taking 3 spots in the top 10.

Pumped to announce the brand name new open LLM leaderboard. We burned 300 H100 to re-run brand-new assessments like MMLU-pro for all significant open LLMs!Some learning:- Qwen 72B is the king and Chinese open designs are controling overall- Previous evaluations have ended up being too simple for current ... June 26, 2024

Hugging Face’s 2nd leaderboard tests language models throughout 4 tasks: understanding screening, thinking on exceptionally long contexts, complex math capabilities, and instruction following. Six criteria are utilized to evaluate these qualities, with tests consisting of fixing 1,000-word murder secrets, explaining PhD-level concerns in layperson’s terms, and a lot of daunting of all: high-school mathematics formulas. A full breakdown of the criteria utilized can be found on Hugging Face’s blog.

The frontrunner of the brand-new leaderboard is Qwen, Alibaba’s LLM, which takes 1st, 3rd, and 10th location with its handful of variations. Also revealing up are Llama3-70B, Meta’s LLM, and a handful of smaller open-source projects that handled to exceed the pack. Notably absent is any indication of ChatGPT