wp header logo 345

Would Large Language Models Be Better If They Weren’t So Large? – The New York Times

Posted by

Artificial Intelligence
Advertisement
Supported by
Mind
Teaching fewer words to large language models might help them sound more human.

When it comes to artificial intelligence chatbots, bigger is typically better.
Large language models like ChatGPT and Bard, which generate conversational, original text, improve as they are fed more data. Every day, bloggers take to the internet to explain how the latest advances — an app that summarizes‌ ‌articles, A.I.-generated podcasts, a fine-tuned model that can answer any question related to professional basketball — will “change everything.”
But making bigger and more capable A.I. requires processing power that few companies possess, and there is growing concern that a small group, including Google, Meta, OpenAI and Microsoft, will exercise near-total control over the technology.
Also, bigger language models are harder to understand. They are often described as “black boxes,” even by the people who design them, and leading figures in the field have expressed ‌unease ‌that ‌A.I.’s goals may ultimately not align with our own. If bigger is better, it is also more opaque and more exclusive.
In January, a group of young academics working in natural language processing — the branch of A.I. focused on linguistic understanding — issued a challenge to try to turn this paradigm on its head. The group called for teams to create functional language models ‌using data sets that are less than one-ten-thousandth the size of those used by the most advanced large language models. A successful mini-model would be nearly as capable as the high-end models but much smaller, more accessible and ‌more compatible with humans. The project is called the BabyLM Challenge.
“We’re challenging people to think small and focus more on building efficient systems that way more people can use,” said Aaron Mueller, a computer scientist at Johns Hopkins University and an organizer of BabyLM.
Alex Warstadt, a computer scientist at ETH Zurich and another organizer of the project, added, “The challenge puts questions about human language learning, rather than ‘How big can we make our models?’ at the center of the conversation.”
We are having trouble retrieving the article content.
Please enable JavaScript in your browser settings.
Thank you for your patience while we verify access. If you are in Reader mode please exit and log into your Times account, or subscribe for all of The Times.
Thank you for your patience while we verify access.
Already a subscriber? Log in.
Want all of The Times? Subscribe.
Advertisement

source

Leave a Reply

Your email address will not be published. Required fields are marked *