Marc Benioff and Sam Altman at odds over core values of tech companies

Published 1:19 pm Wednesday, January 17, 2024

By Ian Krietzberg

: Altman told Bloomberg that, as AI models get more powerful, "no one knows what happens next."

Most recently exemplified by a copyright lawsuit filed by the New York Times against Microsoft (MSFT) – Get Free Report and its artificial intelligence partner OpenAI, the tech industry has lately been at odds with the creative industry.

Marc Benioff: ‘All the training data has been stolen’

“We need to address tech companies’ core values,” Benioff told Bloomberg at the World Economic Forum in Davos, Switzerland Tuesday. “What is really important to these tech companies and how they operate is everybody’s business. Our intellectual property, your stories, your work, surfacing in these results because all the training data has been stolen.”

Salesforce CEO Marc Benioff called the training data that powers Large Language Models (LLMs) “stolen.”

Bloomberg/Getty Images

On the front end, Benioff said, people now have access to highly commoditized user interfaces powered by large language models (LLMs) such as ChatGPT.

OpenAI, the maker of ChatGPT, is valued at $86 billion; Microsoft, which invested $13 billion into the startup, is valued at nearly $3 trillion. A subscription to ChatGPT Plus costs users $20 per month.

But the thing powering those “highly commoditized” LLMs, Benioff said, is “this broad set of training data which has been basically ripped off.”

“If you’re going to use this data, I think probably there’s a pretty great company to be built on a standardized set of training data that lets all these companies play a fair game and lets the content creators get paid fairly for their work,” he said. “I think that bridge has not yet been crossed and that’s a mistake by the AI companies.”

OpenAI: ‘It would be impossible’ to train without violating copyright

OpenAI, however, which has recently signed licensing deals with The Associated Press and Axel Springer, has said that “it would be impossible to train today’s leading AI models without using copyrighted materials.”

The company reiterated in response to the Times’ lawsuit that it believes training is fair use and, despite this belief, allows an opt-out process for publishers who don’t want their data scraped.

Opting out does not erase previously garnered data from existing models.

“OpenAI’s lobbying campaign, simply put, is based on a false dichotomy (give everything to us free or we will die) — and also a threat: either we get to use all the existing IP we want for free, or you won’t get to generative AI anymore,” AI researcher Gary Marcus said at the time. “But the argument is hugely flawed.”

Marcus said that content creators are not arguing that tech companies should not be allowed to use their content; the argument, similar to Benioff’s own stance, is that tech companies should pay to use that content.

OpenAI CEO Sam Altman, also speaking to Bloomberg at the World Economic Forum on Tuesday, said that the recent spate of copyright lawsuits filed against his company are “important, but not for the reason people think.”

“There is this belief held by some people that ‘you need all of my training data and my training data is so valuable,’ and actually, that is generally not the case. We do not want to train on the New York Times’ data, for example,” he said.

Altman told Bloomberg that, as AI models get more powerful, “no one knows what happens next.”

Bloomberg/Getty Images

The focus of OpenAI’s research now, he said, is how far they can leverage smaller sets of high-quality data in training models to be on par with those trained on enormous sets of data.

Altman’s goal is to work with publishers to essentially provide news snippets as sourced AI-generated output. He said OpenAI is currently striking a lot of partnerships which will soon be announced, though didn’t provide any details.

The Information recently reported that OpenAI was offering between $1 million and $5 million in annual licensing fees to media publishers.

“We don’t want to regurgitate someone else’s content, but the problem is not as easy as it sounds in a vacuum,” Altman said, adding that the web might be full of previously stolen New York Times articles that don’t include attribution, making training without violating copyright a “tricky thing.”

He said, however, that if publishers provide OpenAI with a set of articles not to include in its training data, the company should be able to get that number of copyright-infringing instances down.

“The positives are, I think there’s going to be great new ways to consume and monetize news, and for every one New York Times situation we have, we have many more super productive things about people that are excited to build the future and not do the theatrics,” Altman said.

Contact Ian with AI stories via email, ian.krietzberg@thearenagroup.net, or Signal 732-804-1223.

Get exclusive access to portfolio managers’ stock picks and proven investing strategies with Real Money Pro. Get started now.