Marc Benioff and Sam Altman at odds over core values of tech companies

Published 1:19 pm Wednesday, January 17, 2024

Altman told Bloomberg that, as AI models get more powerful, "no one knows what happens next."

Most recently exemplified by a copyright lawsuit filed by the New York Times against Microsoft  (MSFT) – Get Free Report and its artificial intelligence partner OpenAI, the tech industry has lately been at odds with the creative industry. 

The focus of this conflict is a difference in opinion over the “fair use” doctrine, which has yet to be clarified by the U.S. Copyright Office, and how it relates to the construction of commercialized AI models.

The tech and AI companies have largely argued that it is fair use to train their models on content scraped from every corner of the internet; the artists and organizations creating that content disagree. 

Related: ChatGPT maker has a strong response to New York Times lawsuit

The Times’ lawsuit alleges rampant copyright infringement by OpenAI and Microsoft, both in the inputs and outputs of its generative AI models, something the Times argued represents a significant threat to its business. 

The suit cited dozens of examples of AI-generated output that copied Times articles almost verbatim. 

“If Microsoft and OpenAI want to use our work for commercial purposes, the law requires that they first obtain our permission,” a Times spokesperson said. “They have not done so.”

It is a standpoint that Marc Benioff — the CEO of the software giant Salesforce  (CRM) – Get Free Report and the owner of Time Magazine — agrees with.

Related: The ethics of artificial intelligence: A path toward responsible AI

Marc Benioff: ‘All the training data has been stolen’

“We need to address tech companies’ core values,” Benioff told Bloomberg at the World Economic Forum in Davos, Switzerland Tuesday. “What is really important to these tech companies and how they operate is everybody’s business. Our intellectual property, your stories, your work, surfacing in these results because all the training data has been stolen.”

Salesforce CEO Marc Benioff called the training data that powers Large Language Models (LLMs) “stolen.” 

Bloomberg/Getty Images

On the front end, Benioff said, people now have access to highly commoditized user interfaces powered by large language models (LLMs) such as ChatGPT. 

OpenAI, the maker of ChatGPT, is valued at $86 billion; Microsoft, which invested $13 billion into the startup, is valued at nearly $3 trillion. A subscription to ChatGPT Plus costs users $20 per month. 

But the thing powering those “highly commoditized” LLMs, Benioff said, is “this broad set of training data which has been basically ripped off.” 

“If you’re going to use this data, I think probably there’s a pretty great company to be built on a standardized set of training data that lets all these companies play a fair game and lets the content creators get paid fairly for their work,” he said. “I think that bridge has not yet been crossed and that’s a mistake by the AI companies.”

Related: Copyright expert predicts result of NY Times lawsuit against Microsoft, OpenAI

OpenAI: ‘It would be impossible’ to train without violating copyright

OpenAI, however, which has recently signed licensing deals with The Associated Press and Axel Springer, has said that “it would be impossible to train today’s leading AI models without using copyrighted materials.”

The company reiterated in response to the Times’ lawsuit that it believes training is fair use and, despite this belief, allows an opt-out process for publishers who don’t want their data scraped. 

Opting out does not erase previously garnered data from existing models. 

“OpenAI’s lobbying campaign, simply put, is based on a false dichotomy (give everything to us free or we will die) — and also a threat: either we get to use all the existing IP we want for free, or you won’t get to generative AI anymore,” AI researcher Gary Marcus said at the time. “But the argument is hugely flawed.”

Marcus said that content creators are not arguing that tech companies should not be allowed to use their content; the argument, similar to Benioff’s own stance, is that tech companies should pay to use that content. 

OpenAI CEO Sam Altman, also speaking to Bloomberg at the World Economic Forum on Tuesday, said that the recent spate of copyright lawsuits filed against his company are “important, but not for the reason people think.”

“There is this belief held by some people that ‘you need all of my training data and my training data is so valuable,’ and actually, that is generally not the case. We do not want to train on the New York Times’ data, for example,” he said. 

Altman told Bloomberg that, as AI models get more powerful, “no one knows what happens next.”

Bloomberg/Getty Images

The focus of OpenAI’s research now, he said, is how far they can leverage smaller sets of high-quality data in training models to be on par with those trained on enormous sets of data. 

Altman’s goal is to work with publishers to essentially provide news snippets as sourced AI-generated output. He said OpenAI is currently striking a lot of partnerships which will soon be announced, though didn’t provide any details. 

The Information recently reported that OpenAI was offering between $1 million and $5 million in annual licensing fees to media publishers. 

“We don’t want to regurgitate someone else’s content, but the problem is not as easy as it sounds in a vacuum,” Altman said, adding that the web might be full of previously stolen New York Times articles that don’t include attribution, making training without violating copyright a “tricky thing.” 

He said, however, that if publishers provide OpenAI with a set of articles not to include in its training data, the company should be able to get that number of copyright-infringing instances down. 

“The positives are, I think there’s going to be great new ways to consume and monetize news, and for every one New York Times situation we have, we have many more super productive things about people that are excited to build the future and not do the theatrics,” Altman said. 

Contact Ian with AI stories via email, ian.krietzberg@thearenagroup.net, or Signal 732-804-1223.

Related: Human creativity persists in the era of generative AI

Get exclusive access to portfolio managers’ stock picks and proven investing strategies with Real Money Pro. Get started now.

Marketplace