{"id":8782,"date":"2026-05-27T10:01:22","date_gmt":"2026-05-27T02:01:22","guid":{"rendered":"http:\/\/longzhuplatform.com\/?p=8782"},"modified":"2026-05-27T10:01:22","modified_gmt":"2026-05-27T02:01:22","slug":"reddit-ceo-llms-would-not-exist-without-reddit-data-via-sejournal-mattgsouthern","status":"publish","type":"post","link":"http:\/\/longzhuplatform.com\/?p=8782","title":{"rendered":"Reddit CEO: LLMs \u2018Would Not Exist\u2019 Without Reddit Data via @sejournal, @MattGSouthern"},"content":{"rendered":"<p><\/p> <div id=\"narrow-cont\"> <p>Reddit CEO Steve Huffman said large language models \u201cwould not exist as we know them\u201d without Reddit\u2019s content. He called the platform\u2019s user-generated data \u201cmodern oil\u201d for AI.<\/p> <p>Huffman made the comments during an interview at Fast Company\u2019s Most Innovative Companies Summit.<\/p> <h2>What Huffman Said About Reddit\u2019s Value To AI<\/h2> <p>Huffman described the position Reddit\u2019s data holds in the AI ecosystem.<\/p> <p>Huffman said:<\/p> <blockquote> <p>\u201cLLMs would not exist as we know them without Reddit. Reddit is one of the single largest sources of training data for the LLMs and Reddit continues to be one of the primary sources of both training data and we\u2019re also the most cited, the most cited platform across all models.\u201d<\/p> <\/blockquote> <p>He attributed the citation claim to Profound, a firm that tracks AI citation data.<\/p> <p>Huffman explained why AI companies depend on the content.<\/p> <blockquote> <p>\u201cThere\u2019s no artificial intelligence without actual intelligence. At the end of the day, these models are quite simple. They\u2019re regurgitating on an absolutely massive scale what they\u2019ve consumed elsewhere and a large portion of that consumption is actually just the human conversation on Reddit because it\u2019s natural and it covers basically every topic imaginable.\u201d<\/p> <\/blockquote> <h2>Deals For Some, Lawsuits For Others<\/h2> <p>Reddit announced data licensing agreements with Google and OpenAI in 2024. Huffman referenced those as Reddit\u2019s original two AI data deals and didn\u2019t announce any additional agreements.<\/p> <blockquote> <p>\u201cSince we did the original two deals with Google and OpenAI, that was over two years ago, so we\u2019ve learned a lot. They\u2019ve learned a lot. The whole world\u2019s learned a lot. Specifically how valuable Reddit\u2019s data is and how useful it is. And so we\u2019re being I think very deliberate and selective there. But yeah, we\u2019re open and open for business.\u201d<\/p> <\/blockquote> <p>For companies that haven\u2019t agreed to licensing terms, Reddit has taken legal action. The company sued Anthropic in California Superior Court, alleging unauthorized use of Reddit content and violations of Reddit\u2019s terms. Reddit filed a federal lawsuit against Perplexity in the Southern District of New York, along with three data-scraping firms, alleging DMCA anti-circumvention violations and related claims.<\/p> <p>Huffman drew a line between the two groups.<\/p> <blockquote> <p>\u201cCompanies like Google and OpenAI where we had good relationships, we can actually do a deal and put some guard rails on use and access to our data on behalf of our users but then collaborate on making products for the next generation of the internet.\u201d<\/p> <\/blockquote> <p>He added that \u201cnot every company is willing to be a collaborative partner and so unfortunately we have to go the other way which is lawsuits.\u201d<\/p> <p>Huffman told the audience Reddit\u2019s position on commercial use is simple. \u201cCommercial use of our data requires commercial terms,\u201d he said. Reddit began charging for commercial API access in 2023, a move that preceded the current licensing deals.<\/p> <p>Huffman said Reddit still provides free data access to researchers and universities and tries to remain flexible for non-commercial use.<\/p> <h2>What Changed Reddit\u2019s Openness<\/h2> <p>According to Huffman, Reddit\u2019s willingness to share data freely changed when the AI industry moved away from open research. As SEJ previously reported, Reddit limited access for many search engine crawlers while Google remained an exception.<\/p> <blockquote> <p>\u201cHistorically, Reddit has been like we\u2019re born of the open internet and Reddit has been open and very permissive for access to its data. And honestly, I think we would be in a different position today if the AI companies were still basically open and open source and doing open research.\u201d<\/p> <\/blockquote> <p>Huffman said the issue was that Reddit couldn\u2019t longer track how its data was being used. \u201cPeople are using our data and we don\u2019t know what it was being used for,\u201d he told the audience.<\/p> <p>Beyond commercial terms, Huffman said Reddit wants to prevent its data from being used to identify users, target them with ads, or to replace or disintermediate the platform.<\/p> <h2>Reddit\u2019s Own AI Efforts<\/h2> <p>Huffman acknowledged what he called a \u201cparadox.\u201d Reddit\u2019s content powers external AI systems, but the company also uses AI across its platform.<\/p> <p>The most visible product is Reddit Answers, an LLM-powered search feature. It reads posts and comments, then organizes them into responses built from verbatim user quotes. Huffman noted it\u2019s designed for questions without definitive answers.<\/p> <blockquote> <p>\u201cWhat Reddit Answers does is a couple of things that are unique to Reddit. One, it basically only answers in verbatim quotes from actual people. And then the second thing it does is it tries to present multiple perspectives because the whole point if you\u2019re on Reddit, you want the human perspective.\u201d<\/p> <\/blockquote> <p>Behind the scenes, Reddit uses AI for content moderation and classification. LLMs can evaluate whether a comment crosses into bullying, something Huffman described as previously difficult because of the subjectivity involved.<\/p> <p>Huffman presented AI moderation as a way to reduce exposure to the worst content, not as a replacement for Reddit\u2019s community moderation model.<\/p> <p>\u201cThe worst job on the internet used to be looking at the worst content on the internet and deciding whether it could be online or not,\u201d Huffman said. \u201cThat job just goes away.\u201d<\/p> <h2>The Gray Area Of AI-Written Posts<\/h2> <p>Huffman also addressed the challenge of users writing content with AI tools and pasting it into Reddit. That\u2019s different from automated bot activity, he stressed.<\/p> <blockquote> <p>\u201cThe most annoying thing that I see not just on Reddit, but all over the internet is somebody who wrote their post or comment with ChatGPT and then pasted it into Reddit. Like, is that a bot? Certainly feels like a bot, but there\u2019s a human behind the idea.\u201d<\/p> <\/blockquote> <p>Huffman cast the issue as one of intent. \u201cIt\u2019s very important to us that there\u2019s a human behind the idea, behind the content, behind the prompt,\u201d Huffman said. But he also noted that \u201cthe writing sucks\u201d when users rely on AI to compose their posts.<\/p> <p>Rather than creating a policy to address it, Huffman indicated Reddit will let its community handle the issue. Users are already downvoting AI-written content and calling it out in comments. Huffman said Reddit will \u201cempower the users more and the subreddits more to just reject that sort of content altogether.\u201d<\/p> <p>He compared the broader question to calculators in math class. \u201cKids these days are just learning how to write with AI. What are we going to do about it?\u201d he said. \u201cWe kind of have to learn, I think, along with everybody else.\u201d<\/p> <h2>Why This Matters<\/h2> <p>Huffman\u2019s comments reinforce Reddit\u2019s pitch that its user discussions are a core input for AI systems.<\/p> <p>The AI-written content problem Huffman described is one SEJ covered as part of a broader YouTube AI slop investigation. Reddit\u2019s decision to let community voting handle AI-generated posts, rather than building detection tools, is a different path than platforms that have deployed automated labeling.<\/p> <h2>Looking Ahead<\/h2> <p>Huffman told Fast Company that Reddit is \u201cin the market talking to folks all the time\u201d about new data deals, though he didn\u2019t hint at a third agreement.<\/p> <p>Reddit\u2019s lawsuits against Anthropic and Perplexity are both ongoing. The Anthropic case was the subject of a federal court remand hearing in March.<\/p> <\/div> <p>News,Reddit#Reddit #CEO #LLMs #Exist #Reddit #Data #sejournal #MattGSouthern1779847282<\/p> ","protected":false},"excerpt":{"rendered":"<p>Reddit CEO Steve Huffman said large language models \u201cwould not exist as we know them\u201d without Reddit\u2019s content. He called the platform\u2019s user-generated data \u201cmodern oil\u201d for AI. Huffman made the comments during an interview at Fast Company\u2019s Most Innovative Companies Summit. What Huffman Said About Reddit\u2019s Value To AI Huffman described the position Reddit\u2019s [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":8783,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16],"tags":[735,450,33841,299,90,699,80],"class_list":["post-8782","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-accessibility","tag-ceo","tag-data","tag-exist","tag-llms","tag-mattgsouthern","tag-reddit","tag-sejournal"],"acf":[],"_links":{"self":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts\/8782","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=8782"}],"version-history":[{"count":0,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/posts\/8782\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=\/wp\/v2\/media\/8783"}],"wp:attachment":[{"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=8782"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=8782"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/longzhuplatform.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=8782"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}