Close Menu
TechZappi

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Beyond the Hype: How AI is Actually Changing Productivity, 2024 Edition

    April 16, 2026

    The Post-Smartphone Era: A Look at the Devices and Concepts Poised to Succeed Our Most Personal Gadget

    April 16, 2026

    Tesla FSD vs Waymo: The Self-Driving Battle That Will Define the Next Decade

    April 15, 2026
    Facebook X (Twitter) Instagram
    Facebook X (Twitter) Instagram Vimeo Pinterest YouTube
    TechZappi
    Subscribe Login
    • Home
    • AI

      Beyond the Hype: How AI is Actually Changing Productivity, 2024 Edition

      April 16, 2026

      AI Is Already Changing the World – Here’s How It’s Actually Playing Out

      April 8, 2026

      The AI Tools Actually Worth Your Time in 2026

      April 1, 2026

      The Jobs AI Is Already Taking — And What’s Coming Next

      March 17, 2026

      Amazon Expands AI Health Assistant to Its Main Platform

      March 11, 2026
    • Technology
      1. AI
      2. Cybersecurity
      3. Crypto
      4. App
      5. Security
      6. View All

      Beyond the Hype: How AI is Actually Changing Productivity, 2024 Edition

      April 16, 2026

      AI Is Already Changing the World – Here’s How It’s Actually Playing Out

      April 8, 2026

      The AI Tools Actually Worth Your Time in 2026

      April 1, 2026

      The Jobs AI Is Already Taking — And What’s Coming Next

      March 17, 2026

      Your Bank Details Aren’t as Safe as You Think – Here’s How Hackers Get Them

      March 31, 2026

      They Stole Billions — The Cyber Attacks That Changed Everything

      March 25, 2026

      Is Your Phone Spying on You? 10 Warning Signs It’s Been Hacked

      March 17, 2026

      Dutch Intelligence Warns of Russian Hackers Targeting Messaging Apps

      March 9, 2026

      Robinhood Acquires Bitstamp for $200M to Bolster Crypto Presence

      July 18, 2024

      CoinDCX Expands Globally with Acquisition of BitOasis

      July 4, 2024

      IRS Finalizes New Regulations for Crypto Tax Reporting

      July 4, 2024

      EU Privacy Decision Looms for Worldcoin Amid Ongoing Controversy

      June 4, 2024

      The Best Antivirus Software in 2026 – Tested, Ranked, and Worth Your Money

      April 7, 2026

      Google Expands Gemini AI Across Docs, Sheets, Slides, and Drive

      March 10, 2026

      William Shatner Helps Launch X Money Beta With Charity Auction

      March 4, 2026

      Nearby Glasses App Warns You if Someone Close Is Wearing Smart Glasses

      March 2, 2026

      The Best Antivirus Software in 2026 – Tested, Ranked, and Worth Your Money

      April 7, 2026

      Kaspersky to Cease US Operations and Lay Off Employees Following Government Ban

      July 17, 2024

      Data Breach Exposes Millions of mSpy Customers’ Data

      July 12, 2024

      HealthEquity Describes Data Breach as an ‘Isolated Incident’

      July 4, 2024

      Beyond the Hype: How AI is Actually Changing Productivity, 2024 Edition

      April 16, 2026

      The Post-Smartphone Era: A Look at the Devices and Concepts Poised to Succeed Our Most Personal Gadget

      April 16, 2026

      Tesla FSD vs Waymo: The Self-Driving Battle That Will Define the Next Decade

      April 15, 2026

      AI Is Already Changing the World – Here’s How It’s Actually Playing Out

      April 8, 2026
    • Contact
    TechZappi
    Home » EleutherAI Unveils One of the Largest Licensed Datasets for AI Training
    AI

    EleutherAI Unveils One of the Largest Licensed Datasets for AI Training

    June 6, 20252 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp VKontakte Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    EleutherAI, a well-known nonprofit in the AI research space, has unveiled what it describes as one of the most extensive collections of licensed and public domain text for training artificial intelligence models.

    The newly released dataset, named Common Pile v0.1, is the result of nearly two years of collaborative work. The project brought together AI startups such as Hugging Face and Poolside, along with multiple academic institutions. With a total size of 8 terabytes, the dataset was instrumental in developing EleutherAI’s new language models, Comma v0.1-1T and Comma v0.1-2T. The organization claims these models can match the performance of those trained on copyrighted or proprietary content.

    The timing of the release is notable, as many leading AI firms currently face lawsuits over their use of copyrighted material in training data. While some companies have negotiated licenses with content creators, most argue their actions fall under “fair use,” a legal gray area that remains hotly contested. In the face of these lawsuits, many companies have pulled back on transparency, limiting access to research data and methodology.

    EleutherAI’s executive director, Stella Biderman, emphasized that legal disputes have made open AI research more difficult. In a post on Hugging Face, she noted that some researchers have stopped publishing data-related work altogether due to legal concerns. This, she argues, has negatively impacted the broader AI research community.

    In contrast, the Common Pile v0.1 aims to demonstrate that powerful AI models can be built using data that is either licensed or in the public domain. The dataset includes hundreds of thousands of digitized public books from sources like the Library of Congress and the Internet Archive. It also incorporates audio transcriptions generated using OpenAI’s Whisper model.

    Both of EleutherAI’s new models were trained using just a portion of the dataset and consist of 7 billion parameters each. Despite this, they reportedly rival Meta’s original LLaMA model in areas like coding, mathematical reasoning, and visual understanding.

    Biderman stated that the belief that high-performing models require unlicensed data is becoming outdated. As more open data becomes available, she expects even stronger AI models can be trained while respecting copyright laws.

    EleutherAI has committed to more frequent releases of open, legally vetted datasets with the help of research institutions like the University of Toronto.

    AI EleutherAI
    Share. Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Email
    Previous ArticleItalian Spyware Scandal: Lawmakers Confirm Activist Surveillance but Deny Targeting Journalist
    Next Article Apple Must Comply with Ruling on External App Payments, Court Refuses Delay
    admin
    • Website

    Related Posts

    Beyond the Hype: How AI is Actually Changing Productivity, 2024 Edition

    April 16, 2026

    The Post-Smartphone Era: A Look at the Devices and Concepts Poised to Succeed Our Most Personal Gadget

    April 16, 2026

    Tesla FSD vs Waymo: The Self-Driving Battle That Will Define the Next Decade

    April 15, 2026

    AI Is Already Changing the World – Here’s How It’s Actually Playing Out

    April 8, 2026
    Leave A Reply Cancel Reply

    Our Picks

    Beyond the Hype: How AI is Actually Changing Productivity, 2024 Edition

    April 16, 2026

    Tesla FSD vs Waymo: The Self-Driving Battle That Will Define the Next Decade

    April 15, 2026

    AI Is Already Changing the World – Here’s How It’s Actually Playing Out

    April 8, 2026

    The Best Antivirus Software in 2026 – Tested, Ranked, and Worth Your Money

    April 7, 2026
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss
    AI

    Beyond the Hype: How AI is Actually Changing Productivity, 2024 Edition

    April 16, 2026

    Let’s be real: we’re tired of hearing “AI will replace us.” We want to know…

    The Post-Smartphone Era: A Look at the Devices and Concepts Poised to Succeed Our Most Personal Gadget

    April 16, 2026

    Tesla FSD vs Waymo: The Self-Driving Battle That Will Define the Next Decade

    April 15, 2026

    AI Is Already Changing the World – Here’s How It’s Actually Playing Out

    April 8, 2026

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

      About Us
      About Us

      TechZappi is your go-to source for the latest tech news, digital trends, and innovation stories. We cover topics ranging from AI and apps to cybersecurity and online tools, helping readers stay informed about what’s happening in the technology world.

      Our Picks

      Beyond the Hype: How AI is Actually Changing Productivity, 2024 Edition

      April 16, 2026

      AI Is Already Changing the World – Here’s How It’s Actually Playing Out

      April 8, 2026

      The Best Antivirus Software in 2026 – Tested, Ranked, and Worth Your Money

      April 7, 2026

      Subscribe to Updates

      Get the latest creative news from Techzappi about Ai, Apps and Cybersecurity.

        Facebook X (Twitter) Instagram Pinterest Vimeo YouTube
        • Home
        • AI
        • App
        • Cybersecurity
        © 2026 TechZappi. All Rights Reserved.

        Type above and press Enter to search. Press Esc to cancel.

        Sign In or Register

        Welcome Back!

        Login to your account below.

        Lost password?