Recently revealed deposition excerpts show Mark Zuckerberg, Meta’s CEO, referencing YouTube’s handling of pirated content to defend Meta’s use of copyrighted material in its AI training datasets. These remarks come as part of the ongoing copyright case Kadrey v. Meta, one of many lawsuits where AI companies face claims of infringing intellectual property rights.
Zuckerberg’s statements stem from accusations that Meta utilized LibGen, a database hosting unauthorized copies of copyrighted books, to train its AI models, known as Llama. LibGen has faced multiple lawsuits and been fined millions for copyright violations. Plaintiffs in the case include prominent authors like Sarah Silverman and Ta-Nehisi Coates, who allege that Meta knowingly used pirated materials.
During his deposition, Zuckerberg likened the situation to YouTube’s efforts to manage pirated content, suggesting that while some infringement occurs, the platform operates largely within legal bounds. He argued that banning datasets outright due to potential copyright issues might not always be practical. “Would I want to have a policy against people using YouTube because some content may be copyrighted? No,” Zuckerberg stated.
However, the deposition also reveals internal concerns at Meta regarding the legal risks of using LibGen. Employees reportedly referred to LibGen as a “pirated dataset” and warned it could weaken the company’s stance with regulators. Despite this, court filings allege that Meta relied on LibGen for training its Llama models and may have even used the dataset to evaluate whether to pursue licensing agreements with publishers.
The plaintiffs further claim that Meta has continued using other sources of pirated materials, including Z-Library, for training its AI models. Z-Library has also been subject to legal actions for copyright infringement, with its operators facing charges in 2022.
Zuckerberg, when questioned about his knowledge of LibGen, claimed he was unfamiliar with it, stating, “I get that you’re trying to get me to give an opinion of LibGen, which I haven’t really heard of.”
Despite asserting that Meta should exercise caution when using copyrighted materials, Zuckerberg’s deposition highlights a broader debate: the balance between leveraging vast datasets for AI development and respecting intellectual property laws.
As the case unfolds, amended complaints have introduced new allegations, including claims that Meta used LibGen to train its latest Llama models and employed tactics to obscure the use of copyrighted materials. These developments underscore the legal challenges AI companies face in navigating copyright issues in the era of large-scale AI training.