This is an informal case summary prepared for the purposes of facilitating exchange during the 2024 WIPO IP Judges Forum.
Session 1 of the 2024 IP Judges Forum
Doe 1 v. GitHub, Inc., No. 22-cv-06823-JST, 2024 WL 235217 (N.D. Cal. Jan. 22, 2024)
Date of judgment: January 22, 2024
Issuing authority: United States District Court for the Northern District of California
Level of the issuing authority: First Instance
Type of procedure: Judicial (Civil)
Subject matter: Copyright and Related Rights (Neighboring Rights)
Plaintiffs: Individual GitHub users who are proceeding anonymously under the pseudonym “Doe”
Defendants: GitHub, Inc.; Microsoft Corporation
Keywords: Copyright, Standing, Preemption, Motion to Dismiss, Artificial Intelligence, AI, Source Code, Training Data
Basic facts: Plaintiffs are software developers who use GitHub to write source code. Plaintiffs bring several claims against Defendants’ development and operation of Copilot and Codex, two artificial intelligence-based coding tools. Plaintiffs use GitHub to host their software projects and manage their source code. GitHub permits software developers or programmers to collaborate on projects stored in repositories. GitHub users can alter privacy settings for their repositories, such as whether a repository should be private or public, and the type of license their repositories may or may not grant to the public. All code uploaded to GitHub is subject to the GitHub Terms of Service. GitHub’s terms provide that users retain ownership of content they upload to GitHub, subject to GitHub’s “right to store, archive, parse, and display [the content], and make incidental copies, as necessary to provide the Service, including improving the Service over time.” ECF No. 1–2 at 27. GitHub and OpenAI developed and released Copilot and Codex, which are AI tools that use machine learning to produce source code. Machine learning is a method by which a program studies extensive amounts of “training data,” and then uses that training data to create an output upon request that is based on the training data the program has studied. Defendants developed Codex and Copilot using “billions of lines” of publicly available code as training data, including code from public GitHub repositories. In doing so, Plaintiffs allege Defendants violated the Digital Millennium Copyright Act, 17 U.S.C. §§ 1201–05, unfair competition in violation of the Lanham Act, 15 U.S.C. § 1125; and a slew of state law claims. Defendants filed motions to dismiss Plaintiffs’ claims pursuant to Rules 12(b)(1) and 12(b)(6) of the Federal Rules of Civil Procedure.
Held: Does 1, 2, and 5 sufficiently pled that Defendants’ programs will reproduce Plaintiffs' licensed code as output, thus causing a concrete, particularized, and actual or imminent injury. Does 3 and 4 have not alleged instances where their code has been an output of Defendants’ programs. Thus, Does 1, 2, and 5 have standing to pursue claims for both injunctive relief and damages, whereas Does 3 and 4 have standing to pursue only claims for injunctive relief. Further, Plaintiffs' state law claims—including intentional and negligent interference with prospective economic relations, unjust enrichment, unfair competition, and negligence—are preempted by Section 301 of the Copyright Act.
Relevant holdings in relation to Frontier Technologies and Intellectual Property Adjudication: At the pleading stage, Plaintiffs alleging that their independent creations have been used as training data must allege more than a mere possibility that the AI tool at issue will output their independent creations.
Relevant legislation: Digital Millennium Copyright Act, 17 U.S.C. §§ 1201–05, Lanham Act, 15 U.S.C. § 1125