AI and Copyright: When AI Training Becomes Direct Infringement

Direct Infringement

Many companies are already using AI to create content, analyze data, and even train their own internal models. But behind that efficiency, there’s one legal risk in the AI and Copyright space that often gets overlooked, using copyrighted data to train AI can be considered direct copyright infringement.

Under the law, copyright holders have the exclusive right to reproduce their works, as provided in:

  • 17 U.S.C. § 106 (Copyright Act of 1976)

  • Article 9(1) of the Berne Convention

The issue is that AI training almost always involves copying and storing digital works within datasets. From a legal standpoint, this process may be viewed as reproduction. So if it’s done without permission or a proper license, the training activity itself could already be treated as a direct infringement — even before the AI produces any output.

Why Fair Use May Not Always Apply

Several U.S. federal court decisions since 2023 suggest that the fair use defense under:

17 U.S.C. § 107

does not automatically apply in the context of AI training.

Courts are now looking at whether the use of datasets without authorization:

  • harms the potential licensing market for AI training data, or

  • allows AI-generated outputs to replace original works in the marketplace.

Because AI can generate large volumes of content with similar economic functions, this creates risks of:

  • market substitution, and

  • market dilution

If AI-generated outputs directly compete with original works or reduce licensing opportunities, the use of such data may be seen as causing market harm — which can lead to claims of direct infringement.

This Is Not Just a Risk for AI Developers

Companies that:

  • train models using third-party data,

  • use AI-generated content for marketing purposes, or

  • integrate AI into commercial digital products

may still face the risk of violating reproduction rights if the datasets used do not have a clear licensing basis.

In cases of direct infringement, liability can arise even without intent or deliberate wrongdoing.

Licensing Is Becoming Part of the Solution

That’s why many organizations are now considering licensed datasets as a practical way to:

  • avoid infringing reproduction rights, and

  • reduce the risk of direct infringement claims in the future.

If your business uses AI for commercial purposes, it’s worth making sure that your training processes do not create potential direct copyright infringement risks. The Intellectual Property team at amr.co.id can assist in assessing AI-related legal exposure and developing appropriate licensing strategies within the AI and Copyright framework.

For more information about AMR Partnership, feel free to contact us:

Latest articles