• mienshao@lemm.ee
    link
    fedilink
    English
    arrow-up
    70
    arrow-down
    1
    ·
    13 hours ago

    American law has become a literal fucking joke (IAAL). I could’ve guessed the could get the outcome of this case without any facts: the huge corporation wins over authors. American law is no longer capable of holding major corporations to account, so we need a new legal system—one that’s actually functional.

    • Dr. Moose@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      8
      ·
      11 hours ago

      But the actual process of an AI system distilling from thousands of written works to be able to produce its own passages of text qualified as “fair use” under U.S. copyright law because it was “quintessentially transformative,” Alsup wrote.

      Thats the actual argument and the judge is right here. LLMs are transformative in every sense of the word. The technology is even called “transformers”.

        • Dr. Moose@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          3 hours ago

          Nope I’m literally a data programmer working in this field. Any sufficiently transformed data even coming from hard copyright is transformative work and currently LLMs meet this criteria and will continue to do so. Wanna bet?

          • LwL@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            1 hour ago

            I think there’s a blurry line here where you can easily train an LLM to just regurgitate the source material by overfitting, and at what point is it “transformative enough”? I think there’s little doubt that current flagship models usually are transformative enough, but that doesn’t apply to everything using the same technology - even though this case will be used as precedence for all of that.

            There’s also another issue in that while safeguards are generally in place, without them llms would be very capable of quoting entire pages at least of popular books. And jailbreaking llms isn’t exactly unheard of. They also at least used to really like just verbatim repeating news articles on obscure topics.

            What I’m mainly getting at is that LLMs can be transformative, but they also can plagiarize. Much like any human could. The question is then, if training LLMs on copyrighted data is allowed, will the company be held accountable when their LLM does plagiarize, the same way a person would be? Or would the better decision be to prohibit training on copyrighted data because actually transforming it meaningfully can not be guaranteed, and copyright holders actually finding these violations is very hard?

            Though idk the case details, if the argument was purely focused on using the material to produce the model, rather than including the ultimate step of outputting text to anyone who asks, it was probably doomed to fail from the start and the decision makes perfect sense. And that doesn’t seem too unlikely to have happened because realizing this would require the lawyer making the case to actually understand what training an LLM does.