• Droggl@lemmy.sdf.org
    link
    fedilink
    arrow-up
    2
    ·
    11 months ago

    I dont remember the numbers but iirc it was covered by one of the validation datasets and GPT 4 did quite well on it

    • Maestro@kbin.social
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      11 months ago

      Yeah, but did it do well on the specific examples from the Winograd paper? Because ChatGPT probably just learned those since they are well known and oft repeatef. Or does it do well on brand new sentences made according to the Winograd scheme?