Close

COBOLEval: How Well Can Large Language Models (LLMs) Write Code?

(4G)

Stream: Virtual Room 4
Time: 09:00 - 10:00


Presentation

LLMs are fast-changing the way that we write software. Over a million developers now pay for GitHub Copilot and recent breakthroughs in LLM reasoning have brought the dream of a fully AI Software Engineer closer to reality. But while it’s not hard to find a demo of an LLM coding a website or a clone of Flappy Bird, not much is known about their ability to write COBOL.

We've developed a new benchmark to evaluate the ability of LLMs to write COBOL. This presentation walks through the rationale, our design descisions, and the results from some popular LLMs.

Speakers


  • Gabriel Gordon-Hall at bloop AI Limited
  • Cofounder & CTO @ bloop


    Email: gabriel@bloop.ai

  • Gabriel Gordon-Hall at bloop AI Limited
  • Cofounder & CTO @ bloop


    Email: gabriel@bloop.ai

    Feedback

    Click here to give some Feedback so we can make it even better next year!