Debug and improve a deliberately broken AI system spanning ML and GenAI components. Participants must identify hidden bugs, fix performance and stability issues, and optimize the system under time constraints. Advanced teams may also analyze competing solutions to find and exploit weaknesses. Submissions are evaluated by an LLM-based judge that scans repositories against a predefined set of issues, rewarding both accuracy and depth of fixes.
Rankings will be announced soon.
Work on the provided broken AI system (ML + GenAI components)
Identify and fix as many bugs as possible
Improve performance, reliability, and system stability
Do not remove core functionality; fixes must preserve intended behavior
Optional: analyze and exploit flaws in competing solutions (within allowed scope)
Ensure all fixes are properly implemented and testable in the repository
Maintain clean, modular, and well-documented code
Scoring is based on number and difficulty of bugs fixed, verified by an LLM-based judge
Scoring System
Teams earn tournament points based on their final rank in each event. Higher ranks contribute more to the overall leaderboard. Raw scores are used to determine final ranking but are not added directly to the total tournament score.
View Overall Standings