This is not a hackathon, but a benchmark-style challenge designed to test systems under real-world conditions and conference-level rigor.
Legal Challenge 2026 is an international engineering competition focused on building production-grade AI systems for the legal domain. Teams develop Agentic & RAG solutions evaluated for accuracy, faithfulness to legal sources, and real-time performance.
Reach
GLOBAL
Attendees
3,500+
AWARDS CEREMONY AT MACHINES CAN SEE
Strategic Partners
The winners will be announced during a special online awards event affiliated with Machines Can See, bringing together competition participants, organizers, and members of the global AI community, and showcasing the best solutions developed during the challenge.
The Agentic RAG Legal Challenge is a flagship event of Dubai AI Week and part of the Machines Can See 2026 ecosystem.
Dubai, UAE — April 2026
Benchmark Principles & Guarantees
Six pillars defining the most rigorous legal AI evaluation framework ever created.
Open Results
Final datasets, telemetry, and winning approaches are released for transparent and repeatable research.
Anti-Gaming
Private test set, mandatory telemetry, chunk-ID validation, and code audits prevent gaming and ensure fairness.
A corpus of real regulations, case law, and long-form contracts that reflects true legal complexity.
Focus on Faithfulness
Strict grounding checks ensure every answer is verifiably based on retrieved legal sources.
Production Evaluation
Systems are tested as full production pipelines, including ingestion, retrieval, generation, and telemetry.
01
Strict evaluation for exact-answer questions, verifying factual correctness and required data types
Deterministic Fact Checking
02
Generative explanations are evaluated by an LLM judge for correctness, completeness, grounding, clarity, and uncertainty handling
LLM-as-Judge Scoring
03
Each answer must include full telemetry — TTFT, total latency, token usage, and retrieved chunk IDs — with penalties for missing data
Mandatory Telemetry
04
Retrieval is validated through gold-chunk matching; if the correct sources are not retrieved, the grounding score is zero
Grounding Verificationg
05
Time-to-First-Token directly impacts scoring, rewarding fast, production-grade systems and penalizing slow responses
Latency as a Metric
06
Final rankings are determined by a 24-hour evaluation run on a private test set, ensuring fairness and resistance to gaming
Private-Set Evaluation
PRODUCTION- GRADE EVALUATION FRAMEWORK
Designed to mirror real-world deployment conditions, our framework evaluates accuracy, faithfulness, latency, and system integrity as a unified production benchmark.
$12,000
1st Place
$8,000
$4,000
Prizes
2nd Place
3rd Place
Main Prizes (by Total Score)
The prize pool is divided between Main Prizes (overall ranking) and Special Nominations (engineering excellence in specific areas):
$1,000
(2x) Best Publication
AI popularization
Best video/post/etc about the competition (by Jury's choice)
$2,000
Retrieval Master
Best grounding
Highest Grounding Score (Total Score ≥ 70%)
$2,000
Efficiency Expert
Most token-efficient
Highest Score/Token ratio (Total Score ≥ 70%)
$2,000
Lowest avg TTFT (Total Score ≥ 70%)
Fastest solution
Speed Champion
Special Nominations
All top-3 teams receive a fully-funded trip to Dubai for the Machines Can See 2026 Summit
Key milestones of the challenge — from registration and onboarding to final evaluation and awards
Schedule
Team registration opens Community support via Discord
Phase 01
Sign-up & Onboarding
11 February – 11 March
1
11 March – 18 March
2
Final Event
Awards
Winners announced at MCS online event Winning teams’ pitches Part of Dubai AI Week
6 April – 9 April
Phase 02
Competition Start
Starter kit & documentation Demo dataset (30 documents) 100 sample questions released Live leaderboard updates
4
Full corpus unlocked (300+ docs) Full evaluation (1,000+ questions) Final submission: March 22
Phase 03
Active Competition
18 March – 25 March
3
FAQ
All documents are real, public legal materials. No synthetic or toy documents are used.
Any publicly accessible APIs for LLMs, embeddings, or search. Private, self-hosted, or undisclosed models are not allowed to ensure reproducibility.
Every answer must include trace data: timing, TTFT, token usage, and retrieved chunk IDs. Missing telemetry causes a 10% penalty for that answer. Telemetry also supports anti-gaming and fairness.
Yes. The full private set, results, and winning approaches are released after the awards ceremony for reproducibility and community research.
The competition is organized in partnership with Machines Can See, as part of Dubai AI Week. This is a pilot edition, with future iterations planned.
All questions should be submitted via Discord.
There is a dedicated:
#questions — for general inquiries #tech-support — for technical issues #judgment-support — for evaluation and scoring questions
Registered participants receive access to all relevant channels.
No. Participants must independently choose and access the models they wish to use (proprietary or open-source) and cover any associated costs.
The competition is model-agnostic, and no API keys or usage credits are provided by the organizers.
No. Participants may use:
Any LLM (OpenAI, Anthropic, open-source, etc.) Any retrieval model Any additional models integrated into their pipeline
There are no restrictions on architecture or tooling.
No, the competition is not won by using more tokens or more powerful models.
Evaluation considers multiple factors:
Answer quality Grounding (clear citations to exact documents or sections) Retrieval quality Performance metrics (including Time to First Token)
A highly advanced model without proper grounding or with poor latency will score lower than a well-engineered system with strong retrieval and citations.
Success depends on the overall system design, not on model size or token usage.
The dataset consists of:
~300 real-world legal and corporate documents (primarily from the Dubai International Financial Centre — DIFC) 1,000 evaluation questions divided into:
Public Test Set 100 questions across 30 documents Available from March 11 for local testing
Private Test Set 900 questions covering the full corpus Released on March 18
Question types include:
Single-document fact extraction Clause and reference analysis Multi-document reasoning Negative and adversarial queries
A preliminary version of the public dataset is available via Discord in the #participant-announcements channel after registration.
Stay in the Loop
Not ready to compete? Get early access to the next challenge, research releases, and Dubai AI Week announcements.
Manage cookies
We use cookies to provide the best site experience.
Manage cookies
Cookie Settings
Cookies necessary for the correct operation of the site are always enabled. Other cookies are configurable.