Methodology
How we evaluate AI models on real-world creative work, shaped by Contra’s 1M+ global members.
All categories
Design
Writing
Marketing
Social Media
Engineering
Video & Animation
Music & Audio
Top skills
Graphic Designer
14.84%
Web Developer
10.55%
Web Designer
10.1%
UI Designer
9.04%
Brand Designer
8.46%
Video Editor
8.29%
Top tools
Adobe Suite
48.222%
Figma
27.034%
Canva
18.253%
WordPress
11.748%
React
9.298%
JavaScript
8.22%
All categories
Design
Writing
Marketing
Social Media
Engineering
Video & Animation
Music & Audio
Top skills
Graphic Designer
14.84%
Web Developer
10.55%
Web Designer
10.1%
UI Designer
9.04%
Brand Designer
8.46%
Video Editor
8.29%
Top tools
Adobe Suite
48.222%
Figma
27.034%
Canva
18.253%
WordPress
11.748%
React
9.298%
JavaScript
8.22%
All categories
Design
Writing
Marketing
Social Media
Engineering
Video & Animation
Music & Audio
Top skills
Graphic Designer
14.84%
Web Developer
10.55%
Web Designer
10.1%
UI Designer
9.04%
Brand Designer
8.46%
Video Editor
8.29%
Top tools
Adobe Suite
48.222%
Figma
27.034%
Canva
18.253%
WordPress
11.748%
React
9.298%
JavaScript
8.22%
Overview
Creative Arena compares multiple image generation models on tasks that mirror real paid projects commissioned on Contra. We convert anonymized deliverables into prompts, run controlled tournaments with four models at a time, and updateoverall and per-category Elo ratings after every battle.
Real-world grounding
Prompts originate from anonymized client projects
Real-world grounding
Prompts originate from anonymized client projects
Real-world grounding
Prompts originate from anonymized client projects
Controlled bracket
Fixed six-battle mini-tournaments yield a full 1st–4th ordering
Controlled bracket
Fixed six-battle mini-tournaments yield a full 1st–4th ordering
Controlled bracket
Fixed six-battle mini-tournaments yield a full 1st–4th ordering
Continuous ratings
Elo starts at 1500 with K=32; updates occur every battle
Continuous ratings
Elo starts at 1500 with K=32; updates occur every battle
Continuous ratings
Elo starts at 1500 with K=32; updates occur every battle
Categories
We currently evaluate models across the following practical design categories:
Landing Page
Ad
UI Component
Data Visualization
Moodboard
Logo
Video
Categories reflect typical client deliverables on Contra. Results are reported both overall and per category.
Data sourcing & prompt generation
1
Collect deliverables
We sample deliverables from real, completed Contra projects.
1
Collect deliverables
We sample deliverables from real, completed Contra projects.
1
Collect deliverables
We sample deliverables from real, completed Contra projects.
2
Anonymize & sanitize
We remove personally identifiable information (PII), trademarks, and any client-specific terms that would reveal identity or confidential details.
2
Anonymize & sanitize
We remove personally identifiable information (PII), trademarks, and any client-specific terms that would reveal identity or confidential details.
2
Anonymize & sanitize
We remove personally identifiable information (PII), trademarks, and any client-specific terms that would reveal identity or confidential details.
3
Category classification
Deliverables are run through a classifier (LLM-assisted) to map to one of the Arena categories.
3
Category classification
Deliverables are run through a classifier (LLM-assisted) to map to one of the Arena categories.
3
Category classification
Deliverables are run through a classifier (LLM-assisted) to map to one of the Arena categories.
4
Prompt drafting
From the anonymized deliverable, we generate a prompt that captures the intent, constraints, and style of the original request while remaining generic and safe.
4
Prompt drafting
From the anonymized deliverable, we generate a prompt that captures the intent, constraints, and style of the original request while remaining generic and safe.
4
Prompt drafting
From the anonymized deliverable, we generate a prompt that captures the intent, constraints, and style of the original request while remaining generic and safe.
5
Image generation
An image is generated for the given prompt for each active model.
5
Image generation
An image is generated for the given prompt for each active model.
5
Image generation
An image is generated for the given prompt for each active model.
Example tournament format (4 models, 6 battles)
Category selection
A user selects a category (or one is randomly selected).
Category selection
A user selects a category (or one is randomly selected).
Category selection
A user selects a category (or one is randomly selected).
Prompt selection
A pre-generated category prompt is selected at random.
Prompt selection
A pre-generated category prompt is selected at random.
Prompt selection
A pre-generated category prompt is selected at random.
Model sampling
Four distinct models are chosen from the active pool.
Model sampling
Four distinct models are chosen from the active pool.
Model sampling
Four distinct models are chosen from the active pool.
Initial battles
Battle 1
Model A
vs
Model B
Battle 1
Model A
vs
Model B
Battle 1
Model A
vs
Model B
Battle 1
Model C
vs
Model D
Battle 1
Model C
vs
Model D
Battle 1
Model C
vs
Model D
Winner & loser brackets
Battle 3
Winners
Model A
vs
Model D
Battle 3
Winners
Model A
vs
Model D
Battle 3
Winners
Model A
vs
Model D
Battle 4
Losers
Model B
vs
Model C
Battle 4
Losers
Model B
vs
Model C
Battle 4
Losers
Model B
vs
Model C
Tiebreaker
Battle 5
1 win each
Model B
vs
Model D
Battle 5
1 win each
Model B
vs
Model D
Battle 5
1 win each
Model B
vs
Model D
Battle 6
2 wins each
Model A
vs
Model B
Battle 6
2 wins each
Model A
vs
Model B
Battle 6
2 wins each
Model A
vs
Model B
Final ranking
1st
Model A

1st
Model A

1st
Model A

2nd
Model B
2nd
Model B
2nd
Model B
3rd
Model C
3rd
Model C
3rd
Model C
4th
Model D
4th
Model D
4th
Model D
Fairness & bias controls
Left/Right randomization
Each battle randomizes side assignment.
Left/Right randomization
Each battle randomizes side assignment.
Left/Right randomization
Each battle randomizes side assignment.
Blind judging
No model names, vendors, prompts, or metadata are shown to judges.
Blind judging
No model names, vendors, prompts, or metadata are shown to judges.
Blind judging
No model names, vendors, prompts, or metadata are shown to judges.
Prompt hygiene
Prompts are anonymized, policy-compliant, and category-consistent.
Prompt hygiene
Prompts are anonymized, policy-compliant, and category-consistent.
Prompt hygiene
Prompts are anonymized, policy-compliant, and category-consistent.
Balanced exposure
Scheduler ensures broad coverage across models and pairings over time.
Balanced exposure
Scheduler ensures broad coverage across models and pairings over time.
Balanced exposure
Scheduler ensures broad coverage across models and pairings over time.
Audit sampling
A subset of matches is reviewed by humans for quality control.
Audit sampling
A subset of matches is reviewed by humans for quality control.
Audit sampling
A subset of matches is reviewed by humans for quality control.
Ratings (Elo)
We maintain two Elo ratings per model: an overall Elo and a per-category Elo. All models start at 1500. After every battle, we apply an update with K = 32.
1500
Starting Elo rating
K = 32
Update factor after each battle
Base prompts
Base prompts are defined for each category and utilized in combination with the custom or pre-generated prompts.
Landing Page
Ad
UI component
Data visualization
Moodboard
Logo
Video
You are an expert web developer tasked with building a website. Follow these requirements:
Generate a complete and valid HTML document with DOCTYPE and meta tags.
Return raw HTML that can be used directly without any additional processing.`
Use inline vanilla CSS and JavaScript where possible.
When an external dependency is needed, use UNPKG.
Use semantic HTML elements (nav, main, section, article, etc.).
Be accessible, with professional design and good contrast.
Generate mobile-first responsive design using modern CSS techniques (e.g, Grid/Flexbox).
Write clean, readable code with proper spacing.
Your only output should be a markdown code block.
Example output:
html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Page Title</title>
<script src="https://unpkg.com/chart.js"></script>
<style>
body {
font-family: sans-serif;
margin: 0;
}
</style>
</head>
<body>
<h1>Hello, World!</h1>
<p>This is an HTML page.</p>
</body>
</html>
Landing Page
Ad
UI component
Data visualization
Moodboard
Logo
Video
You are an expert web developer tasked with building a website. Follow these requirements:
Generate a complete and valid HTML document with DOCTYPE and meta tags.
Return raw HTML that can be used directly without any additional processing.`
Use inline vanilla CSS and JavaScript where possible.
When an external dependency is needed, use UNPKG.
Use semantic HTML elements (nav, main, section, article, etc.).
Be accessible, with professional design and good contrast.
Generate mobile-first responsive design using modern CSS techniques (e.g, Grid/Flexbox).
Write clean, readable code with proper spacing.
Your only output should be a markdown code block.
Example output:
html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Page Title</title>
<script src="https://unpkg.com/chart.js"></script>
<style>
body {
font-family: sans-serif;
margin: 0;
}
</style>
</head>
<body>
<h1>Hello, World!</h1>
<p>This is an HTML page.</p>
</body>
</html>
Landing Page
Ad
UI component
Data visualization
Moodboard
Logo
Video
You are an expert web developer tasked with building a website. Follow these requirements:
Generate a complete and valid HTML document with DOCTYPE and meta tags.
Return raw HTML that can be used directly without any additional processing.`
Use inline vanilla CSS and JavaScript where possible.
When an external dependency is needed, use UNPKG.
Use semantic HTML elements (nav, main, section, article, etc.).
Be accessible, with professional design and good contrast.
Generate mobile-first responsive design using modern CSS techniques (e.g, Grid/Flexbox).
Write clean, readable code with proper spacing.
Your only output should be a markdown code block.
Example output:
html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Page Title</title>
<script src="https://unpkg.com/chart.js"></script>
<style>
body {
font-family: sans-serif;
margin: 0;
}
</style>
</head>
<body>
<h1>Hello, World!</h1>
<p>This is an HTML page.</p>
</body>
</html>
FAQs
What makes Contra's Creative Arena different?
Prompts start from anonymized deliverables of real, paid client projects on Contra, keeping tasks practical and grounded.
How are models selected for tournaments?
Four distinct models are sampled from the active pool. Side assignment (left/right) is randomized every battle.
Do ratings change after every battle?
Yes. Elo ratings update after each individual battle. We maintain both overall and per-category ratings.
FAQs
What makes Contra's Creative Arena different?
Prompts start from anonymized deliverables of real, paid client projects on Contra, keeping tasks practical and grounded.
How are models selected for tournaments?
Four distinct models are sampled from the active pool. Side assignment (left/right) is randomized every battle.
Do ratings change after every battle?
Yes. Elo ratings update after each individual battle. We maintain both overall and per-category ratings.
FAQs
What makes Contra's Creative Arena different?
Prompts start from anonymized deliverables of real, paid client projects on Contra, keeping tasks practical and grounded.
How are models selected for tournaments?
Four distinct models are sampled from the active pool. Side assignment (left/right) is randomized every battle.
Do ratings change after every battle?
Yes. Elo ratings update after each individual battle. We maintain both overall and per-category ratings.
Discover hidden jobs in your network
Scan thousands of posts in your LinkedIn and X feeds, saving you countless hours.
FOR INDEPENDENTS
Discover hidden jobs in your network
Scan thousands of posts in your LinkedIn and X feeds, saving you countless hours.
FOR INDEPENDENTS
Discover hidden jobs in your network
Scan thousands of posts in your LinkedIn and X feeds, saving you countless hours.
FOR INDEPENDENTS