Model card · pressure-v1
PENDING TRAININGPressure Score v1
A per-ball expected-value model that quantifies batting pressure in T20 cricket. For each delivery, the model estimates the probability distribution over (runs, wicket) outcomes given the ball context (phase, position, run-rate, wickets fallen). The difference between expected and actual outcomes accumulates into a per-innings Pressure Score — positive = the bowling side has exerted above-expected pressure.
validate-model-card-provenance.mjs).Purpose
Answers "How much pressure was a batter or bowler under/exerting, accounting for match context?" — a question raw economy or strike-rate cannot answer without context adjustment.
Metric emitted
Pressure Score
Profile pillar
P3 (Form & phase)
Output range
0–100 (higher = more pressure on the batting side)
Training corpus
The training corpus must trace to ball-by-ball deliveries. Any model that cannot produce this chain is not eligible under §3.3-model.
Source
CricketStudio ball-by-ball corpus — IPL historical (Cricsheet CC BY 3.0)
Seasons
2007/08–2025 (18 seasons)
Match count
pending trainingDelivery count
pending trainingTrain/val/test split
Season-based split — most recent 2 seasons held out for validation and test; earlier seasons used for training. Prevents data leakage across season boundaries.
Notes
IPL 2026 ball-by-ball corpus excluded from training (it is the primary inference target). Cricsheet CC BY 3.0 license. No proprietary or undisclosed data sources.
Features (11)
All features are derived from the ball-by-ball corpus — no external or proprietary data.
| Feature | Type | Description |
|---|---|---|
over | numeric | Over number within the innings (1–20) |
ball_in_over | numeric | Delivery within the over (1–6) |
phase | categorical | Innings phase: powerplay (1–6), middle (7–15), death (16–20) |
balls_remaining | numeric | Balls remaining in the innings at delivery time |
wickets_fallen | numeric | Wickets lost by batting side at delivery time |
runs_scored | numeric | Runs on board at delivery time |
run_rate_required | numeric | Required run rate at delivery time (second-innings only; 0 for first) |
innings | categorical | First (1) or second (2) innings |
venue_par | numeric | Historical par score at the venue (from venue-signature data) |
batter_handedness | categorical | Right-handed (0) or left-handed (1) |
bowler_type | categorical | Pace / off-spin / leg-spin / left-arm-spin / left-arm-pace |
Architecture & evaluation
Architecture
XGBoost (gradient-boosted trees) — separate models for runs regression (target: runs off the delivery) and wicket classification (target: binary wicket outcome). Two-headed design avoids conflating run-scoring and dismissal risk.
Targets
runs_off_delivery (regression), wicket_on_delivery (binary classification)
Evaluation metric
R² (runs regression), log-loss + AUC-ROC (wicket classification)
Eval results
pending trainingInference method
For each delivery in a captured innings: compute (actual_runs − expected_runs) + (actual_wicket − P(wicket)). Sum across all deliveries a bowler bowled (or a batter faced) in the innings. Normalise to 0–100 scale (100 = maximum pressure exerted per the training distribution). Positive = bowling-side advantage; negative = batting-side advantage.
How claims cite this model
Per §3.3-model, every claim derived from this model carries the model name, version, and a link back to this page in its provenance footer and in its ClaimReview JSON-LD block.
Atomic claim format
[Player] exerted a Pressure Score of [X] (Pressure v1, [N] deliveries, IPL 2026)
JSON-LD isBasedOn
"isBasedOn": { "@id": "https://players.cricketstudio.ai/methodology/model/pressure-v1#model" }Governance
Training date
not yet trainedRetrain policy
Retrained each IPL season on the full historical corpus (season-based split updated). Model card version incremented on every retrain.
Deprecation policy
This version remains citable after a new version ships. Claims citing pressure-v1 continue to resolve. Deprecated status is set when a successor version covers the same inference scope.
Contact
hello@cricketstudio.ai
Version history
| Version | Date | Notes |
|---|---|---|
vv1 | pending training | Initial release. Two-headed XGBoost on IPL historical corpus. Features: phase, wickets, RRR, venue par, bowler type. |
Known limitations
- T20-only. The expected-value baseline is derived from IPL data; applying it to other T20 formats (MLC, BBL, T20I) requires retraining or domain adaptation — not yet implemented.
- Venue par feature uses historical par averages (≥3 captured fixtures floor). Venues with fewer fixtures receive a league-average default.
- No batter-vs-bowler matchup feature in v1 — this creates systematic bias for matchups with large historical divergence from the population mean.
- Expected values reflect historical distributions; unusual batting orders or pitch conditions outside the training distribution will skew scores.
- Pressure Score is a relative metric within the training distribution — comparisons across eras (e.g. 2010 vs 2025) require era-adjustment, not yet implemented.
- Pending training: eval results, feature importances, and calibration plots will be added when training is complete. This card is a pre-disclosure scaffold per §3.3-model.