Before we cook
Men’s college basketball is a sprawl – 360-odd Division I teams, a
month of March that decides everything, and a stats culture built as
much on tempo-free efficiency as on raw box scores. hoopR
covers all of it, and the trick to not drowning is the same as in the
NBA cookbook: learn the grammar of the function names and you
can guess your way to almost anything.
If you’ve read the NBA cookbook, you already know most of the
grammar. This one adds one new prefix – kp_ for KenPom –
and swaps the league token from nba to mbb.
That’s nearly the whole diff.
The grammar, refreshed for college
A hoopR college function answers three questions in
order:
-
Where’s the data from? – the prefix.
-
espn_– ESPN’s APIs. The broad, stable backbone. -
kp_– KenPom (kenpom.com). Tempo-free efficiency, the lingua franca of college analytics. Requires a KenPom subscription. -
load_– pre-built bulk season files.
-
-
Which league? – for ESPN it’s spelled out:
espn_mbb_. Thembbtoken is “men’s basketball.” (KenPom is men’s-college-only, sokp_functions don’t need a league token – the prefix already implies it.) - What do you want? – the rest of the name, general to specific.
So “a team’s schedule from ESPN” is espn_ +
mbb_ + team_schedule ->
espn_mbb_team_schedule(). “KenPom’s efficiency ratings” is
kp_ + efficiency ->
kp_efficiency(). You will be right far more often than
you’ll be wrong, and being wrong just means opening the reference
index.
Recipe 1: A blue-blood program’s season
The story. It’s November and you want to set the table for Duke’s season – schedule, roster, and where they sit.
Type espn_mbb_team and the grammar lays out the menu for
you:
team_id <- 150 # Duke
season <- most_recent_mbb_season()
espn_mbb_team(team_id = team_id) # identity + record
#> $Info
#> # A tibble: 1 × 13
#> id uid slug abbreviation display_name short_display_name name nickname
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 150 s:40~… duke… DUKE Duke Blue D… Duke Blue… Duke
#> # ℹ 5 more variables: location <chr>, color <chr>, alternate_color <chr>,
#> # logo <chr>, logo_dark <chr>
#>
#> $Record
#> # A tibble: 3 × 4
#> description type summary stats
#> <chr> <chr> <chr> <list>
#> 1 Overall Record total 35-3 <df [20 × 2]>
#> 2 Home Record home 15-0 <df [20 × 2]>
#> 3 Away Record road 10-1 <df [20 × 2]>
#>
#> $NextEvent
#> # A tibble: 1 × 4
#> id date name short_name
#> <chr> <chr> <chr> <chr>
#> 1 401856577 2026-03-29T21:05Z UConn Huskies at Duke Blue Devils CONN VS DUKE
#>
#> $StandingSummary
#> # A tibble: 1 × 1
#> standing_summary
#> <chr>
#> 1 1st in ACC
#>
#> $Coaches
#> data frame with 0 columns and 0 rows
espn_mbb_team_schedule(team_id = team_id, season = season) # the slate
#> # A tibble: 34 × 21
#> event_id season season_type week date name short_name opponent_id
#> <chr> <int> <int> <int> <chr> <chr> <chr> <chr>
#> 1 401817228 2026 NA 1 2025-11-05T0… Texa… TEX VS DU… NA
#> 2 401817229 2026 NA 1 2025-11-08T1… West… WCU @ DUKE NA
#> 3 401817230 2026 NA 2 2025-11-12T0… Duke… DUKE @ AR… NA
#> 4 401817231 2026 NA 2 2025-11-15T0… Indi… INST @ DU… NA
#> 5 401817232 2026 NA 3 2025-11-19T0… Kans… KU VS DUKE NA
#> 6 401813377 2026 NA 3 2025-11-22T0… Niag… NIA @ DUKE NA
#> 7 401817233 2026 NA 3 2025-11-23T2… Howa… HOW @ DUKE NA
#> 8 401817234 2026 NA 4 2025-11-28T0… Duke… DUKE VS A… NA
#> 9 401806364 2026 NA 5 2025-12-03T0… Flor… FLA @ DUKE NA
#> 10 401817235 2026 NA 5 2025-12-06T1… Duke… DUKE @ MSU NA
#> # ℹ 24 more rows
#> # ℹ 13 more variables: opponent_abbrev <chr>, home_away <chr>,
#> # neutral_site <lgl>, conference_competition <lgl>, venue_id <chr>,
#> # venue_name <chr>, venue_city <chr>, venue_state <chr>, broadcast <chr>,
#> # result <chr>, team_score <chr>, opponent_score <chr>, winner <lgl>
espn_mbb_team_roster(team_id = team_id, season = season) # the players
#> # A tibble: 14 × 15
#> athlete_id full_name jersey position_abbrev position_name height weight age
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 5041935 Cameron … 12 F Forward "6' 9… 250 l… NA
#> 2 5041937 Cayden B… 2 G Guard "6' 4… 205 l… NA
#> 3 5105337 Maliq Br… 6 F Forward "6' 9… 225 l… NA
#> 4 5341547 Brock Da… 50 G Guard "6' 4… 195 l… NA
#> 5 5061585 Isaiah E… 3 G Guard "6' 6… 180 l… NA
#> 6 4711256 Caleb Fo… 1 G Guard "6' 5… 205 l… NA
#> 7 4873107 Darren H… 8 G Guard "6' 5… 195 l… NA
#> 8 5144124 Nikolas … 14 G Guard "6' 8… 215 l… NA
#> 9 4873209 Patrick … 21 C Center "6' 1… 250 l… NA
#> 10 5287474 Dame Sarr 7 G Guard "6' 8… 190 l… NA
#> 11 4888720 Jack Sco… 20 G Guard "6' 6… 220 l… NA
#> 12 4432321 Cameron … 13 G Guard "6' 6… 205 l… NA
#> 13 5107141 Ifeanyi … 15 C Center "6' 1… 240 l… NA
#> 14 5144150 Sebastia… 5 G Guard "6' 8… 220 l… NA
#> # ℹ 7 more variables: birth_date <chr>, birth_place <chr>, headshot <chr>,
#> # link_web <chr>, status <chr>, team_id <chr>, season <int>
espn_mbb_team_season_statistics(team_id = team_id, season = season)
#> # A tibble: 77 × 13
#> league season season_type team_id category_name category_display stat_name
#> <chr> <int> <int> <chr> <chr> <chr> <chr>
#> 1 mens-col… 2026 2 150 defensive Defensive blocks
#> 2 mens-col… 2026 2 150 defensive Defensive defensiv…
#> 3 mens-col… 2026 2 150 defensive Defensive steals
#> 4 mens-col… 2026 2 150 defensive Defensive turnover…
#> 5 mens-col… 2026 2 150 defensive Defensive avgDefen…
#> 6 mens-col… 2026 2 150 defensive Defensive avgBlocks
#> 7 mens-col… 2026 2 150 defensive Defensive avgSteals
#> 8 mens-col… 2026 2 150 general General disquali…
#> 9 mens-col… 2026 2 150 general General flagrant…
#> 10 mens-col… 2026 2 150 general General fouls
#> # ℹ 67 more rows
#> # ℹ 6 more variables: stat_abbrev <chr>, stat_display <chr>, value <dbl>,
#> # display_value <chr>, rank <int>, rank_display_value <chr>The most_recent_mbb_season() helper is worth a margin
note. Every league in hoopR has a
most_recent_<league>_season() companion, and the new
espn_mbb_* functions use it as their default
season argument. So most of the time you can simply
omit season and get the current year – the
function already knows what “now” means.
Recipe 2: The freshman phenom
The story. Every college season has a one-and-done freshman everyone argues about. You want that player’s numbers.
College players turn over fast – this year’s phenom is next year’s
pro – so the reliable move is to pull the current roster first and lift
an athlete_id straight off it, then walk the
athlete_ family:
roster <- espn_mbb_team_roster(team_id = 150, season = most_recent_mbb_season())
athlete_id <- roster$athlete_id[1] # whoever is first on the roster
espn_mbb_athlete_info(athlete_id = athlete_id) # bio
#> $Bio
#> # A tibble: 1 × 18
#> id uid guid first_name last_name full_name display_name short_name
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 5041935 s:40~l:4… da73… Cameron Boozer Cameron … Cameron Boo… C. Boozer
#> # ℹ 10 more variables: weight <dbl>, display_weight <chr>, height <dbl>,
#> # display_height <chr>, jersey <chr>, active <lgl>, headshot_href <chr>,
#> # birth_city <chr>, birth_state <chr>, birth_country <chr>
#>
#> $Team
#> # A tibble: 1 × 1
#> x_ref
#> <chr>
#> 1 http://sports.core.api.espn.com/v2/sports/basketball/leagues/mens-college-bas…
#>
#> $Position
#> # A tibble: 1 × 5
#> id name display_name abbreviation leaf
#> <chr> <chr> <chr> <chr> <lgl>
#> 1 2 Forward Forward F FALSE
#>
#> $Status
#> # A tibble: 1 × 4
#> id name type abbreviation
#> <chr> <chr> <chr> <chr>
#> 1 1 Active active Active
#>
#> $College
#> # A tibble: 0 × 0
#>
#> $Draft
#> # A tibble: 0 × 0
espn_mbb_athlete_gamelog(athlete_id = athlete_id, season = most_recent_mbb_season())
#> # A tibble: 38 × 26
#> id at_vs game_date score home_team_id away_team_id home_team_score
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 401856577 vs 2026-03-29T2… 73-72 150 41 72
#> 2 401856570 vs 2026-03-27T2… 80-75 150 2599 80
#> 3 401856530 vs 2026-03-21T2… 81-58 150 2628 81
#> 4 401856478 vs 2026-03-19T1… 71-65 150 2561 71
#> 5 401851183 vs 2026-03-15T0… 74-70 150 258 74
#> 6 401851182 vs 2026-03-14T0… 73-61 150 228 73
#> 7 401851179 vs 2026-03-12T2… 80-79 150 52 80
#> 8 401820788 vs 2026-03-07T2… 76-61 150 153 76
#> 9 401820778 @ 2026-03-03T0… 93-64 152 150 64
#> 10 401820771 vs 2026-02-28T1… 77-51 150 258 77
#> # ℹ 28 more rows
#> # ℹ 19 more variables: away_team_score <chr>, game_result <chr>,
#> # league_name <chr>, league_abbreviation <chr>, league_short_name <chr>,
#> # event_note <chr>, team_id <chr>, team_uid <chr>, team_abbreviation <chr>,
#> # team_logo <chr>, team_is_all_star <chr>, opponent_id <chr>,
#> # opponent_uid <chr>, opponent_display_name <chr>,
#> # opponent_abbreviation <chr>, opponent_logo <chr>, opponent_rank <chr>, …
espn_mbb_athlete_career_stats(athlete_id = athlete_id) # career rollup, long format
#> # A tibble: 87 × 17
#> league athlete_id stat_type_id split_id split_name split_type category_name
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 mens-co… 5041935 0 0 Season season defensive
#> 2 mens-co… 5041935 0 0 Season season defensive
#> 3 mens-co… 5041935 0 0 Season season defensive
#> 4 mens-co… 5041935 0 0 Season season defensive
#> 5 mens-co… 5041935 0 0 Season season defensive
#> 6 mens-co… 5041935 0 0 Season season defensive
#> 7 mens-co… 5041935 0 0 Season season defensive
#> 8 mens-co… 5041935 0 0 Season season defensive
#> 9 mens-co… 5041935 0 0 Season season general
#> 10 mens-co… 5041935 0 0 Season season general
#> # ℹ 77 more rows
#> # ℹ 10 more variables: category_display <chr>, category_short <chr>,
#> # category_abbrev <chr>, stat_name <chr>, stat_abbrev <chr>,
#> # stat_display <chr>, stat_short <chr>, description <chr>, value <dbl>,
#> # display_value <chr>That “pull the index, lift an ID, drill down” move is the single most
useful habit in the whole package. athlete_gamelog in
particular only has data for players currently on a roster –
last year’s star has moved on – so feeding it an ID you just lifted off
team_roster is how you guarantee it returns something.
Margin note: athlete_info, athlete_gamelog,
athlete_career_stats all share the athlete_
stem. When a stem repeats across many functions, that stem is a
namespace – a promise that everything under it is about the
same subject. Browsing by stem (espn_mbb_athlete + Tab) is
often faster than searching.
Recipe 3: Tempo-free, the KenPom way
The story. Box scores lie a little in college – a team that plays fast will pile up points without being good. You want efficiency: points per possession, not per game. That’s KenPom’s whole reason to exist.
This is where the new prefix earns its keep. Everything KenPom is
kp_.
# KenPom needs a subscription. Log in once per session.
kp_user_email <- "you@example.com"
kp_password <- Sys.getenv("KP_PASSWORD")
# The famous Pomeroy ratings -- adjusted efficiency for every team.
kp_pomeroy_ratings(min_year = 2024, max_year = 2025)
# The four factors: shooting, turnovers, rebounding, free throws.
kp_fourfactors(min_year = 2025, max_year = 2025)
# Raw adjusted-efficiency table.
kp_efficiency(min_year = 2025, max_year = 2025)The grammar lesson here is about prefixes as capabilities.
The moment you see kp_, three things are implied without
another word: the data is men’s-college-only, it is tempo-free, and it
needs authentication. A prefix is a compressed paragraph. Learn what
each prefix promises and half the documentation becomes redundant.
KenPom’s own family is deep – and, of course, regular:
kp_team_schedule(team = "Duke", year = 2025)
kp_teamstats(min_year = 2025, max_year = 2025)
kp_playerstats(metric = "eFG", year = 2025)
kp_kpoy(year = 2025) # KenPom Player of the Year race
kp_fanmatch(date = "2025-02-01") # every game on a date, win probs attachedRecipe 4: Game flow without the bracket pressure
The story. A January road game went to overtime. You want the play- by-play and a sense of the swing.
Same three zoom levels as the NBA cookbook – the grammar is league-agnostic:
game_id <- 401256760
espn_mbb_game_all(game_id = game_id) # everything, as a named list
#> $Plays
#> # A tibble: 353 × 48
#> id sequence_number text away_score home_score scoring_play score_value
#> <chr> <chr> <chr> <int> <int> <lgl> <int>
#> 1 4012567… 101805901 M.J.… 0 0 FALSE 3
#> 2 4012567… 101806301 RaiQ… 0 0 FALSE 0
#> 3 4012567… 101806302 RaiQ… 0 2 TRUE 2
#> 4 4012567… 101808801 Scot… 0 2 FALSE 2
#> 5 4012567… 101809001 Bals… 0 2 FALSE 0
#> 6 4012567… 101814501 Bals… 0 2 FALSE 2
#> 7 4012567… 101814502 Flor… 0 2 FALSE 0
#> 8 4012567… 101817001 Tre … 3 2 TRUE 3
#> 9 4012567… 101818201 RaiQ… 3 2 FALSE 0
#> 10 4012567… 101818202 Tre … 3 2 FALSE 0
#> # ℹ 343 more rows
#> # ℹ 41 more variables: shooting_play <lgl>, points_attempted <int>,
#> # short_description <chr>, type_id <int>, type_text <chr>,
#> # period_number <int>, period_display_value <chr>, clock_display_value <chr>,
#> # team_id <int>, play_id <chr>, athlete_id_1 <int>, athlete_id_2 <int>,
#> # home_team_id <int>, home_team_mascot <chr>, home_team_name <chr>,
#> # home_team_abbrev <chr>, home_team_logo <chr>, home_team_logo_dark <chr>, …
#>
#> $Team
#> # A tibble: 2 × 57
#> game_id season season_type game_date game_date_time team_id team_uid
#> <int> <int> <int> <date> <dttm> <int> <chr>
#> 1 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 57 s:40~l:41…
#> 2 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 52 s:40~l:41…
#> # ℹ 50 more variables: team_slug <chr>, team_location <chr>, team_name <chr>,
#> # team_abbreviation <chr>, team_display_name <chr>,
#> # team_short_display_name <chr>, team_color <chr>,
#> # team_alternate_color <chr>, team_logo <chr>, team_home_away <chr>,
#> # team_score <int>, team_winner <lgl>, assists <int>, blocks <int>,
#> # defensive_rebounds <int>, fast_break_points <chr>, field_goal_pct <dbl>,
#> # field_goals_made <int>, field_goals_attempted <int>, …
#>
#> $Player
#> # A tibble: 34 × 54
#> game_id season season_type game_date game_date_time athlete_id
#> <int> <int> <int> <date> <dttm> <int>
#> 1 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4395714
#> 2 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4397204
#> 3 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4432819
#> 4 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4432812
#> 5 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4395713
#> 6 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4702251
#> 7 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4432825
#> 8 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4279008
#> 9 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4592827
#> 10 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4565449
#> # ℹ 24 more rows
#> # ℹ 48 more variables: athlete_display_name <chr>, team_id <int>,
#> # team_name <chr>, team_location <chr>, team_short_display_name <chr>,
#> # minutes <dbl>, field_goals_made <int>, field_goals_attempted <int>,
#> # three_point_field_goals_made <int>,
#> # three_point_field_goals_attempted <int>, free_throws_made <int>,
#> # free_throws_attempted <int>, offensive_rebounds <int>, …
espn_mbb_team_box(game_id = game_id) # team box
#> # A tibble: 2 × 57
#> game_id season season_type game_date game_date_time team_id team_uid
#> <int> <int> <int> <date> <dttm> <int> <chr>
#> 1 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 57 s:40~l:41…
#> 2 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 52 s:40~l:41…
#> # ℹ 50 more variables: team_slug <chr>, team_location <chr>, team_name <chr>,
#> # team_abbreviation <chr>, team_display_name <chr>,
#> # team_short_display_name <chr>, team_color <chr>,
#> # team_alternate_color <chr>, team_logo <chr>, team_home_away <chr>,
#> # team_score <int>, team_winner <lgl>, assists <int>, blocks <int>,
#> # defensive_rebounds <int>, fast_break_points <chr>, field_goal_pct <dbl>,
#> # field_goals_made <int>, field_goals_attempted <int>, …
espn_mbb_player_box(game_id = game_id) # player box
#> # A tibble: 34 × 54
#> game_id season season_type game_date game_date_time athlete_id
#> <int> <int> <int> <date> <dttm> <int>
#> 1 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4395714
#> 2 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4397204
#> 3 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4432819
#> 4 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4432812
#> 5 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4395713
#> 6 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4702251
#> 7 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4432825
#> 8 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4279008
#> 9 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4592827
#> 10 401256760 2021 2 2020-12-12 2020-12-12 11:00:00 4565449
#> # ℹ 24 more rows
#> # ℹ 48 more variables: athlete_display_name <chr>, team_id <int>,
#> # team_name <chr>, team_location <chr>, team_short_display_name <chr>,
#> # minutes <dbl>, field_goals_made <int>, field_goals_attempted <int>,
#> # three_point_field_goals_made <int>,
#> # three_point_field_goals_attempted <int>, free_throws_made <int>,
#> # free_throws_attempted <int>, offensive_rebounds <int>, …
espn_mbb_pbp(game_id = game_id) # play-by-play
#> # A tibble: 353 × 48
#> id sequence_number text away_score home_score scoring_play score_value
#> <chr> <chr> <chr> <int> <int> <lgl> <int>
#> 1 4012567… 101805901 M.J.… 0 0 FALSE 3
#> 2 4012567… 101806301 RaiQ… 0 0 FALSE 0
#> 3 4012567… 101806302 RaiQ… 0 2 TRUE 2
#> 4 4012567… 101808801 Scot… 0 2 FALSE 2
#> 5 4012567… 101809001 Bals… 0 2 FALSE 0
#> 6 4012567… 101814501 Bals… 0 2 FALSE 2
#> 7 4012567… 101814502 Flor… 0 2 FALSE 0
#> 8 4012567… 101817001 Tre … 3 2 TRUE 3
#> 9 4012567… 101818201 RaiQ… 3 2 FALSE 0
#> 10 4012567… 101818202 Tre … 3 2 FALSE 0
#> # ℹ 343 more rows
#> # ℹ 41 more variables: shooting_play <lgl>, points_attempted <int>,
#> # short_description <chr>, type_id <int>, type_text <chr>,
#> # period_number <int>, period_display_value <chr>, clock_display_value <chr>,
#> # team_id <int>, play_id <chr>, athlete_id_1 <int>, athlete_id_2 <int>,
#> # home_team_id <int>, home_team_mascot <chr>, home_team_name <chr>,
#> # home_team_abbrev <chr>, home_team_logo <chr>, home_team_logo_dark <chr>, …
# ESPN's win-probability-per-play for the same game.
espn_mbb_event_probabilities(event_id = game_id)
#> # A tibble: 200 × 10
#> event_id sequence_number play_id period clock home_win_percentage
#> <chr> <chr> <chr> <int> <chr> <dbl>
#> 1 401256760 101805901 NA NA NA 0.413
#> 2 401256760 101806301 NA NA NA 0.431
#> 3 401256760 101806302 NA NA NA 0.458
#> 4 401256760 101808801 NA NA NA 0.482
#> 5 401256760 101809001 NA NA NA 0.489
#> 6 401256760 101814501 NA NA NA 0.467
#> 7 401256760 101814502 NA NA NA 0.459
#> 8 401256760 101817001 NA NA NA 0.409
#> 9 401256760 101818201 NA NA NA 0.387
#> 10 401256760 101818202 NA NA NA 0.387
#> # ℹ 190 more rows
#> # ℹ 4 more variables: away_win_percentage <dbl>, tie_percentage <dbl>,
#> # secs_to_end_of_period <dbl>, secs_to_end_of_game <dbl>If you read the NBA cookbook, you’ve now seen
espn_*_game_all, espn_*_team_box,
espn_*_player_box, espn_*_pbp in two leagues
with only the league token changing. That’s the payoff of a
regular grammar: a recipe you learned for the NBA transfers to college
by editing one word.
Recipe 5: Per-player, per-game box scores
The story. You want a single player’s line from a
single game, tidy and long – not the whole player_box.
# event + team + athlete -- three IDs, because you're naming one cell
# of one game. (More identifiers = finer grain. Always.)
espn_mbb_event_player_box(
event_id = 401256760,
team_id = 52,
athlete_id = 4277850
)
#> # A tibble: 77 × 12
#> league event_id team_id athlete_id stat_type category_name category_display
#> <chr> <chr> <chr> <chr> <int> <chr> <chr>
#> 1 mens-co… 4012567… 52 4277850 0 defensive Defensive
#> 2 mens-co… 4012567… 52 4277850 0 defensive Defensive
#> 3 mens-co… 4012567… 52 4277850 0 defensive Defensive
#> 4 mens-co… 4012567… 52 4277850 0 defensive Defensive
#> 5 mens-co… 4012567… 52 4277850 0 defensive Defensive
#> 6 mens-co… 4012567… 52 4277850 0 defensive Defensive
#> 7 mens-co… 4012567… 52 4277850 0 defensive Defensive
#> 8 mens-co… 4012567… 52 4277850 0 general General
#> 9 mens-co… 4012567… 52 4277850 0 general General
#> 10 mens-co… 4012567… 52 4277850 0 general General
#> # ℹ 67 more rows
#> # ℹ 5 more variables: stat_name <chr>, stat_abbrev <chr>, stat_display <chr>,
#> # value <dbl>, display_value <chr>Recipe 6: The conference race
The story. It’s February and your league is a three-team knife fight. You want the standings and the structure underneath them.
College conferences are a hierarchy – divisions,
conferences, the whole D-I grouping – and ESPN models that hierarchy
with the season_group family.
season <- most_recent_mbb_season()
# A grammar gotcha worth flagging: espn_mbb_standings() takes `year`,
# not `season`. Most functions say `season`, a few older ones say
# `year` -- when an argument name surprises you, ?function is the
# two-second fix.
espn_mbb_standings(year = season)
#> # A tibble: 365 × 86
#> team_id team avgpointsagainst avgpointsfor gamesbehind leaguewinpercent
#> * <int> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 193 Miami (OH… 75.5 89.6 0 1
#> 2 130 Michigan … 69.5 87.3 0 0.95
#> 3 12 Arizona W… 69.4 86.2 0 0.889
#> 4 150 Duke Blue… 63.6 81.6 0 0.944
#> 5 2250 Gonzaga B… 66.2 84.3 0 0.889
#> 6 2272 High Poin… 71.2 89.7 0 0.938
#> 7 41 UConn Hus… 65.2 76.6 0 0.85
#> 8 258 Virginia … 68.8 80.4 4 0.833
#> 9 139 Saint Lou… 70.5 87.2 0 0.833
#> 10 2006 Akron Zips 74 87.9 3.5 0.944
#> # ℹ 355 more rows
#> # ℹ 80 more variables: losses <dbl>, playoffseed <dbl>,
#> # pointdifferential <chr>, pointsagainst <dbl>, pointsfor <dbl>,
#> # streak <dbl>, winpercent <dbl>, wins <dbl>, gamesahead <chr>, total <chr>,
#> # home_avgpointsagainst <dbl>, home_avgpointsfor <dbl>,
#> # home_gamesbehind <dbl>, home_leaguewinpercent <dbl>, home_losses <dbl>,
#> # home_playoffseed <dbl>, home_pointdifferential <chr>, …
# The group tree: groups -> one group -> its teams.
espn_mbb_season_groups(season = season, season_type = 2)
#> # A tibble: 2 × 5
#> league season season_type group_id ref
#> <chr> <int> <int> <chr> <chr>
#> 1 mens-college-basketball 2026 2 50 http://sports.core.api.es…
#> 2 mens-college-basketball 2026 2 51 http://sports.core.api.es…
espn_mbb_season_group(group_id = 50, season = season, season_type = 2)
#> # A tibble: 1 × 15
#> league season season_type group_id uid name abbreviation short_name
#> <chr> <int> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 mens-college-… 2026 2 50 s:40… NCAA… NCAA Division I
#> # ℹ 7 more variables: midsize_name <chr>, is_conference <lgl>, slug <chr>,
#> # parent_ref <chr>, children_ref <chr>, teams_ref <chr>, standings_ref <chr>
espn_mbb_season_group_teams(group_id = 50, season = season, season_type = 2)
#> # A tibble: 200 × 6
#> league season season_type group_id team_id ref
#> <chr> <int> <int> <chr> <chr> <chr>
#> 1 mens-college-basketball 2026 2 50 2 http://sports.co…
#> 2 mens-college-basketball 2026 2 50 5 http://sports.co…
#> 3 mens-college-basketball 2026 2 50 6 http://sports.co…
#> 4 mens-college-basketball 2026 2 50 8 http://sports.co…
#> 5 mens-college-basketball 2026 2 50 9 http://sports.co…
#> 6 mens-college-basketball 2026 2 50 12 http://sports.co…
#> 7 mens-college-basketball 2026 2 50 13 http://sports.co…
#> 8 mens-college-basketball 2026 2 50 16 http://sports.co…
#> 9 mens-college-basketball 2026 2 50 21 http://sports.co…
#> 10 mens-college-basketball 2026 2 50 23 http://sports.co…
#> # ℹ 190 more rowsMargin note: season_groups (plural) is the index;
season_group (singular) is one item. That singular/plural
pair shows up all over the package –
franchises/franchise,
tournaments/tournament,
positions/position. Plural = “give me the
list,” singular = “give me this one.” It’s the most reliable two-word
rule in the whole grammar.
Recipe 7: March
The story. It’s the only story in college basketball. You want the tournament.
season <- most_recent_mbb_season()
espn_mbb_tournaments() # which tournaments exist
#> # A tibble: 38 × 3
#> tournament_id ref league
#> <chr> <chr> <chr>
#> 1 3 http://sports.core.api.espn.com/v2/sports/basketball/le… mens-…
#> 2 1 http://sports.core.api.espn.com/v2/sports/basketball/le… mens-…
#> 3 39 http://sports.core.api.espn.com/v2/sports/basketball/le… mens-…
#> 4 2 http://sports.core.api.espn.com/v2/sports/basketball/le… mens-…
#> 5 4 http://sports.core.api.espn.com/v2/sports/basketball/le… mens-…
#> 6 5 http://sports.core.api.espn.com/v2/sports/basketball/le… mens-…
#> 7 6 http://sports.core.api.espn.com/v2/sports/basketball/le… mens-…
#> 8 7 http://sports.core.api.espn.com/v2/sports/basketball/le… mens-…
#> 9 8 http://sports.core.api.espn.com/v2/sports/basketball/le… mens-…
#> 10 9 http://sports.core.api.espn.com/v2/sports/basketball/le… mens-…
#> # ℹ 28 more rows
espn_mbb_tournament(tournament_id = 3) # one tournament
#> # A tibble: 1 × 4
#> tournament_id display_name seasons_ref league
#> <chr> <chr> <chr> <chr>
#> 1 3 Atlantic Coast Conf Tournament http://sports.core.api.es… mens-…
espn_mbb_tournament_seasons(tournament_id = 3) # its yearly editions
#> # A tibble: 14 × 4
#> league tournament_id season ref
#> <chr> <chr> <int> <chr>
#> 1 mens-college-basketball 3 2009 http://sports.core.api.espn.com…
#> 2 mens-college-basketball 3 2010 http://sports.core.api.espn.com…
#> 3 mens-college-basketball 3 2011 http://sports.core.api.espn.com…
#> 4 mens-college-basketball 3 2012 http://sports.core.api.espn.com…
#> 5 mens-college-basketball 3 2013 http://sports.core.api.espn.com…
#> 6 mens-college-basketball 3 2014 http://sports.core.api.espn.com…
#> 7 mens-college-basketball 3 2015 http://sports.core.api.espn.com…
#> 8 mens-college-basketball 3 2016 http://sports.core.api.espn.com…
#> 9 mens-college-basketball 3 2017 http://sports.core.api.espn.com…
#> 10 mens-college-basketball 3 2018 http://sports.core.api.espn.com…
#> 11 mens-college-basketball 3 2019 http://sports.core.api.espn.com…
#> 12 mens-college-basketball 3 2020 http://sports.core.api.espn.com…
#> 13 mens-college-basketball 3 2022 http://sports.core.api.espn.com…
#> 14 mens-college-basketball 3 2023 http://sports.core.api.espn.com…
espn_mbb_tournament_season(tournament_id = 3, season = season)
#> # A tibble: 1 × 8
#> league tournament_id season display_name short_display_name number_of_rounds
#> <chr> <chr> <int> <chr> <chr> <int>
#> 1 mens-co… 3 2026 ACC Tournam… ACC 0
#> # ℹ 2 more variables: season_ref <chr>, bracketology_ref <chr>Same singular/plural rule as Recipe 6, same general-to-specific march
from tournaments down to one
tournament_season. By now you should be able to
predict this family before you read it – which is the entire
point of the cookbook.
Recipe 8: A whole season, in bulk
The story. You’re done with single games. You want every college game of a season to model on.
# load / mbb / pbp -- one call, one season.
pbp <- load_mbb_pbp(seasons = 2024)
load_mbb_player_box(seasons = 2024)
load_mbb_team_box(seasons = 2024)
load_mbb_schedule(seasons = 2024)And straight into a database, with the same
(seasons, dbConnection, tablename) plumbing every
load_ function shares:
library(DBI)
con <- dbConnect(RSQLite::SQLite(), "mbb.sqlite")
load_mbb_pbp(seasons = 2020:2024, dbConnection = con, tablename = "mbb_pbp")
dbDisconnect(con)Working through a proxy
Campus networks love a proxy. hoopR handles them in
three layers – reach for the least invasive one that works.
Layer 1 – set it once for the session. Covers everything: ESPN, KenPom, loaders.
options(hoopR.proxy = "http://proxy.university.edu:8080")
# Authenticated proxy? Pass a list instead of a string.
options(hoopR.proxy = list(
url = "http://proxy.university.edu",
port = 8080,
username = "netid",
password = Sys.getenv("PROXY_PASS")
))
# Then just work -- every call inherits it.
espn_mbb_team_roster(team_id = 150, season = most_recent_mbb_season())
kp_pomeroy_ratings(min_year = 2025, max_year = 2025)Layer 2 – environment variables. For shared scripts and CI, where the proxy belongs to the machine and not the analysis, set the standard variables and let libcurl pick them up:
Sys.setenv(
http_proxy = "http://proxy.university.edu:8080",
https_proxy = "http://proxy.university.edu:8080",
no_proxy = "localhost,127.0.0.1"
)A grammar-flavored note on what doesn’t take a per-call
proxy = argument: the espn_*,
kp_*, and load_* functions all call the HTTP
layer directly, so they read the proxy from the session option or the
environment – not from an argument. (In the NBA cookbook, the
nba_* Stats API functions do accept per-call
proxy =, because they thread ... through.
College basketball has no equivalent NBA-Stats-style surface, so for
mbb you’ll use Layer 1 or Layer 2 every time.) The prefix,
once again, tells you the capability before you go looking.
Where to go next
The college game has more teams, a deeper conference hierarchy, and
the kp_ prefix – but the grammar is identical to the NBA’s.
You decided where the data lived, you named the league, you named the
thing from general to specific, and you let the singular/plural rule and
the shared stems carry you the rest of the way.
The women’s game runs on wehoop, a sibling package with
the exact same grammar. espn_mbb_team_roster has a mirror
image in espn_wbb_team_roster; everything you just learned
crosses straight over. That’s the WBB cookbook.
