sportsreference.mlb package

The MLB package offers multiple modules which can be used to retrieve information and statistics for Major League Baseball, such as team names, season stats, game schedules, and boxscore metrics.

sportsreference.mlb.boxscore module

The Boxscore module can be used to grab information from a specific game. Metrics range from number of runs scored to the number of sacrifice flies, to the slugging percentage and much more. The Boxscore can be easily queried by passing a boxscore’s URI on sports-reference.com which can be retrieved from the Schedule class (see Schedule module below for more information on retrieving game-specific information).

from sportsreference.mlb.boxscore import Boxscore

game_data = Boxscore('BOS/BOS201808020')
print(game_data.home_runs)  # Prints 15
print(game_data.away_runs)  # Prints 7
df = game_data.dataframe  # Returns a Pandas DataFrame of game metrics

The Boxscore module also contains a Boxscores class which searches for all games played on a particular day and returns a dictionary of matchups between all teams on the requested day. The dictionary includes the names and abbreviations for each matchup as well as the boxscore link if applicable.

from datetime import datetime
from sportsreference.mlb.boxscore import Boxscores

games_today = Boxscores(datetime.today())
print(games_today.games)  # Prints a dictionary of all matchups for today
class sportsreference.mlb.boxscore.Boxscore(uri)[source]

Bases: object

Detailed information about the final statistics for a game.

Stores all relevant information for a game such as the date, time, location, result, and more advanced metrics such as the number of strikes, a pitcher’s influence on the game, the number of putouts and much more.

Parameters:uri (string) – The relative link to the boxscore HTML page, such as ‘BOS/BOS201806070’.
attendance

Returns an int of the game’s listed attendance.

away_assists

Returns an int of the number of assists the away team registered.

away_at_bats

Returns an int of the number of at bats the away team had.

away_average_leverage_index

Returns a float of the amount of pressure the away team’s pitcher faced during the game. 1.0 denotes average pressure while numbers less than 0 denote lighter pressure.

away_base_out_runs_added

Returns a float of the number of base out runs added by the away team.

away_base_out_runs_saved

Returns a float of the number of runs saved by the away pitcher based on the number of players on bases. 0.0 denotes an average value.

away_bases_on_balls

Returns an int of the number of bases the away team registerd as a result of balls.

away_batting_average

Returns a float of the batting average for the away team.

away_earned_runs

Returns a float of the number of runs the away team earned.

away_fly_balls

Returns an int of the number of fly balls the away team allowed.

away_game_score

Returns an int of the starting away pitcher’s score determine by many factors, such as number of runs scored against, number of strikes, etc.

away_grounded_balls

Returns an int of the number of grounded balls the away team allowed.

away_hits

Returns an int of the number of hits the away team had.

away_home_runs

Returns an int of the number of times the away team gave up a home run.

away_inherited_runners

Returns an int of the number of runners a pitcher inherited when he entered the game.

away_inherited_score

Returns an int of the number of scorers a pitcher inherited when he entered the game.

away_innings_pitched

Returns a float of the number of innings the away team pitched.

away_line_drives

Returns an int of the number of line drives the away team allowed.

away_on_base_percentage

Returns a float of the percentage of at bats that result in the batter getting on base.

away_on_base_plus

Returns a float of the on base percentage plus the slugging percentage. Percentage ranges from 0-1.

away_pitches

Returns an int of the number of pitches the away team faced.

away_plate_appearances

Returns an int of the number of plate appearances the away team made.

away_putouts

Returns an int of the number of putouts the away team registered.

away_rbi

Returns an int of the number of runs batted in the away team registered.

away_runs

Returns an int of the number of runs the away team scored.

away_slugging_percentage

Returns a float of the slugging percentage for the away team based on the number of bases gained per at-bat with bigger plays getting more weight.

away_strikeouts

Returns an int of the number of times the away team was struck out.

away_strikes

Returns an int of the number of times a strike was called against the away team.

away_strikes_by_contact

Returns an int of the number of times the away team struck out a batter who made contact with the pitch.

away_strikes_looking

Returns an int of the number of times the away team struck out a batter who was looking.

away_strikes_swinging

Returns an int of the number of times the away team struck out a batter who was swinging.

away_unknown_bat_type

Returns an int of the number of away at bats that were not properly tracked and therefore cannot be safely placed in another statistical category.

away_win_probability_added

Returns a float of the total positive influence the away team’s offense had on the outcome of the game.

away_win_probability_by_pitcher

Returns a float of the amount of influence the away pitcher had on the game’s result with 0.0 denoting zero influence and 1.0 denoting he was solely responsible for the team’s win.

away_win_probability_for_offensive_player

Returns a float of the overall influence the away team’s offense had on the outcome of the game where 0.0 denotes no influence and 1.0 denotes the offense was solely responsible for the outcome.

away_win_probability_subtracted

Returns a float of the total negative influence the away team’s offense had on the outcome of the game.

dataframe

Returns a pandas DataFrame containing all other class properties and values. The index for the DataFrame is the string URI that is used to instantiate the class, such as ‘BOS201806070’.

date

Returns a string of the date the game took place.

duration

Returns a string of the game’s duration in the format ‘H – MM’.

home_assists

Returns an int of the number of assists the home team registered.

home_at_bats

Returns an int of the number of at bats the home team had.

home_average_leverage_index

Returns a float of the amount of pressure the home team’s pitcher faced during the game. 1.0 denotes average pressure while numbers less than 0 denote lighter pressure.

home_base_out_runs_added

Returns a float of the number of base out runs added by the home team.

home_base_out_runs_saved

Returns a float of the number of runs saved by the home pitcher based on the number of players on bases. 0.0 denotes an average value.

home_bases_on_balls

Returns an int of the number of bases the home team registerd as a result of balls.

home_batting_average

Returns a float of the batting average for the home team.

home_earned_runs

Returns a float of the number of runs the home team earned.

home_fly_balls

Returns an int of the number of fly balls the home team allowed.

home_game_score

Returns an int of the starting home pitcher’s score determine by many factors, such as number of runs scored against, number of strikes, etc.

home_grounded_balls

Returns an int of the number of grounded balls the home team allowed.

home_hits

Returns an int of the number of hits the home team had.

home_home_runs

Returns an int of the number of times the home team gave up a home run.

home_inherited_runners

Returns an int of the number of runners a pitcher inherited when he entered the game.

home_inherited_score

Returns an int of the number of scorers a pitcher inherited when he entered the game.

home_innings_pitched

Returns a float of the number of innings the home team pitched.

home_line_drives

Returns an int of the number of line drives the home team allowed.

home_on_base_percentage

Returns a float of the percentage of at bats that result in the batter getting on base.

home_on_base_plus

Returns a float of the on base percentage plus the slugging percentage. Percentage ranges from 0-1.

home_pitches

Returns an int of the number of pitches the home team faced.

home_plate_appearances

Returns an int of the number of plate appearances the home team made.

home_putouts

Returns an int of the number of putouts the home team registered.

home_rbi

Returns an int of the number of runs batted in the home team registered.

home_runs

Returns an int of the number of runs the home team scored.

home_slugging_percentage

Returns a float of the slugging percentage for the home team based on the number of bases gained per at-bat with bigger plays getting more weight.

home_strikeouts

Returns an int of the number of times the home team was struck out.

home_strikes

Returns an int of the number of times a strike was called against the home team.

home_strikes_by_contact

Returns an int of the number of times the home team struck out a batter who made contact with the pitch.

home_strikes_looking

Returns an int of the number of times the home team struck out a batter who was looking.

home_strikes_swinging

Returns an int of the number of times the home team struck out a batter who was swinging.

home_unknown_bat_type

Returns an int of the number of home at bats that were not properly tracked and therefore cannot be safely placed in another statistical category.

home_win_probability_added

Returns a float of the total positive influence the home team’s offense had on the outcome of the game.

home_win_probability_by_pitcher

Returns a float of the amount of influence the home pitcher had on the game’s result with 0.0 denoting zero influence and 1.0 denoting he was solely responsible for the team’s win.

home_win_probability_for_offensive_player

Returns a float of the overall influence the home team’s offense had on the outcome of the game where 0.0 denotes no influence and 1.0 denotes the offense was solely responsible for the outcome.

home_win_probability_subtracted

Returns a float of the total negative influence the home team’s offense had on the outcome of the game.

losing_abbr

Returns a string of the losing team’s abbreviation, such as ‘LAD’ for the Los Angeles Dodgers.

losing_name

Returns a string of the losing team’s name, such as ‘Los Angeles Dodgers’.

time

Returns a string of the time the game started.

time_of_day

Returns a string constant indicated whether the game was played during the day or at night.

venue

Returns a string of the name of the ballpark where the game was played.

winner

Returns a string constant indicating whether the home or away team won.

winning_abbr

Returns a string of the winning team’s abbreviation, such as ‘HOU’ for the Houston Astros.

winning_name

Returns a string of the winning team’s name, such as ‘Houston Astros’.

class sportsreference.mlb.boxscore.Boxscores(date)[source]

Bases: object

Search for MLB games taking place on a particular day.

Retrieve a dictionary which contains a list of all games being played on a particular day. Output includes a link to the boxscore, and the names and abbreviations for both the home teams. If no games are played on a particular day, the list will be empty.

Parameters:date (datetime object) – The date to search for any matches. The month, day, and year are required for the search, but time is not factored into the search.
games

Returns a dictionary object representing all of the games played on the requested day. Dictionary is in the following format:

{
    'boxscores': [
        'home_name': Name of the home team, such as 'New York
                     Yankees' (`str`),
        'home_abbr': Abbreviation for the home team, such as 'NYY'
                     (`str`),
        'away_name': Name of the away team, such as 'Houston
                     Astros' (`str`),
        'away_abbr': Abbreviation for the away team, such as 'HOU'
                     (`str`),
        'boxscore': String representing the boxscore URI, such as
                    'SLN/SLN201807280' (`str`)},
        { ... },
        ...
    ]
}

If no games were played during the requested day, the list for [‘boxscores’] will be empty.

sportsreference.mlb.schedule module

The Schedule module can be used to iterate over all games in a team’s schedule to get game information such as the date, score, result, and more. Each game also has a link to the Boxscore class which has much more detailed information on the game metrics.

from sportsreference.mlb.schedule import Schedule

houston_schedule = Schedule('HOU')
for game in houston_schedule:
    print(game.date)  # Prints the date the game was played
    print(game.result)  # Prints whether the team won or lost
    # Creates an instance of the Boxscore class for the game.
    boxscore = game.boxscore
class sportsreference.mlb.schedule.Game(game_data, year)[source]

Bases: object

A representation of a matchup between two teams.

Stores all relevant high-level match information for a game in a team’s schedule including date, time, opponent, and result.

Parameters:
  • game_data (string) – The row containing the specified game information.
  • year (string) – The year of the current season.
attendance

Returns an int of the total listed attendance for the game.

boxscore

Returns an instance of the Boxscore class containing more detailed stats on the game.

dataframe

Returns a pandas DataFrame containing all other class properties and values. The index for the DataFrame is the boxscore string.

dataframe_extended

Returns a pandas DataFrame representing the Boxscore class for the game. This property provides much richer context for the selected game, but takes longer to process compared to the lighter ‘dataframe’ property. The index for the DataFrame is the boxscore string.

date

Returns a string of the date the game was played on.

datetime

Returns a datetime object of the month, day, year, and time the game was played.

day_or_night

Returns a string constant to indicate whether the game was played during the day or night.

game

Returns an int of the game in the season, where 1 is the first game of the season.

game_duration

Returns a string of the game’s total duration in the format ‘H – MM’.

game_number_for_day

Returns an int denoting which game is played for the team during the given day. Default value is 1 where a team plays only one game during the day, but can be higher for double headers, etc. For example, if a team has a double header one day, the first game of the day will return 1 while the second game will return 2.

games_behind

Returns a float of the number of games behind the leader the team is. 0.0 indicates the team is tied for first. Negative numbers indicate the number of games a team is ahead of the second place team.

innings

Returns an int of the total number of innings that were played.

location

Returns a string constant to indicate whether the game was played at home or away.

loser

Returns a string of the name of the losing pitcher.

opponent_abbr

Returns a string of the opponent’s 3-letter abbreviation, such as ‘NYY’ for the New York Yankees.

rank

Returns an int of the team’s rank in the league with 1 being the best team.

record

Returns a string of the team’s record in the format ‘W-L’.

result

Returns a string constant to indicate whether the team won or lost.

runs_allowed

Returns an int of the total number of runs that the team allowed.

runs_scored

Returns an int of the total number of runs that were scored by the team.

save

Returns a string of the name of the pitcher credited with the save if applicable. If no saves, returns None.

streak

Returns a string of the team’s winning/losing streak at the conclusion of the requested game. A winning streak is denoted by a number of ‘+’ signs for the number of consecutive wins and a losing streak is denoted by a ‘-‘ sign.

winner

Returns a string of the name of the winning pitcher.

class sportsreference.mlb.schedule.Schedule(abbreviation, year=None)[source]

Bases: object

An object of the given team’s schedule.

Generates a team’s schedule for the season including wins, losses, and scores if applicable.

Parameters:
  • abbreviation (string) – A team’s short name, such as ‘HOU’ for the Houston Astros.
  • year (string (optional)) – The requested year to pull stats from.
dataframe

Returns a pandas DataFrame where each row is a representation of the Game class. Rows are indexed by the boxscore string.

dataframe_extended

Returns a pandas DataFrame where each row is a representation of the Boxscore class for every game in the schedule. Rows are indexed by the boxscore string. This property provides much richer context for the selected game, but takes longer to process compared to the lighter ‘dataframe’ property.

sportsreference.mlb.teams module

The Teams module exposes information for all MLB teams including the team name and abbreviation, the number of games they won during the season, the total number of bases they’ve stolen, and much more.

from sportsreference.mlb.teams import Teams

teams = Teams()
for team in teams:
    print(team.name)  # Prints the team's name
    print(team.batting_average)  # Prints the team's season batting average

Each Team instance contains a link to the Schedule class which enables easy iteration over all games for a particular team. A Pandas DataFrame can also be queried to easily grab all stats for all games.

from sportsreference.mlb.teams import Teams

teams = Teams()
for team in teams:
    schedule = team.schedule  # Returns a Schedule instance for each team
    # Returns a Pandas DataFrame of all metrics for all game Boxscores for
    # a season.
    df = team.schedule.dataframe_extended
class sportsreference.mlb.teams.Team(team_data, rank, year=None)[source]

Bases: object

An object containing all of a team’s season information.

Finds and parses all team stat information and identifiers, such as rank, name, and abbreviation, and sets them as properties which can be directly read from for easy reference.

Parameters:
  • team_data (string) – A string containing all of the rows of stats for a given team. If multiple tables are being referenced, this will be comprised of multiple rows in a single string.
  • rank (int) – A team’s position in the league based on the number of points they obtained during the season.
  • year (string (optional)) – The requested year to pull stats from.
abbreviation

Returns a string of the team’s abbreviation, such as ‘HOU’ for the Houston Astros.

at_bats

Returns an int of the total number of at bats for the team.

average_batter_age

Returns a float of the average batter age weighted by their number of at bats plus the number of games participated in.

average_pitcher_age

Returns a float of the average pitcher age weighted by the number of games started, followed by the number of games played and saves.

away_losses

Returns an int of the number of away losses during the season.

away_record

Returns a string of the team’s away record. Record is in the format ‘W-L’.

away_wins

Returns an int of the number of away wins during the season.

balks

Returns an int of the total number of times a pitcher has balked.

bases_on_balls

Returns an int of the number of bases on walks.

bases_on_walks_given

Returns an int of the total number of bases from walks given up by a team during the season.

bases_on_walks_given_per_nine_innings

Returns a float of the average number of walks conceded per nine innings.

batters_faced

Returns an int of the total number of batters all pitchers have faced during a season.

batting_average

Returns a float of the batting average for the team. Percentage ranges from 0-1.

complete_game_shutouts

Returns an int of the total number of complete games where the opponent scored zero runs.

complete_games

Returns an int of the total number of complete games a team has accumulated during the season.

dataframe

Returns a pandas DataFrame containing all other class properties and values. The index for the DataFrame is the string abbreviation of the team, such as ‘HOU’.

doubles

Returns an int of the total number of doubles hit by the team.

earned_runs_against

Returns a float of the average number of earned runs against for a team.

earned_runs_against_plus

Returns an int of the team’s average earned runs against, adjusted for the home ballpark.

extra_inning_losses

Returns an int of the number of losses the team has when the game has gone to extra innings.

extra_inning_record

Returns a string of the team’s record when the game has gone to extra innings. Record is in the format ‘W-L’.

extra_inning_wins

Returns an int of the number of wins the team has when the game has gone to extra innings.

fielding_independent_pitching

Returns a float of the team’s effectiveness at preventing home runs, walks, batters being hit by pitches, and strikeouts.

games

Returns an int of the number of games the team has played during the season.

games_finished

Returns an int of the number of games finished which is equivalent to the number of games played minus the number of complete games during the season.

grounded_into_double_plays

Returns an int of the total number double plays grounded into by the team.

hit_pitcher

Returns an int of the total number of times a pitcher has hit an opposing batter.

hits

Returns an int of the total number of hits during the season.

hits_allowed

Returns an int of the total number of hits allowed during the season.

hits_per_nine_innings

Returns a float of the average number of hits per nine innings by the opponent.

home_losses

Returns an int of the number of losses at home during the season.

home_record

Returns a string of the team’s home record. Record is in the format ‘W-L’.

home_runs

Returns an int of the total number of home runs hit by the team.

home_runs_against

Returns an int of the total number of home runs given up during the season.

home_runs_per_nine_innings

Returns a float of the average number of home runs per nine innings by the opponent.

home_wins

Returns an int of the number of wins at home during the season.

innings_pitched

Returns a float of the total number of innings pitched by a team during the season.

intentional_bases_on_balls

Returns an int of the total number of times a player took a base from an intentional walk.

interleague_record

Returns a string of the team’s interleague record. Record is in the format ‘W-L’.

last_ten_games_record

Returns a string of the team’s record over the last ten games. Record is in the format ‘W-L’.

last_thirty_games_record

Returns a string of the team’s record over the last thirty games. Record is in the format ‘W-L’.

last_twenty_games_record

Returns a string of the team’s record over the last twenty games. Record is in the format ‘W-L’.

league

Returns a string of the two letter abbreviation of the league, such as ‘AL’ for the American League.

losses

Returns an int of the total number of games the team lost during the season.

losses_last_ten_games

Returns an int of the number of losses in the last 10 games.

losses_last_thirty_games

Returns an int of the number of losses in the last 30 games.

losses_last_twenty_games

Returns an int of the number of losses in the last 20 games.

losses_vs_left_handed_pitchers

Returns an int of number of losses against left-handed pitchers.

losses_vs_right_handed_pitchers

Returns an int of the number of losses against right-handed pitchers.

losses_vs_teams_over_500

Returns an int of the number of losses against teams over 500.

losses_vs_teams_under_500

Returns an int of the number of losses against teams under 500.

luck

Returns an ``int``eger of the difference between the current wins and losses compared to the pythagorean wins and losses.

name

Returns a string of the team’s full name, such as ‘Houston Astros’.

number_of_pitchers

Returns an int of the total number of pitchers used during a season.

number_players_used

Returns an int of the number of different players used during the season.

on_base_percentage

Returns a float of the percentage of at bats that result in a player taking a base. Percentage ranges from 0-1.

on_base_plus_slugging_percentage

Returns a float of the sum of the on base percentage plus the slugging percentage.

on_base_plus_slugging_percentage_plus

Returns an int of the on base percentage plus the slugging percentage, adjusted to the team’s home ballpark.

opposing_runners_left_on_base

Returns an int of the total number of opponents a team has left on bases at the end of an inning.

plate_appearances

Returns an int of the total number of plate appearances for the team.

pythagorean_win_loss

Returns a string of the team’s expected win-loss record based on the runs scored and allowed. Record is in the format ‘W-L’.

rank

Returns an int of the team’s rank based on their win percentage.

record_vs_left_handed_pitchers

Returns a string of the team’s record against left-handed pitchers. Record is in the format ‘W-L’.

record_vs_right_handed_pitchers

Returns a string of the team’s record against right-handed pitchers. Record is in the format ‘W-L’.

record_vs_teams_over_500

Returns a string of the team’s record against teams with a win percentage over 500. Record is in the format ‘W-L’.

record_vs_teams_under_500

Returns a string of the team’s record against teams with a win percentage under 500. Record is in the format ‘W-L’.

run_difference

Returns a float of the difference between the number of runs scored and the number of runs given up per game. Positive numbers indicate the team scores more per game than they are scored on.

runners_left_on_base

Returns an int of the total number of runners left on base at the end of an inning.

runs

Returns a float of the average number of runs scored per game by the team.

runs_against

Returns a float of the average number of runs scored per game by the opponent.

runs_allowed_per_game

Returns a float of the average number of runs a team has allowed per game.

runs_batted_in

Returns an int of the total number of runs batted in by the team.

sacrifice_flies

Returns an int of the total number of sacrifice flies the team made during the season.

sacrifice_hits

Returns an int of the total number of sacrifice hits the team made during the season.

saves

Returns an int of the total number of saves a team has accumulated during the season.

schedule

Returns an instance of the Schedule class containing the team’s complete schedule for the season.

shutouts

Returns an int of the total number of shutouts a team has accumulated during the season.

simple_rating_system

Returns a float of the average number of runs per game a team scores compared to average.

single_run_losses

Returns an int of the number of losses the team has when only one run is scored.

single_run_record

Returns a string of the team’s record when only one run is scored. Record is in the format ‘W-L’.

single_run_wins

Returns an int of the number of wins the team has when only one run is scored.

slugging_percentage

Returns a float of the ratio of total bases gained per at bat.

stolen_bases

Returns an int of the total number of bases stolen by the team.

streak

Returns a string of the team’s current winning or losing streak, such as ‘W 3’ for a team on a 3-game winning streak.

strength_of_schedule

Returns a float denoting a team’s strength of schedule, based on runs scores and conceded. Higher values result in more challenging schedules while 0.0 is an average schedule.

strikeouts

Returns an int of the total number of times a team has struck out an opponent.

strikeouts_per_base_on_balls

Returns a float of the average number of strikeouts per walk thrown by a team.

strikeouts_per_nine_innings

Returns a float of the average number of strikeouts a team throws per nine innings.

times_caught_stealing

Returns an int of the number of times a player was caught stealing.

times_hit_by_pitch

Returns an int of the total number of times a batter was hit by an opponent’s pitch.

times_struck_out

Returns an int of the total number of times the team struck out.

total_bases

Returns an int of the total number of bases a team has gained during the season.

total_runs

Returns an int of the total number of runs scored during the season.

triples

Returns an int of the total number of tripes hit by the team.

whip

Returns a float of the average number of walks plus hits by the opponent per inning.

wild_pitches

Returns an int of the total number of wild pitches thrown by a team during a season.

win_percentage

Returns a float of the number of wins divided by the number of games played during the season. Percentage ranges from 0-1.

wins

Returns an int of the total number of games the team won during the season.

wins_last_ten_games

Returns an int of the number of wins in the last 10 games.

wins_last_thirty_games

Returns an int of the number of wins in the last 30 games.

wins_last_twenty_games

Returns an int of the number of wins in the last 20 games.

wins_vs_left_handed_pitchers

Returns an int of number of wins against left-handed pitchers.

wins_vs_right_handed_pitchers

Returns an int of the number of wins against right-handed pitchers.

wins_vs_teams_over_500

Returns an int of the number of wins against teams over 500.

wins_vs_teams_under_500

Returns an int of the number of wins against teams under 500.

class sportsreference.mlb.teams.Teams(year=None)[source]

Bases: object

A list of all MLB teams and their stats in a given year.

Finds and retrieves a list of all MLB teams from www.baseball-reference.com and creates a Team instance for every team that participated in the league in a given year. The Team class comprises a list of all major stats and a few identifiers for the requested season.

Parameters:year (string (optional)) – The requested year to pull stats from.
dataframes

Returns a pandas DataFrame where each row is a representation of the Team class. Rows are indexed by the team abbreviation.