For The Love of Basketball

Machine Learning the NBA

Let's predict wins and see if the model holds up without fans


Our group wanted to know fans (and the lack of fans) affect player performance. We were also required to use machine learning, so here we are.

We decided to train two models on pre-COVID player stats to predict the outcome of the game: win or loss. This is our "performance with fans" baseline.

Next we run the fanless player game stats into the trained model to quantify the impact of missing fans on our predictions.

We pulled player stats per game from 2015 through 2020 via the SportRadar API, first Schedule per year then looping through the games to get Game Summary per player stats.

The two models are Random Forrest and Sequential with hidden layers.

On to the code!

In [1]:
# get the latest and greatest sklearn for your models!
!pip install sklearn --upgrade
Requirement already up-to-date: sklearn in c:\programdata\anaconda3\lib\site-packages (0.0)
Requirement already satisfied, skipping upgrade: scikit-learn in c:\programdata\anaconda3\lib\site-packages (from sklearn) (0.22.1)
Requirement already satisfied, skipping upgrade: numpy>=1.11.0 in c:\programdata\anaconda3\lib\site-packages (from scikit-learn->sklearn) (1.18.1)
Requirement already satisfied, skipping upgrade: scipy>=0.17.0 in c:\programdata\anaconda3\lib\site-packages (from scikit-learn->sklearn) (1.4.1)
Requirement already satisfied, skipping upgrade: joblib>=0.11 in c:\programdata\anaconda3\lib\site-packages (from scikit-learn->sklearn) (0.14.1)
In [2]:
# next import dependencies for data preprocessing
import pandas as pd
from datetime import datetime as dt
import numpy as np
import requests
import config
import json
import time
import datetime
from pprint import pprint

Preprocess the Data!

In [3]:
# pull in the per game player stats with fans, conveniently saved as a csv
df = pd.read_csv('data_files/fullplayerstatslist.csv')
In [5]:
# here's a peak at the raw data
df.head()
Out[5]:
Unnamed: 0 Unnamed: 0.1 First_Name Last_Name player_id Position Points Free_Throw_Percent Two_Pt_Percent Three_Pt_Percent ... Turnovers Team Home_Away win Team_points Min_played Crowd Stadium_Cap game_id game_date
0 0 0 LeBron James 0afbe608-940a-4d5d-a1f7-468718c67d91 F 19 50.0 81.818 0.0 ... 4 Cavaliers 1 1 117 32:23 20562 20562 0da78f13-73ac-4465-8e31-ecc3029a5dc6 2016-10-25T23:30:00+00:00
1 1 1 James Jones 09d25155-c3be-4246-a986-55921a1b5e61 G-F 5 100.0 0.000 100.0 ... 0 Cavaliers 1 1 117 5:30 20562 20562 0da78f13-73ac-4465-8e31-ecc3029a5dc6 2016-10-25T23:30:00+00:00
2 2 2 J.R. Smith 5934134d-0d27-42ea-a554-4b0e3e85ce56 G-F 8 0.0 20.000 25.0 ... 0 Cavaliers 1 1 117 25:14 20562 20562 0da78f13-73ac-4465-8e31-ecc3029a5dc6 2016-10-25T23:30:00+00:00
3 3 3 Kay Felder 8d3acdd5-9b5a-4d69-9912-de42d979c31a G 0 0.0 0.000 0.0 ... 0 Cavaliers 1 1 117 00:00 20562 20562 0da78f13-73ac-4465-8e31-ecc3029a5dc6 2016-10-25T23:30:00+00:00
4 4 4 Mike Dunleavy 4ec1bff7-ec1b-488b-8a24-aed83e62b4ce G-F 4 0.0 100.000 0.0 ... 0 Cavaliers 1 1 117 22:32 20562 20562 0da78f13-73ac-4465-8e31-ecc3029a5dc6 2016-10-25T23:30:00+00:00

5 rows × 27 columns

In [6]:
# let's trim the data down to our X factors...
df_dropped = df[df['Min_played'] != "00:00"]
df_dropped = df_dropped[df_dropped['Crowd'] != 'Covid']
df_dropped = df_dropped[df_dropped['Crowd'] != '0']
df_dropped= df_dropped[["Points", "Free_Throw_Percent",
                  "Two_Pt_Percent",
                  "Three_Pt_Percent", "Assists",
                  "Rebounds", "Offensive_Rebounds",
                  "Steals", "Personal_Fouls",
                  "Flagrant_Fouls", "Tech_Fouls",
                  "Turnovers",
                  "Home_Away", "win"
                  ]].reset_index(drop = True)
df_dropped.head()
Out[6]:
Points Free_Throw_Percent Two_Pt_Percent Three_Pt_Percent Assists Rebounds Offensive_Rebounds Steals Personal_Fouls Flagrant_Fouls Tech_Fouls Turnovers Home_Away win
0 19 50.0 81.818 0.0 14 11 3 0 3 0 0 4 1 1
1 5 100.0 0.000 100.0 0 0 0 0 1 0 0 0 1 1
2 8 0.0 20.000 25.0 2 3 0 1 1 0 0 0 1 1
3 4 0.0 100.000 0.0 2 4 0 3 0 0 0 0 1 1
4 23 75.0 44.444 33.3 2 12 2 3 3 0 0 2 1 1
In [8]:
# grab every stat except the 'win' column as your X features
X = df_dropped.drop('win', axis=1)
print(X.shape)
(120384, 13)
In [9]:
# set your y to predict to 'win'
y = df_dropped['win']
print(y.shape)
(120384,)
In [10]:
# now import the tools to train and scale
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from tensorflow.keras.utils import to_categorical
In [11]:
# split X and y into train and test groups
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=42)
In [12]:
# now scale the X data to keep everything reasonable
X_scaler = MinMaxScaler().fit(X_train)
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

Now our stats are X features and our y to predict is wins. Aside from a little more y preprocessing for the Sequential model, let's start making models!

Into the Random Forest

In [13]:
# we tried a few n_estimator settings (100, 1000) and landed on the sweet spot through trial and error: 200
from sklearn.ensemble import RandomForestClassifier

# define the model
modelRF = RandomForestClassifier(n_estimators=200)

# train on training data
modelRF.fit(X_train_scaled, y_train)
Out[13]:
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=200,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)
In [15]:
# let's see how our Random Forrest did!
print(f"Training Data Score: {round(modelRF.score(X_train_scaled, y_train)*100,2)}%")
print(f"Testing Data Score: {round(modelRF.score(X_test_scaled, y_test)*100,2)}%")
Training Data Score: 95.29%
Testing Data Score: 69.43%
In [17]:
# last since we're running Random Forrest let's rank the top features
feature_names = X.columns.tolist()
preSelected_features = sorted(zip(modelRF.feature_importances_, feature_names), reverse=True)
ranked_features = pd.DataFrame(preSelected_features, columns=['Score', 'Feature'])
ranked_features = ranked_features.set_index('Feature')
ranked_features
Out[17]:
Score
Feature
Points 0.138204
Rebounds 0.133824
Two_Pt_Percent 0.123946
Assists 0.110833
Personal_Fouls 0.098869
Three_Pt_Percent 0.079086
Turnovers 0.072240
Free_Throw_Percent 0.069879
Offensive_Rebounds 0.067301
Steals 0.063770
Home_Away 0.030521
Tech_Fouls 0.009222
Flagrant_Fouls 0.002305

Random Forrest gets us to 69% accuracy predicting win/loss based on per player game stats - not bad!

Unsurprisingly Points, Rebounds, Shooting Percentage, and Assists have the greatest impact.

So pre-COVID Random Forrest model is ready to go. Let's get Sequential with a few hidden layers next.

Create a Sequential Deep Learning Model

In [25]:
# Sequential Deep learning picked because the model predicted male/female voices in our class exercise. Looking for the same binary decision: win or loss

# also our Sequential model threw an error the first time out so we added LabelEncoder to y (and it worked!)
label_encoder = LabelEncoder()
label_encoder.fit(y_train)
encoded_y_train = label_encoder.transform(y_train)
encoded_y_test = label_encoder.transform(y_test)

# Then we need to convert y labels to one-hot-encoding
y_train_categorical = to_categorical(encoded_y_train)
y_test_categorical = to_categorical(encoded_y_test)
In [26]:
# import the Sequential model and Dense for the hidden layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
In [27]:
# Create model and add layers
# tried increasing units by 100 per layer (ex 100, 200, 300, 400); less accurate
# tried fewer and more hidden layers but best accuracy was three additional layers, 100 units, 100 epochs
# input_dim set to 13 because we have 13 X factors!
# layer activation set to relu because the X factors are all over the place and we want a relative model
# last the final 2 until layer to find wins/losses set to softmax as it's a binary decision and we want to make it softly

model = Sequential()
model.add(Dense(units=100, activation='relu', input_dim=13))
model.add(Dense(units=100, activation='relu'))
model.add(Dense(units=100, activation='relu'))
model.add(Dense(units=100, activation='relu'))
model.add(Dense(units=2, activation='softmax'))
In [28]:
# Compile and fit the model
# optimizer, loss, and metrics set same as the male/female vocal predictions - keeping it binary

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
In [29]:
# let's summarize and make sure we're ready to train!
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_5 (Dense)              (None, 100)               1400      
_________________________________________________________________
dense_6 (Dense)              (None, 100)               10100     
_________________________________________________________________
dense_7 (Dense)              (None, 100)               10100     
_________________________________________________________________
dense_8 (Dense)              (None, 100)               10100     
_________________________________________________________________
dense_9 (Dense)              (None, 2)                 202       
=================================================================
Total params: 31,902
Trainable params: 31,902
Non-trainable params: 0
_________________________________________________________________
In [30]:
# ok time to train our Sequential model
# tried fewer epochs but we had computer processing power to spare and the accuracy went up until we hit 100

model.fit(
    X_train_scaled,
    y_train_categorical,
    epochs=100,
    shuffle=True,
    verbose=2
)
Epoch 1/100
2822/2822 - 2s - loss: 0.6715 - accuracy: 0.5880
Epoch 2/100
2822/2822 - 2s - loss: 0.6672 - accuracy: 0.5950
Epoch 3/100
2822/2822 - 2s - loss: 0.6663 - accuracy: 0.5969
Epoch 4/100
2822/2822 - 1s - loss: 0.6654 - accuracy: 0.5986
Epoch 5/100
2822/2822 - 1s - loss: 0.6646 - accuracy: 0.5986
Epoch 6/100
2822/2822 - 2s - loss: 0.6641 - accuracy: 0.5993
Epoch 7/100
2822/2822 - 1s - loss: 0.6637 - accuracy: 0.5999
Epoch 8/100
2822/2822 - 1s - loss: 0.6635 - accuracy: 0.6005
Epoch 9/100
2822/2822 - 1s - loss: 0.6627 - accuracy: 0.6021
Epoch 10/100
2822/2822 - 1s - loss: 0.6627 - accuracy: 0.6015
Epoch 11/100
2822/2822 - 1s - loss: 0.6621 - accuracy: 0.6014
Epoch 12/100
2822/2822 - 1s - loss: 0.6617 - accuracy: 0.6024
Epoch 13/100
2822/2822 - 1s - loss: 0.6615 - accuracy: 0.6019
Epoch 14/100
2822/2822 - 1s - loss: 0.6610 - accuracy: 0.6029
Epoch 15/100
2822/2822 - 1s - loss: 0.6606 - accuracy: 0.6041
Epoch 16/100
2822/2822 - 1s - loss: 0.6600 - accuracy: 0.6034
Epoch 17/100
2822/2822 - 1s - loss: 0.6595 - accuracy: 0.6042
Epoch 18/100
2822/2822 - 1s - loss: 0.6591 - accuracy: 0.6055
Epoch 19/100
2822/2822 - 1s - loss: 0.6583 - accuracy: 0.6060
Epoch 20/100
2822/2822 - 1s - loss: 0.6577 - accuracy: 0.6055
Epoch 21/100
2822/2822 - 1s - loss: 0.6570 - accuracy: 0.6050
Epoch 22/100
2822/2822 - 1s - loss: 0.6569 - accuracy: 0.6070
Epoch 23/100
2822/2822 - 1s - loss: 0.6558 - accuracy: 0.6076
Epoch 24/100
2822/2822 - 1s - loss: 0.6550 - accuracy: 0.6096
Epoch 25/100
2822/2822 - 1s - loss: 0.6545 - accuracy: 0.6091
Epoch 26/100
2822/2822 - 1s - loss: 0.6536 - accuracy: 0.6097
Epoch 27/100
2822/2822 - 1s - loss: 0.6527 - accuracy: 0.6117
Epoch 28/100
2822/2822 - 1s - loss: 0.6522 - accuracy: 0.6124
Epoch 29/100
2822/2822 - 1s - loss: 0.6509 - accuracy: 0.6126
Epoch 30/100
2822/2822 - 1s - loss: 0.6499 - accuracy: 0.6144
Epoch 31/100
2822/2822 - 1s - loss: 0.6487 - accuracy: 0.6152
Epoch 32/100
2822/2822 - 1s - loss: 0.6481 - accuracy: 0.6152
Epoch 33/100
2822/2822 - 1s - loss: 0.6467 - accuracy: 0.6162
Epoch 34/100
2822/2822 - 1s - loss: 0.6461 - accuracy: 0.6175
Epoch 35/100
2822/2822 - 1s - loss: 0.6451 - accuracy: 0.6175
Epoch 36/100
2822/2822 - 1s - loss: 0.6438 - accuracy: 0.6195
Epoch 37/100
2822/2822 - 1s - loss: 0.6436 - accuracy: 0.6196
Epoch 38/100
2822/2822 - 1s - loss: 0.6421 - accuracy: 0.6214
Epoch 39/100
2822/2822 - 1s - loss: 0.6409 - accuracy: 0.6226
Epoch 40/100
2822/2822 - 1s - loss: 0.6404 - accuracy: 0.6248
Epoch 41/100
2822/2822 - 1s - loss: 0.6386 - accuracy: 0.6248
Epoch 42/100
2822/2822 - 1s - loss: 0.6376 - accuracy: 0.6259
Epoch 43/100
2822/2822 - 1s - loss: 0.6365 - accuracy: 0.6274
Epoch 44/100
2822/2822 - 1s - loss: 0.6350 - accuracy: 0.6291
Epoch 45/100
2822/2822 - 1s - loss: 0.6341 - accuracy: 0.6291
Epoch 46/100
2822/2822 - 1s - loss: 0.6323 - accuracy: 0.6320
Epoch 47/100
2822/2822 - 1s - loss: 0.6315 - accuracy: 0.6310
Epoch 48/100
2822/2822 - 1s - loss: 0.6298 - accuracy: 0.6328
Epoch 49/100
2822/2822 - 1s - loss: 0.6284 - accuracy: 0.6345
Epoch 50/100
2822/2822 - 1s - loss: 0.6275 - accuracy: 0.6346
Epoch 51/100
2822/2822 - 1s - loss: 0.6263 - accuracy: 0.6347
Epoch 52/100
2822/2822 - 1s - loss: 0.6241 - accuracy: 0.6383
Epoch 53/100
2822/2822 - 1s - loss: 0.6235 - accuracy: 0.6384
Epoch 54/100
2822/2822 - 1s - loss: 0.6228 - accuracy: 0.6398
Epoch 55/100
2822/2822 - 1s - loss: 0.6221 - accuracy: 0.6392
Epoch 56/100
2822/2822 - 1s - loss: 0.6192 - accuracy: 0.6417
Epoch 57/100
2822/2822 - 1s - loss: 0.6188 - accuracy: 0.6423
Epoch 58/100
2822/2822 - 1s - loss: 0.6183 - accuracy: 0.6443
Epoch 59/100
2822/2822 - 1s - loss: 0.6157 - accuracy: 0.6460
Epoch 60/100
2822/2822 - 1s - loss: 0.6152 - accuracy: 0.6462
Epoch 61/100
2822/2822 - 1s - loss: 0.6120 - accuracy: 0.6488
Epoch 62/100
2822/2822 - 1s - loss: 0.6125 - accuracy: 0.6471
Epoch 63/100
2822/2822 - 1s - loss: 0.6102 - accuracy: 0.6497
Epoch 64/100
2822/2822 - 1s - loss: 0.6091 - accuracy: 0.6501
Epoch 65/100
2822/2822 - 1s - loss: 0.6085 - accuracy: 0.6511
Epoch 66/100
2822/2822 - 1s - loss: 0.6065 - accuracy: 0.6536
Epoch 67/100
2822/2822 - 1s - loss: 0.6054 - accuracy: 0.6527
Epoch 68/100
2822/2822 - 1s - loss: 0.6038 - accuracy: 0.6539
Epoch 69/100
2822/2822 - 1s - loss: 0.6018 - accuracy: 0.6557
Epoch 70/100
2822/2822 - 1s - loss: 0.6022 - accuracy: 0.6570
Epoch 71/100
2822/2822 - 1s - loss: 0.6007 - accuracy: 0.6558
Epoch 72/100
2822/2822 - 1s - loss: 0.5987 - accuracy: 0.6597
Epoch 73/100
2822/2822 - 1s - loss: 0.5976 - accuracy: 0.6597
Epoch 74/100
2822/2822 - 1s - loss: 0.5959 - accuracy: 0.6604
Epoch 75/100
2822/2822 - 1s - loss: 0.5974 - accuracy: 0.6608
Epoch 76/100
2822/2822 - 1s - loss: 0.5939 - accuracy: 0.6636
Epoch 77/100
2822/2822 - 1s - loss: 0.5926 - accuracy: 0.6626
Epoch 78/100
2822/2822 - 1s - loss: 0.5912 - accuracy: 0.6647
Epoch 79/100
2822/2822 - 1s - loss: 0.5912 - accuracy: 0.6652
Epoch 80/100
2822/2822 - 1s - loss: 0.5891 - accuracy: 0.6650
Epoch 81/100
2822/2822 - 1s - loss: 0.5892 - accuracy: 0.6665
Epoch 82/100
2822/2822 - 1s - loss: 0.5888 - accuracy: 0.6671
Epoch 83/100
2822/2822 - 1s - loss: 0.5868 - accuracy: 0.6679
Epoch 84/100
2822/2822 - 1s - loss: 0.5845 - accuracy: 0.6689
Epoch 85/100
2822/2822 - 1s - loss: 0.5834 - accuracy: 0.6696
Epoch 86/100
2822/2822 - 1s - loss: 0.5842 - accuracy: 0.6691
Epoch 87/100
2822/2822 - 1s - loss: 0.5824 - accuracy: 0.6720
Epoch 88/100
2822/2822 - 1s - loss: 0.5820 - accuracy: 0.6713
Epoch 89/100
2822/2822 - 1s - loss: 0.5781 - accuracy: 0.6737
Epoch 90/100
2822/2822 - 1s - loss: 0.5785 - accuracy: 0.6738
Epoch 91/100
2822/2822 - 1s - loss: 0.5768 - accuracy: 0.6745
Epoch 92/100
2822/2822 - 1s - loss: 0.5771 - accuracy: 0.6755
Epoch 93/100
2822/2822 - 1s - loss: 0.5756 - accuracy: 0.6754
Epoch 94/100
2822/2822 - 1s - loss: 0.5746 - accuracy: 0.6753
Epoch 95/100
2822/2822 - 1s - loss: 0.5743 - accuracy: 0.6772
Epoch 96/100
2822/2822 - 1s - loss: 0.5715 - accuracy: 0.6793
Epoch 97/100
2822/2822 - 1s - loss: 0.5712 - accuracy: 0.6799
Epoch 98/100
2822/2822 - 1s - loss: 0.5714 - accuracy: 0.6794
Epoch 99/100
2822/2822 - 1s - loss: 0.5700 - accuracy: 0.6788
Epoch 100/100
2822/2822 - 1s - loss: 0.5698 - accuracy: 0.6812
Out[30]:
<tensorflow.python.keras.callbacks.History at 0x24c026c3248>
In [32]:
# let's see how our Sequential model did!
model_loss, model_accuracy = model.evaluate(
    X_test_scaled, y_test_categorical, verbose=2)
print(
    f"Normal Neural Network - Loss: {model_loss}, Accuracy: {model_accuracy}")
941/941 - 0s - loss: 0.8607 - accuracy: 0.6030
Normal Neural Network - Loss: 0.8607043027877808, Accuracy: 0.6029704809188843
In [33]:
# last since we're running Sequential let's lay out the predictions against the actuals
encoded_predictions = model.predict_classes(X_test_scaled)
prediction_labels = label_encoder.inverse_transform(encoded_predictions)
WARNING:tensorflow:From <ipython-input-33-ee58f50ea8ca>:2: Sequential.predict_classes (from tensorflow.python.keras.engine.sequential) is deprecated and will be removed after 2021-01-01.
Instructions for updating:
Please use instead:* `np.argmax(model.predict(x), axis=-1)`,   if your model does multi-class classification   (e.g. if it uses a `softmax` last-layer activation).* `(model.predict(x) > 0.5).astype("int32")`,   if your model does binary classification   (e.g. if it uses a `sigmoid` last-layer activation).
In [34]:
print(f"First 10 Predictions:   {prediction_labels[:10]}")
print(f"First 10 Actual labels: {y_test[:10].tolist()}")
First 10 Predictions:   [1 1 1 1 1 1 0 0 1 1]
First 10 Actual labels: [0, 1, 0, 0, 0, 1, 0, 1, 1, 1]
In [35]:
# we can even put them all together into a data frame
pd.DataFrame({"Prediction": prediction_labels, "Actual": y_test}).reset_index(drop=True)
Out[35]:
Prediction Actual
0 1 0
1 1 1
2 1 0
3 1 0
4 1 0
... ... ...
30091 0 1
30092 1 1
30093 0 1
30094 0 0
30095 1 1

30096 rows × 2 columns

Sequential comes in at 60% accuracy (and up to 68% accuracy in epoch 100). Again not bad!

Also don't forget to save your model with: model.save('models/deepLearningSequential.h5')

Now it's time to see how our pre-COVID models perform with fanless player game stats.

Preprosess the Fanless Data!

In [36]:
# pull in the postCovid player data
covidDF = pd.read_csv('data_files/player_stats_2019_pst.csv')
In [37]:
# dropping unwanted fields to match preCovid columns
# don't drop Covid games obviously!
covidDF_dropped = covidDF[covidDF['Min_played'] != "00:00"]
covidDF_dropped= covidDF_dropped[["Points", "Free_Throw_Percent",
                  "Two_Pt_Percent",
                  "Three_Pt_Percent", "Assists",
                  "Rebounds", "Offensive_Rebounds",
                  "Steals", "Personal_Fouls",
                  "Flagrant_Fouls", "Tech_Fouls",
                  "Turnovers",
                  "Home_Away", "win"
                  ]].reset_index(drop = True)
covidDF_dropped.head()
Out[37]:
Points Free_Throw_Percent Two_Pt_Percent Three_Pt_Percent Assists Rebounds Offensive_Rebounds Steals Personal_Fouls Flagrant_Fouls Tech_Fouls Turnovers Home_Away win
0 11 0.0 50.0 60.0 0 4 2 0 6 0 0 1 1 1
1 2 0.0 100.0 0.0 2 6 1 1 3 0 0 1 1 1
2 19 87.5 50.0 40.0 3 0 0 1 2 0 0 0 1 1
3 14 0.0 66.7 66.7 4 3 0 1 1 0 0 1 1 1
4 13 0.0 33.3 42.9 1 8 1 0 3 0 0 1 1 1
In [38]:
# same X features
X = covidDF_dropped.drop('win', axis=1)
print(X.shape)
(1689, 13)
In [39]:
# same y goal
y = covidDF_dropped['win']
print(y.shape)
(1689,)
In [40]:
# scale the X features to predict
X_predict_scaled = X_scaler.transform(X)

Random Forest Pre-COVID Trained vs Fanless Data

In [42]:
print(f"COVID Data Score: {round(modelRF.score(X_predict_scaled, y)*100,2)}%")
COVID Data Score: 54.0%

Playing without fans drops our Random Forrest prediction by 15% (down to 54% vs fan data prediction of 69%)

Sequential Pre-COVID Trained vs Fanless Data

In [43]:
# encode and one-hot-encoding as with the trained model y
label_encoder.fit(y)
encoded_y_actual = label_encoder.transform(y)
y_actual_categorical = to_categorical(encoded_y_actual)
In [45]:
model_loss, model_accuracy = model.evaluate(
    X_predict_scaled, y_actual_categorical, verbose=2)
print(
    f"Normal Neural Network - Loss: {model_loss}, Accuracy: {model_accuracy}")
53/53 - 0s - loss: 1.1353 - accuracy: 0.5240
Normal Neural Network - Loss: 1.1353408098220825, Accuracy: 0.5239787101745605

Playing without fans drops our Sequential prediction by 8% (down to 52% from 60%)

Sources