Logistic Regression

Machine Learning Model - Logistic Regression
Logistic Regression is a statistical method for predicting binary outcomes from data. It uses an equation as the representation, similar to linear regression. Input values (x) are combined linearly using weights or coefficient values to predict an output value (y). The logistic function, which is also known as the sigmoid function, creates an S-shaped curve that can take any real-valued number and map it into a value between 0 and 1.

Implementing the Movie Dataset : After completing the ETL, the clean movie file was used to perform Logistic Regression. In order to create the Logistic Regression model, first the movie titles needed to be dropped from the dataset due to them being strings. Second, the Rating Integer was dropped due to it being utilized in the ETL to create the performance variable, which determined if a movie score was successful or unsuccessful. The Train Test Split was created using the performance variable, which was created during the ETL to determine if a movie’s Rating Integer was greater than 7 (successful) or less than or equal to 7 (unsuccessful/fail).

Results
0 = Fail
1 = Success

Using the Logistic Regression model, 88% of the time the model is successful at predicting a movie’s success.
However, the model tends to over predict the success of the movie meaning it predicts a success when it actually has a score that indicates a failed movie.
Having a model over predict movie scores could lead to someone watching a movie thinking it will be good, but then become disappointed when watching the movie.

Training Data Score: 0.882876468494126
Testing Data Score: 0.880469583778015