Extract
The data set that we obtained was a CSV file from Kaggle.
(https://www.kaggle.com/gabrielegalimberti/movies-example-for-machine-learning-activities?select=MACHINE_LEARNING_FINAL.csv).
Transform
To ready the data for analysis, we needed to ensure all of the data was consistent across all fields.
We found 2 rows that had misaligned data, so we removed all rows that had “Karate Kid” in the
title which removed a total of 4 rows.
We then created a column for movie performance of success or fail: if the rating integer score
was greater than 7, the movie was considered a success, else it was considered a failure.
Load The cleaned data was uploaded into a new CSV file titled clean_movies.csv.
References/Credits
The websites we took References from are :
https://machinelearningmastery.com/logistic-regression-for-machine-learning/
https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbors-classification
https://www.datacamp.com/community/tutorials/random-forests-classifier
https://www.datacamp.com/community/tutorials/svm-classification-scikit-learn-python
https://towardsdatascience.com/classification-using-neural-networks-b8e98f3a904f