ETL


ETL


Extract The data set that we obtained was a CSV file from Kaggle. (https://www.kaggle.com/gabrielegalimberti/movies-example-for-machine-learning-activities?select=MACHINE_LEARNING_FINAL.csv).

Transform
To ready the data for analysis, we needed to ensure all of the data was consistent across all fields. We found 2 rows that had misaligned data, so we removed all rows that had “Karate Kid” in the title which removed a total of 4 rows.
We then created a column for movie performance of success or fail: if the rating integer score was greater than 7, the movie was considered a success, else it was considered a failure.

Load The cleaned data was uploaded into a new CSV file titled clean_movies.csv.

References/Credits
The websites we took References from are :
https://machinelearningmastery.com/logistic-regression-for-machine-learning/
https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbors-classification
https://www.datacamp.com/community/tutorials/random-forests-classifier
https://www.datacamp.com/community/tutorials/svm-classification-scikit-learn-python
https://towardsdatascience.com/classification-using-neural-networks-b8e98f3a904f