Dataset description, dataset extension and Preprocessing
We have been provided with the CMU dataset which can be obtained at the following link. Despite the rich content of the dataset there were several aspects which we sought to improve in order to enrich our data, with features which we deemed relevant to our particular topic.
One of the promising options for such an external source is the IMDb dataset. The dataset was available for us to download at the following link. Thanks to this we were able to merge the two datasets on the name and release date columns, and allowed us to obtain additional features.
Additionally to this we took advantage of the free trial of IMDb Pro. This allowed us to scrape additional relevant data including: budgets, MPAA rating, Box office revenue, Directors, writers, producers, composers, cinematographers and editors.