You are on page 1of 2

Team 5

Summarize what the team has done using your own words
Through the data harvested online, the team built a recommend system to
recommend a user what kind of movie to watch based on the users aggregate
reviews for other movies.
Building a recommender system for helping movie users in recommending five
movies according to aggregated movie ratings.
Scraped two movie websites for movie data and it's critic rating and using that data
created a recommendation system ( based on item-based k-NN model). With thus
built model (with all users's movie ratings) the top 5 movies for a specific reviewer are
recommended ( on item k-NN match)
Did a good job in recomendation, but it wasnt convinving in terms of the work done.
Visualization mising.
The team has created a recommender system using data from Rottentomatoes.com.
The team combined data from couple other sources as the rottentomatoes website
had restrictions for some kinds of data like user level data. The team used rapid
miner tool to create the recommender system.
What do you think are the strengths of this team's work?
1 It is a great idea link the critic review from Rotten Tomatoes to corresponding
movies in movielens.org by using the common field IMDB ID, which helps linking data
of different sources.
2 The process of exploring with recommender systems to use makes the next step
happens naturally
3 The RapidMiner Model is a highlight in this project !
1 .Strong/clear goal towards building a recommender model
2. Using multiple technologies/tools in harvesting data
3. Defining their own universal rating system to normalize ungrouped ratings
4. Research on recommender system
I like the idea of having a Readme.txt. So I know where to go first and do what to
recreate the team's work.
The recommendation system module is robust.
They have used Rapidminer models and did a good deep research on it.
Innovative and creative.
Smart use of data from multiple resources.

What are the major issues or weaknesses in this team's work? If you can, how
the team can improve?
1 Some of the descriptions in the report is confusing, such as (1) you did not explain
how reviews given by various critics can be a substitute for the user level data in
recommend movies. (2) In page 4 of the report you mentioned loading csv file, but
there are several csv files in your course portfolio and I do not know which one is the
right one you use at first glance. It is better to specify with csv file you are loading in

the report.(3) In the body part of the report, you mentioned how an ideal ROC curve
for your model is and how ROC measures the recommendation system performance,
but you do not post the ROC curve in the report. Its better to report your ROC curve
with a comparison of the idea ROC curve in the report to make the whole report more
appealing and convincing
2 It is a good idea to convert the rating score in A-B-C-D format to numerical ratings,
which will help further analysis. But here you take a direct translation of A to
corresponding numerical ratings. I think it is better to translate the A-B-C-D format
score to numerical ratings on a user-level or movie-level, since for each rating of the
movie, there is user-level deviation (bias) and movie-level deviation (bias). (For
example, some critics tend to perceive B as a high rating while some critics my
perceive B as a low rating. If you take the scoring bias into the numerical translations
process, the results will be more reasonable)
3 More comments can be added to the coding part to make it more intuitive. Such as
what imdb = "%7.7i" % imdb params = dict(id=imdb, type='imdb', apikey=api_key) is
for ?
4 As you have already indicated in the report, you can consider investigating a hybrid
of recommendation systems as well as introducing meta-modeling
Despite the small weakness mentioned above, I think your project is great overall !
Good for your work !
1. Optimize the model for a better performance.
2. Performance evaluation is centrally based only on AUC values, various other
measures could have been explored
3. Was content based recommender alternative explored? Which showed better
performance? Maybe was explored, but could not find data about it.
4. Though the idea and goal was good, the presentation didn't maintain the perceived
standard.
Better commenting in the ipython file. Should match more with the pdf information.
The task was to
1. Scrape data
2. Do some analysis
3. Visualize it
The team's presentation is nonexistent on the visualization of the results of their
data.The subject picked does not easily lend itself to visualization still the team could
add some interesting visualizations or methods of presenting their findings. Maybe
displaying movie preferences by genre or displaying two critics side by side and
showing their differences/similarities.
The topic was not given much weight in terms of validating the result.
It felt the result can be werong too.
The team could have used the python programming model which could have given
flexibility to play with the model more as well as would have been good with the
increasing number of data points.
The team provided recommendation based on critics data rather than user data due
to lack of data may be , this may not be the best approach.
Also in general the presentation could have been improved and better coordinated.

You might also like