1. Ideas to be completed for this project

We have 3 tables:

AnimeList.csv contains list of anime, with title, title synonyms, genre, studio, licencor, producer, duration, rating, score, airing date, episodes, source (manga, light novel etc.) and many other important data about individual anime providing sufficient information about trends in time about important aspects of anime. Rank is in float format in csv, but it contains only integer value. This is due to NaN values and their representation in pandas.
UserList.csv contains information about users who watch anime, namely username, registration date (join_date), last online date, birth date, gender, location, and lots of aggregated values from their anime lists.
UserAnimeList.csv contains anime lists of all users. Per each record, here is username, anime ID, score, status and timestamp when was this record last updated.

Ideas to be completed

Statics

Users and theirs location -> Distribution of users over the world (Normalized by population)
Compare watching trending for each country (or each genre)
Which country loves the anime the most (based on number of anime completed and others stuffs (maybe weighted))
Number of average watched episodes by users

Techniques

Graph (Nodes: Anime, Edges: +1 if 2 animes watch by the each user). Note: Using a threshold for weight of the edge -> Elimite the weights, only keep the edge with weights larger than a threshold
- Community detection algorithm.
- Centralities

-> Relationship between anime

Threshold is reasonble (Distribution plot for threshold)?
How many communities is enough ?
Map from communities to the properties of animes -> Features of the same animes in the anime communities

Graph for user graph -> Need to filter also in the anime-user dataset
(2,3 animes) => Recommendation for people to watch next anime based on the first one (Frequent Pattern)
Spearman correlation between 2 columns in user/anime
Clusterings on users (user_watching,user_completed,user_onhold,user_dropped,user_plantowatch,user_days_spent_watching) => Compare it with age (range of ages) ?
Why OnePiece is not a top anime in this ? (Technique not known)

Useful links

1. The link for dataset:

https://husteduvn-my.sharepoint.com/:f:/g/personal/anh_nv183478_sis_hust_edu_vn/EvcJws5DpdhAs9O8hXTgxfYBC-m63XdKIE2Xu5cqgT843w?e=hOtCuW

2. Kaggle processing link:

https://www.kaggle.com/code/vietanhnguyen1010/clean-data-myanimelist

3. Country information (lon, lat, name):

https://github.com/google/dspl/blob/master/samples/google/canonical/countries.csv

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Figures		Figures
Kaggle_processed_data		Kaggle_processed_data
.gitignore		.gitignore
COLLABORATIVE_FILTERING.ipynb		COLLABORATIVE_FILTERING.ipynb
DataMining_Project.pdf		DataMining_Project.pdf
UserGenreAssociationRule.ipynb		UserGenreAssociationRule.ipynb
UserGraph_500_nodes.graphml		UserGraph_500_nodes.graphml
User_Distributions_Geo.ipynb		User_Distributions_Geo.ipynb
animeGraph.graphml		animeGraph.graphml
anime_graph-no-upper-threshold.gephi		anime_graph-no-upper-threshold.gephi
animes_users.graphml		animes_users.graphml
animes_users_noUpperTreshhold.graphml		animes_users_noUpperTreshhold.graphml
dataSet-to-AnimeGraph.ipynb		dataSet-to-AnimeGraph.ipynb
dataSet-to-UserGraph.ipynb		dataSet-to-UserGraph.ipynb
frequent_patterns.ipynb		frequent_patterns.ipynb
process.ipynb		process.ipynb
readme.md		readme.md
requirements.txt		requirements.txt
test.graphml		test.graphml
test.md		test.md
user-graph-analysis.ipynb		user-graph-analysis.ipynb
userGraph.graphml		userGraph.graphml
user_clusterings.ipynb		user_clusterings.ipynb
users-graph.gephi		users-graph.gephi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Ideas to be completed for this project

We have 3 tables:

Ideas to be completed

Statics

Techniques

Useful links

1. The link for dataset:

2. Kaggle processing link:

3. Country information (lon, lat, name):

About

Releases

Packages

Contributors 2

Languages

VietAnhNguyen20/M2_DM_Anime

Folders and files

Latest commit

History

Repository files navigation

1. Ideas to be completed for this project

We have 3 tables:

Ideas to be completed

Statics

Techniques

Useful links

1. The link for dataset:

2. Kaggle processing link:

3. Country information (lon, lat, name):

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages