GitHub - jho44/GetHuluJob: Our COM SCI 267 project. Credit to Nikki Woo for fantastic project name.

jho44 / GetHuluJob Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

Our COM SCI 267 project. Credit to Nikki Woo for fantastic project name.

1 star 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
raw_data		raw_data
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
eval_julia.ipynb		eval_julia.ipynb
ml_netflix.csv		ml_netflix.csv
pyro_lda.ipynb		pyro_lda.ipynb
python-lda.ipynb		python-lda.ipynb
python_pmf.ipynb		python_pmf.ipynb
stan-lda.py		stan-lda.py
stan-pmf.py		stan-pmf.py
turing-lda.ipynb		turing-lda.ipynb

Repository files navigation

Dataset

ml_netflix.csv: join on ml-latest-small and Netflix data

code for generating the dataset in utils/gen_data.ipynb

Experimented on

LDA
- pure Python
  - can run the notebook straight through
- Stan
  - warning: takes ~1.67 hours to finish sampling on all 1056 movies
  - run python3 stan-lda.py in terminal with the following available flags:
    - regen_words_df (bool): True if you'd like to regenerate the dataframe mapping each word (ID) to a document/movie (ID)
      - saved to cache/words_df.csv
    - regen_data_lemmatized (bool): True if you'd like to regenerate the lemmatized movie descriptions
      - saved to cached/data_lemmatized.txt
    - num_movies (int): the first num_movies movies from the data set that you'd like to train on
      - by default, it's the number of movies in the data set (1056)
    - just_eval (bool): True if you'd like to just calculate the evaluation metrics. Assumes you already have the trained posterior values in results/theta.npy.
- Pyro
  - can run the notebook straight through
  - modify number of topics, number of epochs run, etc. in cell 4
- Turing
  - can run the notebook using Julia runtime
  - results are output to CSV (cache/julia_out.csv) for evaluation in Python using eval_julia.ipynb
PMF
- pure Python
  - can run the notebook straight through
- Stan
  - run python3 stan-pmf.py in terminal with the followiing available flags:
    - just_eval (bool): True if you'd like to just calculate the evaluation metrics. Assumes you already have the trained posterior values in results/Z.npy and results/W.npy.

Eval Metrics

About

Our COM SCI 267 project. Credit to Nikki Woo for fantastic project name.

Report repository

Releases

No releases published

Packages

No packages published

Contributors 4

Languages