Inconsistent function_word_frequency features #1

bandrus5 · 2020-11-03T19:00:57Z

The way function word frequencies are counted, you get a different set of function words with each document, which makes comparisons between documents less meaningful. This is easy to see in the flattened version of the feature vector, where each feature is labelled individually, but not clear in the un-flattened version, where the frequency counts are just an unlabeled list. It should either be clear that the features are inconsistent from document to document or, preferably, this should use a consistent set of function words.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent function_word_frequency features #1

Inconsistent function_word_frequency features #1

bandrus5 commented Nov 3, 2020

Inconsistent function_word_frequency features #1

Inconsistent function_word_frequency features #1

Comments

bandrus5 commented Nov 3, 2020