Skip to content
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.
/ nlp-weka-package Public archive

Weka package containing various natural language processing components.

License

Notifications You must be signed in to change notification settings

fracpete/nlp-weka-package

Repository files navigation

nlp-weka-package

Contains various natural language processing components.

Parsers

Makes use of the Stanford Parser. You can download parser models from Maven Central, unzip them (a .jar file is simply a ZIP file) and point to the correct parser model. A simple parser model is available from:

  • Linux

    $HOME/wekafiles/nlp/models

  • Windows

    %USERPROFILE%\\wekfiles\\nlp\\models

Filters

  • weka.filters.unsupervised.attribute.PartOfSpeechTagging

    Performs part-of-speech tagging.

  • weka.filters.unsupervised.attribute.ChangeCase

    Changes strings to upper or lower case.

Tokenizers

  • weka.core.tokenizers.PTBTokenizer

    Penn Treebank tokenizer

  • weka.core.tokenizers.WhiteSpaceTokenizer

    simple tokenizer, uses String.split("\s")

Explorer

You can test parsers (and associated options) with the Explorer tab NLP Parse trees. You simply select a STRING attribute from the currently loaded dataset that you want to analyze and then you can select a row from the dataset to parse. Once you have selected a parser model (and maybe added some custom options), you can parse the string/document. For each sentence in the string/document, a separate parse tree will get generated and displayed.

Releases

How to use packages

For more information on how to install the package, see:

https://waikato.github.io/weka-wiki/packages/manager/

Maven

Add the following dependency in your pom.xml to include the package:

    <dependency>
      <groupId>com.github.fracpete</groupId>
      <artifactId>nlp-weka-package</artifactId>
      <version>2019.3.29</version>
      <type>jar</type>
      <exclusions>
        <exclusion>
          <groupId>nz.ac.waikato.cms.weka</groupId>
          <artifactId>weka-dev</artifactId>
        </exclusion>
      </exclusions>
    </dependency>