Skip to content

Commit

Permalink
README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
azagniotov authored Jan 16, 2024
1 parent 27773cd commit c53a6ac
Showing 1 changed file with 12 additions and 5 deletions.
17 changes: 12 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ A Lucene plugin based on [Sudachi](https://github.com/WorksApplications/Sudachi)
* [Local Development](#local-development)
* [Prerequisites](#prerequisites)
* [Downloading a Sudachi dictionary](#downloading-a-sudachi-dictionary)
* [Changing local Sudachi dictionary location for runtime](#changing-local-sudachi-dictionary-location-for-runtime)
* [System Requirements](#system-requirements)
* [Build System](#build-system)
* [List of Gradle tasks](#list-of-gradle-tasks)
Expand Down Expand Up @@ -131,9 +132,17 @@ The plugin needs a dictionary in order to run the tests. Thus, it needs to be do
```

The above command does the following:
1. Downloads a system dictionary `sudachi-dictionary-20230711-core` ZIP from AWS and unpacks it under the `/tmp/sudachi/`
2. Copies the [user-dictionary/user_lexicon.csv](user-dictionary/user_lexicon.csv) under the `/tmp/sudachi/`. The CSV is used to create a User dictionary. Although user defined dictionary is not really needed here, this sets an example how to add user entries to a dictionary.
3. Builds a Sudachi user dictionary from the CSV under the `/tmp/sudachi/`
1. Downloads a system dictionary `sudachi-dictionary-<YYYYMMDD>-full.zip` (The `YYYYMMDD` is `20230927` as of Jan 15th, 2024) ZIP from AWS and unpacks it under the `<PROJECT_ROOT>/.sudachi/downloaded/` (if the ZIP has been downloaded earlier, the downloaded file will be reused)
2. Unzips the content under the `/tmp/sudachi/system-dict/`
3. Renames the downloaded `system_full.dic` to `system.dict`
4. Copies the [user-dictionary/user_lexicon.csv](user-dictionary/user_lexicon.csv) under the `/tmp/sudachi/`. The CSV is used to create a User dictionary. Although user defined dictionary contains only two entries, this sets an example how to add user dictionary metadata entries.
5. Builds a Sudachi user dictionary `user_lexicon.dict` from the CSV and places it under the `/tmp/sudachi/system-dict`

#### Changing local Sudachi dictionary location for runtime

At runtime, the plugin expects the system and user dictionaries to be located at `/tmp/sudachi/system-dict/system.dict` and `/tmp/sudachi/user_lexicon.dict` respectively.

But, their location in the local file system can be controlled via the ENV variables `SUDACHI_SYSTEM_DICT` and `SUDACHI_USER_DICT` respectively.

[`Back to top`](#table-of-contents)

Expand Down Expand Up @@ -162,8 +171,6 @@ Building and packaging can be done with the following command:
./gradlew build
```

As per [https://github.com/WorksApplications/Sudachi#dictionaries](https://github.com/WorksApplications/Sudachi#dictionaries), the above command will download a `system_core.dic` and will place it under [src/main/resources/system-dict/](src/main/resources/system-dict)

#### Formatting

The project leverages the [Spotless Gradle plugin](https://github.com/diffplug/spotless/tree/main/plugin-gradle) and follows the [palantir-java-format](https://github.com/palantir/palantir-java-format) style guide.
Expand Down

0 comments on commit c53a6ac

Please sign in to comment.