Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved checkduplicates test and restructure project #14

Merged
merged 25 commits into from
Nov 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
6a68e58
remove extra quote
TabulateJarl8 Jan 22, 2024
7f2d2ef
first iteration of rust checkduplicates test
TabulateJarl8 Nov 16, 2024
e880cf7
second iteration speedup
TabulateJarl8 Nov 16, 2024
26841b4
add fix duplicates functionality
TabulateJarl8 Nov 16, 2024
6f516ff
fix fact typos
TabulateJarl8 Nov 17, 2024
5d76547
use arc for cheap cloning
TabulateJarl8 Nov 17, 2024
67b578c
performance enchancements
TabulateJarl8 Nov 17, 2024
f2b2d6a
even more performance improvements
TabulateJarl8 Nov 17, 2024
da2103b
rename new rust test
TabulateJarl8 Nov 17, 2024
7ec19b9
refactor rust project
TabulateJarl8 Nov 17, 2024
d567a82
add comments
TabulateJarl8 Nov 17, 2024
1968a83
remove new duplicate facts that were found
TabulateJarl8 Nov 17, 2024
de5f66f
refactor project to use poetry and pytest
TabulateJarl8 Nov 17, 2024
d68ee0c
remove extra quote
TabulateJarl8 Jan 22, 2024
c394237
Merge master into improved_tests
TabulateJarl8 Nov 17, 2024
a1d08c8
fix workflow file
TabulateJarl8 Nov 17, 2024
67d909c
fix version conflicts with older versions of python
TabulateJarl8 Nov 17, 2024
ddc45d5
bump version
TabulateJarl8 Nov 17, 2024
ea11d34
remove old checkduplicates
TabulateJarl8 Nov 17, 2024
ba53398
add more ruff lints and convert back to tabs
TabulateJarl8 Nov 17, 2024
ff1ce58
finish fixing all new ruff checks
TabulateJarl8 Nov 17, 2024
28dc242
update copyright dates
TabulateJarl8 Nov 18, 2024
b6e8f57
switch from single to double quotes in ruff
TabulateJarl8 Nov 18, 2024
172731d
add binary caching to checkduplicates CI
TabulateJarl8 Nov 18, 2024
0a2c930
fix module docstring
TabulateJarl8 Nov 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions .github/workflows/check_duplicates.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: Check for Duplicate Facts

# Controls when the action will run.
on:
# Triggers the workflow on push or pull request events but only for the master branch
push:
branches: [ master ]
pull_request:
branches: [ master ]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
checkduplicates:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Cache checkduplicates binary
uses: actions/cache@v4
id: cache
with:
path: |
tests/checkduplicates/target/release/checkduplicates
key: ${{ runner.os }}-cargo-${{ hashFiles('tests/checkduplicates/Cargo.lock', 'tests/checkduplicates/Cargo.toml', 'tests/checkduplicates/src/**') }}
restore-keys: |
${{ runner.os }}-cargo-

- name: Build checkduplicates test
if: steps.cache.outputs.cache-hit != 'true'
run: |
cd tests/checkduplicates
cargo build --release

- name: Check for duplicate facts
run: ./tests/checkduplicates/target/release/checkduplicates
67 changes: 0 additions & 67 deletions .github/workflows/codeql-analysis.yml

This file was deleted.

102 changes: 74 additions & 28 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -1,43 +1,89 @@
# This is a basic workflow to help you get started with Actions

name: CI

# Controls when the action will run.
on:
# Triggers the workflow on push or pull request events but only for the master branch
push:
branches: [ master ]
pull_request:
branches: [ master ]

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
test:
# The type of runner that the job will run on
name: Test code and coverage
runs-on: ubuntu-latest

# Steps represent a sequence of tasks that will be executed as part of the job
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"]
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout@v2

# Runs a set of commands using the runners shell
- name: Run a multi-line script
run: |
pip3 install setuptools wheel
python3 setup.py sdist
pip3 install dist/*
python3 tests/test.py
checkduplicates:
runs-on: ubuntu-latest
- uses: actions/checkout@v4

steps:
- uses: actions/checkout@v2
- name: Run a multi-line script
run: |
pip3 install -U setuptools wheel pip
pip3 install rapidfuzz tqdm
python3 tests/checkduplicates.py
# If you wanted to use multiple Python versions, you'd have specify a matrix in the job and
# reference the matrixe python version here.
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
allow-prereleases: true

# Cache the installation of Poetry itself, e.g. the next step. This prevents the workflow
# from installing Poetry every time, which can be slow. Note the use of the Poetry version
# number in the cache key, and the "-0" suffix: this allows you to invalidate the cache
# manually if/when you want to upgrade Poetry, or if something goes wrong. This could be
# mildly cleaner by using an environment variable, but I don't really care.
- name: cache poetry install
uses: actions/cache@v4
with:
path: ~/.local
key: poetry-1.1.12-0

# Install Poetry. You could do this manually, or there are several actions that do this.
# `snok/install-poetry` seems to be minimal yet complete, and really just calls out to
# Poetry's default install script, which feels correct. I pin the Poetry version here
# because Poetry does occasionally change APIs between versions and I don't want my
# actions to break if it does.
#
# The key configuration value here is `virtualenvs-in-project: true`: this creates the
# venv as a `.venv` in your testing directory, which allows the next step to easily
# cache it.
- uses: snok/install-poetry@v1
with:
version: 1.5.1
virtualenvs-create: true
virtualenvs-in-project: true

# Cache your dependencies (i.e. all the stuff in your `pyproject.toml`). Note the cache
# key: if you're using multiple Python versions, or multiple OSes, you'd need to include
# them in the cache key. I'm not, so it can be simple and just depend on the poetry.lock.
- name: cache deps
id: cache-deps
uses: actions/cache@v4
with:
path: .venv
key: pydeps-${{ hashFiles('**/poetry.lock') }}

# Install dependencies. `--no-root` means "install all dependencies but not the project
# itself", which is what you want to avoid caching _your_ code. The `if` statement
# ensures this only runs on a cache miss.
- run: poetry install --no-interaction --no-root
if: steps.cache-deps.outputs.cache-hit != 'true'

# Now install _your_ project. This isn't necessary for many types of projects -- particularly
# things like Django apps don't need this. But it's a good idea since it fully-exercises the
# pyproject.toml and makes that if you add things like console-scripts at some point that
# they'll be installed and working.
- run: poetry install --no-interaction

# run the tests and check for 100% coverage
- run: poetry run pytest . --cov=randfacts --cov-report=term-missing --cov-report=xml

# check for code style errors
- run: poetry run ruff check
# disable code format checking until docstrings are sorted out
# https://github.com/astral-sh/ruff/issues/8430
# - run: poetry run ruff format --check
- name: Upload coverage reports to Codecov
uses: codecov/[email protected]
with:
token: ${{ secrets.CODECOV_TOKEN }}
38 changes: 32 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
# MacOS development (added by PancakesWasTaken)
.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down Expand Up @@ -97,7 +94,22 @@ ipython_config.py
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
# poetry
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
# This is especially recommended for binary packages to ensure reproducibility, and is more
# commonly ignored for libraries.
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
# poetry.lock

# pdm
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
# in version control.
# https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
Expand Down Expand Up @@ -140,5 +152,19 @@ dmypy.json
# Cython debug symbols
cython_debug/

# cargo
Cargo.lock
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

# Poetry local configuration file - https://python-poetry.org/docs/configuration/#local-configuration
poetry.toml

# ruff
.ruff_cache/

# LSP config files
pyrightconfig.json

4 changes: 2 additions & 2 deletions LICENSE.txt → LICENSE
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
MIT License
Copyright (c) 2020-2021 Connor Sample
Copyright (c) 2020-2024 Connor Sample
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
Expand All @@ -14,4 +14,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
SOFTWARE.
1 change: 0 additions & 1 deletion MANIFEST.in

This file was deleted.

Loading