All notable changes to this project will be documented in this file.
- Example GLiNER integration (#1504)
- Docs revamp and docstring bug fixes (#1500)
- Minor updates to the mkdocstrings config (#1503)
- Added logic to handle phone numbers with country code (#1426) (Thanks @kauabh)
- Added UK National Insurance Number Recognizer (#1446) (Thanks @hhobson)
- Fixed regex match_time output (#1488) (Thanks @andrewisplinghoff)
- Added fix to ensure configuration files are closed properly when loading them (#1423) (Thanks @saulbein)
- Closing handles for YAML file (#1424) (Thanks @roeybc)
- Reduce memory usage of Analyzer test suite (#1429) (Thanks @hhobson)
- Added
batch_size
parameter toBatchAnalyzerEngine
(#1449) (Thanks @roeybc) - Remove ignored labels from supported entities (#1454) (Thanks @omri374)
- Update US_SSN CONTEXT and unit test (#1455) (Thanks @claesmk)
- Fixed bug with Azure AI language context (#1458) (Thanks @omri374)
- Add support for allow_list, allow_list_match, regex_flags in REST API (#1484) (Thanks @hdw868)
- Add a link to model classes to simplify configuration (#1472) (Thanks @omri374)
- Restricting spacy.cli for version 3.7.0 (#1495) (Thanks @kshitijcode)
- No changes specified for Anonymizer in this release.
- Fix presidio-structured build - lock numpy version (#1465) (Thanks @SharonHart)
- Fix bug with image conversion (#1445) (Thanks @omri374)
- Removed Python 3.8 support (EOL) and added 3.12 (#1479) (Thanks @omri374)
- Update Docker build to use gunicorn for containers (#1497) (Thanks @RKapadia01)
- New Dev containers for analyzer, analyzer+transformers, anonymizer (#1459) (Thanks @roeybc)
- Added dev containers for: analyzer, analyzer+transformers, anonymizer, and image redaction (#1450) (Thanks @roeybc)
- Added support for allow_list, allow_list_match, regex_flags in REST API (#1488) (Thanks @hdw868)
- Typo fix in if condition (#1419) (Thanks @omri374)
- Minor notebook changes (#1420) (Thanks @omri374)
- Do not release
presidio-cli
as part of the release pipeline (#1422) (Thanks @SharonHart) - (Docs) Use Presidio across Anthropic, Bedrock, VertexAI, Azure OpenAI, etc. with LiteLLM Proxy (#1421) (Thanks @krrishdholakia)
- Update CI due to DockerCompose project name issue (#1428) (Thanks @omri374)
- Update docker-compose installation docs (#1439) (Thanks @MWest2020)
- Fix space typo in docs (#1459) (Thanks @artfuldev)
- Unlock numpy after dropping 3.8 (#1480) (Thanks @SharonHart)
- Add a link to HashiCorp vault operator resource (#1468) (Thanks Akshay Karle)
- Updates to the transformers conf docs and yaml file (#1467)
- docs: clarify the docs on deploying presidio to k8s (#1453) (Thanks Roel Fauconnier)
Note: A new YAML based mechanism has been added to support no-code customization and creation of recognizers. The default recognizers are now automatically loaded from file.
- Recognizer for Spanish Foreigners Identity Code (NIE Numero de Identificacion de Extranjeros).
- Recognizer for Finnish Personal Identity Codes (Henkilötunnus) (#1394) (Thanks honderr).
- New Predefined Recognizer for Indian Passport #1350 (#1351) (Thanks Hiten-98)
- Add new recognizer for IN_VOTER #1344 (#1345) (Thanks kjdeveloper8)
- Spanish NIE (Foreigners ID card) recognizer (#1359) (Thanks areyesfalcon)
- Added regex functionality for allow lists in the analyzer (#1357) (Thanks NarekAra)
- Loading analyzer engine & recognizer registry from configuration file (#1367)
- Align ports with documentation and postman collection. (#1375) (Thanks ungana)
- Analyzer documentation (#1384)
- Fix the entity filtering of the transformer_recognizer.py analzye function (#1403) (Thanks andreas-eberle)
- Update conf files location (#1358)
- Fix OverflowError in crypto_recognizer (#1377)
- Improve url detector (#1398) (Thanks afogel)
- Update Dockerfile.windows (#1413) (thanks markvantilburg)
- Changing predefined recognizers to use the config file (#1393) (Thanks RoeyBC)
- Update Dockerfile.windows (#1414) (thanks markvantilburg)
- Add Ruff linter + Apply Ruff fix (#1379)
- Auto-formatting, fix D rules (#1381)
- Fix N818, E721 (#1382)
- Migrate Python Packaging to pyproject.toml (#1383)
- From Pipenv to Poetry (#1391)
- Fix ports in docs (#1408)
- Support 'M' prefix in SG_NRIC_FIN Recognizer and expand tests (#1304) (Thanks @miltonsim)
- Add Bech32 and Bech32m Bitcoin Address Validation in Crypto Recognizer and expand tests (#1307) (Thanks @miltonsim)
- Predefined pattern recognizer : IN_VEHICLE_REGISTRATION (#1288) (Thanks @devopam)
- Addition of leniency parameter in predefined PhoneRecognizer (#1311) (Thanks @VMD7)
- Add Singapore UEN Recognizer (#1315) (Thanks @miltonsim)
- Update spacy_stanza.md (#1325) (Thanks @AndreasThinks)
- Adding Span Marker Recognizer Sample (#1321) (Thanks @VMD7)
- Cache compiled regexes in analyzer (#1335) (Thanks @Edward-Upton)
- Added pseudonimyzation sample (#1296)
- Added tesseract to installation (#1312)
- Analysis builder improvements (#1295) (Thanks @ebotiab)
- Implement user-defined entity selection strategies in Presidio Structured (#1319) (Thanks @miltonsim)
- Fix for incorrectly referenced recognizer in analysis_explaination using PhoneRecognizer (#1330) *Thanks @egillv021)
- Fix bug where "bank" and "check" wouldn't work (#1333) (Thanks @usr-ein and @Samuel Prevost)
- Bugfix in tutorial (#1310)
- Changed default aggregation_strategy to max (#1342)
- Fixed wrong condition for dicom metadata (#1347)
- Add predefined_recognizer: IN_AADHAAR (#1256)
- Added the option to add custom operators + pseudonymization sample (#1284)
- Fix failing test due to optional package (#1258)
- Update publish-to-pypi.yml (#1259)
- Allow local Spacy Models to be loaded in NLP Engine (#1269)
- Upgrade pip in windows containers (#1272)
- Bugfix in ImageAnalyzerEngine #1274
- Added alpha of presidio-structured, a library (presidio-structured) which re-uses existing logic from existing presidio components to allow anonymization of (semi-)structured data. (#1192)
- Add PL PESEL recognizer (#1209)
- Azure AI language recognizer (#1228)
- Add_conf_to_package_data (#1243)
- Add keep operator as deanonymizer (#1255)
- Update anonymize_list type hints and document that sometimes items will be ignored. (#1252)
- Add Dockerfile for Windows containers (#1194)
- Drop WA driver license number (#1214)
- Change ner_model_configuration from list to map (#1222)
- Bugfix in SpacyRecognizer (#1221)
- Bugfix in NerModelConfiguration (#1230)
- Add_conf_to_package_data (#1243)
- Improved the logic of conflict handling in AnonymizerEngine (#1196)
- Change default score threshold in image redactor (#1210)
- fixes bug #1227 (#1231)
- Added missing dependencies for opencv-python and azure forms recognizer (#1257)
- Remove inclusive-lint step (#1207)
- Updates to demo website with new NLP Engine (#1181)
- Hotfix for NerModelConfiguration not created correctly (#1208)
- Hotfix: default.yaml is not parsed correctly (#1202)
- Put org in ignore as it has many FPs (#1200)
- New Predefined Recognizer: IN_PAN (#1100)
- Anonymizer - Pass bytes key to Encrypt / Decrypt (#1147)
- DICOM redactor improvement: Enabling more photometric interpretations (#1103)
- DICOM redactor improvement: Adding exceptions for when DICOM file does not have pixel data (#1104)
- Small reordering of kwargs as prereq for allow list functionality (#1110)
- DICOM redactor improvement: Preventing distortion when multiple sets of pixels are in one instance (#1109)
- DICOM redactor improvement: Enabling compatibility with compressed images (#1105)
- DICOM redactor improvement: Enable return of redacted bboxes (#1111)
- DICOM redactor improvement: Enable selection of redact approach (#1113)
- Enable toggle of printing output location after redacting from file (#1144)
- Changing test exception type check (#1148)
- Enabling allow list approach with all image redaction (#1145)
- Improve process names method in DICOM image redactor (#1150)
- Adding examples of toggling metadata usage and saving bboxes (#1158)
- Updating verification engines to include latest updates to redactor engines (#1162)
- Improved bbox processor (#1163)
- Updating verification engines and enable plotting of custom bboxes (#1164)
- Added image processing class to preprocess the image before running OCR (#1166)
- Added support for Microsoft's document intelligence OCR
- Refactored the
NlpEngine
and Ner recognizers (SpacyRecognizer
,TransformersRecognizer
,StanzaRecognizer
) to allow simpler integration of huggingface and transformers models (#1159). This includes:- Changes in how NER results flow through Presidio (see docs)
- NER/model definition is now defined using a conf file or a
NerModelConfiguration
object. - Integrated
spacy-huggingface-pipelines
for a more robust integration of huggingface models.
- As a result,
SpacyRecognizer
logic has changed, please see #1159. Some fields within the class are now deprecated. - Updated type checks (#1175)
- Enabled regex flags manipulation (#1193)
- Initial logic check for merging 2 entities (#1092)
- Fix Sphinx warning in OperatorConfig (#1143)
- Fix type mismatch in check_label_groups parameter in spacy_recognizer (#1130)
- anonymize_list return type hint fix (#1178)
- We no longer use Pipenv.lock. Locking happens as part of the CI. (#1152)
- Changed the ACR instance (#1089)
- Updated to Cred Scan V3 (#1154)
- Added
keep
, an no-op anonymizer that allows preserving some types of PII while keeping track of its position in anonymized output. (#1062) - Added
BatchAnonymizerEngine
to complement theBatchAnalyzerEngine
for lists, and dicts (#993)
- Drop support for Python 3.7
- Add support for Python 3.11
- New demo app for Presidio, based on Streamlit (#1054)
- GPT based synthetic data generation (#1051)
- Updated dependencies
- Fixed exception on whitespace in AU recognizers
- Updated API version for Text Analytics in sample
- Fixed merge entity from the same type
- Modified
ImagePiiVerifyEngine
to allow passing of kwargs - Updated template for building image redactor yaml
- Updated all image redactor engines and OCR classes to allow passing of an OCR confidence threshold and other OCR parameters
- Moved general bounding box operations to new class
BboxProcessor
- Updated
presidio-image-redactor
version from 0.0.45 to 0.0.46
- Added revised example for transformer recognizer
- Added evaluation code for the DICOM image redaction capabilities
- REST API to support web applications payload
- Updated documentation to include instructions on using DICOM evaluation code
- Updated documentation to mention OCR thresholding
- Added DICOM image redaction capabilities (
DicomImageRedactorEngine
class and tests) - Updated
setup.py
to include new required packages for DICOM capabilities - Updated Pipfile and Pipfile.lock
- Updated
presidio-image-redactor
version from 0.0.44 to 0.0.45 - Updated the
ImagePiiVerifyEngine
class to allow use of custom analyzer engines
- Updated
NOTICE
to include licenses of added packages - Updated docs with getting started code for new
DicomImageRedactorEngine
- Added Italian fiscal code recognizer
- Added Italian driver license recognizer
- Added Italian identity card recognizer
- Added Italian passport recognizer
- Added
TransformersNlpEngine
to support transformer based NER models within spaCy pipelines - Added pattern for next gen US passport in
presidio-analyzer/presidio_analyzer/predefined_recognizers/us_passport_recognizer.py
- Improved MEDICAL_LICENSE pattern and fixed checksum verification
- Bugfix for context handling by aligning results to recognizers using a unique identifier and not recognizer name
- Updated Pipfile.lock
- Removed constraint on empty texts
- Updated Pipfile.lock
- Updated
pipenv
version - Updated
black
andflake8
in pre-commit scripts - Updated docs for NLP engine
- Added Presidio to OSSF (Open Source Security Foundation)
- Added CodeQL scanning
- Introduced BatchAnalyzerEngine
- Added allow-list functionality to ignore specific strings
- Added notebook on anonymizing known values
- Added sample for using
transformers
models in Presidio
- Bug fix for getting the text before anonymizing (#890)
- Deps update
- Improved deny-list regex and customizability
- Added documentation for existing spaCy models
- Bugfix in analysis explanation scores
- PIL version updated to 9.0.1
- Recognizers can be loaded from YAML
- Improved context mechanisms to support recognizer level context enhacenement and cross-entity context support
Bug fix in context support
- Added a URL recognizer
- Added a new capability for creating new logic for context detection. See ContextAwareEnhancer and LemmaContextAwareEnhancer. Documentation would be added on a future release.
Furthermore, it is now possible to pass context words thruogh the
analyze
method (or via API) and those would be taken into account for context enhancement.
- Bug fix for entities at the end of a sentence.
- Formatted (black/flake8) the Python examples.
- Removed the DOMAIN_NAME recognizer. This change means that the
DOMAIN_NAME
entity is no longer returned by Presidio.URL
would be returned instead, and would catch full addresses and not just domain names (https://www.microsoft.com/a/b.html
and not justwww.microsoft.com
)
- Fixed issue when IBAN followed by all caps can't be recognized
- Updated dependencies in Pipfile.lock
- Removed official Python 3.6 support and added support for 3.10
- Added docs for creating a streamlit app
- Added docs for using Flair
- Added multi-regional phone number recognizer.
- Fixed duplicated entities removal.
- Added sample for structured / semi-structured data in batch.
- Dependencies version bumps.
- Added sample for getting an identified entity value using a custom Operator.
- Changed packages/imports .
- Added repr to classes.
- Added encryption and decryption samples.
- Remove AnonymizerResult in favor of OperatorResult, for an easier anonymization-deanonymization.
- Anonymizaer and Deanonymizaer to return
operator_name
instead ofoperator
in OperatorResult.
- Databricks based template in Azure Data Factory docs
- Adding ORGANIZATION recognizer docs
- Bumped pydantic from 1.7.3 to 1.7.4
- Updated call to stanza via spacy-stanza
- Added DATE_TIME recognizer
- Added Medical Licence recognizer
- Bumped spacy from 3.0.5 to 3.0.6
- Create CODE_OF_CONDUCT
- ADF templates docs
- Fix spark sample to run presidio in broadcast
- Ad-hoc recognizers
- Text Analytics Integration Sample
- Documentation update and samples validation
- Adding tagger to the spaCy model pipeline
- Sample notebook for remote recognizer (using Text Analytics)
- Add matplotlib to image-redactor
- Added custom lambda anonymizer
- Added add pii_verify_engine to the image-redactor
Upgrade Analyzer spacy version to 3.0.5
- Request entity AnonymizerConfig renamed OperatorConfig
- In OperatorConfig: anonymizer_name -> operator_name
- Response entity AnonymizerResult renamed to EngineResult
- In EngineResult: List[AnonymizedEntity] -> List[OperatorResult]
- In OperatorResult:
- anonymizer -> operator
- anonymized_text -> text
- Response entity anonymizer renamed to operator.
- Response entity anonymizer_text renamed to text.
New endpoint for deanonymizing encrypted entities by the anonymizer. [unreleased]: https://github.com/microsoft/presidio/compare/2.2.357...HEAD [2.2.357]: https://github.com/microsoft/presidio/compare/2.2.356...2.2.357 [2.2.356]: https://github.com/microsoft/presidio/compare/2.2.355...2.2.356 [2.2.355]: https://github.com/microsoft/presidio/compare/2.2.354...2.2.355 [2.2.354]: https://github.com/microsoft/presidio/compare/2.2.353...2.2.354 [2.2.353]: https://github.com/microsoft/presidio/compare/2.2.352...2.2.353 [2.2.352]: https://github.com/microsoft/presidio/compare/2.2.351...2.2.352 [2.2.351]: https://github.com/microsoft/presidio/compare/2.2.350...2.2.351 [2.2.350]: https://github.com/microsoft/presidio/compare/2.2.35...2.2.350 [2.2.35]: https://github.com/microsoft/presidio/compare/2.2.34...2.2.35 [2.2.34]: https://github.com/microsoft/presidio/compare/2.2.33...2.2.34 [2.2.33]: https://github.com/microsoft/presidio/compare/2.2.32...2.2.33 [2.2.32]: https://github.com/microsoft/presidio/compare/2.2.31...2.2.32 [2.2.31]: https://github.com/microsoft/presidio/compare/2.2.30...2.2.31 [2.2.30]: https://github.com/microsoft/presidio/compare/2.2.29...2.2.30 [2.2.29]: https://github.com/microsoft/presidio/compare/2.2.28...2.2.29 [2.2.28]: https://github.com/microsoft/presidio/compare/2.2.27...2.2.28 [2.2.27]: https://github.com/microsoft/presidio/compare/2.2.26...2.2.27 [2.2.26]: https://github.com/microsoft/presidio/compare/2.2.25...2.2.26 [2.2.25]: https://github.com/microsoft/presidio/compare/2.2.24...2.2.25 [2.2.24]: https://github.com/microsoft/presidio/compare/2.2.23...2.2.24 [2.2.23]: https://github.com/microsoft/presidio/compare/2.2.2...2.2.23 [2.2.2]: https://github.com/microsoft/presidio/compare/2.2.1...2.2.2 [2.2.1]: https://github.com/microsoft/presidio/compare/2.2.0...2.2.1