Skip to content

Latest commit

 

History

History
176 lines (137 loc) · 7.44 KB

README.md

File metadata and controls

176 lines (137 loc) · 7.44 KB
Shows the Penman logo: a slash character between two parentheses.

Penman – a library for PENMAN graph notation

PyPI Version Python Support .github/workflows/checks.yml Documentation Status

This package models graphs encoded in PENMAN notation (e.g., AMR), such as the following for the boy wants to go:

(w / want-01
   :ARG0 (b / boy)
   :ARG1 (g / go
            :ARG0 b))

The Penman package may be used as a Python library or as a script.

For Javascript, see chanind/penman-js.

Features

  • Read and write PENMAN-serialized graphs or triple conjunctions
  • Read metadata in comments (e.g., # ::id 1234)
  • Read surface alignments (e.g., foo~e.1,2)
  • Inspect and manipulate the graph or tree structures
  • Customize graphs for writing:
    • Adjust indentation and compactness
    • Select a new top node
    • Rearrange edges
    • Restructure the tree shape
    • Relabel node variables
  • Transform the graph
    • Canonicalize roles
    • Reify and dereify edges
    • Reify attributes
    • Embed the tree structure with additional TOP triples
  • AMR model: role inventory and transformations
  • Check graphs for model compliance
  • Tested (but not yet 100% coverage)
  • Documented (see the documentation)

Library Usage

>>> import penman
>>> g = penman.decode('(b / bark-01 :ARG0 (d / dog))')
>>> g.triples
[('b', ':instance', 'bark-01'), ('b', ':ARG0', 'd'), ('d', ':instance', 'dog')]
>>> g.edges()
[Edge(source='b', role=':ARG0', target='d')]
>>> print(penman.encode(g, indent=3))
(b / bark-01
   :ARG0 (d / dog))
>>> print(penman.encode(g, indent=None))
(b / bark-01 :ARG0 (d / dog))

(more information)

Script Usage

$ echo "(w / want-01 :ARG0 (b / boy) :ARG1 (g / go :ARG0 b))" | penman
(w / want-01
   :ARG0 (b / boy)
   :ARG1 (g / go
            :ARG0 b))
$ echo "(w / want-01 :ARG0 (b / boy) :ARG1 (g / go :ARG0 b))" | penman --make-variables="a{i}"
(a0 / want-01
    :ARG0 (a1 / boy)
    :ARG1 (a2 / go
              :ARG0 a1))

(more information)

Demo

For a demonstration of the API usage, see the included Jupyter notebook:

PENMAN Notation

A description of the PENMAN notation can be found in the documentation. This module expands the original notation slightly to allow for untyped nodes (e.g., (x)) and anonymous relations (e.g., (x : (y))). It also accommodates slightly malformed graphs as well as surface alignments.

Citation

If you make use of Penman in your work, please cite Goodman, 2020. The BibTeX is below:

@inproceedings{goodman-2020-penman,
    title = "{P}enman: An Open-Source Library and Tool for {AMR} Graphs",
    author = "Goodman, Michael Wayne",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-demos.35",
    pages = "312--319",
    abstract = "Abstract Meaning Representation (AMR) (Banarescu et al., 2013) is a framework for semantic dependencies that encodes its rooted and directed acyclic graphs in a format called PENMAN notation. The format is simple enough that users of AMR data often write small scripts or libraries for parsing it into an internal graph representation, but there is enough complexity that these users could benefit from a more sophisticated and well-tested solution. The open-source Python library Penman provides a robust parser, functions for graph inspection and manipulation, and functions for formatting graphs into PENMAN notation. Many functions are also available in a command-line tool, thus extending its utility to non-Python setups.",
}

For the graph transformation/normalization work, please cite Goodman, 2019. The BibTeX is below:

@inproceedings{Goodman:2019,
  title     = "{AMR} Normalization for Fairer Evaluation",
  author    = "Goodman, Michael Wayne",
  booktitle = "Proceedings of the 33rd Pacific Asia Conference on Language, Information, and Computation",
  year      = "2019",
  pages     = "47--56",
  address   = "Hakodate",
  publisher = "Japanese Association for the Study of Logic, Language and Information",
  url       = "https://jaslli.org/files/proceedings/05_paclic33_postconf.pdf",
  abstract  = "Abstract Meaning Representation (AMR; Banarescu et al., 2013) encodes the meaning of sentences as a directed graph and Smatch (Cai and Knight, 2013) is the primary metric for evaluating AMR graphs. Smatch, however, is unaware of some meaning-equivalent variations in graph structure allowed by the AMR Specification and gives different scores for AMRs exhibiting these variations. In this paper I propose four normalization methods for helping to ensure that conceptually equivalent AMRs are evaluated as equivalent. Equivalent AMRs with and without normalization can look quite different—comparing a gold corpus to itself with relation reification alone yields a difference of 25 Smatch points, suggesting that the outputs of two systems may not be directly comparable without normalization. The algorithms described in this paper are implemented on top of an existing open-source Python toolkit for AMR and will be released under the same license."
}

Disclaimer

This project is not affiliated with ISI, the PENMAN project, or the AMR project.