-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incorrect PENMAN encoding oof triple list with duplicate :instance triple #113
Comments
Thanks for the report! I agree that the output is bad. However, the input is not a valid PENMAN string, either. I'm not certain if the reuse of the So I'm not fully convinced that there is a bug here that should be fixed, because the input is already bad. Do you have more context about where this AMR came from? |
The incorrect initial Penman is an annotation error. It should have been
but is the following really invalid? I couldn't find any documentation on this)
|
Ok, thanks for explaining. Yes, I now agree that it would be useful for Penman to be able to detect such bad graphs, and the current methods are insufficient (a warning is currently issued in the above case, and you could cause that warning to become an error, but that is not a good solution). Do you have a proposal of how this check might be implemented? BTW regarding your question of duplicate instantiations, my opinion is that it is incorrect. Information should not be repeated, even if the information is consistent. There is the idea of distributed nodes where new information can be added at a later point, so, e.g., if the sentence were instead the young boy wants to believe, it could be presented as:
However I don't think I've seen any AMR example demonstrating this, so I assume it is invalid (pre-AMR PENMAN may have allowed it, but I can't turn up the reference just now). In any case, for our current purposes we need an idea of how the check is implemented and where it is exposed in the API. |
Hi, I agree that we should consider a second instantiation of the same variable (even with the same concept) is an error. for the duplicate triple, I do not see any warning (with version 1.2.2) , so
outputs the incorrect
However I see a warning when decoding (the incorrect)
which gives So if penman.encode() gave the same warning (and ignoring the duplicate would be fine |
@jheinecke yes, I meant the warning appears when decoding the erroneous AMR, not when encoding it. The epigraph is metadata about the graph to help it be configured into the same tree-like structure that was read in. The metadata is stored as a mapping of triples to epigraphical markers, so duplicate triples would not work with such a mapping. The triples themselves, however, are stored as a list, so it is possible to have duplicate triples. Upon configuring the graph to a tree (or encoding it to a string), the duplicates will be inserted at the same place as it assumes singular instantiation of nodes, leading to the behavior you see. It is somewhat convenient to allow the graph to store the duplicate triples, however, because then we can use that data in scripts that, e.g., detect and/or clean up such errors. There is actually no error or warning when parsing the string into the tree, it's only when interpreting the tree into a graph that we get a warning: >>> import penman
>>> t = penman.parse('(a / alpha :ARG0 (b / beta) :ARG1 (b / beta))')
>>> t
Tree(('a', [('/', 'alpha'), (':ARG0', ('b', [('/', 'beta')])), (':ARG1', ('b', [('/', 'beta')]))]))
>>> g = penman.interpret(t)
ignoring epigraph data for duplicate triple: ('b', ':instance', 'beta') It would be tempting, then, to implement the check for duplicate edges on the tree structure, but the problem is that the tree is agnostic about inverted edges, so it would miss things like the following, where
I think this check would therefore be good to add to penman.model.Model.errors(), which works on the graph (link to code). Something like this: + triple_counts = Counter(graph.triples)
g: Dict[Variable, List[BasicTriple]] = {}
for triple in graph.triples:
+ if triple_counts[triple] > 1:
+ err[triple].append('duplicated') |
I admit not to have delved too much in the code :-), but that looks OK to me if the user can retrieve this error somehow before the encoding the triple list. I also thought of case where the triple list comes from somewhere else, so no epigraph or whatsoever is present |
I found an (possible) issue with doubled instances (e.g. twice
n / name
instead a simplen
in the second case) in a penman notation. Parsing the following graphwith
returns
with
('n', ':instance', 'name')
twiceIf I want to re-encode this triples into penman with
pm = penman.encode(penman.Graph(tr)); print(pm)
I get the incorrect penmanI wonder where the bug is. I agree that the initial penman is at least suboptimal since
n / name
occurs twice ant should have onlyn
in the second place. sincen
's concept isname
in both cases that sould not be an fatal error.Or is it the fact that
('n', ':instance', 'name')
occurs twice which creates an incorrect penman.So if the initial penman graph is acceptable, than there is clearly a problem in the
decode
/encode
methods ?The text was updated successfully, but these errors were encountered: