-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LO fails to load document after saving with odftoolkit due to invalid UTF-16 entities #137
Comments
how is this library actually used? i can only find the file odfdom/src/main/java/org/odftoolkit/odfdom/IElementWriter.java which defines an interface but this interface appears to be unused.... probably i'm missing something. |
This library is a replacement for xalan:serializer. The xalan serializer is used to serialize back to XML, and this is what causes my problem. |
I also ran into this issue when trying to use the library to export user generated content. User generated content often contains Unicode emojis ("🙂") which trigger this incorrect behavior leading to broken docs. |
Apache Xalan-Java did a 2.7.3 release in April: https://xalan.apache.org/xalan-j/readme.html#notes_latest If this problem still exist, I would suggest you address this issue to the Apache Xalan developers: Please note, they still seem to use SVN, but have a GitHub Mirror, which is just read-only. Good luck! |
Thanks for the reply, @svanteschubert! I've tried overriding the Xalan dependency with 2.7.3 but unfortunately the latest version doesn't fix this issue. For now, I've replaced the dependency with the fork by docx4j which fixes it. Three related issues around this already exist in their tracker and are marked as major bugs, the oldest one has been reported 15 years ago. Looking at the SVN/Git history, it seems like the project has been completely unmaintained for nearly a decade. But since last year, there has been some activity. So I'm slightly hopeful that they will pickup the existing fixes in the near future. |
@dgerhardt Hi Daniel, I suggest to write to the Apache Xalan Dev List and list and tell them about the problem and the solution. The more you are able lower the bar of release (their work), the likelier it gets for them to fix it. For instance, the docx4j fork has a solution, you might point to it! Or try to motivate them to overtake that task! :-) Godspeed, Daniel! |
Replacing xalan with fork to avoid document maformation when unicode emojis used in content tdf#137
Xalan contains a nasty bug that produces incorrect XML entities in the output, leading to a corrupt document. E.g. this input
Is changed to this when saving this document with odftoolkit:
More information about the root cause can be found here:
https://issues.apache.org/jira/browse/XALANJ-2419
As it seems unlikely that there will ever be a new Xalan release including a fix for this, one option (and that is what I have been doing now) is to replace the xalan serializer dependency with a known good version, e.g.
I cannot vouch for the integrity of this package but I have verified that it actually fixes the invalid encoding.
The text was updated successfully, but these errors were encountered: