-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data Liberation] Expose experimental Markdown importer in the importWxr step #2080
Conversation
This PR needs to be split into smaller parts before merging. For sure the new vendor libraries will become a separate PR. Epub and HTML importers probably, too. |
Adds a forked version of the markdown parsing libraries required by the upcoming Markdown importer. We need out own fork for PHP 7.2 compatibility. The downgrade process was performed semi-automatically via Rector. This PR adds the following libraries: * `league/commonmark` * `webuni/front-matter` There are no testing steps here. This PR only adds new code without modifying the existing one. A part of #2080
Adds a forked version of the markdown parsing libraries required by the upcoming Markdown importer. We need out own fork for PHP 7.2 compatibility. The downgrade process was performed semi-automatically via Rector. This PR adds the following libraries: * `league/commonmark` * `webuni/front-matter` There are no testing steps here. This PR only adds new code without modifying the existing one. A part of: * #2080 * #1894
…Wxr step 🚧 Work in progress, don't merge 🚧 Enables importing markdown files via the `importWxr` step (to be renamed) when the data-liberation importer is enabled. Here's the Blueprint you can use to import the "data basics" tutorial from the Gutenberg repo: ```json { "$schema": "https://playground.wordpress.net/blueprint-schema.json", "landingPage": "/adding-a-delete-button/", "features": { "networking": true }, "steps": [ { "step": "resetData" }, { "step": "importWxr", "importer": "data-liberation", "phpImporterOptions": { "data_source": "markdown_directory", "source_site_url": "https://raw.githubusercontent.com/WordPress/gutenberg/HEAD/docs/how-to-guides/data-basics" }, "importData": { "resource": "git:directory", "url": "https://github.com/WordPress/gutenberg.git", "ref": "HEAD", "path": "docs/how-to-guides/data-basics" } } ] } ``` ## Remaining work * Confirm the WXR import still works both for the regular importer and the data liberation one * Add E2E coverage * Rewrite relative markdown URLs * Enable specifying additional URL mappings directly in the Blueprint * Review the code and make any architectural adjustments necessary
…zed WP_Markdown_Directory_Tree_Reader
Builds data-liberation-markdown.phar.gz (200KB) to enable downloading the Markdown importer only when needed instead of on every page load. A part of: * #2080 * #1894 ## Testing instructions Run `nx build playground-data-liberation-markdown`, confirm it finished without errors. A smoke test of the built phar file is included in the build command.
Builds data-liberation-markdown.phar.gz (200KB) to enable downloading the Markdown importer only when needed instead of on every page load. A part of: * #2080 * #1894 ## Testing instructions Run `nx build playground-data-liberation-markdown`, confirm it finished without errors. A smoke test of the built phar file is included in the build command.
f522d40
to
4a31689
Compare
I'm going to close this PR. I've reorganized it as a series of smaller ones that we can discuss granularly:
After all the API changes, I'm no longer sure setting up the importer in |
Sets the stage for the EPub importer. A part of #2080 Refactors and clean up the Data Liberation package. This includes renaming, reorganizing file paths, improving class structure, and removing deprecated/unused code. ## Key Changes **Refactor:** - Renamed `WP_WXR_Reader` to `WP_WXR_Entity_Reader` for consistency and clarity. - Adjusted references in related classes, tests, and imports. - Moved `byte-readers` to the Blueprints library (see WordPress/blueprints-library#121) **Cleanup:** - Deleted unused and redundant byte reader classes (`WP_Byte_Reader`, `WP_File_Reader`, etc.). - Removed legacy files such as `WXR_Import_Info`. **New Additions:** - Added `WP_Directory_Tree_Entity_Reader` to improve handling of directory tree imports. - Introduced `WP_Import_HTML_Processor` for better HTML import functionality. ## Testing instructions Confirm the CI tests passed
🚧 Work in progress, don't merge 🚧
Enables importing markdown and epub files via the
importWxr
step (to be renamed) when the data-liberation importer is enabled.CleanShot.2024-12-13.at.21.17.10.mp4
Here's the Blueprint you can use to import the "data basics" tutorial from the Gutenberg repo:
Requires WordPress/blueprints-library#121
Other code examples
Combining the new importers APIs is getting ridiculous. Here’s two entity readers:
We can mix&match data sources (local filesystem, remote), formats (e.g. md, xhtml, wxr), and containes (plain, .zip, git in the future)
Remaining work