Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to remove things like CodeBlocks from ToPlainText rendering #789

Open
kaylumah opened this issue Apr 5, 2024 · 3 comments
Open

How to remove things like CodeBlocks from ToPlainText rendering #789

kaylumah opened this issue Apr 5, 2024 · 3 comments
Labels

Comments

@kaylumah
Copy link

kaylumah commented Apr 5, 2024

Hi Xoofx,

The repo does not have discussions enabled so I am submitting it here. I apologise in advance if there is a better place to put these kind of questions.

For my blog I am looking into a clean way to count the number of words present in a specific article.

I came across the ToPlainText method for my Markdown. That appears to make it mostly clean text.
However, it leaves in things like the code blocks (my blog is technical, so lots of code snippets).
Is there an extension point I missed, in which I can remove code blocks from the PlainText view?

Any pointers would be appreciated

Thanks for the awesome work you did on both Markdig and Scriban
Max

@xoofx
Copy link
Owner

xoofx commented Apr 5, 2024

Is there an extension point I missed, in which I can remove code blocks from the PlainText view?

Not that I'm aware, but you can just take the Markdown AST, search/remove the code blocks, and call PlainText later.

In my own blog post engine, I do it differently, convert to HTML, and extract the text from there with NUglify here

@xoofx xoofx added the question label Apr 5, 2024
@kaylumah
Copy link
Author

kaylumah commented Apr 5, 2024

I don't see an equivalent ToText as an extension

public static void ToHtml(this MarkdownDocument document, TextWriter writer, MarkdownPipeline? pipeline = null)

So based on

public static MarkdownDocument ToPlainText(string markdown, TextWriter writer, MarkdownPipeline? pipeline = null, MarkdownParserContext? context = null)

I think I need to do something like this

            StringWriter writer = new StringWriter();
            MarkdownDocument document = Markdown.Parse(source, pipeline);
            // todo remove codeblocks from Document.Decendants
            HtmlRenderer renderer = new HtmlRenderer(writer)
            {
                EnableHtmlForBlock = false,
                EnableHtmlForInline = false,
                EnableHtmlEscape = false,
            };
            pipeline.Setup(renderer);

            renderer.Render(document);
            writer.Flush();
            string result = writer.ToString();
            return result;

@BeneHenke
Copy link

You can iterate through your MarkdownDocument and remove blocks like this.

foreach (CodeBlock item in document.Descendants<CodeBlock>()) { document.Remove(item); }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants