Introduction

I recently created a tool to convert Markdown files into PowerPoint format (*.pptx).
I’ll summarize the technical notes from that process here. This article focuses on Markdown parsing.

The created tool is available on GitHub. https://github.com/ayumax/MDToPPTX

Also, for the article about the pptx creation tool itself, please refer to this article.

Development Environment

Considering future integration into a tool using Xamarin, I created the project as a .NET Standard 2.0 C# library.

The overall environment is as follows:

  • Windows 10
  • Visual Studio 2017
  • C#
  • .NET Standard 2.0 Class library

How to Parse Markdown

Initially, I considered parsing it myself using regular expressions or something similar. However, the Markdown format has surprisingly many variations, so I searched for a convenient library and found the following.

markdig

This library is available on GitHub under the BSD-Clause 2 license and can be obtained via NuGet. If you just want to convert Markdown to HTML, you can use it as is. It’s also designed so that you can create your own output part for formats other than HTML.

Standard Usage of markdig

To convert Markdown to HTML, only the following description is needed (excerpt from GitHub’s README.md):

var result = Markdown.ToHtml("This is a text with some *emphasis*");
Console.WriteLine(result);   // prints: <p>This is a text with some <em>emphasis</em></p>

Using markdig as a Parser

My goal this time was to parse Markdown and output a PowerPoint file, so I investigated whether I could use only markdig’s parser functionality and create custom output.

How markdig’s Processing Works

To find out how to perform custom output using markdig, I examined how the preset HtmlRenderer is implemented. The diagram below simplifies this.

markdig rendering part

Tracing the inheritance of HtmlRenderer leads to RendererBase, which defines the Render() method used during conversion and the ObjectRenderers property for handling Markdown blocks and inlines.

Renderer

It’s designed to define a Renderer class for each output file type. The default implementation includes HtmlRenderer for outputting HTML files and NormalizeRenderer for outputting normalized Markdown text.
TextRendererBase in the middle of the class diagram above is a class that bundles the functionality of Renderers that output to text files.

ObjectRenderers

This part defines how the parsed Markdown objects should be output. In the HtmlRenderer constructor, it’s implemented like this:

// Default block renderers
ObjectRenderers.Add(new CodeBlockRenderer());
ObjectRenderers.Add(new ListRenderer());
ObjectRenderers.Add(new HeadingRenderer());
ObjectRenderers.Add(new HtmlBlockRenderer());
ObjectRenderers.Add(new ParagraphRenderer());
ObjectRenderers.Add(new QuoteBlockRenderer());
ObjectRenderers.Add(new ThematicBreakRenderer());

// Default inline renderers
ObjectRenderers.Add(new AutolinkInlineRenderer());
ObjectRenderers.Add(new CodeInlineRenderer());
ObjectRenderers.Add(new DelimiterInlineRenderer());
ObjectRenderers.Add(new EmphasisInlineRenderer());
ObjectRenderers.Add(new LineBreakInlineRenderer());
ObjectRenderers.Add(new HtmlInlineRenderer());
ObjectRenderers.Add(new HtmlEntityInlineRenderer());            
ObjectRenderers.Add(new LinkInlineRenderer());
ObjectRenderers.Add(new LiteralInlineRenderer());

The first half of the definition adds objects that define block-level output, and the second half adds objects with output definitions for inline parts within blocks. Since parsed MarkdownBlock objects contain further Inline objects internally, separating the Renderer into Block and Inline makes implementation easier.

When implementing a custom Renderer, you just need to create and add an ObjectRenderer for your target parse object. When the Renderer class’s Render() method is called, the Write() method of the ObjectRenderer corresponding to the Block object is called. Then, recursively, the Write() method of the ObjectRenderer corresponding to the Inline object is called, producing the output.

ObjectRenderer Creation Example

The first argument of the ObjectRenderer’s Write() method receives a reference to the Renderer class, and the second argument receives a reference to the parsed block.

Methods related to file output are grouped in the Renderer class and called from within the Write() method.

Example 1: HeadingBlock

In the example below, the output options are changed only if the header block level is H1 or H2.

public class HeadingRenderer : PPTXObjectRenderer<HeadingBlock>
{
    protected override void Write(PPTXRenderer renderer, HeadingBlock obj)
    {
        // Change output options for each header level
        var _block = renderer.Options.Normal;
        switch (obj.Level)
        {
            case 1:
                _block = renderer.Options.Header1;
                break;
            case 2:
                _block = renderer.Options.Header2;
                break;
        }

        renderer.PushBlockSetting(_block);

        renderer.StartTextArea();

        renderer.WriteLeafInline(obj);
        renderer.PopBlockSetting();

        renderer.EndTextArea();
    }
}

Example 2: ListBlock

ListBlock provides a list of bullet points, so it loops through the list items and processes the internal objects further.

public class ListRenderer : PPTXObjectRenderer<ListBlock>
{
    protected override void Write(PPTXRenderer renderer, ListBlock listBlock)
    {
        // Apply list writing settings
        renderer.PushBlockSetting(renderer.Options.List);

        renderer.StartTextArea();

        // Loop through list items
        for (var i = 0; i < listBlock.Count; i++)
        {
            var item = listBlock[i];
            var listItem = (ListItemBlock)item;

            renderer.AddTextRow(new PPTXText()
            {
                // Set whether it's a numbered or symbol bullet point
                Bullet = listBlock.IsOrdered ? PPTXBullet.Number : PPTXBullet.Circle
            });

            // Process the content of one list item
            renderer.WriteChildren(listItem);

            renderer.WriteReturn();
        }

        renderer.EndTextArea();

        // Restore list writing settings
        renderer.PopBlockSetting();
    }

}

Example 3: CodeInline

Unlike the previous two examples, CodeInline is an Inline, not a Block. Therefore, the key point is to be aware that you are processing part of a sentence (it feels like you need to avoid line breaks within the InlineRenderer’s Write).

public class CodeInlineRenderer : PPTXObjectRenderer<CodeInline>
{
   protected override void Write(PPTXRenderer renderer, CodeInline obj)
   {
        // Change font settings only for the inline code part
        renderer.PushInlineSetting(renderer.Options.InlineCode);

        // Write the inline code part
        renderer.Write(obj.Content);

        // Restore font settings
        renderer.PopBlockSetting();
    }
}

Extensions

When I first created it and tested the operation, table notation and strikethrough notation were not parsed. I thought they might not be supported, but they were handled as extensions.

I was able to parse with table notation and text decoration symbol extensions enabled using the following description:

var pipeline = new MarkdownPipelineBuilder()
                // Enable table extension
                .UsePipeTables()
                // Enable text decoration symbol extension
                .UseEmphasisExtras()
                .Build();

var document = Markdig.Markdown.Parse(markdown, pipeline);
  • Using UseAdvancedExtensions() seems to allow parsing with all extensions enabled.

Summary

By implementing using markdig, I could focus my thoughts on how to format the output for each Block and Inline unit, without worrying about parsing the Markdown string or complex nested object structures, which significantly reduced implementation time.

If I need to support more Blocks in the future, it seems I can do so with minimal impact on the current implementation.

Although my implementation this time was to output a PowerPoint file, I believe the content of this article can be applied in various other ways.