Markdown’ing and open sourcing our documentation

We will be converting all our documentation source text to markdown files, version controlled with Git, and published as open source on GitHub. And we will build custom compilers to convert the markdown to .chm (help files included with software) and to HTML (for publishing on our web-sites).

Others have certainly done this before us. For example, Microsoft has an “Edit” link on each article hosted on http://docs.microsoft.com - linking to the corresponding GitHub repository and file. And a number of popular tools like Sphinx, Read the Docs and GitBook exist for similar purposes (see References section at bottom).

However, as I was researching this, I didn’t find anyone explaining WHY this is a good idea. The benefits might seem obvious. But they weren’t to me at first, which is why I set out to explore this.

Background

The documentation for our products is currently written in different tools and originates from different source formats:

Why markdown?

The HTML format (as currently used by some of our documentation) is pretty universal of course, but unfortunately various WYSIWYG editors, converters, etc. tend to insert a lot of extra formatting and other unwanted stuff. This makes it difficult to later change systems - for example because site CSS styles are overridden by inline style information, and line feeds and paragraphs use different HTML codes behind the scenes, etc.

Markdown (https://en.wikipedia.org/wiki/Markdown) doesn’t have these issues simply because it has very limited formatting options.

Markdown is also easier to write directly as source, and many great tools exist for this. A few of my favorites are “Visual Studio Code”, DropBox Paper (basically a WYSIWYG for markdown, with a markdown export function), and StackEdit (https://stackedit.io).

Because markdown is so simple, it is easy to convert into other things - future proofing the solution.

And since markdown is plain text, it works very well with Git (see below), and it is the default text format on GitHub (see further below).

Why files?

It might seem old fashioned to store documents in local files rather than some cloud content management system, but there are some nice benefits to this:

Why Git?

We already use Git (https://en.wikipedia.org/wiki/Git) for source code control. It will be nice to have the same benefits (branching, go back to previous commit, version comparison, etc.) for our documentation.

And of course Git makes it easy for multiple authors to collaborate and work on the same stuff at the same time.

Why open source?

Publishing the documentation as open source (https://en.wikipedia.org/wiki/Open-source_model) on GitHub (or similar), allows anyone to easily contribute in a well established manner (fork, branch, pull request, etc.).

I have no illusion that strangers will suddenly re-write all our documentation just because we open source it, but on the other hand, I see no reason not to do this. And if, for whatever reason, someone does decide to lend us a hand - that would just be fantastic.

Why GitHub?

GitHub (https://github.com) just happens to be where we currently host our source code (open and closed). It is also where most open source (code and docs) lives, and therefore might give us a bit more exposure than alternatives (not sure about this).

GitHub provides a nice interface for editing markdown files directly, so contributors could do everything right there on the GitHub web-site.

However BitBucket, GitLab, etc. would probably fit the bill too.

Why proprietary compiler?

I know that tools already exist for this (see References section below), but I want something very simple that will do exactly what I want - nothing more, nothing less. For me, this means writing my own little program.

Another reason is that we will need to support a mix of markdown and HTML source files for a while (until all HTML is converted to markdown). So the compiler needs to handle both source formats. Output to .chm help files (or files that can be compiled to this) is not standard either.

The hard part is converting markdown to HTML, but luckily very good .NET components already exist for that (being a VB / C# developer).

I have used CommonMark.NET (https://github.com/Knagis/CommonMark.NET) with great success in the past, but unfortunately this does not support tables. So this time I will be using Markdig (https://github.com/lunet-io/markdig).

Markdig supports a number of markdown extensions (including two types of tables), and lets me turn each extension on/off as needed. For example - I don’t want auto links from bare URLs - which I will turn off.

I don’t need any fancy templating since the resulting HTML will be integrated into our existing web-site design.

The rest is mostly trivial file handling and URL rewriting.

Moving web content from SQL to static files

As part of this effort, some content (Simple DNS Plus plug-in documentation, news articles, and release-notes) will be moved from SQL database records to static content files (HTML pages and JSON based indexes) - all of which will be rebuilt whenever something is updated.

This actually makes perfect sense. Most of this data never changes - so why waste SQL queries for every web request. Static content really should be stored as such.

For example - our RSS/Atom feeds are rebuilt from SQL on every web request. It really only needs to be rebuilt whenever a new story is added.

Freeing this content from SQL server also makes the web-sites easier to migrate to whichever new technologies and hosting platforms we might consider in the future.

Other benefits

Alternatives considered

Many software companies settle on one of the big support portal providers - like ZenDesk, FreshDesk, SmarterTrack, etc. - putting all their documentation into the KB that comes with their chosen system.

This is indeed very appealing in the context of “lets focus on developing our own product and pay someone else for this stuff”. We have been using SmarterTrack ourselves for many years for exactly this reason.

However, those systems all have some significant hidden costs:

I did have (another) look at both ZenDesk and FreshDesk, but they both suffer from the issues mentioned above.

I also considered DocFX, Sphinx, Jekyll, Read the Docs, and GitBook (see References at bottom), but they all seem to be overkill and/or still need a lot of customization.

The good thing is that if we do decide on another solution at some point, it will be super easy to migrate.

Why now

I have been wanting to overhaul the knowledge bases and the help files for some time.

And as I have been working more and more with Git and markdown lately (in other contexts) it just suddenly made sense to do this.

References

Comments

Jesper Høy's
Dev Blog

  • Home (blog posts)
  • About me and this website
  • My developer tech stack
  • My favorite software
  • My favorite online services
  • Cool stuff
  • My side projects
  • Our wonderful ice horses