Markdown’ing and open sourcing our documentation

1 November 2017

We will be converting all our documentation source text to markdown files, version controlled with Git, and published as open source on GitHub. And we will build custom compilers to convert the markdown to .chm (help files included with software) and to HTML (for publishing on our web-sites).

Others have certainly done this before us. For example, Microsoft has an “Edit” link on each article hosted on http://docs.microsoft.com - linking to the corresponding GitHub repository and file. And a number of popular tools like Sphinx, Read the Docs and GitBook exist for similar purposes (see References section at bottom).

However, as I was researching this, I didn’t find anyone explaining WHY this is a good idea. The benefits might seem obvious. But they weren’t to me at first, which is why I set out to explore this.

Background

The documentation for our products is currently written in different tools and originates from different source formats:

Help files (.chm) included with software Written in and built using Help & Manual (http://www.helpandmanual.com). Source format is a proprietary XML format.
Knowledge base (web) Written in and published through a web interface in SmarterTrack (http://smartertrack.com). Source format is HTML.
Simple DNS Plus plug-in documentation (web), news articles (including RSS feeds), and release-notes. Written in an a simple home-grown web-forms system backed by SQL server. Source format is HTML.

Why markdown?

The HTML format (as currently used by some of our documentation) is pretty universal of course, but unfortunately various WYSIWYG editors, converters, etc. tend to insert a lot of extra formatting and other unwanted stuff. This makes it difficult to later change systems - for example because site CSS styles are overridden by inline style information, and line feeds and paragraphs use different HTML codes behind the scenes, etc.

Markdown (https://en.wikipedia.org/wiki/Markdown) doesn’t have these issues simply because it has very limited formatting options.

Markdown is also easier to write directly as source, and many great tools exist for this. A few of my favorites are “Visual Studio Code”, DropBox Paper (basically a WYSIWYG for markdown, with a markdown export function), and StackEdit (https://stackedit.io).

Because markdown is so simple, it is easy to convert into other things - future proofing the solution.

And since markdown is plain text, it works very well with Git (see below), and it is the default text format on GitHub (see further below).

Why files?

It might seem old fashioned to store documents in local files rather than some cloud content management system, but there are some nice benefits to this:

Local files can be accessed and edited off-line. So you can work on a plane or anywhere with poor connectivity.
Easy to search / replace across all articles (using Visual Studio and other tools).
Easy to write scripts to do maintenance across all articles - for example check / update links etc.
Write and maintain both program code and documentation using the same tools (Visual Studio, SourceTree, etc.).
Works with Git (see below).

Why Git?

We already use Git (https://en.wikipedia.org/wiki/Git) for source code control. It will be nice to have the same benefits (branching, go back to previous commit, version comparison, etc.) for our documentation.

And of course Git makes it easy for multiple authors to collaborate and work on the same stuff at the same time.

Why open source?

Publishing the documentation as open source (https://en.wikipedia.org/wiki/Open-source_model) on GitHub (or similar), allows anyone to easily contribute in a well established manner (fork, branch, pull request, etc.).

I have no illusion that strangers will suddenly re-write all our documentation just because we open source it, but on the other hand, I see no reason not to do this. And if, for whatever reason, someone does decide to lend us a hand - that would just be fantastic.

Why GitHub?

GitHub (https://github.com) just happens to be where we currently host our source code (open and closed). It is also where most open source (code and docs) lives, and therefore might give us a bit more exposure than alternatives (not sure about this).

GitHub provides a nice interface for editing markdown files directly, so contributors could do everything right there on the GitHub web-site.

However BitBucket, GitLab, etc. would probably fit the bill too.

Why proprietary compiler?

I know that tools already exist for this (see References section below), but I want something very simple that will do exactly what I want - nothing more, nothing less. For me, this means writing my own little program.

Another reason is that we will need to support a mix of markdown and HTML source files for a while (until all HTML is converted to markdown). So the compiler needs to handle both source formats. Output to .chm help files (or files that can be compiled to this) is not standard either.

The hard part is converting markdown to HTML, but luckily very good .NET components already exist for that (being a VB / C# developer).

I have used CommonMark.NET (https://github.com/Knagis/CommonMark.NET) with great success in the past, but unfortunately this does not support tables. So this time I will be using Markdig (https://github.com/lunet-io/markdig).

Markdig supports a number of markdown extensions (including two types of tables), and lets me turn each extension on/off as needed. For example - I don’t want auto links from bare URLs - which I will turn off.

I don’t need any fancy templating since the resulting HTML will be integrated into our existing web-site design.

The rest is mostly trivial file handling and URL rewriting.

Moving web content from SQL to static files

As part of this effort, some content (Simple DNS Plus plug-in documentation, news articles, and release-notes) will be moved from SQL database records to static content files (HTML pages and JSON based indexes) - all of which will be rebuilt whenever something is updated.

This actually makes perfect sense. Most of this data never changes - so why waste SQL queries for every web request. Static content really should be stored as such.

For example - our RSS/Atom feeds are rebuilt from SQL on every web request. It really only needs to be rebuilt whenever a new story is added.

Freeing this content from SQL server also makes the web-sites easier to migrate to whichever new technologies and hosting platforms we might consider in the future.

Other benefits

All documentation source texts will be in the same format, in the same place, and accessible with the same tools.
Independence of technologies (SQL, proprietary formats, writing tools, etc.) and platforms (can easily edit the markdown files on any device).
Image files follow source text files (in Git repository). In other systems, images are often stored separately from the text.
Search - by moving the KB content from SmarterTrack into our own web-site, we can setup a central search function that covers all our documentation.
Commenting - by moving the KB content from SmarterTrack into our own web-site, we can use a single commenting system (Disqus) - so our users won’t need multiple logins for that.
The whole setup is just much simpler and therefore easier to adapt to future requirements.

Alternatives considered

Many software companies settle on one of the big support portal providers - like ZenDesk, FreshDesk, SmarterTrack, etc. - putting all their documentation into the KB that comes with their chosen system.

This is indeed very appealing in the context of “lets focus on developing our own product and pay someone else for this stuff”. We have been using SmarterTrack ourselves for many years for exactly this reason.

However, those systems all have some significant hidden costs:

Vendor lock in:
- The more information you put into the KB, the harder it will be to move somewhere else. The portal providers have no interest in making this easy. I had to resort to screen scraping to get our KB articles out of SmarterTrack.
- User posts and comments (which over time become an important part of your documentation) are even harder to move.
- User logins (your users) belong to the portal. No way to move users since passwords are (and should be) hashed.
Although they present you with an impressive list of overall features, the KB part of these systems is actually very basic and generic, and allows very little customization for special needs.
No search / replace across all articles.
Your KB content is disconnected from any other systems you might have.
Close to impossible to synchronize online and offline documentation. An included help file is still important, at least for installable PC software.
The article editing interface typically consists of some WYSIWYG HTML editor control. Ever had a look at the resulting HTML code - especially after you copy/paste something into one of those?
Forget markdown, versioning, and source control of any kind.

I did have (another) look at both ZenDesk and FreshDesk, but they both suffer from the issues mentioned above.

I also considered DocFX, Sphinx, Jekyll, Read the Docs, and GitBook (see References at bottom), but they all seem to be overkill and/or still need a lot of customization.

The good thing is that if we do decide on another solution at some point, it will be super easy to migrate.

Why now

I have been wanting to overhaul the knowledge bases and the help files for some time.

And as I have been working more and more with Git and markdown lately (in other contexts) it just suddenly made sense to do this.

References

Sphinx - http://www.sphinx-doc.org
Read the Docs - https://readthedocs.org
GitBook - https://www.gitbook.com
DocFX - http://dotnet.github.io/docfx
Jekyll - https://jekyllrb.com
CommonMark.NET - https://github.com/Knagis/CommonMark.NET
Markdig - https://github.com/lunet-io/markdig
https://opensource.com/article/16/12/yearbook-5-trends-open-source-documentation
https://docs.microsoft.com/en-us/teamblog/introducing-docs-microsoft-com

Comments