The superfluous CodeDown manual

CodeDown is a simple, but very univeral converter between code and documentation texts, based on Markdown as the central format. codedown is an application and command that performs all these conversions on demand.

Code and Documentation

Any non-trivial and serious program nowadays comprises both: some code and its documentation. And because both kind of texts are so closely related, they are often put into the same source text. This is usually done in one of two fashions:

codedown is a universal document generator and code extractor

The choice in each case is achieved by a specification of the source and target format. For example,

where in all cases "..." stands for more specifications, such as the input and ouput files and other options.

CoreCodeDown and Pandoc

The back and forth conversions between code and Markdown is done by the CoreCodeDown module, all conversions between Markdown and any other document format is performed by the Pandoc module.

Pandoc is a truely universal document converter, implemented by John MacFarlaine as an independent Haskell package. We strongly recommend the installation and use along with the CodeDown package. Pandoc has its own command line application pandoc with a very rich set of options. The syntax of the codedown command is designed very closely to the pandoc syntax¹, so that both programs may support each other. ²

A more detailed example

Suppose, you have a C source file Sample.c, enriched with CodeDown documentation, an you want turn that into a nice HTML document Sample.html. Besides, you want this document to be a full HTML document (with <head> and everything), so you use the Pandoc option -s (which is short for --standalone). Also, there should be a comfortable table of contents (Pandoc option: --toc or --table-of-contents) and a nice CSS stylesheet integration (Pandoc option: --css=CodeDown.css or -c CodeDown.css).

(Note the biggest difference between codedown and pandoc calls: codedown has an --input option, but for pandoc the input files are attached to the other options.)

Overview

Markdown

Markdown was originally designed as a way to ease the generation and comprehension of HTML source code. But meanwhile, there are a couple of Markdown extensions and implementations (including Pandoc) that suggest Markdown as a default authoring format for documents in general.

Markdown syntax, part 1

A typical example

Suppose, you want to write some little document, which looks nicely rendered as follows (note, that this is only an image, the links are not functional):

If you create this document in HTML file Sample.html, the content would then be something like this:

But you may as well create the following file Sample.markdown, which is much more conveniently written and easier to read:

and then generate the HTML file from the Markdown file with the original Perl executable

In fact, the style for the document image above was achieved by inserting our default CSS stylesheet CodeDown.css with an additional option --css=CodeDown.css to either the pandoc or the codedown call above.

Markdown syntax, part 2

There are three more Markdown syntax rules, that will be particularly important for the CodeDown conversions later on:

Markdown for program documentations

And in particular: Markdown is also a great format for the documentation of programming source code!

If you ever have to write a manual for some program or application, this is a very convenient format. It is very easy to read and write, especially the just mentioned syntax for inline code and code blocks is very efficient and intuitive. The huge amount of Markdown converter implementations, including some online tools, makes it ubiquitously available. And they not only convert to HTML, but to any documentation format you could possibly whish for: groff man pages, PDF, RTF, LaTeX, DocBook XML, you name it. Besides, it is even very readable in its own text style.

By the way, this very document CodeDownManual.html was originally written in Markdown and then converted to HTML.⁵ The source text CodeDownManual.markdown should thus be a good example for the ease and beauty of the Markdown syntax (in the extended Pandoc version).

CoreCodeDown conversions

CoreCodeDown describes the conversions between code of an arbitrary programming language on one hand and Markdown on the other hand.

The rules for these conversions are very universal. CodeDown is a document generator and code extractor for virtually any programming language. In this sense, it is not only a document generator, but a true "document generator generator". And yet, these rules are very simple and just a variation of the same three principles, for every type of code.

Comments

All main stream programming languages (and this even includes other specific formal languages like SQL, HTML, XML and TeX/LaTeX) allow the insertion of comments into the source code. These are text parts, that are ignored by the machine, but provide information for Human readers and users. The syntax for comments always works according to at least one of the following two conventions:

Every modern programming language provides at least one of the following kinds of comments. Some only have line comments, such as Scheme, Bash scripts or Perl.⁶ Others only know block comments, such as SML and SQL. And languages like C and Haskell have both. ⁷

The core conversion rules

CodeDown modifies the native comments of a given code language, so that there are special dedicated parts defined in the source code:

Note, that all special CodeDown comment symbols (e.g. in case of C, these are: "// //", "/***" "***/", "///BEGIN///" and "///END///") have to be placed at the beginning of a line.

And again, you should not write anything after the block delimiters (here: "/***", "***/", "///BEGIN///" and "///END///"). ⁹

An example in C code

Suppose, we have a C source code file HelloWorld.c with the following content:

A C-to-Markdown conversion into a Markdown file HelloWorld.c.markdown would generate the following content:

Digression: a further conversion of the example into HTML

Although this is just not part of the CoreCodeDown conversion, let us show how the Markdown further converts into HTML.

A Markdown-to-HTML conversion and an according HTML file HelloWorld.c.html is generated with the Pandoc command

(Note the only real difference between a pandoc and a codedown call: pandoc has no --input option, the input file is the last argument of the call.)

Of course, we can also generate the HTML file from the C source file right away with a call of

Anyway, the HTML file HelloWorld.c.html has the following content (the HTML text is shifted on many places, in order to make the structure a little more readable):

In a standard browser (and when the HTML document was generated with the additional --css=CodeDown.css option), this looks as follows:

The --help=CODE option

We just explained the core CodeDown syntax rules for the C programming language. You can recall these rules any time with either one of the following commands:

This is the entire summary of all the CodeDown rules for the C programming language.

will display a similar overview for any CODE, such as c, php, etc. (The CODE value is case-insensitive, i.e. you can also write C, PHP, etc.)

Another example: JavaScript

Suppose you would like to enrich JavaScript source code with CodeDown comments to generate a nice document. To see the CodeDown rules for JavaScript, just call

JavaScript (like Java, PHP and others) has C-like syntax constructions, including the line and block comments. Therefore, JavaScript has Markdown document lines, Markdown document blocks, and literal code blocks similar to the ones in C.

Yet another example: Scheme

Scheme only has line comments (after the semicolon ";"). It therefore only has Markdown document lines, but no Markdown documnent blocks. To find out about the CodeDown rules, type

A final example: SML

Standard ML (SML) has block, but no line comments. Accordingly, we only have Markdown document blocks and literal code blocks available. A call of

Document formats (LaTeX, HTML and XML) as code languages

Some textual document formats like LaTeX, HTML or XML also provide comments. So from the CodeDown point of view, these formats can be treated like any other code language. However, codedown is always thinking of them as document formats in --from and --to options. So, in case you want them to be treated as code instead, you add a (case-insensitive) _code suffix to its name and write LaTeX_Code, HTML_CODE or xml_code, respectively.

we generate the following Markdown text (on the standard output, since no other --output is specified)

Code extraction with CodeDown

The same CoreCodeDown rules that explain the conversion from code to Markdown also work the other way round. We have both, document generation and code extraction

where CODE is say C (or JavaScript etc.), and the result is code in that language. All Markdown now appears in comments and the literal code blocks recovered as C code of the form

This makes source code and documentation formats equal and interchangeable environments for programmers at any point.

Example code extraction from Markdown to C code

Plain code extraction and plain document generation

There is also a plain code extractor, which only extract the literal code blocks from a document and omits all other text. This is done with the code value for the --to option.

Similarly, there is also a plain document generator, which takes any code file and returns this code in one big literal Markdown code block, and this is done with the --from option set to the value code.

This plain document generator can e.g. be useful if an entire source file needs to be printed and a nice CodeDown layout is preferred over a plain monospace font.¹⁰

The codedown user guide

We explain the syntax and options of the codedown executable in some detail. A short summary can be obtained at any time by calling the --help option without a value, i.e.

Formats

CodeDown is universal converter between different text formats, and there are three types of formats:

The names of all these formats need to be specified in the source (--from or --read) and target (--to or --write) options of the codedown command. These name values are case-insensitive. For example, LaTeX, latex and LATEX are equally possible.

Syntax of the codedown call

Syntax for options

Each OPTION is a combination of a key and possible values. Each OPTION has has a long form

and often also an equivalent short version, where the key K stands for just one letter

where each of these options has a short version and can thus be replaced by ¹³

The codedown options

Appendix: Installation

This is a easy and convenient installation of a whole Haskell infrastructure on your system. You don't need to understand Haskell at all in order to use pandoc and codedown, but your system does. The Platform also comes with a cabal (Common Architecture for Building Appliciations and Libraries) command that enables easy installation from thousands of Haskell packages from the HaskellDB.

That simple call fetches the latest release from the huge HackageDB repository. You can check if that worked, for example by calling

The only main difference between is that codedown has an --input=File1,...,FileN option, which makes it overall syntax (codedown Option1 ... OptionM) a bit more elegant, whereas in pandoc the input files are attached (syntax: pandoc Option ... OptionM File1 ... FileN). ↩
In a certain sense, codedown is more powerful than pandoc, because it can also be called as a document-to-document converter (e.g. codedown --from=HTML --to=LaTeX ...), in which case the pandoc command is invoked behind the scenes with the same options. But we don't recommend the use of codedown as a document-to-document converter and stipulate the use of pandoc instead. ↩
The syntax of the codedown command is very similar to the pandoc command syntax. There is one big difference, however, namely the --input option, which does not exist for pandoc. There, the input files are added at the end of the call, as the example shows. ↩
There is yet another version for code blocks in Markdown, but only in the extended Markdown version of Pandoc, namely delimited code blocks between tilde-lines, with an option to use syntax highlighting for many types of code. You can use that, too, but the official version of CodeDown does not mention this explicitly. ↩

The conversion was done with the command

codedown --from=markdown --to=html --input CodeDownManual.markdown --output=CodeDownManual.html \
         --table-of-contents --standalone --css=CodeDown.css

and that has the same effect as

pandoc --from=markdown --to=html --output=CodeDownManual --table-of-contents --standalone \
       --css=CodeDown.css CodeDownManual.markdown

↩

In this context, Perl, Python and Ruby are considered languages that only have line comments, because their block comments use a special markup for their own document converters. ↩
In the implementation of the general document generators in the CoreCodeDown.hs module we say that a code language is of type 1, if it has a line, but no block comment. If it is the other way round, we call it a type 2 code language. If it has both, line and block comments, it is of type 3. For example, Scheme and Bash are type 1, SML and SQL are type 2, and C and (Common) Lisp are type 3. ↩
You may wonder, why a literal code block, say in C
```
///BEGIN///
... a line of C code ...
... another line of C code ...
///END///
```
turns into a code block inside of a block quote
```
>     ... a line of C code ...
>     ... another line of C code ...
```
instead of the more simple code block
```
    ... a line of C code ....
    ... another line of C code ...
```
In terms of HTML this means that the result is
```
<blockquote><pre><code>
... a line of C code ...
... another line of C code ...
</code></pre></blockquote>
```
instead of
```
<pre><code>
... a line of C code ...
... another line of C code ...
</code></pre>
```
So, why does CodeDown choose the more complicated version?

Well, in earlier version of CodeDown (e.g. ElephantMark), the simpler version was the rule for literal code blocks, indeed. But when source code is annotated with documentations, one would often use standard Markdown code blocks for examples or input-output dialogs, and then it is nice to have a custom layout distinction between these annotations and literal code blocks. For the document generation of HTML, this can be very nicely done with a CSS stylesheet, that defines custom layouts for the according combinations of the <code>, <pre> and <blockquote> tags.

In the browser view of the example of C code the literal block appears with a border around the grey block, and ordinary code blocks have only a grey background.

By the way, in Haskell, this choice has another advantage, namely that these literal code blocks comply with the "Bird tracks" syntax for literate comments. ↩
As mentioned in the core conversion rules, we recommend not to add anything after the delimiters for CodeDown blocks, and use an entire line instead. The first reason for this rule is its simplicity. Another reason is that writing code after literal block delimiters has different effects for different code languages.

In case the code language has line comments, the literal block delimiters are by default chosen to be line comments, too, and that would thus allow to add comments. For example, the C code of the form
```
...
///BEGIN///
... first line of C code ...
... second line of C code ...
///END///
...
```
could be modified to
```
...
///BEGIN/// Now comes my precious piece of code:
... first line of C code ...
... second line of C code ...
///END/// So far for my precious piece of code.
...
```
and the C-to-Markdown conversion would still produce exactly the same result.

But if the code language doesn't have line comments, the literal block delimiters have to be block comments, and anything after the delimiters must be proper code. For example, SML only has block comments (between "(*" and "*)") and its literal block delimiters are "(***BEGIN***)" and "(***END***)". If we add any text after the "(***BEGIN***)" on the same line, this text would have to be code, different to say C.

So, in order to use general conventions for all code languages alike:

Don't write anything after CodeDown block delimiters!

↩

The according call is e.g.

codedown --from=code --to=html --input=any.code --output=any.code.html --css=CodeDown.css

↩

PDF output is generated via LaTeX and is supported with the markdown2pdf wrapper, included in the Pandoc installation. By using codedown, all this is done automatically. For example, calling codedown -f markdown -t pdf -i example.markdown -o example.pdf should work just fine. ↩
To be precise, the order of the options in a codedown call is not entirely arbitrary, namely in case you specify the same option several times. But this is never intended and average users will avoid doing that, anyway. ↩
As it is common for one-letter UNIX command options without values, these one-letter flags can be condensed into a single one. For example, in UNIX, a call of ls -A -l -r -R -S is equivalent to ls -AlrRS. This works in CodeDown and Pandoc, too, but the time and space to mention this is probably not worth the time that can be saved when using these abbreviations. ↩