This is an implementation of John Gruber's markdown in C. It uses a parsing expression grammar (PEG) to define the syntax. This should allow easy modification and extension. It currently supports output in HTML, LaTeX, ODF, or groff_mm formats, and adding new formats is relatively easy.
It is pretty fast. A 179K text file that takes 5.7 seconds for Markdown.pl (v. 1.0.1) to parse takes less than 0.2 seconds for this markdown. It does, however, use a lot of memory (up to 4M of heap space while parsing the 179K file, and up to 80K for a 4K file). (Note that the memory leaks in earlier versions of this program have now been plugged.)
Both a library and a standalone program are provided.
peg-markdown is written and maintained by John MacFarlane (jgm on github), with significant contributions by Ryan Tomayko (rtomayko). It is released under both the GPL and the MIT license; see LICENSE for details.
This program is written in portable ANSI C. It requires glib2. Most *nix systems will have this installed already. The build system requires GNU make.
The other required dependency, Ian Piumarta's peg/leg PEG parser generator, is included in the source directory. It will be built automatically. (However, it is not as portable as peg-markdown itself, and seems to require gcc.)
To make the 'markdown' executable:
make
(Or, on some systems, gmake
.) Then, for usage instructions:
./markdown --help
To run John Gruber's Markdown 1.0.3 test suite:
make test
The test suite will fail on one of the list tests. Here's why.
Markdown.pl encloses "item one" in the following list in <p>
tags:
1. item one
* subitem
* subitem
2. item two
3. item three
peg-markdown does not enclose "item one" in <p>
tags unless it has a
following blank line. This is consistent with the official markdown
syntax description, and lets the author of the document choose whether
<p>
tags are desired.
Prerequisites:
-
Linux system with MinGW cross compiler For Ubuntu:
sudo apt-get install mingw32
-
Windows glib-2.0 binary & development files. Unzip files into cross-compiler directory tree (e.g.,
/usr/i586-mingw32msvc
).
Steps:
-
Create the markdown parser using Linux-compiled
leg
from peg-0.1.4:./peg-0.1.4/leg markdown_parser.leg >markdown_parser.c
(Note: The same thing could be accomplished by cross-compiling leg, executing it on Windows, and copying the resulting C file to the Linux cross-compiler host.)
-
Run the cross compiler with include flag for the Windows glib-2.0 headers: for example,
/usr/bin/i586-mingw32msvc-cc -c \ -I/usr/i586-mingw32msvc/include/glib-2.0 \ -I/usr/i586-mingw32msvc/lib/glib-2.0/include -Wall -O3 -ansi markdown*.c
-
Link against Windows glib-2.0 headers: for example,
/usr/bin/i586-mingw32msvc-cc markdown*.o \ -Wl,-L/usr/i586-mingw32msvc/lib/glib,--dy,--warn-unresolved-symbols,-lglib-2.0 \ -o markdown.exe
The resulting executable depends on the glib dll file, so be sure to load the glib binary on the Windows host.
These directions assume that MinGW is installed in c:\MinGW
and glib-2.0
is installed in the MinGW directory hierarchy (with the mingw bin directory
in the system path).
Unzip peg-markdown in a temp directory. From the directory with the peg-markdown source, execute:
cd peg-0.1.4
make PKG_CONFIG=c:/path/to/glib/bin/pkg-config.exe
peg-markdown supports extensions to standard markdown syntax.
These can be turned on using the command line flag -x
or
--extensions
. -x
by itself turns on all extensions. Extensions
can also be turned on selectively, using individual command-line
options. To see the available extensions:
./markdown --help-extensions
The --smart
extension provides "smart quotes", dashes, and ellipses.
The --notes
extension provides a footnote syntax like that of
Pandoc or PHP Markdown Extra.
The --autolink
extension converts bare URLs into links.
The --no_images
extension treats image links as plain links.
The --strike
extension provides a strike-through syntax like that of
Redcarpet. For strike-through support in LaTeX documents the sout
macro from the ulem
package is used. Add
\usepackage[normalem]{ulem}
to your document's preamble to load it.
With the --codeblock
extension, blocks delimited with ~~~
will be
considered as code, without the need to be indented.
The library exports two functions:
GString * markdown_to_g_string(char *text, int extensions, int output_format);
char * markdown_to_string(char *text, int extensions, int output_format);
The only difference between these is that markdown_to_g_string
returns a
GString
(glib's automatically resizable string), while markdown_to_string
returns a regular character pointer. The memory allocated for these must be
freed by the calling program, using g_string_free()
or free()
.
text
is the markdown-formatted text to be converted. Note that tabs will
be converted to spaces, using a four-space tab stop. Character encodings are
ignored.
extensions
is a bit-field specifying which syntax extensions should be used.
If extensions
is 0, no extensions will be used. If it is 0xFFFFFF
,
all extensions will be used. To set extensions selectively, use the
bitwise &
operator and the following constants:
EXT_SMART
turns on smart quotes, dashes, and ellipses.EXT_NOTES
turns on footnote syntax. Pandoc's footnote syntax is used here.EXT_FILTER_HTML
filters out raw HTML (except for styles).EXT_FILTER_STYLES
filters out styles in HTML.EXT_STRIKE
turns on strike-through syntax.EXT_AUTOLINK
turns bare URLs into links.
output_format
is either HTML_FORMAT
, LATEX_FORMAT
, ODF_FORMAT
,
or GROFF_MM_FORMAT
.
To use the library, include markdown_lib.h
. See markdown.c
for an example.
It should be pretty easy to modify the program to produce other formats, and to parse syntax extensions. A quick guide:
-
markdown_parser.leg
contains the grammar itself. -
markdown_output.c
contains functions for printing theElement
structure in various output formats. -
To add an output format, add the format to
markdown_formats
inmarkdown_lib.h
. Then modifyprint_element
inmarkdown_output.c
, and add functionsprint_XXXX_string
,print_XXXX_element
, andprint_XXXX_element_list
. Also add an option in the main program that selects the new format. Don't forget to add it to the list of formats in the usage message. -
To add syntax extensions, define them in the PEG grammar (
markdown_parser.leg
), using existing extensions as a guide. New inline elements will need to be added toInline =
; new block elements will need to be added toBlock =
. (Note: the order of the alternatives does matter in PEG grammars.) -
If you need to add new types of elements, modify the
keys
enum inmarkdown_peg.h
. -
By using
&{ }
rules one can selectively disable extensions depending on command-line options. For example,&{ extension(EXT_SMART) }
succeeds only if theEXT_SMART
bit of the globalsyntax_extensions
is set. Add your option tomarkdown_extensions
inmarkdown_lib.h
, and add an option inmarkdown.c
to turn on your extension. -
Note: Avoid using
[^abc]
character classes in the grammar, because they cause problems with non-ascii input. Instead, use:( !'a' !'b' !'c' . )
Support for ODF output was added by Fletcher T. Penney.