hxml
is a simple, convenient way to write HTML and XML. Mainly,
it reduces the clutter and burden of closing tags.
Writing HTML directly can be unpleasant, and tools exist to generate it at different levels of abstraction. Ultimately, however, most developers will need to work with HTML source directly to get what they want done.
hxml
source compiles to regular HTML or XML, and is compatible with
it, but with a few added features.
This syntax was developed as part of a larger ecosystem written in
Haskell (called gxml
). However, this part was useful on its own,
and so I extracted it into one standalone file of Haskell source.
You do not need to know any Haskell to use it See Installation and use below.
hxml
checks your HTML for unclosed or wrongly-closed tags. All tags
must be closed, or self-closed such as <img src="picture.png"/>
.
This follows XML/XHTML style.
The design goals for XML regarded verbosity as a non-issue.
One example is that tags must be closed with the full name of the
initial tag. This can sometimes be an aid in reading source, but
often it is not helpful at all. For example, in HTML documents with
numerous levels of <div>
s, seeing </div>
tells you very little.
And for short snippets of enclosed text, it just adds work and noise.
hxml
allows the name of the closing tag to be omitted, so that
This is a <span class="purple">purple chunk</> of text.
compiles to
This is a <span class="purple">purple chunk</span> of text.
Going the other way, you can supply extra information in the closing tag:
<div id="menubar" class="offset">
<div>Home</><div>Dashboard</>
</div id="menubar">
Attributes on the closing tag will be checked by hxml
to see if they
match the opening tag, and then removed. This can catch mis-matches
that an HTML checker would miss.
The syntax <tag:>
with a colon will scope the tag over an indented
block. For example,
<div:>
<h1:>This is a header
<p class="big xyz":>This is a test paragaraph with
multiple lines of
content.
will become
<div>
<h1>This is a header.</h1>
<p class="big xyz">This is a test paragaraph with
multiple lines of
content.</p></div>
You can include blank lines within the indented block.
XML requires that all attributes be quoted. This can be inconvenient
and pointless for simple values, so hxml
will quote these for you:
<table cellpadding=0 cellspacing=1>
turns into
<table cellpadding="0" cellspacing="1">
Of course, if there are special characters such as spaces or slashes in the attribute value, you'll still need to quote them.
An attribute with no value is expanded to having itself as a value,
so that <input disabled>
becomes <input disabled="disabled">
.
Attributes starting with an underscore (_
) are removed. These can
be used for documentation and/or matching:
<div _pricelist>
content
</div _pricelist>
To my knowledge, no application uses attributes that start with an underscore. If one turns up, I may revisit this syntax.
A self-closing tag such as <hr/>
has no content. We generalize
this by allowing short content after the closing slash. That is,
This is a <b/bold> word.
expands to
This is a <b>bold</b> word.
This should only be used for short text with no potential syntactic ambiguity. For anything more, just use the closing tag syntax above:
This is a <b>bold/strong/emphasized</> phrase.
Content enclosed with the <#>
tag is considered an hxml
comment and
is removed. I dub these chomments. They pattern like any other tag:
<#>This will be removed.</#>
<#>This uses the simpler closing tag.</>
<#:>
A potentially longer block
chomment.
<#/a short chomment>
These are distinct from <!-- comments -->
. The latter are preserved
by hxml
, and should be used for comments intended to be in the
final HTML.
Inside a chommented block, tags are not parsed and matched since the indentation is sufficient to deliminate the block.
hxml
is fairly encoding-agnostic. It should work with UTF-8,
ISO-8859-1, and any other 8-bit extension of ASCII. It will not work
with UTF-16 and the like.
I have tried to address various side and corner cases reasonably, although I may try different approaches in the future.
In XML a self-closed tag is equivalent to a tag with no content,
but in HTML5 tags are never self-closed, and instead are either
inherently contentful or contentless (“void”). Closing a void tag
is an error, and a contentful one must have a separate closing tag.
Final slashes as in <hr/>
may be present but are ignored.
hxml
has a list of non-void HTML5 tags, and always expands them
to include a distinct closing tag. So, <div/>
and <i class=fa/>
compile to <div></div>
and <i class=fa></i>
.
For other tags, hxml
simply preserves the input. So, <br/>
stays the same, and <br></br>
would not be compressed (but no one
should write that). If a new tag is not on hxml
's list and needs
to be closed, write out <new></>
to assure a closing tag.
HTML has an odd relationship to whitespace. Sometimes it matters,
sometimes it's ignored. hxml
has careful rules for dispatching it.
As illustrated, blocked tags attach the closing tag to the end of the
last line of the block. If space is desired before the closing tag,
you'll have to fit it in somehow, say with a space at the end of the
last line. Chomments can help visually here, such as the empty <#/>
.
If you require a newline, a line with only the indent can be added.
It might be simpler to use </>
in some cases.
Controlling space after the opening tag is easier. hxml
also
provides a mechanism for suppressing a newline immediately after tag,
in case it is more sightly to have the whole block indented the same:
simply put a space before the colon. Thus,
<span :>
One line
becomes <span>One line</span>
. This is perhaps a strange choice of
syntax, but it has served well enough.
Chomments pattern just like block tags in that a chommented block will turn into a newline. However, there is one exception: a one-line chommented block disappears completely, so that
Foo<#:>chomment
bar.
becomes Foobar
.
This is halfway between a bug and a feature. I set out to fix this defect in the parser, but decided this exception may be useful in practice, and so for now I am leaving it.
Error reporting should be pretty good. I initially wrote hxml
using
Megaparsec, but
rewrote it to use Parsec,
since it was more standard. I then rewrote it back into Megaparsec.
As such, the code may be a bit rough around the edges as of this
writing.
Attributes on a close tag must be quoted (or not) the same as on
the opening tag. E.g., <i class="foo">text</i class=foo>
yields
an error.
Tags are compacted to minimal spacing. For instance, a tag split across several lines will be compressed to one line, so
<div
_updated=2017-07-19
id=main_menu
class=myclass
ng-if=shown
_todo=replace
>
becomes <div id="main_menu" class="myclass" ng-if="shown">
.
Tabs in the source are preserved and are treated as 4 spaces wide when
comparing indentation. This value can be set in code as tabWidth
.
As usual, beware when mixing tabs and spaces.
You will need to install GHC or Stack, and use cabal install
or
stack install
to build a library and executable.
The executable is simple:
hxml < main.hxml > main.html
If your application is written in Haskell, you can also use
the parseHxml
function directly. As an example, I write
Heist templates in hxml
.
Feedback is welcome!