Release 0.2.0
Alternative version: https://github.com/YorickPeterse/oga/blob/b8f9d04b17b2f56eed6e8a5d1ab77a6123fd985e/doc/changelog.md#020---2014-11-17
CSS Selector Support
Probably the biggest feature of this release: support for querying documents
using CSS selectors. Oga supports a subset of the CSS3 selector specification,
in particular the following selectors are supported:
- Element, class and ID selectors
- Attribute selectors (e.g.
foo[x ~= "y"]
)
The following pseudo classes are supported:
:root
:nth-child(n)
:nth-last-child(n)
:nth-of-type(n)
:nth-last-of-type(n)
:first-child
:last-child
:first-of-type
:last-of-type
:only-child
:only-of-type
:empty
You can use CSS selectors using the methods css
and at_css
on an instance of
Oga::XML::Document
or Oga::XML::Element
. For example:
document = Oga.parse_xml('<people><person>Alice</person></people>')
document.css('people person') # => NodeSet(Element(name: "person" ...))
The architecture behind this is quite similar to parsing XPath. There's a lexer
(Oga::CSS::Lexer
) and a parser (Oga::CSS::Parser
). Unlike Nokogiri (and
perhaps other libraries) the parser does not output XPath expressions as a
String or a CSS specific AST. Instead it directly emits an XPath AST. This
allows the resulting AST to be directly evaluated by Oga::XPath::Evaluator
.
See #11 for more information.
Mutli-line Attribute Support
Oga can now lex/parse elements that have attributes with newlines in them.
Previously this would trigger memory allocation errors.
See #58 for more information.
SAX after_element
The after_element
method in the SAX parsing API now always takes two
arguments: the namespace name and element name. Previously this method would
always receive a single nil value as its argument, which is rather pointless.
See #54 for more information.
XPath Grouping
XPath expressions can now be grouped together using parenthesis. This allows one
to specify a custom operator precedence.
Enumerator Parsing Input
Enumerator instances can now be used as input for Oga.parse_xml
and friends.
This can be used to download and parse XML files on the fly. For example:
enum = Enumerator.new do |yielder|
HTTPClient.get('http://some-website.com/some-big-file.xml') do |chunk|
yielder << chunk
end
end
document = Oga.parse_xml(enum)
See #48 for more information.
Removing Attributes
Element attributes can now be removed using Oga::XML::Element#unset
:
element = Oga::XML::Element.new(:name => 'foo')
element.set('class', 'foo')
element.unset('class')
XPath Attributes
XPath predicates are now evaluated for every context node opposed to being
evaluated once for the entire context. This ensures that expressions such as
descendant-or-self::node()/foo[1]
are evaluated correctly.
Available Namespaces
When calling Oga::XML::Element#available_namespaces
the Hash returned by
Oga::XML::Element#namespaces
would be modified in place. This was a bug that
has been fixed in this release.
NodeSets
NodeSet instances can now be compared with each other using ==
. Previously
this would always consider two instances to be different from each other due to
the usage of the default Object#==
method.
XML Entities
XML entities such as &
and <
are now encoded/decoded by the lexer,
string and text nodes.
See #49 for more information.
General
Source lines are no longer included in error messages generated by the XML
parser. This simplifies the code and removes the need of re-reading the input
(in case of IO/Enumerable inputs).
XML Lexer Newlines
Newlines in the XML lexer are now counted in native code (C/Java). On MRI and
JRuby the improvement is quite small, but on Rubinius it's a massive
improvement. See commit 8db77c0a09bf6c996dd2856a6dbe1ad076b1d30a
for more
information.
HTML Void Element Performance
Performance for detecting HTML void elements (e.g. <br>
and <link>
) has been
improved by removing String allocations that were not needed.