-
Notifications
You must be signed in to change notification settings - Fork 18
/
README.txt
55 lines (32 loc) · 2.37 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
NOTE: This fork of htmlcleaner is now merged back into the http://htmlcleaner.sourceforge.net/ project as of version 2.4
2.4 is officially released!
This fork is kept only to help with patch submission to the official version.
==========================================================================
* omitHtmlEnvelope behavior change:
* output all the html contained in the body not just first TagNode contents. ( useful for cleaning html fragments ) ( creates a new blank TagNode to hold the nodes to be outputed
* omitHtmlEnvelope also triggers omitDoctype
* TagNodes that can be reopened after their parent is closed ( i.e. <b><i></b> -- would result in <b><i></i><b><i> ) if the reopened tag ( <i> in this example ) is immediately closed, the reopened tag is pruned. -- accomplished by checking the autoGenerated boolean on TagNode )
* refactoring template methods from Utils to TagTransformer.
*CleanerTransformations changes:
* Utils.updateTagTransformations now member function.
* Handles the transformation work so that multiple TagTransformations can be applied to a given tag. ( sets up for regex expression matching )
* now owns responsibility for determining transformed tagname.
*concept of global AttributeTransformations -- used to strip all attributes that start with "on" for example ( i.e. "onclick" , "onblur" )
* plus added regular expressions matching on values/attribute names
XmlSerializer/HtmlCleaner -- remove IOException being thrown when reading from strings.
* work on spotting "tricky" encoding -- unencode normal ascii characters.
* get Default Output charset from CleanerProperties
* handle badly encoded numbers better for example &x0fx , &0A; were parsed badly before
* added a bunch of html special entities
* convert ' in html context to '
* added regex attribute/value matching
* random spelling corrections
* additional documentation
* add greek and math symbols
* cleanup change - if tag was closed due to improperly placed child it will be reopened after the child.
See ClosedTagReopenTest.java for examples
* added audit code - now it is possible to hook in code that will be notified about changes that htmlcleaner does.
See CleanerProperties#addHtmlModificationListener.
* Added unit tests for escapeXml function from Utils
* JDom generation updated not to fail on starting with 'xml' attributes.
* Unit tests TODOs added