Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dtd clone #62

Closed
wants to merge 39 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
ef7006c
Split off and wrap cloning API
nielsdos Aug 25, 2023
52bc3e0
Update ext/libxml APIs so that non-libxml users can hook into the err…
nielsdos Aug 25, 2023
12fd2b0
Add is_html5_class field to document data in libxml
nielsdos Aug 25, 2023
5d163d0
Implement HTML5Document
nielsdos Aug 25, 2023
71bf853
Create aliases for DOM constants
nielsdos Sep 17, 2023
88b294e
Alias dom_import_simplexml
nielsdos Sep 17, 2023
84d9ce3
Create class aliases
nielsdos Sep 17, 2023
46dc030
Introduce common base class
nielsdos Sep 17, 2023
104938c
Register prop handlers in common base class
nielsdos Sep 17, 2023
f6807c3
Implement the new class hierarchy instead of HTML5Document extends DO…
nielsdos Sep 17, 2023
06b6b85
Add libxml2 bug workaround
nielsdos Sep 22, 2023
d7d7165
Update tree error reporting
nielsdos Sep 23, 2023
d0c7089
Amends
nielsdos Sep 24, 2023
142dd95
DOMException name
nielsdos Sep 26, 2023
1146894
Add interaction test with getElementsByTagName(NS)
nielsdos Sep 28, 2023
51135cc
Wire up documentURI
nielsdos Sep 28, 2023
079bd4c
Adjustment for namespace reconciliation revert
nielsdos Sep 28, 2023
2f8849a
Test with noscript
nielsdos Oct 4, 2023
85c9b7f
Implement override_encoding
nielsdos Oct 4, 2023
90d75c6
Make libxml stream context externally visible and use it in html_docu…
nielsdos Oct 4, 2023
fdc81dc
Nope: BC
nielsdos Oct 7, 2023
92ee791
Implement MIME sniff
nielsdos Oct 7, 2023
54f9050
Fix crash if document is uninitialized
nielsdos Oct 12, 2023
2d5ac24
Fix test output due to class changes in this RFC
nielsdos Oct 18, 2023
dbd0679
rename tests
nielsdos Oct 19, 2023
715c76f
Cleanup: encoding is always set for the new HTMLDocument class
nielsdos Oct 19, 2023
0c47536
More and improved tests
nielsdos Oct 19, 2023
0317967
Comment and indent cleanup
nielsdos Oct 20, 2023
cd42f84
Use libxml context for saveHTMLFile
nielsdos Oct 20, 2023
d6919af
[ci skip] UPGRADING
nielsdos Oct 20, 2023
290c01d
Propagate last error back into libxml
nielsdos Oct 20, 2023
c67d089
Propagate file name in libxml error
nielsdos Oct 20, 2023
2905344
Resolve LSAN spurious crashes by upgrading LLVM
nielsdos Oct 21, 2023
8ca4566
Update doctype hint in ext/xsl
nielsdos Oct 22, 2023
5791616
Update error message wording of abstract class
nielsdos Oct 22, 2023
53f4e82
Add test for incompatible override_encoding and charset
nielsdos Oct 22, 2023
8447ace
Test behaviour of XML-style namespaces in HTMLDocument
nielsdos Oct 22, 2023
b69513e
Process review feedback
nielsdos Oct 22, 2023
7aee868
wip: must fix dtd appending edge cases first smh
nielsdos Oct 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions .github/workflows/push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,13 +73,17 @@ jobs:
asan: true
name: "LINUX_X64_${{ matrix.debug && 'DEBUG' || 'RELEASE' }}_${{ matrix.zts && 'ZTS' || 'NTS' }}${{ matrix.asan && '_ASAN' || '' }}"
runs-on: ubuntu-22.04
container:
image: ${{ matrix.asan && 'ubuntu:23.04' || null }}
steps:
- name: git checkout
uses: actions/checkout@v4
- name: apt
uses: ./.github/actions/apt-x64
- name: LLVM 16 (ASAN-only)
if: ${{ matrix.asan }}
run: |
wget https://apt.llvm.org/llvm.sh
chmod u+x llvm.sh
sudo ./llvm.sh 16
- name: System info
run: |
echo "::group::Show host CPU info"
Expand Down Expand Up @@ -110,7 +114,7 @@ jobs:
configurationParameters: >-
--${{ matrix.debug && 'enable' || 'disable' }}-debug
--${{ matrix.zts && 'enable' || 'disable' }}-zts
${{ matrix.asan && 'CFLAGS="-fsanitize=undefined,address -DZEND_TRACK_ARENA_ALLOC" LDFLAGS="-fsanitize=undefined,address" CC=clang CXX=clang++ --disable-opcache-jit' || '' }}
${{ matrix.asan && 'CFLAGS="-fsanitize=undefined,address -DZEND_TRACK_ARENA_ALLOC" LDFLAGS="-fsanitize=undefined,address" CC=clang-16 CXX=clang++-16 --disable-opcache-jit' || '' }}
skipSlow: ${{ matrix.asan }}
- name: make
run: make -j$(/usr/bin/nproc) >/dev/null
Expand Down
8 changes: 8 additions & 0 deletions UPGRADING
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,14 @@ PHP 8.4 UPGRADE NOTES
. Added constant DOMNode::DOCUMENT_POSITION_CONTAINS.
. Added constant DOMNode::DOCUMENT_POSITION_CONTAINED_BY.
. Added constant DOMNode::DOCUMENT_POSITION_IMPLEMENTATION_SPECIFIC.
. Implemented DOM HTML5 parsing and serialization.
RFC: https://wiki.php.net/rfc/domdocument_html5_parser.
This RFC adds the new DOM namespace along with class and constant aliases.
There are two new classes to handle HTML and XML documents:
DOM\HTMLDocument and DOM\XMLDocument.
These classes provide a cleaner API to handle HTML and XML documents.
Furthermore, the DOM\HTMLDocument class implements spec-compliant HTML5
parsing and serialization.

- XSL:
. It is now possible to use parameters that contain both single and double
Expand Down
4 changes: 4 additions & 0 deletions UPGRADING.INTERNALS
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@ PHP 8.4 INTERNALS UPGRADE NOTES
- The function php_xsl_create_object() was removed as it was not used
nor exported.

d. ext/libxml
- Added php_libxml_pretend_ctx_error_ex() to emit errors as if they had come
from libxml.

========================
4. OpCode changes
========================
Expand Down
22 changes: 22 additions & 0 deletions dom.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<?php

$original = new DOMDocument();
$original->loadHTML('<!DOCTYPE html>');

$dt = $original->implementation->createDocumentType('html', '', '');
$original->appendChild($dt);
/*
$doctype = $original->doctype->cloneNode();
var_dump($doctype);

$other = new DOMDocument();
$doctype = $other->importNode($original->doctype);
$other->append($doctype);

echo $original->saveXML();
// Deallocating the original document should not affect the imported node
unset($original);
echo $other->saveXML();

echo "-----------------------------\n";
*/
19 changes: 17 additions & 2 deletions ext/dom/config.m4
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,21 @@ if test "$PHP_DOM" != "no"; then

PHP_SETUP_LIBXML(DOM_SHARED_LIBADD, [
AC_DEFINE(HAVE_DOM,1,[ ])
PHP_LEXBOR_CFLAGS="-I@ext_srcdir@/lexbor -DLEXBOR_STATIC"
LEXBOR_DIR="lexbor/lexbor"
LEXBOR_SOURCES="$LEXBOR_DIR/ports/posix/lexbor/core/memory.c \
$LEXBOR_DIR/core/array_obj.c $LEXBOR_DIR/core/array.c $LEXBOR_DIR/core/avl.c $LEXBOR_DIR/core/bst.c $LEXBOR_DIR/core/diyfp.c $LEXBOR_DIR/core/conv.c $LEXBOR_DIR/core/dobject.c $LEXBOR_DIR/core/dtoa.c $LEXBOR_DIR/core/hash.c $LEXBOR_DIR/core/mem.c $LEXBOR_DIR/core/mraw.c $LEXBOR_DIR/core/print.c $LEXBOR_DIR/core/serialize.c $LEXBOR_DIR/core/shs.c $LEXBOR_DIR/core/str.c $LEXBOR_DIR/core/strtod.c \
$LEXBOR_DIR/dom/interface.c $LEXBOR_DIR/dom/interfaces/attr.c $LEXBOR_DIR/dom/interfaces/cdata_section.c $LEXBOR_DIR/dom/interfaces/character_data.c $LEXBOR_DIR/dom/interfaces/comment.c $LEXBOR_DIR/dom/interfaces/document.c $LEXBOR_DIR/dom/interfaces/document_fragment.c $LEXBOR_DIR/dom/interfaces/document_type.c $LEXBOR_DIR/dom/interfaces/element.c $LEXBOR_DIR/dom/interfaces/node.c $LEXBOR_DIR/dom/interfaces/processing_instruction.c $LEXBOR_DIR/dom/interfaces/shadow_root.c $LEXBOR_DIR/dom/interfaces/text.c \
$LEXBOR_DIR/html/tokenizer/error.c $LEXBOR_DIR/html/tokenizer/state_comment.c $LEXBOR_DIR/html/tokenizer/state_doctype.c $LEXBOR_DIR/html/tokenizer/state_rawtext.c $LEXBOR_DIR/html/tokenizer/state_rcdata.c $LEXBOR_DIR/html/tokenizer/state_script.c $LEXBOR_DIR/html/tokenizer/state.c \
$LEXBOR_DIR/html/tree/active_formatting.c $LEXBOR_DIR/html/tree/error.c $LEXBOR_DIR/html/tree/insertion_mode/after_after_body.c $LEXBOR_DIR/html/tree/insertion_mode/after_after_frameset.c $LEXBOR_DIR/html/tree/insertion_mode/after_body.c $LEXBOR_DIR/html/tree/insertion_mode/after_frameset.c $LEXBOR_DIR/html/tree/insertion_mode/after_head.c $LEXBOR_DIR/html/tree/insertion_mode/before_head.c $LEXBOR_DIR/html/tree/insertion_mode/before_html.c $LEXBOR_DIR/html/tree/insertion_mode/foreign_content.c $LEXBOR_DIR/html/tree/insertion_mode/in_body.c $LEXBOR_DIR/html/tree/insertion_mode/in_caption.c $LEXBOR_DIR/html/tree/insertion_mode/in_cell.c $LEXBOR_DIR/html/tree/insertion_mode/in_column_group.c $LEXBOR_DIR/html/tree/insertion_mode/in_frameset.c $LEXBOR_DIR/html/tree/insertion_mode/in_head.c $LEXBOR_DIR/html/tree/insertion_mode/in_head_noscript.c $LEXBOR_DIR/html/tree/insertion_mode/initial.c $LEXBOR_DIR/html/tree/insertion_mode/in_row.c $LEXBOR_DIR/html/tree/insertion_mode/in_select.c $LEXBOR_DIR/html/tree/insertion_mode/in_select_in_table.c $LEXBOR_DIR/html/tree/insertion_mode/in_table_body.c $LEXBOR_DIR/html/tree/insertion_mode/in_table.c $LEXBOR_DIR/html/tree/insertion_mode/in_table_text.c $LEXBOR_DIR/html/tree/insertion_mode/in_template.c $LEXBOR_DIR/html/tree/insertion_mode/text.c $LEXBOR_DIR/html/tree/open_elements.c \
$LEXBOR_DIR/encoding/big5.c $LEXBOR_DIR/encoding/decode.c $LEXBOR_DIR/encoding/encode.c $LEXBOR_DIR/encoding/encoding.c $LEXBOR_DIR/encoding/euc_kr.c $LEXBOR_DIR/encoding/gb18030.c $LEXBOR_DIR/encoding/iso_2022_jp_katakana.c $LEXBOR_DIR/encoding/jis0208.c $LEXBOR_DIR/encoding/jis0212.c $LEXBOR_DIR/encoding/range.c $LEXBOR_DIR/encoding/res.c $LEXBOR_DIR/encoding/single.c \
$LEXBOR_DIR/html/encoding.c $LEXBOR_DIR/html/interface.c $LEXBOR_DIR/html/parser.c $LEXBOR_DIR/html/token.c $LEXBOR_DIR/html/token_attr.c $LEXBOR_DIR/html/tokenizer.c $LEXBOR_DIR/html/tree.c \
$LEXBOR_DIR/html/interfaces/anchor_element.c $LEXBOR_DIR/html/interfaces/area_element.c $LEXBOR_DIR/html/interfaces/audio_element.c $LEXBOR_DIR/html/interfaces/base_element.c $LEXBOR_DIR/html/interfaces/body_element.c $LEXBOR_DIR/html/interfaces/br_element.c $LEXBOR_DIR/html/interfaces/button_element.c $LEXBOR_DIR/html/interfaces/canvas_element.c $LEXBOR_DIR/html/interfaces/data_element.c $LEXBOR_DIR/html/interfaces/data_list_element.c $LEXBOR_DIR/html/interfaces/details_element.c $LEXBOR_DIR/html/interfaces/dialog_element.c $LEXBOR_DIR/html/interfaces/directory_element.c $LEXBOR_DIR/html/interfaces/div_element.c $LEXBOR_DIR/html/interfaces/d_list_element.c $LEXBOR_DIR/html/interfaces/document.c $LEXBOR_DIR/html/interfaces/element.c $LEXBOR_DIR/html/interfaces/embed_element.c $LEXBOR_DIR/html/interfaces/field_set_element.c $LEXBOR_DIR/html/interfaces/font_element.c $LEXBOR_DIR/html/interfaces/form_element.c $LEXBOR_DIR/html/interfaces/frame_element.c $LEXBOR_DIR/html/interfaces/frame_set_element.c $LEXBOR_DIR/html/interfaces/head_element.c $LEXBOR_DIR/html/interfaces/heading_element.c $LEXBOR_DIR/html/interfaces/hr_element.c $LEXBOR_DIR/html/interfaces/html_element.c $LEXBOR_DIR/html/interfaces/iframe_element.c $LEXBOR_DIR/html/interfaces/image_element.c $LEXBOR_DIR/html/interfaces/input_element.c $LEXBOR_DIR/html/interfaces/label_element.c $LEXBOR_DIR/html/interfaces/legend_element.c $LEXBOR_DIR/html/interfaces/li_element.c $LEXBOR_DIR/html/interfaces/link_element.c $LEXBOR_DIR/html/interfaces/map_element.c $LEXBOR_DIR/html/interfaces/marquee_element.c $LEXBOR_DIR/html/interfaces/media_element.c $LEXBOR_DIR/html/interfaces/menu_element.c $LEXBOR_DIR/html/interfaces/meta_element.c $LEXBOR_DIR/html/interfaces/meter_element.c $LEXBOR_DIR/html/interfaces/mod_element.c $LEXBOR_DIR/html/interfaces/object_element.c $LEXBOR_DIR/html/interfaces/o_list_element.c $LEXBOR_DIR/html/interfaces/opt_group_element.c $LEXBOR_DIR/html/interfaces/option_element.c $LEXBOR_DIR/html/interfaces/output_element.c $LEXBOR_DIR/html/interfaces/paragraph_element.c $LEXBOR_DIR/html/interfaces/param_element.c $LEXBOR_DIR/html/interfaces/picture_element.c $LEXBOR_DIR/html/interfaces/pre_element.c $LEXBOR_DIR/html/interfaces/progress_element.c $LEXBOR_DIR/html/interfaces/quote_element.c $LEXBOR_DIR/html/interfaces/script_element.c $LEXBOR_DIR/html/interfaces/select_element.c $LEXBOR_DIR/html/interfaces/slot_element.c $LEXBOR_DIR/html/interfaces/source_element.c $LEXBOR_DIR/html/interfaces/span_element.c $LEXBOR_DIR/html/interfaces/style_element.c $LEXBOR_DIR/html/interfaces/table_caption_element.c $LEXBOR_DIR/html/interfaces/table_cell_element.c $LEXBOR_DIR/html/interfaces/table_col_element.c $LEXBOR_DIR/html/interfaces/table_element.c $LEXBOR_DIR/html/interfaces/table_row_element.c $LEXBOR_DIR/html/interfaces/table_section_element.c $LEXBOR_DIR/html/interfaces/template_element.c $LEXBOR_DIR/html/interfaces/text_area_element.c $LEXBOR_DIR/html/interfaces/time_element.c $LEXBOR_DIR/html/interfaces/title_element.c $LEXBOR_DIR/html/interfaces/track_element.c $LEXBOR_DIR/html/interfaces/u_list_element.c $LEXBOR_DIR/html/interfaces/unknown_element.c $LEXBOR_DIR/html/interfaces/video_element.c $LEXBOR_DIR/html/interfaces/window.c \
$LEXBOR_DIR/selectors/selectors.c \
$LEXBOR_DIR/ns/ns.c \
$LEXBOR_DIR/tag/tag.c"
PHP_NEW_EXTENSION(dom, [php_dom.c attr.c document.c \
xml_document.c html_document.c html5_serializer.c html5_parser.c namespace_compat.c \
domexception.c parentnode.c \
processinginstruction.c cdatasection.c \
documentfragment.c domimplementation.c \
Expand All @@ -21,8 +35,9 @@ if test "$PHP_DOM" != "no"; then
nodelist.c text.c comment.c \
entityreference.c \
notation.c xpath.c dom_iterators.c \
namednodemap.c],
$ext_shared)
namednodemap.c \
$LEXBOR_SOURCES],
$ext_shared,,$PHP_LEXBOR_CFLAGS)
PHP_SUBST(DOM_SHARED_LIBADD)
PHP_INSTALL_HEADERS([ext/dom/xml_common.h])
PHP_ADD_EXTENSION_DEP(dom, libxml)
Expand Down
18 changes: 17 additions & 1 deletion ext/dom/config.w32
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,29 @@ if (PHP_DOM == "yes") {
CHECK_HEADER_ADD_INCLUDE("libxml/parser.h", "CFLAGS_DOM", PHP_PHP_BUILD + "\\include\\libxml2")
) {
EXTENSION("dom", "php_dom.c attr.c document.c \
xml_document.c html_document.c html5_serializer.c html5_parser.c namespace_compat.c \
domexception.c parentnode.c processinginstruction.c \
cdatasection.c documentfragment.c domimplementation.c element.c \
node.c characterdata.c documenttype.c \
entity.c nodelist.c text.c comment.c \
entityreference.c \
notation.c xpath.c dom_iterators.c \
namednodemap.c");
namednodemap.c", null, "-Iext/dom/lexbor");

ADD_SOURCES("ext/dom/lexbor/lexbor/ports/windows_nt/lexbor/core", "memory.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/core", "array_obj.c array.c avl.c bst.c diyfp.c conv.c dobject.c dtoa.c hash.c mem.c mraw.c print.c serialize.c shs.c str.c strtod.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/dom", "interface.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/dom/interfaces", "attr.c cdata_section.c character_data.c comment.c document.c document_fragment.c document_type.c element.c node.c processing_instruction.c shadow_root.c text.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/html/tokenizer", "error.c state_comment.c state_doctype.c state_rawtext.c state_rcdata.c state_script.c state.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/html/tree", "active_formatting.c open_elements.c error.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/html/tree/insertion_mode", "after_after_body.c after_after_frameset.c after_body.c after_frameset.c after_head.c before_head.c before_html.c foreign_content.c in_body.c in_caption.c in_cell.c in_column_group.c in_frameset.c in_head.c in_head_noscript.c initial.c in_row.c in_select.c in_select_in_table.c in_table_body.c in_table.c in_table_text.c in_template.c text.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/html", "encoding.c interface.c parser.c token.c token_attr.c tokenizer.c tree.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/encoding", "big5.c decode.c encode.c encoding.c euc_kr.c gb18030.c iso_2022_jp_katakana.c jis0208.c jis0212.c range.c res.c single.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/html/interfaces", "anchor_element.c area_element.c audio_element.c base_element.c body_element.c br_element.c button_element.c canvas_element.c data_element.c data_list_element.c details_element.c dialog_element.c directory_element.c div_element.c d_list_element.c document.c element.c embed_element.c field_set_element.c font_element.c form_element.c frame_element.c frame_set_element.c head_element.c heading_element.c hr_element.c html_element.c iframe_element.c image_element.c input_element.c label_element.c legend_element.c li_element.c link_element.c map_element.c marquee_element.c media_element.c menu_element.c meta_element.c meter_element.c mod_element.c object_element.c o_list_element.c opt_group_element.c option_element.c output_element.c paragraph_element.c param_element.c picture_element.c pre_element.c progress_element.c quote_element.c script_element.c select_element.c slot_element.c source_element.c span_element.c style_element.c table_caption_element.c table_cell_element.c table_col_element.c table_element.c table_row_element.c table_section_element.c template_element.c text_area_element.c time_element.c title_element.c track_element.c u_list_element.c unknown_element.c video_element.c window.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/selectors", "selectors.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/ns", "ns.c", "dom");
ADD_SOURCES("ext/dom/lexbor/lexbor/tag", "tag.c", "dom");
ADD_FLAG("CFLAGS_DOM", "/D LEXBOR_STATIC ");

AC_DEFINE("HAVE_DOM", 1, "DOM support");

Expand Down
Loading
Loading