Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to get text 'CZ/KHN' #461

Closed
windzhu0514 opened this issue Nov 2, 2023 · 3 comments
Closed

how to get text 'CZ/KHN' #461

windzhu0514 opened this issue Nov 2, 2023 · 3 comments

Comments

@windzhu0514
Copy link

<table class="data-table1">
    <colgroup>
        <col width="45px">
        <col width="83px">
        <col width="116px">
        <col width="88px">
        <col width="58px">
        <col width="57px">
        <col width="64px">
        <col width="70px">
        <col width="67px">
        <col width="64px">
        <col width="63px">
    </colgroup>
    <tbody>
        <tr>
            <th rowspan="2" class="first"><input type="checkbox" onclick="allCheck(this);" /></th>
            <th class="dotted none1 redText"><span>PNR</span></th>
            <th class="dotted redText">Airline/Arrival<br>airport</th>
            <th rowspan="2" class="space redText"><span class="sub-space">Ticket Number</span></th>
            <th class="dotted redText">Flight fare</th>
            <th rowspan="2" class="space redText">Commission</th>
            <th class="dotted redText">Total price</th>
            <th rowspan="2" class="space redText"><span class="sub-space">Remaining Balance</span></th>
            <th rowspan="2" class="space redText">INVOICE</th>
            <th rowspan="2" class="space redText">REQUEST</th>
            <th rowspan="2" class="none1 redText">VOID</th>
        </tr>
        <tr>
            <th class="redText"><span>Ticketing Date</span></th>
            <th class="redText none1"><span>Passenger<br>Name</span></th>
            <th class="redText none1"><span>TAX</span></th>
            <th class="redText none1"><span>Service fee</span></th>
        </tr>
        <!--loop -->

        <tr>
            <input type="hidden" name="PTR_PNR_ADDR" value="5XR127" />
            <script>
                var priRefundType = '';
                if (priRefundType == '傈咀券阂' || priRefundType == '老何券阂') {
                    document.write("<td rowspan=\"2\" class=\"first hover\">");
                } else {
                    document.write("<td rowspan=\"2\" class=\"first\">");
                }
            </script>
            <input type="checkbox" name='TKTCHECKBOX' value='7843449802431|5XR127' />
            </td>
            <script>
                var priRefundType = '';
                if (priRefundType == '傈咀券阂' || priRefundType == '老何券阂') {
                    document.write("<td class=\"dotted hover\">");
                } else {
                    document.write("<td class=\"dotted\">");
                }
            </script>
            <a href="#" onclick="pnrWindow('/lts/GC_wholesale/WS_TktRequestDetail.lts?PCD_ID=25319832',850,800);"
                class="href1">5XR127</a>
            </td>
            <script>
                var priRefundType = '';
                if (priRefundType == '傈咀券阂' || priRefundType == '老何券阂') {
                    document.write("<td class=\"dotted hover\">");
                } else {
                    document.write("<td class=\"dotted\">");
                }
            </script>
            CZ/KHN
            </td>
            <script>
                var priRefundType = '';
                if (priRefundType == '傈咀券阂' || priRefundType == '老何券阂') {
                    document.write("<td rowspan=\"2\" class=\"data1 hover\">");
                } else {
                    document.write("<td rowspan=\"2\" class=\"data1\">");
                }
            </script>
            7843449802431
            </td>
        </tr>
    </tbody>
</table>
@mna
Copy link
Member

mna commented Nov 2, 2023

Hello, see #287 (comment) for how to get text that is not wrapped in an html element.

@mna mna closed this as completed Nov 2, 2023
@windzhu0514
Copy link
Author

Hello, see #287 (comment) for how to get text that is not wrapped in an html element.

it's not work for this html

@mna
Copy link
Member

mna commented Nov 2, 2023

It does work, the problem is that this html is broken/invalid (most striking is the lack of opening <td> so all content is inside the <tr> which is invalid). Once parsed with the html5 parser, this is the actual html that goquery works with:

<html><head></head><body><input type="checkbox" name="TKTCHECKBOX" value="7843449802431|5XR127"/><a href="#" onclick="pnrWindow(&#39;/lts/GC_wholesale/WS_TktRequestDetail.lts?PCD_ID=25319832&#39;,850,800);" class="href1">5XR127</a>
            CZ/KHN
            
            7843449802431
            <table class="data-table1">
    <colgroup>
        <col width="45px"/>
        <col width="83px"/>
        <col width="116px"/>
        <col width="88px"/>
        <col width="58px"/>
        <col width="57px"/>
        <col width="64px"/>
        <col width="70px"/>
        <col width="67px"/>
        <col width="64px"/>
        <col width="63px"/>
    </colgroup>
    <tbody>
        <tr>
            <th rowspan="2" class="first"><input type="checkbox" onclick="allCheck(this);"/></th>
            <th class="dotted none1 redText"><span>PNR</span></th>
            <th class="dotted redText">Airline/Arrival<br/>airport</th>
            <th rowspan="2" class="space redText"><span class="sub-space">Ticket Number</span></th>
            <th class="dotted redText">Flight fare</th>
            <th rowspan="2" class="space redText">Commission</th>
            <th class="dotted redText">Total price</th>
            <th rowspan="2" class="space redText"><span class="sub-space">Remaining Balance</span></th>
            <th rowspan="2" class="space redText">INVOICE</th>
            <th rowspan="2" class="space redText">REQUEST</th>
            <th rowspan="2" class="none1 redText">VOID</th>
        </tr>
        <tr>
            <th class="redText"><span>Ticketing Date</span></th>
            <th class="redText none1"><span>Passenger<br/>Name</span></th>
            <th class="redText none1"><span>TAX</span></th>
            <th class="redText none1"><span>Service fee</span></th>
        </tr>
        <!--loop -->

        <tr>
            <input type="hidden" name="PTR_PNR_ADDR" value="5XR127"/>
            <script>
                var priRefundType = '';
                if (priRefundType == '傈咀券阂' || priRefundType == '老何券阂') {
                    document.write("<td rowspan=\"2\" class=\"first hover\">");
                } else {
                    document.write("<td rowspan=\"2\" class=\"first\">");
                }
            </script>
            
            
            <script>
                var priRefundType = '';
                if (priRefundType == '傈咀券阂' || priRefundType == '老何券阂') {
                    document.write("<td class=\"dotted hover\">");
                } else {
                    document.write("<td class=\"dotted\">");
                }
            </script>
            
            
            <script>
                var priRefundType = '';
                if (priRefundType == '傈咀券阂' || priRefundType == '老何券阂') {
                    document.write("<td class=\"dotted hover\">");
                } else {
                    document.write("<td class=\"dotted\">");
                }
            </script>
            <script>
                var priRefundType = '';
                if (priRefundType == '傈咀券阂' || priRefundType == '老何券阂') {
                    document.write("<td rowspan=\"2\" class=\"data1 hover\">");
                } else {
                    document.write("<td rowspan=\"2\" class=\"data1\">");
                }
            </script>
        </tr>
    </tbody>
</table>
</body></html>

As you can see, quite different. Here's an example program that can help you get started:

func main() {
	doc, err := goquery.NewDocumentFromReader(strings.NewReader(data))
	if err != nil {
		log.Fatal(err)
	}
        // print the full html after being parsed and fixed by the html5 parser
	fmt.Println(goquery.OuterHtml(doc.Selection))

        // there are many ways to get to the text segment that contains what you want,
        // here I just started at the <table> element and selected its parent then iterated
        // over its contents to find the text segment.
	doc.Find("body table").Parent().Contents().Each(func(i int, s *goquery.Selection) {
		if goquery.NodeName(s) == "#text" {
			fmt.Printf(">>> (%d) >>> %s\n", i, s.Text())
		}
	})
// Prints:
// >>> (2) >>> 
//             CZ/KHN
//            
//             7843449802431
//             
// >>> (4) >>> 
// 
}

Hope this helps,
Martin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants