diff --git a/docs/formats/HTML.md b/docs/formats/HTML.md index 73efa2e2..f08f4f28 100644 --- a/docs/formats/HTML.md +++ b/docs/formats/HTML.md @@ -85,7 +85,7 @@ WHERE whatwg:innerText "Hello world!" ] ; whatwg:innerHTML "Hello world!" ; - whatwg:innerText "Hello world! Hello world!" + whatwg:innerText "Hello world!" ] ; rdf:_2 [ rdf:type xhtml:body ; rdf:_1 [ rdf:type xhtml:p ; @@ -95,10 +95,10 @@ WHERE whatwg:innerText "Hello world" ] ; whatwg:innerHTML "

Hello world

" ; - whatwg:innerText "Hello world Hello world" + whatwg:innerText "Hello world" ] ; whatwg:innerHTML "\n Hello world!\n\n\n

Hello world

\n" ; - whatwg:innerText "Hello world! Hello world Hello world! Hello world! Hello world Hello world" + whatwg:innerText "Hello world! Hello world" ] . ``` @@ -111,6 +111,7 @@ WHERE | [html.selector](#htmlselector) | A CSS selector that restricts the HTML tags to consider for the triplification. | Any valid CSS selector. | `:root` | | [html.metadata](#htmlmetadata) | It tells the triplifier to extract inline RDF from HTML pages. The triples extracted will be included in the default graph. -- See [#164](https://github.com/SPARQL-Anything/sparql.anything/issues/164) | true/false | `false` | | [html.browser](#htmlbrowser) | It tells the triplifier to use the specified browser to navigate to the page to obtain HTML. By default a browser is not used. The use of a browser has some dependencies -- see [BROWSER](https://github.com/SPARQL-Anything/sparql.anything/blob/v1.0-DEV/BROWSER.md) and [justin2004's blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | chromium|webkit|firefox | Not set | +| [html.parser](#htmlparser) | It tells the triplifier to use the specified JSoup parser (default: html). | xml html | Not set | | [html.browser.wait](#htmlbrowserwait) | When using a browser to navigate, it tells the triplifier to wait for the specified number of seconds (after telling the browser to navigate to the page) before attempting to obtain HTML. -- See See [justin2004's blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any integer | Not set | | [html.browser.screenshot](#htmlbrowserscreenshot) | When using a browser to navigate, take a screenshot of the webpage (perhaps for troubleshooting) and save it here. See [justin2004's blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any valid URL | Not set | | [html.browser.timeout](#htmlbrowsertimeout) | When using a browser to navigate, it tells the browser if it spends longer than this amount of time (in milliseconds) until a load event is emitted then the operation will timeout -- See [justin2004's blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any integer | `30000` | @@ -283,14 +284,14 @@ WHERE ] ; xhtml:itemscope "" ; xhtml:itemtype "https://schema.org/Movie" ; - whatwg:innerHTML "

Avatar

Director: James Cameron (born August 16, 1954)" ; - whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)" + whatwg:innerHTML "

Avatar

Director: James Cameron (born August 16, 1954)" ; + whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954)" ] ; - whatwg:innerHTML "
\n

Avatar

Director: James Cameron (born August 16, 1954)\n
" ; - whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)" + whatwg:innerHTML "
\n

Avatar

Director: James Cameron (born August 16, 1954)\n
" ; + whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954)" ] ; - whatwg:innerHTML "\n\n
\n

Avatar

Director: James Cameron (born August 16, 1954)\n
\n" ; - whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)" + whatwg:innerHTML "\n\n
\n

Avatar

Director: James Cameron (born August 16, 1954)\n
\n" ; + whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954)" ] . @@ -317,6 +318,22 @@ chromium|webkit|firefox Not set +--- +### `html.parser` + +#### Description + +It tells the triplifier to use the specified JSoup parser (default: html). + +#### Valid Values + +xml html + +#### Default Value + +Not set + + --- ### `html.browser.wait` diff --git a/docs/formats/Metadata.md b/docs/formats/Metadata.md index 937fb5c0..7b4cb756 100644 --- a/docs/formats/Metadata.md +++ b/docs/formats/Metadata.md @@ -122,7 +122,7 @@ WHERE "f/7.1" ; - "Tue Jan 09 14:01:48 +01:00 2024" ; + "Fri Feb 23 15:30:21 +00:00 2024" ; "Canon_40D.jpg" ; diff --git a/docs/formats/RDF.md b/docs/formats/RDF.md index 2119ac87..b45e7adb 100644 --- a/docs/formats/RDF.md +++ b/docs/formats/RDF.md @@ -2,16 +2,18 @@ # RDF -RDF files can be targeted by the option `location`, the content is loaded as-is (no facade-x interpretation, obviously). In addition, the SPARQL Anything Command Line Interface can load static RDF files. +RDF files can be targeted like any other format by the option `location`. The content is queried as-is (no facade-x interpretation needed, obviously). -The query does not need to include a SERVICE clause, so you can use the tool to just query some RDF file of your choice. -This is useful when you want to break down the process so that RDF files produced by previous SPARQL Anything processes are joined with data coming from additional transformatioons. + +In addition, the [Command Line Interface (CLI)](../CLI.md) can load static RDF files. + +This is useful when you want to break down the task so that RDF files produced by previous SPARQL Anything executions are joined with additional transformations. Examples of this can be found in the [tutorials](../TUTORIALS.md). This feature is enabled with the command line argument `-l|--load` that accepts a file or a directory. -The files are loaded in a Dataset which becomes the target for the query execution. +The files are loaded in a Dataset, which becomes the target for the query execution. A single file will be loaded in the default Graph. -In the second case, all RDF files in the folder are loaded, each one on a Named Graph. +If pointing to a folder, all RDF files in the folder are loaded, each one on a Named Graph. See also the documentation of the [Command Line Interface (CLI)](../CLI.md). diff --git a/formats/HTML.md b/formats/HTML.md index 73efa2e2..f08f4f28 100644 --- a/formats/HTML.md +++ b/formats/HTML.md @@ -85,7 +85,7 @@ WHERE whatwg:innerText "Hello world!" ] ; whatwg:innerHTML "Hello world!" ; - whatwg:innerText "Hello world! Hello world!" + whatwg:innerText "Hello world!" ] ; rdf:_2 [ rdf:type xhtml:body ; rdf:_1 [ rdf:type xhtml:p ; @@ -95,10 +95,10 @@ WHERE whatwg:innerText "Hello world" ] ; whatwg:innerHTML "

Hello world

" ; - whatwg:innerText "Hello world Hello world" + whatwg:innerText "Hello world" ] ; whatwg:innerHTML "\n Hello world!\n\n\n

Hello world

\n" ; - whatwg:innerText "Hello world! Hello world Hello world! Hello world! Hello world Hello world" + whatwg:innerText "Hello world! Hello world" ] . ``` @@ -111,6 +111,7 @@ WHERE | [html.selector](#htmlselector) | A CSS selector that restricts the HTML tags to consider for the triplification. | Any valid CSS selector. | `:root` | | [html.metadata](#htmlmetadata) | It tells the triplifier to extract inline RDF from HTML pages. The triples extracted will be included in the default graph. -- See [#164](https://github.com/SPARQL-Anything/sparql.anything/issues/164) | true/false | `false` | | [html.browser](#htmlbrowser) | It tells the triplifier to use the specified browser to navigate to the page to obtain HTML. By default a browser is not used. The use of a browser has some dependencies -- see [BROWSER](https://github.com/SPARQL-Anything/sparql.anything/blob/v1.0-DEV/BROWSER.md) and [justin2004's blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | chromium|webkit|firefox | Not set | +| [html.parser](#htmlparser) | It tells the triplifier to use the specified JSoup parser (default: html). | xml html | Not set | | [html.browser.wait](#htmlbrowserwait) | When using a browser to navigate, it tells the triplifier to wait for the specified number of seconds (after telling the browser to navigate to the page) before attempting to obtain HTML. -- See See [justin2004's blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any integer | Not set | | [html.browser.screenshot](#htmlbrowserscreenshot) | When using a browser to navigate, take a screenshot of the webpage (perhaps for troubleshooting) and save it here. See [justin2004's blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any valid URL | Not set | | [html.browser.timeout](#htmlbrowsertimeout) | When using a browser to navigate, it tells the browser if it spends longer than this amount of time (in milliseconds) until a load event is emitted then the operation will timeout -- See [justin2004's blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any integer | `30000` | @@ -283,14 +284,14 @@ WHERE ] ; xhtml:itemscope "" ; xhtml:itemtype "https://schema.org/Movie" ; - whatwg:innerHTML "

Avatar

Director: James Cameron (born August 16, 1954)" ; - whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)" + whatwg:innerHTML "

Avatar

Director: James Cameron (born August 16, 1954)" ; + whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954)" ] ; - whatwg:innerHTML "
\n

Avatar

Director: James Cameron (born August 16, 1954)\n
" ; - whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)" + whatwg:innerHTML "
\n

Avatar

Director: James Cameron (born August 16, 1954)\n
" ; + whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954)" ] ; - whatwg:innerHTML "\n\n
\n

Avatar

Director: James Cameron (born August 16, 1954)\n
\n" ; - whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)" + whatwg:innerHTML "\n\n
\n

Avatar

Director: James Cameron (born August 16, 1954)\n
\n" ; + whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954)" ] . @@ -317,6 +318,22 @@ chromium|webkit|firefox Not set +--- +### `html.parser` + +#### Description + +It tells the triplifier to use the specified JSoup parser (default: html). + +#### Valid Values + +xml html + +#### Default Value + +Not set + + --- ### `html.browser.wait` diff --git a/formats/Metadata.md b/formats/Metadata.md index 937fb5c0..7b4cb756 100644 --- a/formats/Metadata.md +++ b/formats/Metadata.md @@ -122,7 +122,7 @@ WHERE "f/7.1" ; - "Tue Jan 09 14:01:48 +01:00 2024" ; + "Fri Feb 23 15:30:21 +00:00 2024" ; "Canon_40D.jpg" ; diff --git a/formats/RDF.md b/formats/RDF.md index 2119ac87..b45e7adb 100644 --- a/formats/RDF.md +++ b/formats/RDF.md @@ -2,16 +2,18 @@ # RDF -RDF files can be targeted by the option `location`, the content is loaded as-is (no facade-x interpretation, obviously). In addition, the SPARQL Anything Command Line Interface can load static RDF files. +RDF files can be targeted like any other format by the option `location`. The content is queried as-is (no facade-x interpretation needed, obviously). -The query does not need to include a SERVICE clause, so you can use the tool to just query some RDF file of your choice. -This is useful when you want to break down the process so that RDF files produced by previous SPARQL Anything processes are joined with data coming from additional transformatioons. + +In addition, the [Command Line Interface (CLI)](../CLI.md) can load static RDF files. + +This is useful when you want to break down the task so that RDF files produced by previous SPARQL Anything executions are joined with additional transformations. Examples of this can be found in the [tutorials](../TUTORIALS.md). This feature is enabled with the command line argument `-l|--load` that accepts a file or a directory. -The files are loaded in a Dataset which becomes the target for the query execution. +The files are loaded in a Dataset, which becomes the target for the query execution. A single file will be loaded in the default Graph. -In the second case, all RDF files in the folder are loaded, each one on a Named Graph. +If pointing to a folder, all RDF files in the folder are loaded, each one on a Named Graph. See also the documentation of the [Command Line Interface (CLI)](../CLI.md).