#201 Update documentation ''

SPARQL-Anything · Feb 23, 2024 · 9aaae1a · 9aaae1a
1 parent 99cdc16
commit 9aaae1a
Show file tree

Hide file tree

Showing 6 changed files with 68 additions and 30 deletions.
diff --git a/docs/formats/HTML.md b/docs/formats/HTML.md
@@ -85,7 +85,7 @@ WHERE
                                           whatwg:innerText  "Hello world!"
                                         ] ;
                       whatwg:innerHTML  "<title>Hello world!</title>" ;
-                      whatwg:innerText  "Hello world! Hello world!"
+                      whatwg:innerText  "Hello world!"
                     ] ;
   rdf:_2            [ rdf:type          xhtml:body ;
                       rdf:_1            [ rdf:type          xhtml:p ;
@@ -95,10 +95,10 @@ WHERE
                                           whatwg:innerText  "Hello world"
                                         ] ;
                       whatwg:innerHTML  "<p class=\"paragraph\">Hello world</p>" ;
-                      whatwg:innerText  "Hello world Hello world"
+                      whatwg:innerText  "Hello world"
                     ] ;
   whatwg:innerHTML  "<head>\n <title>Hello world!</title>\n</head>\n<body>\n <p class=\"paragraph\">Hello world</p>\n</body>" ;
-  whatwg:innerText  "Hello world! Hello world Hello world! Hello world! Hello world Hello world"
+  whatwg:innerText  "Hello world! Hello world"
 ] .
 
 ```
@@ -111,6 +111,7 @@ WHERE
 | [html.selector](#htmlselector) | A CSS selector that restricts the HTML tags to consider for the triplification. | Any valid CSS selector. | `:root` |
 | [html.metadata](#htmlmetadata) | It tells the triplifier to extract inline RDF from HTML pages. The triples extracted will be included in the default graph. -- See [#164](https://github.com/SPARQL-Anything/sparql.anything/issues/164) | true/false | `false` |
 | [html.browser](#htmlbrowser) | It tells the triplifier to use the specified browser to navigate to the page to obtain HTML. By default a browser is not used. The use of a browser has some dependencies -- see [BROWSER](https://github.com/SPARQL-Anything/sparql.anything/blob/v1.0-DEV/BROWSER.md) and [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | chromium|webkit|firefox | Not set |
+| [html.parser](#htmlparser) | It tells the triplifier to use the specified JSoup parser (default: html). | xml html | Not set |
 | [html.browser.wait](#htmlbrowserwait) | When using a browser to navigate, it tells the triplifier to wait for the specified number of seconds (after telling the browser to navigate to the page) before attempting to obtain HTML. -- See See [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any integer | Not set |
 | [html.browser.screenshot](#htmlbrowserscreenshot) | When using a browser to navigate, take a screenshot of the webpage (perhaps for troubleshooting) and save it here. See [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any valid URL | Not set |
 | [html.browser.timeout](#htmlbrowsertimeout) | When using a browser to navigate, it tells the browser if it spends longer than this amount of time (in milliseconds) until a load event is emitted then the operation will timeout -- See [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any integer | `30000` |
@@ -283,14 +284,14 @@ WHERE
                                                             ] ;
                                           xhtml:itemscope   "" ;
                                           xhtml:itemtype    "https://schema.org/Movie" ;
-                                          whatwg:innerHTML  "<h1 itemprop=\"name\">Avatar</h1> <span>Director: James Cameron (born August 16, 1954)</span>" ;
-                                          whatwg:innerText  "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)"
+                                          whatwg:innerHTML  "<h1 itemprop=\"name\">Avatar</h1><span>Director: James Cameron (born August 16, 1954)</span>" ;
+                                          whatwg:innerText  "Avatar Director: James Cameron (born August 16, 1954)"
                                         ] ;
-                      whatwg:innerHTML  "<div itemscope itemtype=\"https://schema.org/Movie\">\n <h1 itemprop=\"name\">Avatar</h1> <span>Director: James Cameron (born August 16, 1954)</span>\n</div>" ;
-                      whatwg:innerText  "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)"
+                      whatwg:innerHTML  "<div itemscope itemtype=\"https://schema.org/Movie\">\n <h1 itemprop=\"name\">Avatar</h1><span>Director: James Cameron (born August 16, 1954)</span>\n</div>" ;
+                      whatwg:innerText  "Avatar Director: James Cameron (born August 16, 1954)"
                     ] ;
-  whatwg:innerHTML  "<head></head>\n<body>\n <div itemscope itemtype=\"https://schema.org/Movie\">\n  <h1 itemprop=\"name\">Avatar</h1> <span>Director: James Cameron (born August 16, 1954)</span>\n </div>\n</body>" ;
-  whatwg:innerText  "Avatar Director: James Cameron (born August 16, 1954)  Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)"
+  whatwg:innerHTML  "<head></head>\n<body>\n <div itemscope itemtype=\"https://schema.org/Movie\">\n  <h1 itemprop=\"name\">Avatar</h1><span>Director: James Cameron (born August 16, 1954)</span>\n </div>\n</body>" ;
+  whatwg:innerText  "Avatar Director: James Cameron (born August 16, 1954)"
 ] .
 
 <https://sparql-anything.cc/examples/Microdata1.html>
@@ -317,6 +318,22 @@ chromium|webkit|firefox
 Not set
 
 
+---
+### `html.parser`
+
+#### Description
+
+It tells the triplifier to use the specified JSoup parser (default: html).
+
+#### Valid Values
+
+xml html
+
+#### Default Value
+
+Not set
+
+
 ---
 ### `html.browser.wait`
 

diff --git a/docs/formats/Metadata.md b/docs/formats/Metadata.md
@@ -122,7 +122,7 @@ WHERE
             <http://sparql.xyz/facade-x/data/F-Number>
                     "f/7.1" ;
             <http://sparql.xyz/facade-x/data/File%20Modified%20Date>
-                    "Tue Jan 09 14:01:48 +01:00 2024" ;
+                    "Fri Feb 23 15:30:21 +00:00 2024" ;
             <http://sparql.xyz/facade-x/data/File%20Name>
                     "Canon_40D.jpg" ;
             <http://sparql.xyz/facade-x/data/File%20Size>

diff --git a/docs/formats/RDF.md b/docs/formats/RDF.md
@@ -2,16 +2,18 @@
 
 # RDF
 
-RDF files can be targeted by the option `location`, the content is loaded as-is (no facade-x interpretation, obviously). In addition, the SPARQL Anything Command Line Interface can load static RDF files.
+RDF files can be targeted like any other format by the option `location`. The content is queried as-is (no facade-x interpretation needed, obviously). 
 
-The query does not need to include a SERVICE clause, so you can use the tool to just query some RDF file of your choice.
-This is useful when you want to break down the process so that RDF files produced by previous SPARQL Anything processes are joined with data coming from additional transformatioons.
+
+In addition, the [Command Line Interface (CLI)](../CLI.md) can load static RDF files.
+
+This is useful when you want to break down the task so that RDF files produced by previous SPARQL Anything executions are joined with additional transformations.
 Examples of this can be found in the [tutorials](../TUTORIALS.md).
 
 This feature is enabled with the command line argument `-l|--load` that accepts a file or a directory.
-The files are loaded in a Dataset which becomes the target for the query execution.
+The files are loaded in a Dataset, which becomes the target for the query execution.
 A single file will be loaded in the default Graph. 
-In the second case, all RDF files in the folder are loaded, each one on a Named Graph.
+If pointing to a folder, all RDF files in the folder are loaded, each one on a Named Graph.
 
 See also the documentation of the [Command Line Interface (CLI)](../CLI.md).
 

diff --git a/formats/HTML.md b/formats/HTML.md
@@ -85,7 +85,7 @@ WHERE
                                           whatwg:innerText  "Hello world!"
                                         ] ;
                       whatwg:innerHTML  "<title>Hello world!</title>" ;
-                      whatwg:innerText  "Hello world! Hello world!"
+                      whatwg:innerText  "Hello world!"
                     ] ;
   rdf:_2            [ rdf:type          xhtml:body ;
                       rdf:_1            [ rdf:type          xhtml:p ;
@@ -95,10 +95,10 @@ WHERE
                                           whatwg:innerText  "Hello world"
                                         ] ;
                       whatwg:innerHTML  "<p class=\"paragraph\">Hello world</p>" ;
-                      whatwg:innerText  "Hello world Hello world"
+                      whatwg:innerText  "Hello world"
                     ] ;
   whatwg:innerHTML  "<head>\n <title>Hello world!</title>\n</head>\n<body>\n <p class=\"paragraph\">Hello world</p>\n</body>" ;
-  whatwg:innerText  "Hello world! Hello world Hello world! Hello world! Hello world Hello world"
+  whatwg:innerText  "Hello world! Hello world"
 ] .
 
 ```
@@ -111,6 +111,7 @@ WHERE
 | [html.selector](#htmlselector) | A CSS selector that restricts the HTML tags to consider for the triplification. | Any valid CSS selector. | `:root` |
 | [html.metadata](#htmlmetadata) | It tells the triplifier to extract inline RDF from HTML pages. The triples extracted will be included in the default graph. -- See [#164](https://github.com/SPARQL-Anything/sparql.anything/issues/164) | true/false | `false` |
 | [html.browser](#htmlbrowser) | It tells the triplifier to use the specified browser to navigate to the page to obtain HTML. By default a browser is not used. The use of a browser has some dependencies -- see [BROWSER](https://github.com/SPARQL-Anything/sparql.anything/blob/v1.0-DEV/BROWSER.md) and [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | chromium|webkit|firefox | Not set |
+| [html.parser](#htmlparser) | It tells the triplifier to use the specified JSoup parser (default: html). | xml html | Not set |
 | [html.browser.wait](#htmlbrowserwait) | When using a browser to navigate, it tells the triplifier to wait for the specified number of seconds (after telling the browser to navigate to the page) before attempting to obtain HTML. -- See See [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any integer | Not set |
 | [html.browser.screenshot](#htmlbrowserscreenshot) | When using a browser to navigate, take a screenshot of the webpage (perhaps for troubleshooting) and save it here. See [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any valid URL | Not set |
 | [html.browser.timeout](#htmlbrowsertimeout) | When using a browser to navigate, it tells the browser if it spends longer than this amount of time (in milliseconds) until a load event is emitted then the operation will timeout -- See [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any integer | `30000` |
@@ -283,14 +284,14 @@ WHERE
                                                             ] ;
                                           xhtml:itemscope   "" ;
                                           xhtml:itemtype    "https://schema.org/Movie" ;
-                                          whatwg:innerHTML  "<h1 itemprop=\"name\">Avatar</h1> <span>Director: James Cameron (born August 16, 1954)</span>" ;
-                                          whatwg:innerText  "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)"
+                                          whatwg:innerHTML  "<h1 itemprop=\"name\">Avatar</h1><span>Director: James Cameron (born August 16, 1954)</span>" ;
+                                          whatwg:innerText  "Avatar Director: James Cameron (born August 16, 1954)"
                                         ] ;
-                      whatwg:innerHTML  "<div itemscope itemtype=\"https://schema.org/Movie\">\n <h1 itemprop=\"name\">Avatar</h1> <span>Director: James Cameron (born August 16, 1954)</span>\n</div>" ;
-                      whatwg:innerText  "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)"
+                      whatwg:innerHTML  "<div itemscope itemtype=\"https://schema.org/Movie\">\n <h1 itemprop=\"name\">Avatar</h1><span>Director: James Cameron (born August 16, 1954)</span>\n</div>" ;
+                      whatwg:innerText  "Avatar Director: James Cameron (born August 16, 1954)"
                     ] ;
-  whatwg:innerHTML  "<head></head>\n<body>\n <div itemscope itemtype=\"https://schema.org/Movie\">\n  <h1 itemprop=\"name\">Avatar</h1> <span>Director: James Cameron (born August 16, 1954)</span>\n </div>\n</body>" ;
-  whatwg:innerText  "Avatar Director: James Cameron (born August 16, 1954)  Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)"
+  whatwg:innerHTML  "<head></head>\n<body>\n <div itemscope itemtype=\"https://schema.org/Movie\">\n  <h1 itemprop=\"name\">Avatar</h1><span>Director: James Cameron (born August 16, 1954)</span>\n </div>\n</body>" ;
+  whatwg:innerText  "Avatar Director: James Cameron (born August 16, 1954)"
 ] .
 
 <https://sparql-anything.cc/examples/Microdata1.html>
@@ -317,6 +318,22 @@ chromium|webkit|firefox
 Not set
 
 
+---
+### `html.parser`
+
+#### Description
+
+It tells the triplifier to use the specified JSoup parser (default: html).
+
+#### Valid Values
+
+xml html
+
+#### Default Value
+
+Not set
+
+
 ---
 ### `html.browser.wait`
 

diff --git a/formats/Metadata.md b/formats/Metadata.md
@@ -122,7 +122,7 @@ WHERE
             <http://sparql.xyz/facade-x/data/F-Number>
                     "f/7.1" ;
             <http://sparql.xyz/facade-x/data/File%20Modified%20Date>
-                    "Tue Jan 09 14:01:48 +01:00 2024" ;
+                    "Fri Feb 23 15:30:21 +00:00 2024" ;
             <http://sparql.xyz/facade-x/data/File%20Name>
                     "Canon_40D.jpg" ;
             <http://sparql.xyz/facade-x/data/File%20Size>

diff --git a/formats/RDF.md b/formats/RDF.md
@@ -2,16 +2,18 @@
 
 # RDF
 
-RDF files can be targeted by the option `location`, the content is loaded as-is (no facade-x interpretation, obviously). In addition, the SPARQL Anything Command Line Interface can load static RDF files.
+RDF files can be targeted like any other format by the option `location`. The content is queried as-is (no facade-x interpretation needed, obviously). 
 
-The query does not need to include a SERVICE clause, so you can use the tool to just query some RDF file of your choice.
-This is useful when you want to break down the process so that RDF files produced by previous SPARQL Anything processes are joined with data coming from additional transformatioons.
+
+In addition, the [Command Line Interface (CLI)](../CLI.md) can load static RDF files.
+
+This is useful when you want to break down the task so that RDF files produced by previous SPARQL Anything executions are joined with additional transformations.
 Examples of this can be found in the [tutorials](../TUTORIALS.md).
 
 This feature is enabled with the command line argument `-l|--load` that accepts a file or a directory.
-The files are loaded in a Dataset which becomes the target for the query execution.
+The files are loaded in a Dataset, which becomes the target for the query execution.
 A single file will be loaded in the default Graph. 
-In the second case, all RDF files in the folder are loaded, each one on a Named Graph.
+If pointing to a folder, all RDF files in the folder are loaded, each one on a Named Graph.
 
 See also the documentation of the [Command Line Interface (CLI)](../CLI.md).