Skip to content

Commit

Permalink
#201 Update documentation ''
Browse files Browse the repository at this point in the history
  • Loading branch information
enridaga committed Feb 23, 2024
1 parent 99cdc16 commit 9aaae1a
Show file tree
Hide file tree
Showing 6 changed files with 68 additions and 30 deletions.
35 changes: 26 additions & 9 deletions docs/formats/HTML.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ WHERE
whatwg:innerText "Hello world!"
] ;
whatwg:innerHTML "<title>Hello world!</title>" ;
whatwg:innerText "Hello world! Hello world!"
whatwg:innerText "Hello world!"
] ;
rdf:_2 [ rdf:type xhtml:body ;
rdf:_1 [ rdf:type xhtml:p ;
Expand All @@ -95,10 +95,10 @@ WHERE
whatwg:innerText "Hello world"
] ;
whatwg:innerHTML "<p class=\"paragraph\">Hello world</p>" ;
whatwg:innerText "Hello world Hello world"
whatwg:innerText "Hello world"
] ;
whatwg:innerHTML "<head>\n <title>Hello world!</title>\n</head>\n<body>\n <p class=\"paragraph\">Hello world</p>\n</body>" ;
whatwg:innerText "Hello world! Hello world Hello world! Hello world! Hello world Hello world"
whatwg:innerText "Hello world! Hello world"
] .
```
Expand All @@ -111,6 +111,7 @@ WHERE
| [html.selector](#htmlselector) | A CSS selector that restricts the HTML tags to consider for the triplification. | Any valid CSS selector. | `:root` |
| [html.metadata](#htmlmetadata) | It tells the triplifier to extract inline RDF from HTML pages. The triples extracted will be included in the default graph. -- See [#164](https://github.com/SPARQL-Anything/sparql.anything/issues/164) | true/false | `false` |
| [html.browser](#htmlbrowser) | It tells the triplifier to use the specified browser to navigate to the page to obtain HTML. By default a browser is not used. The use of a browser has some dependencies -- see [BROWSER](https://github.com/SPARQL-Anything/sparql.anything/blob/v1.0-DEV/BROWSER.md) and [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | chromium|webkit|firefox | Not set |
| [html.parser](#htmlparser) | It tells the triplifier to use the specified JSoup parser (default: html). | xml html | Not set |
| [html.browser.wait](#htmlbrowserwait) | When using a browser to navigate, it tells the triplifier to wait for the specified number of seconds (after telling the browser to navigate to the page) before attempting to obtain HTML. -- See See [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any integer | Not set |
| [html.browser.screenshot](#htmlbrowserscreenshot) | When using a browser to navigate, take a screenshot of the webpage (perhaps for troubleshooting) and save it here. See [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any valid URL | Not set |
| [html.browser.timeout](#htmlbrowsertimeout) | When using a browser to navigate, it tells the browser if it spends longer than this amount of time (in milliseconds) until a load event is emitted then the operation will timeout -- See [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any integer | `30000` |
Expand Down Expand Up @@ -283,14 +284,14 @@ WHERE
] ;
xhtml:itemscope "" ;
xhtml:itemtype "https://schema.org/Movie" ;
whatwg:innerHTML "<h1 itemprop=\"name\">Avatar</h1> <span>Director: James Cameron (born August 16, 1954)</span>" ;
whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)"
whatwg:innerHTML "<h1 itemprop=\"name\">Avatar</h1><span>Director: James Cameron (born August 16, 1954)</span>" ;
whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954)"
] ;
whatwg:innerHTML "<div itemscope itemtype=\"https://schema.org/Movie\">\n <h1 itemprop=\"name\">Avatar</h1> <span>Director: James Cameron (born August 16, 1954)</span>\n</div>" ;
whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)"
whatwg:innerHTML "<div itemscope itemtype=\"https://schema.org/Movie\">\n <h1 itemprop=\"name\">Avatar</h1><span>Director: James Cameron (born August 16, 1954)</span>\n</div>" ;
whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954)"
] ;
whatwg:innerHTML "<head></head>\n<body>\n <div itemscope itemtype=\"https://schema.org/Movie\">\n <h1 itemprop=\"name\">Avatar</h1> <span>Director: James Cameron (born August 16, 1954)</span>\n </div>\n</body>" ;
whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)"
whatwg:innerHTML "<head></head>\n<body>\n <div itemscope itemtype=\"https://schema.org/Movie\">\n <h1 itemprop=\"name\">Avatar</h1><span>Director: James Cameron (born August 16, 1954)</span>\n </div>\n</body>" ;
whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954)"
] .
<https://sparql-anything.cc/examples/Microdata1.html>
Expand All @@ -317,6 +318,22 @@ chromium|webkit|firefox
Not set


---
### `html.parser`

#### Description

It tells the triplifier to use the specified JSoup parser (default: html).

#### Valid Values

xml html

#### Default Value

Not set


---
### `html.browser.wait`

Expand Down
2 changes: 1 addition & 1 deletion docs/formats/Metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ WHERE
<http://sparql.xyz/facade-x/data/F-Number>
"f/7.1" ;
<http://sparql.xyz/facade-x/data/File%20Modified%20Date>
"Tue Jan 09 14:01:48 +01:00 2024" ;
"Fri Feb 23 15:30:21 +00:00 2024" ;
<http://sparql.xyz/facade-x/data/File%20Name>
"Canon_40D.jpg" ;
<http://sparql.xyz/facade-x/data/File%20Size>
Expand Down
12 changes: 7 additions & 5 deletions docs/formats/RDF.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,18 @@

# RDF

RDF files can be targeted by the option `location`, the content is loaded as-is (no facade-x interpretation, obviously). In addition, the SPARQL Anything Command Line Interface can load static RDF files.
RDF files can be targeted like any other format by the option `location`. The content is queried as-is (no facade-x interpretation needed, obviously).

The query does not need to include a SERVICE clause, so you can use the tool to just query some RDF file of your choice.
This is useful when you want to break down the process so that RDF files produced by previous SPARQL Anything processes are joined with data coming from additional transformatioons.

In addition, the [Command Line Interface (CLI)](../CLI.md) can load static RDF files.

This is useful when you want to break down the task so that RDF files produced by previous SPARQL Anything executions are joined with additional transformations.
Examples of this can be found in the [tutorials](../TUTORIALS.md).

This feature is enabled with the command line argument `-l|--load` that accepts a file or a directory.
The files are loaded in a Dataset which becomes the target for the query execution.
The files are loaded in a Dataset, which becomes the target for the query execution.
A single file will be loaded in the default Graph.
In the second case, all RDF files in the folder are loaded, each one on a Named Graph.
If pointing to a folder, all RDF files in the folder are loaded, each one on a Named Graph.

See also the documentation of the [Command Line Interface (CLI)](../CLI.md).

Expand Down
35 changes: 26 additions & 9 deletions formats/HTML.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ WHERE
whatwg:innerText "Hello world!"
] ;
whatwg:innerHTML "<title>Hello world!</title>" ;
whatwg:innerText "Hello world! Hello world!"
whatwg:innerText "Hello world!"
] ;
rdf:_2 [ rdf:type xhtml:body ;
rdf:_1 [ rdf:type xhtml:p ;
Expand All @@ -95,10 +95,10 @@ WHERE
whatwg:innerText "Hello world"
] ;
whatwg:innerHTML "<p class=\"paragraph\">Hello world</p>" ;
whatwg:innerText "Hello world Hello world"
whatwg:innerText "Hello world"
] ;
whatwg:innerHTML "<head>\n <title>Hello world!</title>\n</head>\n<body>\n <p class=\"paragraph\">Hello world</p>\n</body>" ;
whatwg:innerText "Hello world! Hello world Hello world! Hello world! Hello world Hello world"
whatwg:innerText "Hello world! Hello world"
] .
```
Expand All @@ -111,6 +111,7 @@ WHERE
| [html.selector](#htmlselector) | A CSS selector that restricts the HTML tags to consider for the triplification. | Any valid CSS selector. | `:root` |
| [html.metadata](#htmlmetadata) | It tells the triplifier to extract inline RDF from HTML pages. The triples extracted will be included in the default graph. -- See [#164](https://github.com/SPARQL-Anything/sparql.anything/issues/164) | true/false | `false` |
| [html.browser](#htmlbrowser) | It tells the triplifier to use the specified browser to navigate to the page to obtain HTML. By default a browser is not used. The use of a browser has some dependencies -- see [BROWSER](https://github.com/SPARQL-Anything/sparql.anything/blob/v1.0-DEV/BROWSER.md) and [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | chromium|webkit|firefox | Not set |
| [html.parser](#htmlparser) | It tells the triplifier to use the specified JSoup parser (default: html). | xml html | Not set |
| [html.browser.wait](#htmlbrowserwait) | When using a browser to navigate, it tells the triplifier to wait for the specified number of seconds (after telling the browser to navigate to the page) before attempting to obtain HTML. -- See See [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any integer | Not set |
| [html.browser.screenshot](#htmlbrowserscreenshot) | When using a browser to navigate, take a screenshot of the webpage (perhaps for troubleshooting) and save it here. See [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any valid URL | Not set |
| [html.browser.timeout](#htmlbrowsertimeout) | When using a browser to navigate, it tells the browser if it spends longer than this amount of time (in milliseconds) until a load event is emitted then the operation will timeout -- See [justin2004&#39;s blogpost](https://github.com/justin2004/weblog/tree/master/scraping_with_sparql). | Any integer | `30000` |
Expand Down Expand Up @@ -283,14 +284,14 @@ WHERE
] ;
xhtml:itemscope "" ;
xhtml:itemtype "https://schema.org/Movie" ;
whatwg:innerHTML "<h1 itemprop=\"name\">Avatar</h1> <span>Director: James Cameron (born August 16, 1954)</span>" ;
whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)"
whatwg:innerHTML "<h1 itemprop=\"name\">Avatar</h1><span>Director: James Cameron (born August 16, 1954)</span>" ;
whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954)"
] ;
whatwg:innerHTML "<div itemscope itemtype=\"https://schema.org/Movie\">\n <h1 itemprop=\"name\">Avatar</h1> <span>Director: James Cameron (born August 16, 1954)</span>\n</div>" ;
whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)"
whatwg:innerHTML "<div itemscope itemtype=\"https://schema.org/Movie\">\n <h1 itemprop=\"name\">Avatar</h1><span>Director: James Cameron (born August 16, 1954)</span>\n</div>" ;
whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954)"
] ;
whatwg:innerHTML "<head></head>\n<body>\n <div itemscope itemtype=\"https://schema.org/Movie\">\n <h1 itemprop=\"name\">Avatar</h1> <span>Director: James Cameron (born August 16, 1954)</span>\n </div>\n</body>" ;
whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954) Avatar Director: James Cameron (born August 16, 1954)"
whatwg:innerHTML "<head></head>\n<body>\n <div itemscope itemtype=\"https://schema.org/Movie\">\n <h1 itemprop=\"name\">Avatar</h1><span>Director: James Cameron (born August 16, 1954)</span>\n </div>\n</body>" ;
whatwg:innerText "Avatar Director: James Cameron (born August 16, 1954)"
] .
<https://sparql-anything.cc/examples/Microdata1.html>
Expand All @@ -317,6 +318,22 @@ chromium|webkit|firefox
Not set


---
### `html.parser`

#### Description

It tells the triplifier to use the specified JSoup parser (default: html).

#### Valid Values

xml html

#### Default Value

Not set


---
### `html.browser.wait`

Expand Down
2 changes: 1 addition & 1 deletion formats/Metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ WHERE
<http://sparql.xyz/facade-x/data/F-Number>
"f/7.1" ;
<http://sparql.xyz/facade-x/data/File%20Modified%20Date>
"Tue Jan 09 14:01:48 +01:00 2024" ;
"Fri Feb 23 15:30:21 +00:00 2024" ;
<http://sparql.xyz/facade-x/data/File%20Name>
"Canon_40D.jpg" ;
<http://sparql.xyz/facade-x/data/File%20Size>
Expand Down
12 changes: 7 additions & 5 deletions formats/RDF.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,18 @@

# RDF

RDF files can be targeted by the option `location`, the content is loaded as-is (no facade-x interpretation, obviously). In addition, the SPARQL Anything Command Line Interface can load static RDF files.
RDF files can be targeted like any other format by the option `location`. The content is queried as-is (no facade-x interpretation needed, obviously).

The query does not need to include a SERVICE clause, so you can use the tool to just query some RDF file of your choice.
This is useful when you want to break down the process so that RDF files produced by previous SPARQL Anything processes are joined with data coming from additional transformatioons.

In addition, the [Command Line Interface (CLI)](../CLI.md) can load static RDF files.

This is useful when you want to break down the task so that RDF files produced by previous SPARQL Anything executions are joined with additional transformations.
Examples of this can be found in the [tutorials](../TUTORIALS.md).

This feature is enabled with the command line argument `-l|--load` that accepts a file or a directory.
The files are loaded in a Dataset which becomes the target for the query execution.
The files are loaded in a Dataset, which becomes the target for the query execution.
A single file will be loaded in the default Graph.
In the second case, all RDF files in the folder are loaded, each one on a Named Graph.
If pointing to a folder, all RDF files in the folder are loaded, each one on a Named Graph.

See also the documentation of the [Command Line Interface (CLI)](../CLI.md).

Expand Down

0 comments on commit 9aaae1a

Please sign in to comment.