diff --git a/404.html b/404.html index aaca30b2..aa19fc14 100644 --- a/404.html +++ b/404.html @@ -16,7 +16,7 @@
Skip to content

404

PAGE NOT FOUND

But if you don't change your direction, and if you keep looking, you may end up where you are heading.
- + \ No newline at end of file diff --git a/about/contact.html b/about/contact.html index ad46c645..7c33668b 100644 --- a/about/contact.html +++ b/about/contact.html @@ -19,7 +19,7 @@
Skip to content

Contact

Please feel free to join our slack channel using the invitation link.

- + \ No newline at end of file diff --git a/about/contributing.html b/about/contributing.html index 93d4b034..1d48cffb 100644 --- a/about/contributing.html +++ b/about/contributing.html @@ -19,7 +19,7 @@
Skip to content

Contributing

Submission guidelines

Report a bug

Before creating an issue please make sure you have checked out the docs, you might want to also try searching Github. It's pretty likely someone has already asked a similar question.

Issues can be reported in the issue tracker.

Pull Requests

We love pull requests and we're continually working to make it as easy as possible for people to contribute.

We prefer small pull requests with minimal code changes. The smaller they are the easier they are to review and merge. A core team member will pick up your PR and review it as soon as they can. They may ask for changes or reject your pull request. This is not a reflection of you as an engineer or a person. Please accept feedback graciously as we will also try to be sensitive when providing it.

Although we generally accept many PRs they can be rejected for many reasons. We will be as transparent as possible but it may simply be that you do not have the same context or information regarding the roadmap that the core team members have. We value the time you take to put together any contributions so we pledge to always be respectful of that time and will try to be as open as possible so that you don't waste it.

Commit message guidelines

We follow the Conventional commits specifications which provides a set of rules to make commit messages more readable when looking through the project history. But also, we use the git commit messages to generate the change log.

Commit message format

The commit message should be structured as follows:

<type>: <subject> [optional `breaking`]
<type>: <subject> [optional `breaking`]

Where type must be one of the following:

  • build: changes that affect the build system (external dependencies)
  • ci: changes to our CI configuration files and scripts
  • chore: changes that affect the project structure
  • docs: changes that affect the documentation only
  • feat: a new feature
  • fix: a bug fix
  • perf: a code change that improves performance
  • refactor: a code change that neither fixes a bug nor adds a feature
  • revert: revert changes
  • style: changes that do not affect the meaning of the code (lint issues)
  • test: adding missing tests or correcting existing tests

Use the optional [ breaking ] keyword to declare a BREAKING CHANGE.

Examples

  • Commit message with description and breaking change in body
feat: allow provided config object to extend other configs [ breaking ]
feat: allow provided config object to extend other configs [ breaking ]
  • Commit message with no body
docs: correct spelling in the contributing.md file
docs: correct spelling in the contributing.md file
  • Commit message for a fix using an issue number.
fix: fix minor issue in code (#12)
fix: fix minor issue in code (#12)

Versioning guidelines

We rely on Semantic Versioning for versioning a release. Indeed, given a version number MAJOR.MINOR.PATCH, increment the:

  • MAJOR version when you make a major evolution leading to breaking changes,
  • MINOR version when you add functionality in a backwards-compatible manner
  • PATCH version when you make backwards-compatible bug fixes.

The command npm run release:<type>, where <type> is either patch, minor or major, helps you to do the release.

It performs the following task for you:

  • increase the package version number in the package.json file
  • generate the change log
  • create a tag accordingly in the git repository and push it

Contributor Code of Conduct

As contributors and maintainers of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.

We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.

Examples of unacceptable behavior by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.

Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed from the project team.

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers.

This Code of Conduct is adapted from the Contributor Covenant, version 1.0.0, available at http://contributor-covenant.org/version/1/0/0/

- + \ No newline at end of file diff --git a/about/introduction.html b/about/introduction.html index 504bd74e..a0c1071c 100644 --- a/about/introduction.html +++ b/about/introduction.html @@ -19,7 +19,7 @@
Skip to content

Introduction

Krawler aims at making the automated process of extracting and processing (geographic) data from heterogeneous sources easy. It can be viewed as a minimalist Extract, Transform, Load (ETL). ETL refers to a process where data is

  1. extracted from heterogeneous data sources (e.g. databases or web services);
  2. transformed in a target format or structure for the purposes of querying and analysis (e.g. JSON or CSV);
  3. loaded into a final target data store (e.g. a file system or a database).

ETL

ETL naturally leads to the concept of a pipeline: a set of processing functions (called hooks in krawler) connected in series, often executed in parallel, where the output of one function is the input of the next one. The execution of a given pipeline on an input dataset to produce the associated output is a job performed by krawler.

A set of introduction articles to krawler have been written and detail:

- + \ No newline at end of file diff --git a/about/license.html b/about/license.html index ce8332c6..5acc841f 100644 --- a/about/license.html +++ b/about/license.html @@ -19,7 +19,7 @@
Skip to content

License

The MIT License (MIT)

Copyright (c) 2017-2020 Kalisio

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

- + \ No newline at end of file diff --git a/about/roadmap.html b/about/roadmap.html index 8397a952..e696ae27 100644 --- a/about/roadmap.html +++ b/about/roadmap.html @@ -19,7 +19,7 @@
Skip to content

Roadmap

The roadmap is available on Github.

Milestones

The milestones are available on Github

Release Notes

The changelog is available in Github

- + \ No newline at end of file diff --git a/examples/index.html b/examples/index.html index 83b8ef5b..aba34eea 100644 --- a/examples/index.html +++ b/examples/index.html @@ -67,7 +67,7 @@ "link": "https://s3.eu-central-1.amazonaws.com/krawler/bc185cb0-7983-11e8-883f-a333a7402f4a.png" } ]

Seeder

The seeder take advantage of Kargo to seed a dataset. It relies on the seeding capabilities of MapProxy. The global approach is to subdivide the job into multiple tasks and run mapprroxy-seed utility for each task. To subdivide the job, we use a spatial grid and each cell is used as a coverage entry to limit the extend of the corresponding task. All the tasks, i.e. mapproxy-seed share the same MapProxy configuration file and use a generated seed file.

We use the same image of MapProxy as the one used in Kargo, but for now we do not use the benefits of a Swarm infrastructure to deploy the task. Meanwhile, if you plan to seed a layer with a source exposed by TileserverGL, you can easily scale the number of instances of TileserverGL to fit the required charge.

- + \ No newline at end of file diff --git a/guides/extending-krawler.html b/guides/extending-krawler.html index 3a12770c..d056e160 100644 --- a/guides/extending-krawler.html +++ b/guides/extending-krawler.html @@ -57,7 +57,7 @@ } } hooks.registerHook('custom', hook)

After that you can use your custom hook like the built-in ones with the CLI.

TIP

For this to work you need to add krawler as a dependency in the package.json of your job module or link to it in development mode. Please refer to our installation section

Complete example

A complete example of extension is available in the samples.

- + \ No newline at end of file diff --git a/guides/index.html b/guides/index.html index 9f02f246..300bc9fa 100644 --- a/guides/index.html +++ b/guides/index.html @@ -19,7 +19,7 @@
Skip to content
- + \ No newline at end of file diff --git a/guides/installing-krawler.html b/guides/installing-krawler.html index be5bc826..051ac25a 100644 --- a/guides/installing-krawler.html +++ b/guides/installing-krawler.html @@ -43,7 +43,7 @@ yarn link @kalisio/krawler

Please refer to the KDK documentation to setup your development environment.

A native command-line executable can be generated using pkg eg for windows:

bash
pkg . --target node8-win-x86
pkg . --target node8-win-x86

Because it relies on the GDAL native bindings you will need to deploy the gdal.node file (usually found in node_modules\gdal\lib\binding) to the same directory as the executable. Take care to generate the executable with the same architecture than your Node.js version.

As a Docker container

When using krawler as a Docker container the arguments to the CLI have to be provided through the ARGS environment variable, along with any other required variables and the data volume to make inputs accessible within the container and get output files back:

bash
docker pull kalisio/krawler
 docker run --name krawler --rm -v /mnt/data:/opt/krawler/data -e "ARGS=/opt/krawler/data/jobfile.js" -e S3_BUCKET=krawler kalisio/krawler
docker pull kalisio/krawler
 docker run --name krawler --rm -v /mnt/data:/opt/krawler/data -e "ARGS=/opt/krawler/data/jobfile.js" -e S3_BUCKET=krawler kalisio/krawler
- + \ No newline at end of file diff --git a/guides/understanding-krawler.html b/guides/understanding-krawler.html index b96b6929..30b813dd 100644 --- a/guides/understanding-krawler.html +++ b/guides/understanding-krawler.html @@ -19,7 +19,7 @@
Skip to content

Understanding Krawler

krawler is powered by Feathers and rely on two of its main abstractions: services and hooks. We assume you are familiar with this technology.

Main concepts

krawler manipulates three kind of entities:

  • a store define where the extracted/processed data will reside,
  • a task define what data to be extracted and how to query it,
  • a job define what tasks to be run to fulfill a request (i.e. sequencing).

On top of this hooks provide a set of functions that can be typically run before/after a task/job such as a conversion after a download or task generation before a job run. More or less, this allows to create a processing pipeline.

Regarding the store management we rely on abstract-blob-store, which abstracts a lot of different storage backends (local file system, AWS S3, Google Drive, etc.), and is already used by feathers-blob.

Global overview

The following figure depicts the global architecture and all concepts at play:

Architecture

What is inside ?

krawler is possible and mainly powered by the following stack:

- + \ No newline at end of file diff --git a/guides/using-krawler.html b/guides/using-krawler.html index 3df4d84a..2aaa78c2 100644 --- a/guides/using-krawler.html +++ b/guides/using-krawler.html @@ -119,7 +119,7 @@ ... ] }

TIP

When running the krawler as a web API note that only the hooks pipeline is mandatory in the job file. Indeed, job and task objects will be then sent by requesting the exposed web services.

Healthcheck

Healthcheck endpoint

When running the krawler as a cron job note that it provides a healthcheck endpoint e.g. on localhost:3030/api/healthcheck. The following JSON structure is returned:

The returned HTTP code is 500 whenever an error has occured in the last run, 200 otherwise.

TIP

You can add your custom data in the healthcheck structure using the healthcheck hook.

Healthcheck command

For convenience the krawler also includes a built-in healthcheck script that could be used e.g. by Docker. This script uses similar options than the CLI plus some specific options:

TIP

Templates are generated with healthcheck structure and environment variables as context, learn more about templating.

- + \ No newline at end of file diff --git a/hashmap.json b/hashmap.json index b1d81917..576e6823 100644 --- a/hashmap.json +++ b/hashmap.json @@ -1 +1 @@ -{"guides_understanding-krawler.md":"56d783c6","index.md":"ee3add83","reference_index.md":"76bdc228","guides_index.md":"8bac4ba9","guides_extending-krawler.md":"ccca42a7","about_contributing.md":"30a97c28","about_license.md":"51fe04ca","reference_services.md":"4fa3200f","about_introduction.md":"b4a8b11a","guides_installing-krawler.md":"60d1a08e","about_roadmap.md":"2f902165","reference_known-issues.md":"29bbda91","examples_index.md":"2db99c6c","guides_using-krawler.md":"e66b4f7b","about_contact.md":"8fe8c179","reference_hooks.md":"2874e9f8"} +{"about_contact.md":"8fe8c179","about_license.md":"51fe04ca","reference_index.md":"76bdc228","guides_index.md":"8bac4ba9","guides_installing-krawler.md":"60d1a08e","guides_understanding-krawler.md":"56d783c6","about_roadmap.md":"2f902165","index.md":"ee3add83","guides_using-krawler.md":"e66b4f7b","reference_known-issues.md":"29bbda91","about_introduction.md":"b4a8b11a","guides_extending-krawler.md":"ccca42a7","reference_services.md":"4fa3200f","examples_index.md":"2db99c6c","about_contributing.md":"30a97c28","reference_hooks.md":"2874e9f8"} diff --git a/index.html b/index.html index 7ff88516..c790399a 100644 --- a/index.html +++ b/index.html @@ -19,7 +19,7 @@
Skip to content

Krawler

A minimalist Geospatial ETL

krawler
- + \ No newline at end of file diff --git a/reference/hooks.html b/reference/hooks.html index cb6a7537..2b86f9fc 100644 --- a/reference/hooks.html +++ b/reference/hooks.html @@ -147,7 +147,7 @@ port: 5432, clientPath: 'taskTemplate.client' }

disconnectPG(options)

Disconnect from a PostgresSQL database. Hook options are the following:

dropPGTable(options)

Drop if exists a table in a PostgreSQL database. Hook options are the following:

createPGTable(options)

Create a table in a PostgreSQL database with the following structure:

For now the structure has been defined to store GeoJSON collection. Hook options are the following:

writePGTable(options)

Inserts a GeoJSON collection or an array of features into an existing table. THe table must have the same structured as a table created using the createPGTable hook. Hook options are the following:

Raster

source

readGeoTiff(options)

Read a GeoTiff from an input stream/store and convert it to in-memory JSON values, hook options are the following:

computeStatistics(options)

Computes minimum and maximum values on a GeoTiff file, hook options are the following:

Store

source

createStores(options)

Create (a set of) store(s), hook options are the (array of) following the following:

removeStores(options)

Remove (a set of) store(s), hook options are (array of) the following:

TIP

As a shortcut the options provided can only be store IDs when storePath is not used

discardIfExistsInStore(options)

Discard the task if a target file already exists in an output store, hook options are the following:

copyToStore(options)

Copy the item(s) from an input store to an output store, hook options are the following:

gzipToStore(options)

Gzip the item(s) from an input store to an output store, hook options are the following:

gunzipFromStore(options)

Gunzip the item(s) from an input store to an output store, hook options are the following:

unzipFromStore(options)

Unzip the item(s) from an input store to an output store, hook options are the following:

System

source

tar(options)

Tar files or directories using node-tar, hook options are the following:

TIP

file, files and cwd options can be templates, learn more about templating

untar(options)

Untar files or directories using node-tar, hook options are the following:

TIP

file, files and cwd options can be templates, learn more about templating

runCommand(options)

Run a system command. Hook options are the following:

TIP

Learn more about templating

envsubst(options)

Provides file-level environment variable substitution. Hook options are the following:

TXT

source

readTXT(options)

Read a TXT from an input stream/store and convert it to in-memory JSON values, hook options are the following:

Utils

source

generateId(options)

Generate a UUID (V1) for the item using node-uuid.

template(options)

Perform templating of the options using the item as context and merge it with item.

discardIf(options)

Discard all subsequent hooks and task if the input data passes the given match filter options, filter options are similar to the match filter described in common options.

apply(options)

Apply a given function to the hook item(s), hook options are the following:

healthcheck(options)

Apply a given function to the hook item(s) and healthcheck structure, hook options are the following:

addOutputs(outputs)

Declare a new output for the job/task, hook options is an array of objects with the following properties:

Tasks and write hooks automatically track generated outputs but sometimes outputs are generated by an external process (eg. command hook) so that you need to declare it in order to properly clean it with the clearOutputs hook.

runTask(options)

Run a given task, hook options are those of a task.

emitEvent(options)

Emit a 'krawler' event on the underlying service, hook options are the following:

XML

source

readXML(options)

Read an XML file from a store and convert it to in-memory JSON values, hook options are the following:

YAML

source

readYAML(options)

Read a YAML file from a store and convert it to in-memory JSON values, hook options are the following:

writeYAML(options)

Generate a YAML file from in-memory JSON values, hook options are the following:

- + \ No newline at end of file diff --git a/reference/index.html b/reference/index.html index 0bbcf186..6468ebc4 100644 --- a/reference/index.html +++ b/reference/index.html @@ -19,7 +19,7 @@
Skip to content
- + \ No newline at end of file diff --git a/reference/known-issues.html b/reference/known-issues.html index 7402f0ab..f37497da 100644 --- a/reference/known-issues.html +++ b/reference/known-issues.html @@ -89,7 +89,7 @@ clearOutputs: {} } } - + \ No newline at end of file diff --git a/reference/services.html b/reference/services.html index d3a9feed..9dd317e2 100644 --- a/reference/services.html +++ b/reference/services.html @@ -131,7 +131,7 @@ .catch(error => { console.log(error.message) }) - + \ No newline at end of file