Skip to content

Latest commit

 

History

History
722 lines (527 loc) · 20.6 KB

README.md

File metadata and controls

722 lines (527 loc) · 20.6 KB

Web to Gemini proxy

logo

Pāpiliō levior est ave (The butterfly is lighter than the bird)

levior (a latin word meaning lighter) is a web (HTTP/HTTPs) to Gemini proxy. It converts web pages (as well as Atom/RSS feeds) on-the-fly to the gemtext format, allowing you to browse the web with any Gemini browser.

  • Builds an (RDF) graph from the visited pages using linked data.

  • Supports Javascript rendering and can therefore be used to browse dynamic websites.

  • Supports serving other types of content, like ZIM files (the archive format used by Wikipedia), making it possible to browse complete wikis through Gemini (see the config file).

pipeline status coverage report

Supporting this project

If you want to support this project, you can make a donation here (or here).

You can get in touch via misfin at the following misfin address: cipres AT hashnix.club.

Installation

AppImage

You can get the latest AppImage (for the x86_64 platform) here. This would install levior in ~/.local/bin:

curl -L -o ~/.local/bin/levior https://gitlab.com/cipres/levior/-/releases/continuous-master/downloads/levior-latest-x86_64.AppImage
chmod +x ~/.local/bin/levior

Manual install

Clone the repo and create a virtualenv:

git clone https://gitlab.com/cipres/levior && cd levior
python3 -m venv venv; source venv/bin/activate

Upgrade pip and install:

pip install -U pip
pip install .

For zim or uvloop support, install the extra requirements:

pip install '.[zim]'
pip install '.[uvloop]'

For Javascript rendering, install the js extra:

pip install '.[js]'

Manual install (arm, aarch64, Raspberry Pi, and others)

One of the dependencies, aiogemini, requires the cryptography package, which since version 35.0 requires Rust to build, which might not be available on your system. If you don't have rust, you can install an older version of the cryptography package that does not require rust, by running:

pip install -U pip

CRYPTOGRAPHY_DONT_BUILD_RUST=1 pip install 'cryptography==3.4.8'

pip install .

Usage

levior can be configured from the command-line or via a YAML config file If a config file is provided, settings from both sources are merged to create a unique config, with the config file settings taking precedence. See the example config file. URL rules can only be configured with a config file.

levior uses the OmegaConf library to parse the YAML config files, therefore all the specific syntax elements supported by OmegaConf can be used in your configuration files. levior provides several resolvers that you can use inside your config file.

levior
levior -d --mode=proxy
levior -c config.yaml

Once levior is running, open your gemini browser and go to gemini://localhost.

Proxies (HTTP, Socks4 and Socks5) are supported.

Generating a new configuration file

levior --config-gen levior.yaml
levior -c levior.yaml

Daemonization

Use --daemon or -d to run levior as a daemon, or set the daemonize setting in the config file:

daemonize: true
pid_file_path: levior.pid

Custom SSL certificate

By default, levior will use a built-in SSL certificate and key that is appropriate when the service is listening on localhost.

If you are configuring levior to listen on a non-local interface, you will first need to generate your own SSL keypair. Then, in your config file, set the cert and key attributes to point to the file paths of your SSL certificate and key.

hostname: 'mydomain.com'
cert: 'mydomain.crt'
key: 'mydomain.key'

You can also use the --cert and -key command-line parameters.

Logging

Access log

Requests are logged as gemtext links. Use --log-file if you want the access log to be written to a file.

  • If you are not running levior as a daemon, and you don't specify an access log file path, requests are logged to the console
  • If you are running levior as a daemon, requests are logged to the specified log file (or the default: levior-log.gmi)

Access log server endpoint

Set access_log_endpoint to true in your config file to enable the access log endpoint /access_log on the server. This endpoint shows the proxy's access log in the gemtext format.

access_log_endpoint: true

Restricting access by IP address or network

You can restrict access to the proxy by declaring a list of allowed IP addresses or networks in your config file.

client_ip_allow:
  - 127.0.0.1
  - 10.0.1.0/24

RDF

levior uses an RDF graph to store various attributes about the pages that are accessed via the proxy:

  • The page's title
  • Every link contained in the page is defined as being referenced by the source page
  • Gemtext headers in the page are the table of contents

If you want to disable the automatic graphing of pages, disable graph_visited_pages in the config:

graph_visited_pages: false

URL mapping

Define urlmap in your config file to map specific paths (on levior's gemini server) to certain URLs.

urlmap:
  # When /searx is requested without a gemini query, it will send
  # an input response. When the input is sent back, it will redirect the
  # user to "https://searx.be/search?q={input}"

  /searx:
    input_for: https://searx.be/search?q=
    route_name: Search with SearX

  /liteduck:
    input_for: https://lite.duckduckgo.com/lite/?q=
    route_name: DuckDuckGo Lite search

  # Mapping with variables in the path
  # /z/test => https://searx.be/search?q=test
  /z/{query}:
    url: https://searx.be/search?q={query}

If you set route_name, the route will appear on levior's homepage.

URL rules

You can define your own rules in order to apply some processing on the gemtext that will be sent to the browser, or return a specific gemini response.

A rule must define which URL(s) to match with the url attribute, which can be a regular expression or a list of regular expressions. If the response attribute is defined, the status attribute must be set as an aiogemini Status code. Here are some basic examples of custom rules:

rules:
  - url: '^https?://[\\w.-]*google'
    response:
      status: 'PROXY_REQUEST_REFUSED'
rules:
  - url: '^https?://www.example.org'
    response:
      status: 'SUCCESS'
      text: |-
        Gemtext content

Set js_render in the rule to enable JS rendering.

rules:
  - url: '^https?://www.requires-js.org'
    js_render: true

Caching

The raw content of the web resources fetched by the proxy can be cached. The result of the geminification of the pages (the gemtext document) is never cached.

Set the cache attribute in your rule to cache the data. The ttl (time-to-live) attribute determines the expiration lifetime (in seconds) for the resource's content in the cache. The data will be served from the cache until the ttl expires (subsequent requests will trigger a refetch).

rules:
  - url: '^https?://www.thingstokeep.org'
    cache: true
    ttl: 86400

Caching the access log

The access log can be persisted in the cache via the persist_access_log setting (or with --persist-access-log). This is disabled by default.

persist_access_log: true

Caching links on pages

Specific links to cache the page for a few days (or forever) can be shown at the top of the page, with the page_cachelinks setting. This makes it easy to cache a page that you've just browsed without having to define custom rules.

page_cachelinks: true

Includes

It is also possible to load predefined rules by using the include keyword in your config file. If you prefix the path with levior:, it will be loaded from the builtin rules library (please open a PR to submit new rules), otherwise it is assumed to be a local file.

include:
  - levior:sites/francetvinfo.yaml
  - my_rules.yaml

When you use the levior: prefix, you can pass a glob-style pattern, allowing you to source multiple files in a single include.

include:
  - src: levior:sites/*.yaml
    with:
      ...

Rules can receive parameters, allowing the creation of more generic rules that can be applied to any URL.

rules:
  - url: ${URL}
    gemtext_filters:
      - filter: levior.filters:uppercased
        words: ${uwords}

To pass params to the rule from the config file, set the rule path by setting the src attribute, and set the params via the with attribute.

include:
  - src: words_upper.yaml
    with:
      URL:
        - https://example.org/.*.html
        - https://domain.io
      uwords:
        - coffee
        - milk

The puretext rule keeps only the text content:

include:
  - src: levior:puretext.yaml
    with:
      URL:
        - https://example.org
        - https://example2.org

Proxies

Default proxy

You can set the default proxy URL with the proxy attribute, whose value must be a proxy URL or a list of proxy URLs, to establish a proxy chain. HTTP, Socks4 and Socks5 proxies are supported.

Defining a single proxy:

proxy: socks5://user:password@localhost:9050
proxy: http://127.0.0.1:8090

To use a proxy chain (Proxy chaining is a technique that allows you to use multiple proxies to access the web anonymously and bypass geo-restrictions), just declare your proxies as a list (the order matters):

proxy:
  - socks4://127.0.0.1:1081
  - socks5://localhost:9050
  - http://10.0.1.2:8090

Random proxies

You can use the OmegaConf resolver called random to choose a random proxy from a predefined list. The resolver will be called on every request, so this means that a proxy URL will be randomly chosen from the list for every request:

my_proxies:
  - http://10.0.1.2:8090
  - http://10.0.4.2:8092
  - http://10.0.8.4:8094

proxy: ${random:${my_proxies}}

Setting a proxy for a rule

rules:
  - regexp: "https://freebsd.org"
    proxy: socks5://localhost:9050

Setting a proxy when including another config file

When including one or more config files, you can set the proxy that will be used for the included rules:

include:
  - src: levior:sites/*.yaml
    proxy: http://127.0.0.1:8090

HTTP headers

In a rule or at the top of the config file, you can set specific HTTP headers that will be used when making HTTP requests:

http_headers:
  Accept-Language: en-US
  Accept-Charset: utf-8

Feeds aggregator

It is possible to aggregate multiple Atom/RSS web feeds into a single tinylog, by setting the rule type to feeds_aggregator and defining the list of feeds. Example:

rules:
  - url: '^gemini://localhost/francetv'
    type: 'feeds_aggregator'

    # "feeds" is a dictionary, the key must be the feed's URL, the
    # dict value is for the feed's options
    feeds:
      https://www.francetvinfo.fr/titres.rss: {}
      https://www.francetvinfo.fr/monde.rss: {}
      https://www.francetvinfo.fr/culture.rss:
        enabled: false

When you are sourcing a config file that includes aggregation rules, you can enable or disable certain feeds using the parameters:

  - src: levior:sites/francetvinfo.yaml
    with:
      ftvinfo_feeds:
        culture: true
        sports: true

Gemtext filters

It's possible to run filters on the gemtext content that will be sent to the browser. In your config file, set the gemtext_filters property for the rule. For example, this will remove any email address link by running the strip_emailaddrs function found in the levior.filters.links python module (if you don't specify a function name, it will call the gemtext_filter function/coroutine in that module by default):

urules:
  - url:
    - "https://searx.be/search"
    - "https://lite.duckduckgo.com/lite/search"

    gemtext_filters:
      - levior.filters.links:strip_emailaddrs
      - filter: levior.filters:get_out
        re:
          - 'google'
          - 'stop'

You can also pass params to your filter. This rule removes all (English) wikipedia URLs and PNG image URLs in the final gemtext:

urules:
  - url: ".*"
    gemtext_filters:
      - filter: levior.filters.links:url_remove
        urls:
          - ^https://en.wikipedia.org
          - \.png$

Your filter (which can be a function or a coroutine) can return different value types:

  • boolean: if your filter returns True, that gemtext line will be removed (filtered out).
  • Line (trimgmi class): If you return a Line object, it will be used to replace the original gemtext line.
  • list: If you return a list of Line objects, they will be inserted in place
  • str: replace the original gemtext line with this raw string value
  • int: If your filter returns a negative integer, everything after that in the document (including that line) will be removed.

Any other return value type will be ignored.

Checkout the filters package to see all the available builtin filters.

OmegaConf resolvers

levior provides a few OC resolvers (which are like functions called when the YAML element is accessed).

random

Returns a random item from a list.

my_proxies:
  - http://10.0.1.2:8090
  - http://10.0.4.2:8092
  - http://10.0.8.4:8094

proxy: ${random:${my_proxies}}

ua_roulette

User Agent roulette.

Returns a random browser user agent string. Takes no argument.

http_user_agent: ${ua_roulette:}

custom_ua_roulette

Custom User Agent roulette.

Returns a random browser user agent string for specific operating systems, browsers and browser engines. The parameters are, in this order:

  • Operating system list. e.g: [linux, freebsd]
  • Software list (optional). e.g: [firefox, chromium]
  • Software engine list (optional). e.g: [webkit,blink]
  • Hardware type list (optional). e.g: [mobile]
http_user_agent: ${custom_ua_roulette:[linux]}
http_user_agent: ${custom_ua_roulette:[linux,mac,freebsd],[firefox]}
http_user_agent: ${custom_ua_roulette:[linux,freebsd],[],[webkit]}
http_user_agent: ${custom_ua_roulette:[linux,windows,mac_os_x],[],[],[mobile]}

See the random_user_agent documentation for a list of params.

Note: passing invalid parameters will raise a ValueError exception.

Javascript rendering

Experimental feature.

levior (through the use of requests-html which uses the pyppeteer headless automation library) can render webpages that contain Javascript code.

Pass --js on the command-line to enable Javascript rendering. Use js-force to always run JS rendering even if no JS scripts were detected on the page.

Note: when you run levior with JS rendering for the first time, pyppeteer will download a copy of the browser binary that it requires to run (about ~300 Mb of free disk space is required).

Service modes

  • server: serves web content as gemtext, via gemini URLs. When you visit levior's gemini URL (gemini://localhost by default) you'll be asked for a web domain to browse via a gemini input request. You can also simply go to gemini://localhost/{domain} in your Gemini browser, for example gemini://localhost/sr.ht to browse https://sr.ht. The URLs in the HTML pages are rewritten to be routed through the levior server. This mode is compatible with any Gemini browser.

  • proxy: in this mode, levior acts as a proxy for http and https URLs and serves pages without rewriting URLs. To use this mode, you need a Gemini browser that supports http proxies. Here's a list of browsers supporting proxies: Gemalaya (bundles and uses levior in proxy mode by default), Lagrange, Amfora, diohsc and Telescope.

The allowed modes can be set with the --mode (or -m) command-line argument or with the mode setting in the config file. Use --mode=proxy to run only as a transparent http proxy, or --mode=server to only serve requests made with gemini URLs.

Use --mode=proxy,server to handle both request types (this is the default).

Configuring your Gemini browser to use levior as a proxy

Lagrange

In the File menu, select Preferences, and go to the Network section. Set the HTTP proxy text field to 127.0.0.1:1965. If you're not running levior on localhost, set it to levior's listening IP and port.

Telescope

As explained in the docs, edit ~/.config/telescope/config and add the following:

proxy http via "gemini://127.0.0.1:1965"
proxy https via "gemini://127.0.0.1:1965"

Links

The --links option controls the Gemini links generation mode (this is an md2gemini option):

  • paragraph (this is the default): This will result in footnotes being added to the document, and the links for each footnote being added at the end of each paragraph
  • copy: Like paragraph, but without footnotes
  • at-end: The links are added at the very end of the document
  • off: Remove all links
levior --links=at-end
levior --links=off

Open your Gemini browser and go to gemini://localhost or //localhost.

Mounting ZIM images

You can also mount ZIM files to be served via the gemini protocol. Once you've configured a ZIM mountpoint, go to gemini://localhost/{mountpoint} (for example: gemini://localhost/wiki_en). A great source of ZIM archives is the kiwix library.

It's possible to run searches on the ZIM archive's contents. Go to gemini://localhost/{mountpoint}/search (for example: gemini://localhost/wiki_en/search), where you'll be prompted for a search query (by default there's a limit of 4096 results, this can be changed via the search_results_max option). The search_path option sets the URL path of the search API:

mount:
  /wiki_en:
    type: zim
    path: ./wikipedia_en_all_mini_2022-03.zim
    search_path: /
    search_results_max: 8192

See the example config file here.

Server endpoints

/

The homepage lists the links for the main endpoints, the mountpoints and the links to access the aggregated RSS/Atom feeds.

/goto

When accessing /goto, or /go, you'll be prompted for a domain name or a full URL to browse.

/{domain}

When accessing /{domain}, levior will proxy https://{domain} to the Gemini browser. Examples:

gemini://localhost/searx.be
gemini://localhost/gitlab.com/cipres/levior

/access_log

Shows the proxy's access log.

/cache

Lists the objects stored in the cache.

/graph

RDF graph index

/graph/search

RDF graph search endpoint

/search

When accessing /search, you'll be prompted for a search query. Your search will be performed via the searx search engine.