CVMFS server scraper and prometheus exporter

This tool scrapes the public metadata sources from set of stratum0 and stratum1 servers. It grabs:

- cvmfs/info/v1/repositories.json

And then for every repo it finds (that it's not told to ignore), it grabs:

- cvmfs/<repo>/.cvmfs_status.json
- cvmfs/<repo>/.cvmfspublished

Installation

pip install cvmfs-server-scraper

Usage

#!/usr/bin/env python3

import logging
from cvmfsscraper import scrape, scrape_server, set_log_level

# server = scrape_server("aws-eu-west1.stratum1.cvmfs.eessi-infra.org")

set_log_level(logging.DEBUG)

servers = scrape(
    stratum0_servers=[
        "stratum0.tld",
    ],
    stratum1_servers=[
        "stratum1-no.tld",
        "stratum1-au.tld",
    ],
    repos=[],
    ignore_repos=[],
)

# Note that the order of servers is undefined.
print(servers[0])

for repo in servers[0].repositories:
    print("Repo: " + repo.name )
    print("Root size: " + repo.root_size)
    print("Revision: " + repo.revision)
    print("Revision timestamp: " + repo.revision_timestamp)
    print("Last snapshot: " + str(repo.last_snapshot))

Note that if you are using a Stratum1 server with S3 as its backend, you need to set repos explicitly. This is because the S3 backend does not have a cvmfs/info/v1/repositories.json file. Also, the GeoAPI status will be NOT_FOUND for these servers.

# Data structure

## Server

A server object, representing a specific server that has been scraped.

````python
servers = scrape(...)
server_one = servers[0]

Name

Type: Attribute

server.name

Returns

The name of the server, usually its fully qualified domain name.

GeoApi status

Type: Attribute

server.geoapi_status

Returns

A GeoAPIstatus enum object. Defined in constants.py. The possible values are:

OK (0: OK)
LOCATION_ERROR (1: GeoApi gives wrong location)
NO_RESPONSE (2: No response)
NOT_FOUND (9: The server has no repository available so the GeoApi cannot be tested)
NOT_YET_TESTED (99: The server has not yet been tested)

Repositories

Type: attribute

server.repositories

Returns

A list of repository objects, sorted by name. Empty if no repositores are scraped on the server.

Ignored repositories

Type: Attribute

server.ignored_repositories

Returns

List of repositories names that are to be ignored by the scraper.

Forced repositories

Type: Attribute

server.forced_repositories

Returns

A list of repository names that the server is forced to scrape. If a repo name exists in both ignored_repositories and forced_repositories, it will be scraped.

Repository

A repository object, representing a single repository on a scraped server.

servers = scrape(...)
repo_one = servers[0].repositories[0]

Name

Type: Attribute

repo_one.name

Returns

The fully qualified name of the repository.

Server

Type: Attribute

repo_one.server

Returns

The server object to which the repository belongs.

Path

Type: Attribute

repo_one.path

Returns

The path for the repository on the server. May differ from the name. To get a complete URL, one can do:

url = "http://" + repo_one.server.name + repo_one.path

Status attributes

These attributes are populated from cvmfs_status.json:

Attribute	Value
last_gc	Timestamp of last garbage collection
last_snapshot	Timestamp of the last snapshot

Information from .cvmfspublished is also provided. For explanations for these keys, please see CVMFS' official documentation. The field value in the table is the field key from .cvmfspublished.

Attribute	Field
alternative_name	A
full_name	N
is_garbage_collectable	G
metadata_cryptographic_hash	M
micro_cataogues	L
reflog_checksum_cryptographic_hash	Y
revision_timestamp	T
root_catalogue_ttl	D
root_cryptographic_hash	C
root_size	B
root_path_hash	R
signature	The end signature blob
signing_certificate_cryptographic_hash	X
tag_history_cryptographic_hash	H

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github		.github
cvmfsscraper		cvmfsscraper
scripts		scripts
.gitignore		.gitignore
Changelog.md		Changelog.md
LICENSE		LICENSE
README.md		README.md
plan.md		plan.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
test.py		test.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CVMFS server scraper and prometheus exporter

Installation

Usage

Name

Type: Attribute

Returns

GeoApi status

Type: Attribute

Returns

Repositories

Type: attribute

Returns

Ignored repositories

Type: Attribute

Returns

Forced repositories

Type: Attribute

Returns

Repository

Name

Type: Attribute

Returns

Server

Type: Attribute

Returns

Path

Type: Attribute

Returns

Status attributes

About

Contributors 2

Languages

License

EESSI/cvmfs-server-scraper

Folders and files

Latest commit

History

Repository files navigation

CVMFS server scraper and prometheus exporter

Installation

Usage

Name

Type: Attribute

Returns

GeoApi status

Type: Attribute

Returns

Repositories

Type: attribute

Returns

Ignored repositories

Type: Attribute

Returns

Forced repositories

Type: Attribute

Returns

Repository

Name

Type: Attribute

Returns

Server

Type: Attribute

Returns

Path

Type: Attribute

Returns

Status attributes

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages