Skip to content

Latest commit

 

History

History
84 lines (65 loc) · 4.33 KB

README.md

File metadata and controls

84 lines (65 loc) · 4.33 KB

RSSAC047 Initial Implementation

This repository represents an initial implementation of RSSAC047 prepared by ICANN org. This implementation is an initial development, and is presented here so that the community can see how RSSAC047 might be implemented, to get suggestions and code contributions for this code, and to help the RSSAC Caucus evaluate RSSAC047 for possible future changes.

The repo has a Markdown version of excerpts of RSSAC047v2 as the file "rssac-047.md". In this file, every requirement is marked with a unique three-letter code in square brackets, and that same code appears in the source code as well as this document. The purpose of doing this it to verify that all requirements from RSSAC047 are implemented, and for readers to be able to more easily find where the requirements are reflected in the implementation.

Deployment

  • Deployed with Ansible
    • In Ansible directory in the repo
    • Files that are not part of the distribution are on Local directory in the repo
    • Create VPs first with vps_building.yml, then create collector with collector_building.yml

Logging and alerts

  • Logs are text files kept on VPs and collector
  • Alerts are text files, may be monitored by Prometheus/Zabbix/etc. on collector
    • ~/Logs/nnn-alerts.txt on every machine
  • All Python scripts have die function that prints to alert logs

Vantage points

  • Each VP should have more than one core if possible

  • All are running latest Debian or similar

    • Thus automatically running NTP [ugt]
  • All programs run as "metrics" user

  • vantage_point_metrics.py

    • Is run from cron job every 5 minutes on 0, 5, ... [wyn] [mba] [wca]
    • All systems use UTC [nms]
    • Checks for new root zone every 15 minutes
    • Run scamper after queries to each source for both IPv4 and IPv6 and stores the output in ~/Routing
    • Results of each run are saved as .pickle.gz to ~/Output for later pulling
    • Logs to ~/Logs/nnn-log.txt

Collector

  • Run on a VM with lots of cores and memory

  • Running latest Debian or similar

    • Thus automatically running NTP [ugt]
  • All programs run as "metrics" user

  • get_root_zone.py

    • Run from cron job every 15 minutes
    • Stores zones in ~/Output/RootZones for every SOA seen
  • copy_files_from_vps.py

    • Run from cron job every 15 minutes, 1 minute after get_root_zone.py
    • Uses rsync to copy new files in the Output, Routing and Logs directories on the vantage points to ~/Incoming/vvv directory on the collector machine
  • collector_processing.py

    • Run from cron job every hour
    • For each vantage point directory vvv in ~/Incoming
      • For each pickle.gz file in ~/Incoming/vvv/Output
        • Open file, store results in the database
        • Store all correctness responses in ~/Output/Responses/<soa-serial>-nnn.pickle
    • TODO:
      • Find records in the correctness table that have not been checked, and check them
    • Reports why any failure happens
  • report_creator.py

    • Run from cron job every week, and on the first of each month
    • --debug to add debugging info to the report
    • --force to recreate a report that already exists
    • --test_date to pretend that it is a different date in order to make earlier reports
    • Performs aggregation with results from the database to come to the metrics to report

Correctness testing

Important note: in the current version of the testbed, correctness is not being checked. The data that could be used for correctness testing is being collected, but the steps from RSSAC047 used to check that data are not being performed.

  • collector_processing.py contains a twisty maze of code to check the correctness of queries to the root servers from Section 5.3 of RSSAC047
  • Clearly, this part needs test cases
  • In Tests/, make_tests.py makes the set of positive and negative test responses for correctness
  • Tests are run manually to check whether the correctness tests in collector_processing.py are correct
  • In a local setup (not on the root metrics system):
    • Use make_tests.py --addr to get test vectors from a server under test
    • Use make_tests.py --bin_prefix to indicate where "dig" is
  • After setting up the test cases, run collector_processing.py --test to execute the tests
    • This uses the normal logging
    • See the full output in Tests/results.txt