Skip to content
This repository has been archived by the owner on Jul 5, 2024. It is now read-only.

API Server

Finn K edited this page May 29, 2018 · 29 revisions

API-Server

The API-Server is responsible for collecting data from public and private chains, generating statistics from the data and distributing the data to the clients.

API

For a detailed API information, take a look at the API Documentation.

The documentation was created via apidoc and apidoc-markdown by running

apidoc -i ./components && apidoc-markdown -p doc -o apidoc.md

or

npm run apidoc

in the api-server root directory after installing all node modules. To inspect the results, take a look at apidoc.md or open doc/index.html

Docker architecture

We want to run the API-server operating system independent. To achieve this we decided on a docker based architecture. The API-server and the DMBS run in separate docker containers connected via a docker network.

Collect data

Our server collects data that is relevant for our statistics by requesting it from online providers and from the private block chain we set up.

Public chain data

To gather public chain data, the API-Server accesses the provided API of serveral websites, that offer free statistics of the most important and relevant block chains. We minimize the traffic to those websites by caching their results. This will not influence the result, because the values do not change often.

Private chain data

To gather information about our private block chain, the API-Server provides an interface for all docker nodes that are involved in the current block chain setting. The nodes can push their information on the server via a socket and the server selects and stores relevant data.

Data Aggregation

Private chain data comes in at a rate we might not be able to handle, and so incoming data could get dropped while processing it from a single buffer. To prevent inaccurate aggregations, we use multiple buffering. A given buffer A is active and collects private chain data. After a certain time, another Buffer B is turned active and collects the private data instead of A, while Buffer A is aggregated and the aggregated data is stored in the databse. Afterwards, A is flushed and then becomes active, while B is agggegated. This prevents data from being dropped and furthermore keeps the number of Items needed to be stored in the database low (thereby keeping I/O low) while being accurate enough for our use case.

Make data accessable

Our API-Server provides an interface via a http-express server. It responds to data requests with a JSON-string.

Store gathered information

Storing the data we receive allows us to provide further statistics and detect fluctuations and anomalies over a long time analysis.

Database Management System

Our storage technology has to handle serveral different analytical tasks and a huge amount of new data that has to be collected simultaneously. So we thought about choosing a row based database management system to minimize the time to write new data. Another major reason is the format of the data we are gathering. We chose JSON-objects as our main data format for all data transfers - the information retrieved by the docker nodes is pushed into the server as JSON-objects. By choosing a NoSQL DBMS we could easily adapt our database layout to these, so there is little to no need parsing the nodes' information. But a SQL database should work as well. We selected MongoDB as our DBMS. We realised that we store a lot of data with the same structure and therefore thought about switching to a SQL database, where we could write our aggregations more easily. This might have also provided us with a little performance increase regarding our logs and help us get rid of mongooses syntax that can be difficult to understand sometimes. Postgres was our SQL database of choice but while trying to switch to the new technology we figured sticking to our initial choice of MongoDB would be better for our project. We realised that changing to Postgres wouldn't add much value to the project. It might have increased our performance but we only had small performance issues with our logs, that needed refactoring anyways. The aggregator query could also be improved without changing the database technology. The new database would increase difficulty when adding new metrics or chains with additional values to store and as we used JSON-objects a lot of parsing would have to be done. This is why we decided to stick to MongoDB and refactor the queries and not the technologies.

Security

To secure the database we enabled authentication. So every time a client wants to connect to the database he has to authenticate himself with a username and a password against the admin database running on MongoDB. The database has three users:

  • root
  • admin
  • chainboarddbuser

The root user will be created within the entrypoint script for the MongoDB-Docker using environment variables provided by the .env file. Admin and chainboarddbuser will be created by the createDbUser script. Chainboarddbuser will only have read and write rights for the chainboarddb database. The script uses the same environment variables to authenticate as root. After the creation of the users we can start the API-server. The API-server gathers the login credentials from the .env file and authenticates itself as chainboarddbuser. After the procedure we delete the .env file and the database is secured.

Filter the amount of incoming data

Another approach to store the data correctly with good performance is to provide two seperate database tables. The whole traffic will be stored in one of the tables for a certain amount of time. After the time is up, the traffic will be redirected into the second table and the first table can be aggregated and the prepared data will be stored in the main table. Then we drop the first table and create a new one to which the traffic can be redirected after the time is up.

Provide the data to the Frontend

The Frontend can use our API-calls to access the data. The number of items responding to the Frontend is limited to 10.000.

Enacting Scenarios

Scenarios can either be provided in the form of Scylla logs or defined by an interval, a payload size and a required number of miners. Uploaded Scylla logs will be parsed and stored, while the manually defined scenarios get translated into a format similar to that of a parsed Scylla log. A scenario can be identified by a name and started on various chains via the frontend - when a chains' parameters are changed and a scenario name is provided, the scenario gets sent to the specified chain on the specified target and executed there.

Recording and Replaying of Execution Data

The incoming data might be of interest to other viewers at a later point in time. Endpoints to start, stop, and cancel a recording are in place. A recording is started with its' name provided. When recording is stopped, an entry in the database with the recording name, its start and end time and all recorded chains and their target system, is stored. The frontend then can get a list of all recordings and trigger a normal chain data request for the stored chains and targets with the recording start and end time as a timeframe. Thus, we don't have to store the same chain data twice (once as it comes in and once for replaying) and instead can even use the same endpoint and table.

Deployment on the BPT-Server

To deploy our server on the BPT-Server we secured our database by enabling authentication. The Server is now accessible via the BPT-Server.

Testing

Testing of the API Server is done via https://mochajs.org and https://istanbul.js.org/ is used to generate coverage statistics. The tests themselves don't have to be called explicitly, all methods in the test folder are executed recursively.