Skip to content

ESGF_download_statistics

Matthew Harris edited this page Oct 7, 2013 · 3 revisions

Site level data download statistics

The site level statistics are stored into the esgf_dashboard.finaldw table.
The local stats are automatically inferred by the information provider through a reconciliation process and updated once every hour according to the new entries in the access_logging table.
At the end of the reconciliation process, the information provider stores the last access_logging id processed and the current timestamp.

  • _ esgcet=# select * from esgf_dashboard.reconciliation_process ;
    _

_ lastprocessed_id | time_stamp
_ _ 48 | 2012-08-08 01:19:23.735661
_

_ (1 row) _

To rebuild from scratch the esgf_dashboard.finaldw table you need to set lastprocessed_id=0 in the reconciliation_process table and wait for the hourly-based reconciliation process.

  • _ esgcet=# update esgf_dashboard.reconciliation_process set lastprocessed_id=0;
    _

The two attributes in the esgf_dashboard.reconciliation_process have the following meaning:

  • - lastprocessed_id = this is the id of the last entries in the access_logging table analyzed during the last reconciliation process to get the local statistics.
    - time_stamp = this is the timestamp of the last reconciliation process to get the local statistics.

Please consider that this table must contain only 1 single tuple.

To analyze the "local" statistics through the esgf-desktop, you have to choose in the "Source" combo of the Download Statistics window:

  • - "Local (All projects)" or
    - "Local (CMIP5)" if you want to access to the local statistics only related to the CMIP5 experiment.

Federation-level data download statistics (beta)

** _ [This functionality is still under test]
_ **

The federation level statistics are stored into the esgf_dashboard.federationdw table.
To get federated statistics you need to configure the following table: esgf_dashboard.aggregation_process by adding one row for each data node you want to include into the federated statistics.

  • _ # insert into esgf_dashboard.aggregation_process(hostname,lastprocessed_id) values('adm08.cmcc.it',0); _

The three attributes in the esgf_dashboard.aggregation_process have the following meaning:

  • - hostname = this is the remote hostname you want to query to get the remote statistics;
    - lastprocessed_id = this is the id of the last entries in the access_logging table of the remote hostname, processed during the last aggregation step;
    - time_stamp = this is the timestamp of the last aggregation process related to the remote "hostname" hostname.

Here is an example:

  • _ esgcet=# select * from esgf_dashboard.aggregation_process ;
    _ _ hostname | lastprocessed_id | time_stamp
    _ _ adm08.cmcc.it | 48 | 2012-08-08 15:37:07.515519
    _ _ anotherhost | 140 | 2012-08-08 15:40:07.515519
    _ _ … _

No more than a single row for hostname can be specified.

The remote statistics are aggregated by the information provider and updated once every hour according to the new entries in the access_logging table.
Also in this case the statistics are incrementally updated.

To rebuild from scratch the federated statistics for a specific hostname, you have to set the corresponding lastprocessed_id to 0 and wait for the hourly- based aggregation process.

  • _ # update aggregation_process set lastprocessed_id=0 where hostname='adm08.cmcc.it' _

To remove the statistics of a specific hostname from the federated statistics you have to set the corresponding lastprocessed_id to -1. This is a way to exclude a hostname from the federated statistics.

  • _ # update aggregation_process set lastprocessed_id=-1 where hostname='adm08.cmcc.it' _

The time_stamp is written ONLY by the aggregation process.

To analyze the federated statistics through the esgf-desktop, you have to choose in the "Source" combo of the Download Statistics window:

  • - "Federation (All projects)" or
    - "Federation (CMIP5)" if you want to access to the federated statistics only related to the CMIP5 experiment.

Notes

  1. The back-end of the site and federation level statistics comes with the esgf-dashboard v1.3.1.
    This part relates to the C module (information provider) installed under /usr/local/esgf-dashboard-ip (default folder).

The front-end is included into the esgf-desktop module and it is written in Java/JS.
By default it is installed under /usr/local/tomcat/webapp/esgf-desktop

  1. The reconciliation and aggregation processes are by default triggered once every hour.
    This default can be changed by setting (in the esgf.properties) the following property:
  • _ esgf.ip.downdatarefresh.hour
    _

to a different number of hours.

For instance, if you want to trigger the reconciliation and aggregation processes once every day you must set the esgf.ip.downdatarefresh.hour property as in the following example:

  • _ esgf.ip.downdatarefresh.hour = 24
    _
  1. A restart of the node (esg-node restart) automatically triggers a new reconciliation process followed by an aggregation process.
    This can avoid to wait for the synchronous (hourly-based by default) aggregation and reconciliation processes, when some changes are applied to the:
  • - _ esgf_dashboard.reconciliation_process _ and
    - _ esgf_dashboard.aggregation_process _ ,

tables.

  1. Please note that the first time a new machine is inserted into the esgf_dashboard.aggregation_process table, the aggregation process can take from some seconds/minutes to some hours depending on the total number of entries in the esgf_node_manager.access_logging table that have to be aggregated... and on the resources available on the involved machines.
    After that, the incremental updates work only on the delta statistics and generally take just few seconds.

  2. For performance reasons, the next versions of the esgf-dashboard will build several data marts (starting from the two datawarehouses (local and federated)) to separately manage the different dimensions and provide higher responsive on-line analytics capabilities from the front-end point of view.

  3. The federated statistics are still under test.

Clone this wiki locally