-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1230 from Zarquan/20231029-zrq-rsap-request
Updating RSAP requests
- Loading branch information
Showing
1 changed file
with
245 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,245 @@ | ||
# | ||
# <meta:header> | ||
# <meta:licence> | ||
# Copyright (c) 2023, ROE (http://www.roe.ac.uk/) | ||
# | ||
# This information is free software: you can redistribute it and/or modify | ||
# it under the terms of the GNU General Public License as published by | ||
# the Free Software Foundation, either version 3 of the License, or | ||
# (at your option) any later version. | ||
# | ||
# This information is distributed in the hope that it will be useful, | ||
# but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
# GNU General Public License for more details. | ||
# | ||
# You should have received a copy of the GNU General Public License | ||
# along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
# </meta:licence> | ||
# </meta:header> | ||
# | ||
#zrq-notes-time | ||
#zrq-notes-indent | ||
#zrq-notes-crypto | ||
#zrq-notes-ansible | ||
#zrq-notes-osformat | ||
#zrq-notes-zeppelin | ||
# | ||
# AIMetrics: [] | ||
# | ||
|
||
Target: | ||
|
||
2023/24/25 RSAP request | ||
|
||
Result: | ||
|
||
Work in progress ... | ||
|
||
# ----------------------------------------------------- | ||
|
||
Cambridge | ||
|
||
Live DR4 service | ||
700 cores | ||
1 TiB RAM | ||
Full DR4 dataset, 500 TiB | ||
50 TiB Object Swift | ||
500 TiB CephFS Manila | ||
5 TiB Block Cinder | ||
16 TiB Direct attached SSD Ephemeral | ||
|
||
Dev DR4 service | ||
700 cores | ||
1 TiB RAM | ||
Full DR4 dataset, 500 TiB (needed to re-generate table indexing) | ||
50 TiB Object Swift | ||
500 TiB CephFS Manila | ||
5 TiB Block Cinder | ||
16 TiB Direct attached SSD Ephemeral | ||
|
||
Total for Cambridge | ||
1400 cores | ||
2 TiB RAM | ||
100 TiB Object Swift | ||
1000 TiB CephFS Manila | ||
10 TiB Block Cinder | ||
32 TiB Direct attached SSD Ephemeral | ||
|
||
Somerville | ||
|
||
Live/dev DR4 service | ||
700 cores | ||
1 TiB RAM | ||
Full DR4 dataset, 500 TiB | ||
50 TiB Object Swift | ||
500 TiB CephFS Manila | ||
5 TiB Block Cinder | ||
16 TiB Direct attached SSD Ephemeral | ||
|
||
STFC cloud RAL | ||
|
||
Live/dev DR4 service | ||
700 cores | ||
1 TiB RAM | ||
Full DR4 dataset, 500 TiB | ||
50 TiB Object Swift | ||
500 TiB CephFS Manila | ||
5 TiB Block Cinder | ||
16 TiB Direct attached SSD Ephemeral | ||
|
||
# ----------------------------------------------------- | ||
|
||
RSAP total | ||
|
||
2800 cores | ||
4 TiB RAM | ||
200 TiB Object Swift | ||
2000 TiB CephFS Manila | ||
20 TiB Block Cinder | ||
64 TiB Direct attached SSD Ephemeral | ||
|
||
# ----------------------------------------------------- | ||
|
||
Peak estimates: | ||
|
||
This resource request is based on estimates for the peak load | ||
expected when Gaia DR4 is released at the end of 2025. | ||
|
||
Our current Gaia DR3 dataset is around 8TiB. | ||
The Gaia DR4 dataset is expected be be in the order of 500TiB. | ||
|
||
The current timeline for Gaia is to release Gaia DR4 in Q4 2025, | ||
although we may start to receive pre-release copies of the data | ||
for internal development and testing during 2024/25. | ||
|
||
When DR4 is released in Q4 2025 we are expecting to have to curate multiple | ||
copies of the 500TiB dataset, and handle a peak of interest from users in | ||
response to the publicity surrounding the event. | ||
|
||
This means that our resource requirements for this 2024/2025 RSAP round | ||
includes an estimate of the resources that will need to be deployed by | ||
the end of 2024, ready to handle the peak load leading up to the DR4 | ||
release at the end of 2025. | ||
|
||
Resources requested in the 2025/2026 RSAP round will not be in deployed, | ||
tested and ready for use in time. | ||
|
||
Once our Gaia DR4 platform has been deployed and the expected peak load | ||
has passed; we may be able to release some of the requested resources. | ||
|
||
Multiple sites: | ||
|
||
We are planning for our main live deployment to be at Cambridge, along | ||
with a second system on the same platform for development and data curation. | ||
|
||
The secondary sites at RAL and Somerville will provide scale-out capacity to | ||
handle peak load expected during the DR4 release. | ||
|
||
IF secondary sites at RAL and Somerville come online with no problems, | ||
then we may be able to release some the second system at Cambridge. | ||
|
||
Our development plan for 2024 includes migrating from the current monolithic | ||
deployment to a more flexible system consisting of an initial website | ||
that handles user accounts and login, backed by a number of notebook services. | ||
|
||
This will enable us to scale out to use multiple physical sites for the | ||
notebook services while presenting the user with what appears to be a single | ||
integrated platform. | ||
|
||
Multiple data copies: | ||
|
||
Our development plan for 2024 includes working with mock copies of the DR4 | ||
dataset to test our data injest, indexing and partitioning processes, | ||
and testing new deployment methods that can scale out to cope with this | ||
amount of data. | ||
|
||
During 2025 we expect to be injesting pre-release versions of the DR4 data | ||
set and using them to run a beta-test version of the system for a restricted | ||
set of users, while at the same time running one or more development sites, | ||
and keeping the existing live DR3 site running. | ||
|
||
Booking system: | ||
|
||
Our development plan for 2024 includes a booking system that enables users | ||
to book sessions ahead of time, smoothing out the peaks in demand and enabling | ||
us to predict the resources needed in advance. | ||
|
||
With this in place, we may be able to release resources at some of the physical | ||
sites in response to changes in load. | ||
|
||
It is hoped that the booking system can be integrated with Blazar, the Openstack | ||
resource reservation system, if it becomes available on the Openstack platforms. | ||
|
||
Resource pinning: | ||
|
||
Our current allocation at Cambridge is pinned to specific hardware. | ||
This is to needed guarantee that we can release and create sets of resources without | ||
running into problems. | ||
|
||
In the past we found that if the Openstack system is configured to make maximal | ||
use all the available resources, leaving few idle resources available for new | ||
allocations, then releasing and creating a block of 10+ VMs may fail because some | ||
of the resources get assigned to other processes during the gap between our | ||
release and create steps. | ||
|
||
Whether this will be needed at RAL or Somerville will depend on the configuration | ||
of their Openstack systems. | ||
|
||
This issue may be solved by the deployment of Blazar, the Openstack resource | ||
reservation system. | ||
|
||
High memory hypervisors: | ||
|
||
A number of the hypervisors at Cambridge are high memory nodes with a higher | ||
memory:vcpu ratio. | ||
|
||
These host the high memory VMs required by the Zeppelin head nodes to support | ||
the in-memory processing used by libraries such as HDBSCAN. | ||
(*) cite Dennis's paper ? | ||
|
||
Project specific flavors: | ||
|
||
We have a set of project specific flavors at Cambridge, designed to optimise | ||
the packing of VMs into the pinned hypervisors. | ||
|
||
This includes a set of 'himem' flavors designed to make use of the high-memory | ||
hypervisors. | ||
|
||
We will also need to create a set of 'high storage' flavors to make use of the | ||
direct attached SSD storage when it becomes available. | ||
|
||
Direct attached storage: | ||
|
||
This request is based on the assumption that direct attached SSD storage wil | ||
be able to solve the problems with high IO wait that we have seen with our | ||
current deployment. | ||
We planned to use the SSD resources requested in our 2023 allocation to test | ||
this theory, but the resources are not available yet. | ||
|
||
We hope to be able to resolve this question during work planned for 2024. | ||
|
||
CephFS vs object storage: | ||
|
||
The current system uses the CephFS storage system at Cambridge, which is is based | ||
on spinning disc hard drives. | ||
|
||
In theory it should be possible to serve the Gaia dataset via the S3 interface | ||
of Ceph object storage using the Openstack Swift interface. | ||
However, we have encountered some issues with the Java S3 client libraries | ||
when using this to serve the Gaia DR3 dataset. | ||
Which is why this request specifies the majority of the storage, 500TiB, on | ||
CephFS via Manila and only 50 TiB of object via Swift at each site. | ||
|
||
50 TiB Object Swift | ||
500 TiB CephFS Manila | ||
|
||
If these issues can be solved, and if the resulting S3 interface is fast enough | ||
to provide the bandwidth needed, then we may be able to swap the bulk of the | ||
storage, 500TiB, from CephFS via Manila to object storage via Swift. | ||
|
||
500 TiB Object Swift | ||
50 TiB CephFS Manila | ||
|
||
We hope to be able to resolve this question during work planned for 2024. | ||
|