From b8872247012ea7afce8e2824ed0402421678cf43 Mon Sep 17 00:00:00 2001 From: Annika Lauber Date: Thu, 20 Jun 2024 13:29:56 +0200 Subject: [PATCH 01/13] Add file for minutes --- events/icon_meetings/2024-2.md | 46 ++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) create mode 100644 events/icon_meetings/2024-2.md diff --git a/events/icon_meetings/2024-2.md b/events/icon_meetings/2024-2.md new file mode 100644 index 00000000..fa9926db --- /dev/null +++ b/events/icon_meetings/2024-2.md @@ -0,0 +1,46 @@ +--- +title: 2024/2 +layout: default +parent: Quarterly ICON Meeting +grand_parent: Technical Events +--- + +# ICON Meeting 2024/2 (20 June 2024) + +## Venue +Hybrid: ETH Zurich (L 17.1) and via Zoom + +## Participants (on-site) +Michael Jähn (MJ), +Matthieu Leclair (ML), +Annika Lauber (AL), +Jonas Jucker (JJ), +Fulden Batibeniz (FB), +Emmanuele Russo (ER), +Athena Nghiem (AN), +Andrea Stenke (AS), +Sylvaine Ferrachat (SF) + + +## Participants (Zoom) +Marco Arpagaus (MA), +Will Sawyer (WS), +Alina Yapparova (AY), +Arash Hamzehloo (AH), +Mikael Stellio (MS), +David Leutwyler (DL), +Dominik Brunner (DB), +Corina Keller (CK), +Michael Steiner (MicS), +Stefan Rüdisühli (SR), +Christian Steger (CS), +Kutay Dönmez (KD) + + + +_Minutes by Annika Lauber_ + +## Reports + +### C2SM (Michael Jähn, Annika Lauber, Matthieu Leclair, Jonas Jucker) +MJ welcomes everyone and shares the [latest news about ICON from C2SM](https://polybox.ethz.ch/index.php/s/4fXixLycOwOy41K). From 4c1be3f1e2c342aa7e4a2501f94ee367423c199c Mon Sep 17 00:00:00 2001 From: Annika Lauber Date: Thu, 20 Jun 2024 13:36:37 +0200 Subject: [PATCH 02/13] update participants --- events/icon_meetings/2024-2.md | 25 ++++++++++--------------- 1 file changed, 10 insertions(+), 15 deletions(-) diff --git a/events/icon_meetings/2024-2.md b/events/icon_meetings/2024-2.md index fa9926db..8912c3e8 100644 --- a/events/icon_meetings/2024-2.md +++ b/events/icon_meetings/2024-2.md @@ -17,25 +17,20 @@ Annika Lauber (AL), Jonas Jucker (JJ), Fulden Batibeniz (FB), Emmanuele Russo (ER), -Athena Nghiem (AN), -Andrea Stenke (AS), -Sylvaine Ferrachat (SF) - +Guillaume Bertolli (GB), +Brigitta Goger (BG), +Clarissa Kroll (CK), +Doris Folini (DF) ## Participants (Zoom) -Marco Arpagaus (MA), -Will Sawyer (WS), -Alina Yapparova (AY), -Arash Hamzehloo (AH), +Sylvain Ferrachat (SF), +Fabian Gessler (FG), +Lukas Jansing (LJ), Mikael Stellio (MS), -David Leutwyler (DL), -Dominik Brunner (DB), -Corina Keller (CK), -Michael Steiner (MicS), -Stefan Rüdisühli (SR), +Nander Wever (NW), +Jacopo Canton (JC), Christian Steger (CS), -Kutay Dönmez (KD) - +Alina Yapparova (AY) _Minutes by Annika Lauber_ From de416b5385d4af793bf6a1076fbebc88b1d723c8 Mon Sep 17 00:00:00 2001 From: Annika Lauber Date: Thu, 20 Jun 2024 13:40:35 +0200 Subject: [PATCH 03/13] prepare --- events/icon_meetings/2024-2.md | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/events/icon_meetings/2024-2.md b/events/icon_meetings/2024-2.md index 8912c3e8..30f4cf5d 100644 --- a/events/icon_meetings/2024-2.md +++ b/events/icon_meetings/2024-2.md @@ -23,14 +23,16 @@ Clarissa Kroll (CK), Doris Folini (DF) ## Participants (Zoom) -Sylvain Ferrachat (SF), +Sylvaine Ferrachat (SF), Fabian Gessler (FG), Lukas Jansing (LJ), Mikael Stellio (MS), Nander Wever (NW), Jacopo Canton (JC), Christian Steger (CS), -Alina Yapparova (AY) +Alina Yapparova (AY), +David Leutwyler (DL), +Will Sawyer (WS) _Minutes by Annika Lauber_ @@ -38,4 +40,20 @@ _Minutes by Annika Lauber_ ## Reports ### C2SM (Michael Jähn, Annika Lauber, Matthieu Leclair, Jonas Jucker) -MJ welcomes everyone and shares the [latest news about ICON from C2SM](https://polybox.ethz.ch/index.php/s/4fXixLycOwOy41K). +MJ welcomes everyone and shares the [latest news about ICON from C2SM](https://polybox.ethz.ch/index.php/s/4fXixLycOwOy41K) (update link). + +### Fulden Baibeniz +### Emmanuele Russo +### Guillaume Bertolli +### Brigitta Goger +### Clarissa Kroll +### Doris Folini +### Sylvaine Ferrachat +### Lukas Jansing +### Mikael Stellio +### Nander Wever +### Jacopo Canton +### Christian Steger +### Alina Yapparova +### David Leutwyler +### Will Saywer From 55eeaeefac6888edf921e876c53a2914df76537d Mon Sep 17 00:00:00 2001 From: Annika Lauber Date: Thu, 20 Jun 2024 14:18:30 +0200 Subject: [PATCH 04/13] first notes --- events/icon_meetings/2024-2.md | 63 ++++++++++++++++++++++++++++++---- 1 file changed, 57 insertions(+), 6 deletions(-) diff --git a/events/icon_meetings/2024-2.md b/events/icon_meetings/2024-2.md index 30f4cf5d..d9912556 100644 --- a/events/icon_meetings/2024-2.md +++ b/events/icon_meetings/2024-2.md @@ -32,6 +32,9 @@ Jacopo Canton (JC), Christian Steger (CS), Alina Yapparova (AY), David Leutwyler (DL), +Sven Kotlarski (SK), +Dominik Brunner (DB), +Arash Hamzehloo (AH), Will Sawyer (WS) @@ -42,18 +45,66 @@ _Minutes by Annika Lauber_ ### C2SM (Michael Jähn, Annika Lauber, Matthieu Leclair, Jonas Jucker) MJ welcomes everyone and shares the [latest news about ICON from C2SM](https://polybox.ethz.ch/index.php/s/4fXixLycOwOy41K) (update link). -### Fulden Baibeniz -### Emmanuele Russo +ML reporting on Alps. +CK asks how to do benchmarking. +Nobody really knows. (ML should write something here) + ### Guillaume Bertolli +Last 3 years build ML implementation for ecrad. +Next: instead of only predicting contrib of radiation on temp, the sum of all (ask JJ for more details). Still havent constructed dataset. +### Emmanuele Russo +icon-clm: finally found optimal calibration for euro-cordex domain. model perf pretty good, particulary compared to cosmo-clm. +applied several model developments, including routine for transient aerosol. +AL asks if it will be on CPU +ER: will be on CPU, the Germans are not interested in GPUs. + ### Brigitta Goger +uses icon below 1km. limited area setup in Austria +has accepted paper about it. +hecometric range: perhaps there is issue with smagorinsky model (clearly bias at least) +side project with NA: explored representation of clouds at 1km and 65m (1-moment and 2-moment): rather represent on resolution(?) than on microphysics scheme + + ### Clarissa Kroll +getting icon xtp setup, merge it on icon-exlcaim. +working on radi dumping option + ### Doris Folini -### Sylvaine Ferrachat -### Lukas Jansing -### Mikael Stellio +just here to inform herself on icon + +### Fulden Batibeniz +goal to use icon-clm configuration (hopefully on gpu) +needs to do benchmarking (not possible yet) +project about seasonal predictions. ### Nander Wever +works at slf. main task snow pack development. financed by mch. want to make it opr for next winter season. + +### Sylvaine Ferrachat +icon-ham: no new development since last meeting +going to do soon: set up icon-ham as global simulation. Climatology r2b8 ### Jacopo Canton +mostly working with icon4py. using icon mostly as validation. just here to be kept up to date +### Sven Kotlarski +climate department of mch. trys o link icon-clm develpment to climate scenarios. heard that it will take some more time but luckily the scenarios does not have real dependency but it would be nice to have somehing available. does not do any coding of icon (doesnt know about details) +### David Leutwyler +supports team X campaig in austria, writing proposal for cscs ### Christian Steger +numerical weather forecasting team. looking topographic setting for surface radiation. which parametrisations make sense to port +if you run on complex terrain on high resolution please let me know +### Mikael Stellio +works on icon-ham on gpu side. now working on tracer advection and trys to optimize it ### Alina Yapparova -### David Leutwyler +part of glory project. data simulation of 3d radar simulations. running icon with icon and emvorado (fixed bug in emvorado). +### Lukas Jansing +started working in send last september. was getting model ready for simulations. still need to finish product imigration. trying to find better configuration for production. still bias of temp in valleys. sarted working in glory-A. first set of 500m simulations. no significatn results so far. +### Fabiag Gessler +most of time getting icon operational on gpus. next thing is to get further perfomance improvement. using more nodes than they would like. will probably also contribute to glory project in future +### Arash Hamzehloo +porting components of ART to GPUs. testing new code based on co2 over zurich. +project was originally financed (?) by PASC + +### Dominik Brunner +do a lot of greenhouase gas modeling. GPU ports have been very essential. they are moving over now to GPU version +also do air pollution modelling. icon-art also does chemistry. couldnt do all the processes needed yet. has postdoc to do that. look at impact of net-zero scenarios: how would that ipact air polluution. had 2 day training course for icon-art: set up exercise to do air pollution setup on levante. + ### Will Saywer From e7a71775b9731b3caa4f1b92e14d6611e30b5b69 Mon Sep 17 00:00:00 2001 From: Annika Lauber Date: Thu, 20 Jun 2024 15:16:05 +0200 Subject: [PATCH 05/13] notes Wills presentation --- events/icon_meetings/2024-2.md | 45 ++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/events/icon_meetings/2024-2.md b/events/icon_meetings/2024-2.md index d9912556..5db8ded9 100644 --- a/events/icon_meetings/2024-2.md +++ b/events/icon_meetings/2024-2.md @@ -108,3 +108,48 @@ do a lot of greenhouase gas modeling. GPU ports have been very essential. they a also do air pollution modelling. icon-art also does chemistry. couldnt do all the processes needed yet. has postdoc to do that. look at impact of net-zero scenarios: how would that ipact air polluution. had 2 day training course for icon-art: set up exercise to do air pollution setup on levante. ### Will Saywer +ML asks how WSs benchmarking will be able to be used +WL can be used but not exactly the same. + +ICON on Alps: How is it performaing? + +ML asks about higher resolution reaching less of the power cabinet (ask ML) +WS thinks they got the issue and will rerun the benchmarks. R2b9 is better but still not as good as 12b7 +DL: can you correlate the different steps. There are different steps. r2b7 +WS: initialisation but also in first timestep it reads boundary conditions, takes really large amount of time. This is much shorter for lower resolutions + + +After presentation +DL: would be valuable to have these results somewhere on wiki. +Also I guess you have a lot of tricks for doing benchmarks. Maybe you could share that? +WS: will definitely share that. No decision yet if they will enable low-noise mode. Sure that there will be at least a partiion availbale with low-noise mode. +DL: working on recommended configuration. Now you say use just-fit mode. Does it instantly drop below 50% +WS: no. WS personally does not agree with it but thinks you should run on best-fit configuration. Machine is so powerful that you will still get good time to solution. + +FB: also wriing proposla using icon-clm, still developing phase for gpu porting. Tested on daint because it doesnt run on ALPS. Maria Grazia said that scaling on daint will be rejected. +WS: Maria Grazia said that you can apply for accoutn to do early testing +FB: icon-clm not final yet and not tested on pre-alps. +MJ: they only have close setup. Could potentially do it on ALPS. Basic question if FB can use your benchmarking +WS: yes but it's definitely different configuration. Doesnt know if it could be a reason to be rejected. WS can ask Maria Grazia if that would help. +ML: thinks that benchmarking for icon-nwp could be useful for icon-clm. Maybe you could share namelist. +WS will share namelist. Can also give access to weak scalability test (to certain people) + +DB: showed ng savings. How much faster it is for given config to run on Alps compared to daint. Can we estimate from that comparison +WS: if you run gpu to gpu configuration, Alps is roughly a factor of 9 faster. But you wouldn't run it that way because it wuldnt fill memory, so you would use fewer nodes. + +JC: power cap per cabinet +WS actually it's per GPU. in terms of energy to solution, you can get it to run faster but not in a linear way. Grace CPU very very powerful, grace shares memory with GPU. people may want to run components on CPU. Problem is that it gives priority to CPU and then GPU is slowed down. We have to warn people because it will not necessarly perfomr well if you run large components on cpu. +ocean-atmosphere coupling is such a component. CSCS is pushsing for a GPU port. + +ML: what if you reduce cap? wouldnt you be able to have a good scaling +WS probably yes. its a politacl decision where to put the power cap +ML could it be different on differnt clusters +WS: doesnt know. He can ask + +DF: all tests were done within one cabinet. Is it expected that individula jobs run iwthin a single cabinet +WS: run other tests (r2b11) which does not fit into one cabinet. Looks like it has minimal impact. + +ML virtual cluste cap may not work because its virtual +WS doesnt know how elastic it is. Will go back to it + +WS will discuss what he can put on wiki and will ask MG about situation if daint benchmarks are sufficient From 0aaf6ae0ffd417756dc6d73efaedb0210a96b110 Mon Sep 17 00:00:00 2001 From: Annika Lauber Date: Fri, 21 Jun 2024 11:46:46 +0200 Subject: [PATCH 06/13] Change notes into sentences --- events/icon_meetings/2024-2.md | 176 ++++++++++++++++++++------------- 1 file changed, 105 insertions(+), 71 deletions(-) diff --git a/events/icon_meetings/2024-2.md b/events/icon_meetings/2024-2.md index 5db8ded9..59cd268d 100644 --- a/events/icon_meetings/2024-2.md +++ b/events/icon_meetings/2024-2.md @@ -43,113 +43,147 @@ _Minutes by Annika Lauber_ ## Reports ### C2SM (Michael Jähn, Annika Lauber, Matthieu Leclair, Jonas Jucker) -MJ welcomes everyone and shares the [latest news about ICON from C2SM](https://polybox.ethz.ch/index.php/s/4fXixLycOwOy41K) (update link). +MJ welcomes everyone and shares the [latest news about ICON from C2SM](https://polybox.ethz.ch/index.php/s/4fXixLycOwOy41K) ( **TODO MJ**: update link). ML reporting on Alps. CK asks how to do benchmarking. -Nobody really knows. (ML should write something here) +Nobody really knows. ( **TODO ML**: ML should write something here) ### Guillaume Bertolli -Last 3 years build ML implementation for ecrad. -Next: instead of only predicting contrib of radiation on temp, the sum of all (ask JJ for more details). Still havent constructed dataset. +Over the last three years, GB has been focused on building a machine learning implementation for ECRAD. Next, instead of just predicting the contribution of radiation on temperature, the plan is to predict the sum of all contributing factors ( **TODO JJ**: JJ is that correct?). The dataset needed for this next phase has not yet been constructed. + ### Emmanuele Russo -icon-clm: finally found optimal calibration for euro-cordex domain. model perf pretty good, particulary compared to cosmo-clm. -applied several model developments, including routine for transient aerosol. -AL asks if it will be on CPU -ER: will be on CPU, the Germans are not interested in GPUs. +ER shared that for ICON-CLM, they finally found the optimal calibration for the EURO-CORDEX domain. The model performance is pretty good, especially compared to COSMO-CLM. They applied several model developments, including a routine for transient aerosol. + +AL asked if it will run on a CPU as the transient aerosol datasets is not yet ported to GPU. + +ER confirmed that it will be on a CPU for now but asks when the port is expected to be ready. + +AL answers that it is really hard to predict as it has to be done from scratch now. ### Brigitta Goger -uses icon below 1km. limited area setup in Austria -has accepted paper about it. -hecometric range: perhaps there is issue with smagorinsky model (clearly bias at least) -side project with NA: explored representation of clouds at 1km and 65m (1-moment and 2-moment): rather represent on resolution(?) than on microphysics scheme +BG reported that she is using ICON below 1km. There has been a paper accepted using the limited area setup over Austria. Regarding the hecometric range, BG mentioned that there might be an issue with the Smagorinsky model, as there is a clear bias, at least. +As a side project with Nadja Omanovic, they explored the representation of clouds at 1km and 65m resolutions, using both 1-moment and 2-moment schemes. They concluded that to represent clouds resolution is more important than on the microphysics scheme. ### Clarissa Kroll -getting icon xtp setup, merge it on icon-exlcaim. -working on radi dumping option +CK reported that they are getting the ICON XTP ( **TODO ML**: ML can you correct? XTP is somehow what I understood but I guess I misunderstood something) setup and plan to merge it with ICON-EXCLAIM. Additionally, they are working on a radiative dumping option. ### Doris Folini -just here to inform herself on icon +DF is just here to inform herself on ICON. She was part of Christoph Schär's group, which no longer exists. + +### Fulden Batıbeniz +FB reported that her goal is to use the ICON-CLM configuration, hopefully on GPU. However, they need to do benchmarking, which is not possible yet. FB's project focuses on seasonal predictions. -### Fulden Batibeniz -goal to use icon-clm configuration (hopefully on gpu) -needs to do benchmarking (not possible yet) -project about seasonal predictions. ### Nander Wever -works at slf. main task snow pack development. financed by mch. want to make it opr for next winter season. +NW reported that he works at SLF, with his main task being snowpack development. They aim to make it operational for the next winter season. ### Sylvaine Ferrachat -icon-ham: no new development since last meeting -going to do soon: set up icon-ham as global simulation. Climatology r2b8 +SF reported that there has been no new development on ICON-HAM since the last meeting. However, they plan to soon set up ICON-HAM as a global simulation with climatology at the R2B8 resolution. + ### Jacopo Canton -mostly working with icon4py. using icon mostly as validation. just here to be kept up to date +JC reported that he is mostly working with ICON4py and using ICON primarily for validation. He is attending the meeting to stay up to date. + ### Sven Kotlarski -climate department of mch. trys o link icon-clm develpment to climate scenarios. heard that it will take some more time but luckily the scenarios does not have real dependency but it would be nice to have somehing available. does not do any coding of icon (doesnt know about details) +SK, from the climate department of MCH, mentioned their efforts to link ICON-CLM development to climate scenarios. They understand that this integration will take more time, but fortunately, the scenarios do not have a real dependency. They expressed the desire to have something available soon. SK clarified that he does not engage in coding for ICON and is not familiar with the technical details. + ### David Leutwyler -supports team X campaig in austria, writing proposal for cscs +DL supports Team X's campaign in Austria and is working on writing proposals for CSCS. + ### Christian Steger -numerical weather forecasting team. looking topographic setting for surface radiation. which parametrisations make sense to port -if you run on complex terrain on high resolution please let me know +CS, from the numerical weather forecasting team, is currently investigating the topographic setting for surface radiation. They are exploring which parameterizations would be appropriate to port. CS asks people to let him know if they are working with complex terrain at high resolutions. + ### Mikael Stellio -works on icon-ham on gpu side. now working on tracer advection and trys to optimize it +MS is currently focused on the GPU side of ICON-HAM and is working on optimizing tracer advection. + ### Alina Yapparova -part of glory project. data simulation of 3d radar simulations. running icon with icon and emvorado (fixed bug in emvorado). +AY is part of the GLORI-A project and is involved in simulating 3D radar data. They are currently running ICON with Emvorado, and recently fixed a bug in Emvorado. + ### Lukas Jansing -started working in send last september. was getting model ready for simulations. still need to finish product imigration. trying to find better configuration for production. still bias of temp in valleys. sarted working in glory-A. first set of 500m simulations. no significatn results so far. -### Fabiag Gessler -most of time getting icon operational on gpus. next thing is to get further perfomance improvement. using more nodes than they would like. will probably also contribute to glory project in future +LJ started working in MCH-SEN last September, focusing on preparing the model for simulations. They are still in the process of finishing product migration and are experimenting with finding a better configuration for production. LJ noted there is still a temperature bias in valleys that needs addressing. Additionally, they have begun working on GLORI-A, conducting the first set of 500m simulations, though no significant results have emerged thus far. + +### Fabian Gessler +FG spends most of his time getting ICON operational on GPUs and is currently focused on achieving further performance improvements. They are still using more nodes than preferred. FG will probably be contributing to the GLORI-A project in the future. + ### Arash Hamzehloo -porting components of ART to GPUs. testing new code based on co2 over zurich. -project was originally financed (?) by PASC +AH is currently porting components of ART to GPUs and testing new code using CO2 over Zurich. ### Dominik Brunner -do a lot of greenhouase gas modeling. GPU ports have been very essential. they are moving over now to GPU version -also do air pollution modelling. icon-art also does chemistry. couldnt do all the processes needed yet. has postdoc to do that. look at impact of net-zero scenarios: how would that ipact air polluution. had 2 day training course for icon-art: set up exercise to do air pollution setup on levante. +DB mentioned that his group is working a lot on greenhouse gas modeling and has found GPU ports to be crucial. They are currently transitioning to the GPU version and also engage in air pollution modeling. ICON-ART includes chemistry capabilities, although they haven't yet implemented all required processes; a postdoc is dedicated to this task. + +They are also examining the impact of net-zero scenarios on air pollution. Recently, they conducted a two-day training course for ICON-ART, including a setup exercise for air pollution modeling on Levante. + +### Will Saywer presenting "ICON on Alps: How is it performing?" +ML asks how WS's benchmarking will be able to be used. + +WS responds: It can be used, but not exactly the same. + +
+ +ML asks about achieving higher resolution with less use of the power cabinet. + +WS believes they have identified the issue and will rerun the benchmarks. R2B9 shows improvement but still doesn't match R2B7. + +DL asks if WS can correlate the different steps. There are various steps in R2B7. + +WS explains that initialization is a key step, and in the first timestep, reading boundary conditions takes a significant amount of time. This process is much quicker for lower resolutions. + +
+ +DL remarks that it would be valuable to have these results documented somewhere on the wiki. He also speculates that WS has many tricks for conducting benchmarks and asks if WS could share them. + +WS responds that he will definitely share those tricks. He mentions there's no decision yet on enabling low-noise mode but assures that there will be at least one partition available with low-noise mode. + +DL then mentions he's working on a recommended configuration. He asks if not using just-fit mode instantly drops below 50%. + +WS clarifies that it doesn't. Personally, WS disagrees with using just-fit mode and suggests running on the best-fit configuration instead. He points out that the machine is so powerful that it will still yield a good time-to-solutionnot . + +
+ +FB mentions she is writing a proposal using ICON-CLM, which is still in the development phase for GPU porting. They tested it on Daint because it doesn't run on ALPS. Maria Grazia mentioned that scaling on Daint will likely be rejected. + +WS responds that Maria Grazia mentioned they can apply for an account to conduct early testing. + +FB adds that ICON-CLM is not finalized yet and hasn't been tested on pre-ALPS systems. + +MJ notes they have a similar setup and could potentially run it on ALPS. They ask if FB can use WS's benchmarking. + +WS confirms FB can use their benchmarking, but it's a different configuration. WS isn't sure if this could be a reason for rejection. WS offers asking Maria Grazia if that would help. + +ML suggests that benchmarking for ICON-NWP could also be useful for ICON-CLM. They ask if WS could share the namelist. + +WS agrees to share the namelist and mentions that he can also provide access to weak scalability tests for certain people. + +
+ +DB asks how much faster a given configuration runs on Alps compared to Daint and if one can estimate from that comparison. + +WS responds that If you run GPU-to-GPU configuration, Alps is roughly a factor of 9 faster. However, this approach wouldn't fully utilize memory, so you would use fewer nodes. + +
+ +JC asks about the power cap per cabinet. -### Will Saywer -ML asks how WSs benchmarking will be able to be used -WL can be used but not exactly the same. +WS corrects that it's actually per GPU. In terms of energy-to-solution, you can make it run faster but not in a linear way. Grace CPU is very powerful and shares memory with GPU. Some people may want to run components on CPU, but this gives priority to the CPU, potentially slowing down the GPU. WS notes they need to caution users because running large components on the CPU might not perform well. Ocean-atmosphere coupling is an example of such a component. CSCS is advocating for a GPU port to address these issues. -ICON on Alps: How is it performaing? +
-ML asks about higher resolution reaching less of the power cabinet (ask ML) -WS thinks they got the issue and will rerun the benchmarks. R2b9 is better but still not as good as 12b7 -DL: can you correlate the different steps. There are different steps. r2b7 -WS: initialisation but also in first timestep it reads boundary conditions, takes really large amount of time. This is much shorter for lower resolutions +ML asks how reducing the cap would affect scaling. +WS responds that it would likely improve scaling, but emphasizes that setting the power cap involves political decisions. -After presentation -DL: would be valuable to have these results somewhere on wiki. -Also I guess you have a lot of tricks for doing benchmarks. Maybe you could share that? -WS: will definitely share that. No decision yet if they will enable low-noise mode. Sure that there will be at least a partiion availbale with low-noise mode. -DL: working on recommended configuration. Now you say use just-fit mode. Does it instantly drop below 50% -WS: no. WS personally does not agree with it but thinks you should run on best-fit configuration. Machine is so powerful that you will still get good time to solution. +ML queries if the power cap could vary across different clusters. -FB: also wriing proposla using icon-clm, still developing phase for gpu porting. Tested on daint because it doesnt run on ALPS. Maria Grazia said that scaling on daint will be rejected. -WS: Maria Grazia said that you can apply for accoutn to do early testing -FB: icon-clm not final yet and not tested on pre-alps. -MJ: they only have close setup. Could potentially do it on ALPS. Basic question if FB can use your benchmarking -WS: yes but it's definitely different configuration. Doesnt know if it could be a reason to be rejected. WS can ask Maria Grazia if that would help. -ML: thinks that benchmarking for icon-nwp could be useful for icon-clm. Maybe you could share namelist. -WS will share namelist. Can also give access to weak scalability test (to certain people) +WS replies that he is unsure but can inquire about it. -DB: showed ng savings. How much faster it is for given config to run on Alps compared to daint. Can we estimate from that comparison -WS: if you run gpu to gpu configuration, Alps is roughly a factor of 9 faster. But you wouldn't run it that way because it wuldnt fill memory, so you would use fewer nodes. +
-JC: power cap per cabinet -WS actually it's per GPU. in terms of energy to solution, you can get it to run faster but not in a linear way. Grace CPU very very powerful, grace shares memory with GPU. people may want to run components on CPU. Problem is that it gives priority to CPU and then GPU is slowed down. We have to warn people because it will not necessarly perfomr well if you run large components on cpu. -ocean-atmosphere coupling is such a component. CSCS is pushsing for a GPU port. +DF mentions that all tests were conducted within one cabinet and asks if it's expected for individual jobs to run within a single cabinet. -ML: what if you reduce cap? wouldnt you be able to have a good scaling -WS probably yes. its a politacl decision where to put the power cap -ML could it be different on differnt clusters -WS: doesnt know. He can ask +WS responds by mentioning they ran other tests (R2B11) that did not fit into one cabinet, noting it had minimal impact. -DF: all tests were done within one cabinet. Is it expected that individula jobs run iwthin a single cabinet -WS: run other tests (r2b11) which does not fit into one cabinet. Looks like it has minimal impact. +ML comments that a virtual cluster cap may not work because it's virtual. -ML virtual cluste cap may not work because its virtual -WS doesnt know how elastic it is. Will go back to it +WS admits he's uncertain about how elastic it is and plans to revisit the issue. -WS will discuss what he can put on wiki and will ask MG about situation if daint benchmarks are sufficient +WS states he will discuss what information can be shared on the wiki and will inquire with Maria Grazia about whether benchmarks from Daint are sufficient. \ No newline at end of file From 0b57dfd60cb8d0c499232a4994e345799929710b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Fri, 21 Jun 2024 14:46:54 +0200 Subject: [PATCH 07/13] Update polybox link --- events/icon_meetings/2024-2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/events/icon_meetings/2024-2.md b/events/icon_meetings/2024-2.md index 59cd268d..8ea7ea6e 100644 --- a/events/icon_meetings/2024-2.md +++ b/events/icon_meetings/2024-2.md @@ -43,7 +43,7 @@ _Minutes by Annika Lauber_ ## Reports ### C2SM (Michael Jähn, Annika Lauber, Matthieu Leclair, Jonas Jucker) -MJ welcomes everyone and shares the [latest news about ICON from C2SM](https://polybox.ethz.ch/index.php/s/4fXixLycOwOy41K) ( **TODO MJ**: update link). +MJ welcomes everyone and shares the [latest news about ICON from C2SM](https://polybox.ethz.ch/index.php/s/n65Nnj4L48Y4WEe). ML reporting on Alps. CK asks how to do benchmarking. From ac2fc2b09f0e7bf8012e0dd51c3532ee0dd7bd36 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Fri, 21 Jun 2024 14:50:29 +0200 Subject: [PATCH 08/13] Add dividing line instead of
--- events/icon_meetings/2024-2.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/events/icon_meetings/2024-2.md b/events/icon_meetings/2024-2.md index 8ea7ea6e..2e8397ce 100644 --- a/events/icon_meetings/2024-2.md +++ b/events/icon_meetings/2024-2.md @@ -118,7 +118,7 @@ ML asks how WS's benchmarking will be able to be used. WS responds: It can be used, but not exactly the same. -
+--- ML asks about achieving higher resolution with less use of the power cabinet. @@ -128,7 +128,7 @@ DL asks if WS can correlate the different steps. There are various steps in R2B7 WS explains that initialization is a key step, and in the first timestep, reading boundary conditions takes a significant amount of time. This process is much quicker for lower resolutions. -
+--- DL remarks that it would be valuable to have these results documented somewhere on the wiki. He also speculates that WS has many tricks for conducting benchmarks and asks if WS could share them. @@ -138,7 +138,7 @@ DL then mentions he's working on a recommended configuration. He asks if not usi WS clarifies that it doesn't. Personally, WS disagrees with using just-fit mode and suggests running on the best-fit configuration instead. He points out that the machine is so powerful that it will still yield a good time-to-solutionnot . -
+--- FB mentions she is writing a proposal using ICON-CLM, which is still in the development phase for GPU porting. They tested it on Daint because it doesn't run on ALPS. Maria Grazia mentioned that scaling on Daint will likely be rejected. @@ -154,19 +154,19 @@ ML suggests that benchmarking for ICON-NWP could also be useful for ICON-CLM. Th WS agrees to share the namelist and mentions that he can also provide access to weak scalability tests for certain people. -
+--- DB asks how much faster a given configuration runs on Alps compared to Daint and if one can estimate from that comparison. WS responds that If you run GPU-to-GPU configuration, Alps is roughly a factor of 9 faster. However, this approach wouldn't fully utilize memory, so you would use fewer nodes. -
+--- JC asks about the power cap per cabinet. WS corrects that it's actually per GPU. In terms of energy-to-solution, you can make it run faster but not in a linear way. Grace CPU is very powerful and shares memory with GPU. Some people may want to run components on CPU, but this gives priority to the CPU, potentially slowing down the GPU. WS notes they need to caution users because running large components on the CPU might not perform well. Ocean-atmosphere coupling is an example of such a component. CSCS is advocating for a GPU port to address these issues. -
+--- ML asks how reducing the cap would affect scaling. @@ -176,7 +176,7 @@ ML queries if the power cap could vary across different clusters. WS replies that he is unsure but can inquire about it. -
+--- DF mentions that all tests were conducted within one cabinet and asks if it's expected for individual jobs to run within a single cabinet. From b1707296e673f48dded6b08c11b329cfb081f7f2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Fri, 21 Jun 2024 15:04:15 +0200 Subject: [PATCH 09/13] Clarify Clarissa's report --- events/icon_meetings/2024-2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/events/icon_meetings/2024-2.md b/events/icon_meetings/2024-2.md index 2e8397ce..d8fbb038 100644 --- a/events/icon_meetings/2024-2.md +++ b/events/icon_meetings/2024-2.md @@ -67,7 +67,7 @@ BG reported that she is using ICON below 1km. There has been a paper accepted us As a side project with Nadja Omanovic, they explored the representation of clouds at 1km and 65m resolutions, using both 1-moment and 2-moment schemes. They concluded that to represent clouds resolution is more important than on the microphysics scheme. ### Clarissa Kroll -CK reported that they are getting the ICON XTP ( **TODO ML**: ML can you correct? XTP is somehow what I understood but I guess I misunderstood something) setup and plan to merge it with ICON-EXCLAIM. Additionally, they are working on a radiative dumping option. +CK reported that they are using the icon-seamless/icon-xpp setup and plan to merge it with ICON-EXCLAIM. Additionally, they are working on a radiative dumping option. ### Doris Folini DF is just here to inform herself on ICON. She was part of Christoph Schär's group, which no longer exists. From 7b3f200ad55c42e6b30d74e11b7b3e90ae55ded1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Fri, 21 Jun 2024 15:14:35 +0200 Subject: [PATCH 10/13] Even more clarification --- events/icon_meetings/2024-2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/events/icon_meetings/2024-2.md b/events/icon_meetings/2024-2.md index d8fbb038..d7371715 100644 --- a/events/icon_meetings/2024-2.md +++ b/events/icon_meetings/2024-2.md @@ -67,7 +67,7 @@ BG reported that she is using ICON below 1km. There has been a paper accepted us As a side project with Nadja Omanovic, they explored the representation of clouds at 1km and 65m resolutions, using both 1-moment and 2-moment schemes. They concluded that to represent clouds resolution is more important than on the microphysics scheme. ### Clarissa Kroll -CK reported that they are using the icon-seamless/icon-xpp setup and plan to merge it with ICON-EXCLAIM. Additionally, they are working on a radiative dumping option. +CK reported that they are using the ICON-Seamless/ICON-xpp (extended predictions and projections) setup and plan to merge it with ICON-EXCLAIM. Additionally, they are working on a radiative dumping option. ### Doris Folini DF is just here to inform herself on ICON. She was part of Christoph Schär's group, which no longer exists. From 53c8149e6a21aff8f96f97cc9f79fe78ad20b4da Mon Sep 17 00:00:00 2001 From: Matthieu Date: Mon, 24 Jun 2024 09:07:16 +0200 Subject: [PATCH 11/13] add(icon meetings): ML answer to CK --- events/icon_meetings/2024-2.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/events/icon_meetings/2024-2.md b/events/icon_meetings/2024-2.md index d7371715..c0dd049f 100644 --- a/events/icon_meetings/2024-2.md +++ b/events/icon_meetings/2024-2.md @@ -47,7 +47,8 @@ MJ welcomes everyone and shares the [latest news about ICON from C2SM](https://p ML reporting on Alps. CK asks how to do benchmarking. -Nobody really knows. ( **TODO ML**: ML should write something here) +Nobody really knows. +ML answers so far the only piece of information available is WS's benchmark (presentation to come). ### Guillaume Bertolli Over the last three years, GB has been focused on building a machine learning implementation for ECRAD. Next, instead of just predicting the contribution of radiation on temperature, the plan is to predict the sum of all contributing factors ( **TODO JJ**: JJ is that correct?). The dataset needed for this next phase has not yet been constructed. @@ -186,4 +187,4 @@ ML comments that a virtual cluster cap may not work because it's virtual. WS admits he's uncertain about how elastic it is and plans to revisit the issue. -WS states he will discuss what information can be shared on the wiki and will inquire with Maria Grazia about whether benchmarks from Daint are sufficient. \ No newline at end of file +WS states he will discuss what information can be shared on the wiki and will inquire with Maria Grazia about whether benchmarks from Daint are sufficient. From fd08e631d2bfc6b2fb2952449492939e3f81debf Mon Sep 17 00:00:00 2001 From: juckerj <39263956+jonasjucker@users.noreply.github.com> Date: Mon, 24 Jun 2024 09:58:02 +0200 Subject: [PATCH 12/13] Update 2024-2.md --- events/icon_meetings/2024-2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/events/icon_meetings/2024-2.md b/events/icon_meetings/2024-2.md index c0dd049f..3c96f054 100644 --- a/events/icon_meetings/2024-2.md +++ b/events/icon_meetings/2024-2.md @@ -51,7 +51,7 @@ Nobody really knows. ML answers so far the only piece of information available is WS's benchmark (presentation to come). ### Guillaume Bertolli -Over the last three years, GB has been focused on building a machine learning implementation for ECRAD. Next, instead of just predicting the contribution of radiation on temperature, the plan is to predict the sum of all contributing factors ( **TODO JJ**: JJ is that correct?). The dataset needed for this next phase has not yet been constructed. +Over the last three years, GB has been focused on building a machine learning implementation for ECRAD. Next, instead of just predicting the contribution of radiation, the plan is to predict the sum of all physics contributing to tendencies. The dataset needed for this next phase has not yet been constructed. ### Emmanuele Russo ER shared that for ICON-CLM, they finally found the optimal calibration for the EURO-CORDEX domain. The model performance is pretty good, especially compared to COSMO-CLM. They applied several model developments, including a routine for transient aerosol. From d530fdd030c421e116f91560a7d1261dfeb4ee5c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michael=20J=C3=A4hn?= Date: Mon, 24 Jun 2024 16:01:52 +0200 Subject: [PATCH 13/13] Add presentation and repo links --- events/icon_meetings/2024-2.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/events/icon_meetings/2024-2.md b/events/icon_meetings/2024-2.md index 3c96f054..9bb677fe 100644 --- a/events/icon_meetings/2024-2.md +++ b/events/icon_meetings/2024-2.md @@ -115,6 +115,12 @@ DB mentioned that his group is working a lot on greenhouse gas modeling and has They are also examining the impact of net-zero scenarios on air pollution. Recently, they conducted a two-day training course for ICON-ART, including a setup exercise for air pollution modeling on Levante. ### Will Saywer presenting "ICON on Alps: How is it performing?" + +- [Presentation (pdf)](https://polybox.ethz.ch/index.php/s/04cZMyZjIheyxWc) +- [ICON benchmarks](https://github.com/C2SM/icon-alps-weak) + +#### Discussion + ML asks how WS's benchmarking will be able to be used. WS responds: It can be used, but not exactly the same. @@ -137,7 +143,7 @@ WS responds that he will definitely share those tricks. He mentions there's no d DL then mentions he's working on a recommended configuration. He asks if not using just-fit mode instantly drops below 50%. -WS clarifies that it doesn't. Personally, WS disagrees with using just-fit mode and suggests running on the best-fit configuration instead. He points out that the machine is so powerful that it will still yield a good time-to-solutionnot . +WS clarifies that it doesn't. Personally, WS disagrees with using just-fit mode and suggests running on the best-fit configuration instead. He points out that the machine is so powerful that it will still yield a good time-to-solution. ---