-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STOP-ALL ERROR: Error in netcdf routine #108
Comments
Hi Alexander, |
("Input/output error" is the error message from the NetCDF library and "STOP-ALL ERROR: Error in netcdf routine" is the error message from the emep model) |
Hi, |
Ah, this shows another error: Edit: actually, the error is "too many words". That is something else. This error comes from the colons in the meteo name. It is a weakness in the code, that it fails for those names, because the sites/sondes use the colons as separator for some internal name handling. It should still work as intended though (I think!). So this is not what is causing the main error. |
It happens randomly. The log file I sent is an example of one of the many tests I did; this one is based on the meteo files provided by Massimo. With my meteo files it stopped, for example on January 17th and the next time, after relaunch, somewhere in June, then another time in October. |
Actually I do not think that the colons will make a difference, otherwise it would not work at all. |
As I wrote on my email, I have seen this kind of behaviour when there is not enough disk quota for the full run outputs. Can you confirm that you have enough disk quota for the full run? Other cause for this erratic behaviours is problems with the linking on the runtime environment. That is why I asked for the
Some of the libraries are not found at runtime, which puzzles me.
Did you built your docker image in two stages? If so, might have not copied all the runtime dependencies from the first stage image (build) to the second stage image (runtime). |
Hi Alvaro, I post here the Dockerfile used. It's very old and we have not updated till now because it was working well. To compile the new exe for emep we did it into the container from this image and I thought that this way was safe because it would link the libs present in the image. FROM debian:jessie
LABEL project="IAM-SUL" \
author="Dario Rodriguez" \
image_name="" \
version="1.0" \
released="2018-12-12" \
software_versions="OpenMPI 3.0 NCDF 4 Fortran 95" \
description="EMEP air quality model, for training and validation of the SHERPA simplified model"
ENV DEBIAN_FRONTEND=noninteractive
ENV TERM xterm
ENV DISPLAY :1.0
ENV LC_ALL C.UTF-8
RUN apt-get update && apt-get -yq install gcc gfortran g++\
build-essential \
tar \
bzip2 \
m4 \
zlib1g-dev \
libopenmpi-dev \
curl \
wget
RUN apt-get install -y apt-utils
RUN apt-get install -y libnetcdf-dev
COPY packages/hdf5-1.10.3.tar.bz2 hdf5-1.10.3.tar.bz2
COPY packages/netcdf-c-4.6.2.tar.gz netcdf-c-4.6.2.tar.gz
#COPY packages/netcdf-4.3.3.1.tar.gz netcdf-4.3.3.1.tar.gz
#COPY packages/netcdf-4.3.2.tar.gz netcdf-4.3.2.tar.gz
#COPY packages/netcdf-cxx4-4.2.1.tar.gz netcdf-cxx4-4.2.1.tar.gz
COPY packages/netcdf-fortran-4.4.4.tar.gz netcdf-fortran-4.4.4.tar.gz
#Build HDF5
RUN tar xjvf hdf5-1.10.3.tar.bz2 && \
cd hdf5-1.10.3 && \
CC=mpicc ./configure --enable-parallel --prefix=/usr/local && \
make -j4 && \
make install && \
cd .. && \
rm -rf /hdf5-1.10.3 /hdf5-1.10.3.tar.bz2
RUN apt-get install -y libcurl3 libcurl4-gnutls-dev
#Build netcdf
RUN tar xzvf netcdf-c-4.6.2.tar.gz && \
cd netcdf-c-4.6.2 && \
./configure --prefix=/usr \
CC=mpicc \
LDFLAGS=-L/usr/local/lib \
CFLAGS=-I/usr/local/include && \
make -j4 && \
make install && \
cd .. && \
rm -rf netcdf-c-4.6.2 netcdf-c-4.6.2.tar.gz
#RUN tar xzvf netcdf-cxx4-4.2.1.tar.gz && \
# cd netcdf-cxx4-4.2.1 && \
# ./configure --prefix=/usr/local \
# CC=mpicc \
# LDFLAGS=-L/usr/local/lib \
# CFLAGS=-I/usr/local/include && \
# make check && make -j4 && \
# make install && \
# cd .. && \
#rm -rf netcdf-cxx4-4.2.1 netcdf-cxx4-4.2.1.tar.gz
ENV LD_LIBRARY_PATH /usr/local/lib
RUN tar xzvf netcdf-fortran-4.4.4.tar.gz && \
cd netcdf-fortran-4.4.4 && \
./configure --prefix=/usr/local CC=/usr/bin/mpicc FC=/usr/bin/gfortran LDFLAGS=-L/usr/local/lib CFLAGS=-I/usr/local/include && \
make && make install && \
cd .. && \
rm -rf netcdf-fortran-4.4.4 netcdf-fortran-4.4.4.tar.gz
##install apt utils and sudo
RUN apt-get install -y sudo
RUN apt-get install -y openmpi-bin openmpi-common openssh-client openssh-server libopenmpi1.6 libopenmpi1.6-dbg
ENV MY_HOME=/home/iamsulproc
RUN export uid=35727 gid=41068 \
&& mkdir -p ${MY_HOME} \
&& echo "iamsulproc:x:${uid}:${gid}:iamsulproc,,:${MY_HOME}:/bin/bash" >> /etc/passwd \
&& echo "iamsulproc:x:${uid}:" >> /etc/group \
&& chown ${uid}:${gid} -R ${MY_HOME} \
&& echo "iamsulproc ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/iamsulproc \
&& chmod 0440 /etc/sudoers.d/iamsulproc
RUN apt-get install -y --no-install-recommends \
vim \
python-crypto \
python-dateutil \
python-dev \
python-lxml \
python-numpy \
python-openssl \
python-pip \
python-psycopg2 \
python-scipy \
python-urllib3 \
python-colorama \
python-distlib \
python-html5lib \
python-pkg-resources \
python-requests \
python-scipy \
python-setuptools \
python-six \
python-wheel \
python-pip-whl \
swig
RUN apt-get install -y unzip
USER iamsulproc
ENV HOME /home/iamsulproc
CMD /bin/bash Which procedure do you suggest? |
This is the base image for the emep model compilation and runtime. |
We had compiled the model, last time and with this unstable situation, inside the container from this image and in a mounted external folder with the EMEP code. So, the libs should have been taken from the image itself. |
If understand your answer correctly, you're compiling the model from within a Docker container The
|
Sorry, I explained me badly. We compiled the model within a container and running itself there. We just have physically the code out of it in a mounted folder. So, it's inside at compiling time. But what you point out from the result of LDD is puzzling me. I don't understand this result indeed. |
OK, now is much clearer. Can I see the Dockerfile for the model compilation and runtime? The attached file is an example multi stage build for the EMEP MSC-W model from 2 years ago. |
Dear,
I am contacting you about a problem I experience with the EMEP model using WRF meteorology. I previously contacted you via email, but I agree that's better to contact you via Github.
I am running the model (rv4_34) on a docker system here at the JRC (Ispra), see issue #76
With the IFS meteo, the EMEP model works fine, but when I used WRF meteo it gives me the following error:
"Input/output error
STOP-ALL ERROR: Error in netcdf routine".
This happens at any random day. This means that when I re-launch the model, it stops let say in February. When I re-launch again the model stops in June (and the problem in February doesn't pop up anymore), with the same error message.
I've created new WRF meteo (different physics) and different resolution to reduce the file size. Currently I am trying to run the model with WRF meteo files of ~2.0GB.
I have no idea what thee problem is. I am also in contact with Massimo Vieno and he told me to add this line in NetCDF_mod.f90
Still I get the same error message.
As suggested by you I've tried the command ldd name_executable. This is what I get:
Also, this netCDF 4.6.2 has been built with the following features:
Do we miss something when we compile the EMEP code?
Best regards,
Alexander de Meij
The text was updated successfully, but these errors were encountered: