-
Notifications
You must be signed in to change notification settings - Fork 20
ESGFNode|FAQ
Wiki Reorganisation |
---|
This page has been classified for reorganisation. It has been given the category REVISE. |
This page contains useful content but needs revision. It may contain out of date or inaccurate content. |
The release names follow an alphabetical list of Brooklyn (NYC) neighborhoods .
Currently we test and build on CentOS/RedHat 5 . There are installations that have been successfully done on Ubuntu and SuSe . I suspect most LINUX distributions are supported with little or no modifications necessary. The current installation does NOT support Windows or Mac. (For the mac, the only real issue is how mac users are created i.e. not using useradd. I suspect that this will be addressed after the LINUX installs are more settled and/or there is a sufficient demand for mac support).
The system should have gcc , gcc-c++ , openssl-devel as well the X11-devel headers installed before you begin the Node installation. Additionally you may need to install zlib-devel , gettext-devel and expat-devel as well. Actually the installation of CDAT will install the last three libraries, however, GIT is needed to pull down the CDAT distribution, hence these are actually prerequisites for building GIT. To support the GridFTP installation you will need Flex and Bison (see below for details)
NOTE: the node will not work with OpenSSL 1.0. See Bug 123
==centos 6.3 (64bit)== autoconf
- autoconf-archive.noarch: The Autoconf Macro Archive
- autoconf.noarch: A GNU tool for automatically configuring source code
automake
- automake.noarch: A GNU tool for automatically creating Makefiles
bison
- bison-devel: -ly library for development using Bison-generated parsers
- bison-runtime: Runtime support files used by Bison-generated parsers
file
- file-libs: Libraries for applications using libmagic
- file-roller: Tool for viewing and creating archives
flex
- flexiport-devel: Header files and libraries for flexiport
- jflex-javadoc.noarch: Javadoc for jflex
gcc
- libgcc: GCC version 4.4 shared support library
gcc-c++
- gcc-c++: C++ support for GCC
gettext-devel
- gettext-devel: Development files for gettext
libtool
- libtool-ltdl-devel: Tools needed for development using the GNU Libtool Dynamic Module Loader
- libtool: The GNU Portable Library Tool
libuuid
- libuuid-devel: Universally unique ID library
libxml2
- libxml2: Library providing XML and HTML support
- libxml2-devel: Libraries, includes, etc. to develop XML and HTML applications
libxslt
- libxslt: Library providing the Gnome XSLT engine
- libxslt-devel: Libraries, includes, etc. to embed the Gnome XSLT engine
lsof
- lsof: A utility which lists open files on a Linux/UNIX system
make
- make: A GNU tool which simplifies the build process for users
openssl
- openssl-devel: Files for development of applications which will use OpenSSL
pam
- pam-devel: Files needed for developing PAM-aware applications and modules for PAM
pax
- pax: POSIX File System Archiver
- pax-utils: PaX aware and related utilities for ELF binaries
readline
- readline-devel: Files needed to develop programs which use the readline library
tk
- tk-devel: Tk graphical toolkit development files
wget
- wget: A utility for retrieving files using the HTTP or FTP protocols
zlib-devel
- zlib-devel: Header files and libraries for Zlib development
ExtUtils
- perl- ExtUtils *
perl-Archive-Tar
- perl-Archive-Tar: A module for Perl manipulation of .tar files
perl-XML-Parser
- perl-XML-Parser: Perl module for parsing XML files
x11
- xorg-x11*
A more copy&paste friendly version of that (using yum for centos):
yum install autoconf automake bison file flex gcc gcc-c++ gettext-devel libtool libuuid-devel libxml2 libxml2-devel libxslt libxslt-devel lsof make openssl-devel pam-devel pax readline-devel tk-devel wget zlib-devel *ExtUtils* perl-Archive-Tar perl-XML-Parser
NOTE: There are additional prerequisites from the UV-CDAT tool that is installed as part of the DATA configuration of the stack. Please see them here: https://github.com/UV-CDAT/uvcdat/wiki/System-Requirements, most notably the need for gfortran. (In newer versions of uv-cdat gfortran is part of the installation procedure)
localhost.localdomain?)
mmm.... probably not, there are a lot of difficulties I run into for attempting such a thing.
just give yourself a name (for testing) by calling hostname myname.mydomain
and updating the /etc/hosts file to include this. For Example:
...
my.ip.goes.here myname.mydomain
127.0.0.1 myname.mydomain localhost.localdomain
...
This will solve at least some issues arising probably with the new security infrastructure.
CDAT will not work"
This most likely indicates that your machine does not have X11 headers installed. This is a vestigial dependency because the publisher has a tk/tcl graphical user interface that gets built in this process. (Pre-Requisite: X11 headers)
Solution: Install all the X11 headers
ERROR:
----------------------------
Thredds Data Server Test... (publisher catalog gen)
----------------------------
Tomcat (jsvc) process is running...
Postgres process is running...
/usr/local/cdat/bin/esgpublish --use-existing pcmdi.ichec.ie.test.mytest
--noscan --thredds
INFO 2011-01-14 11:34:22,437 Writing THREDDS catalog
/esg/content/thredds/esgcet/1/pcmdi.ichec.ie.test.mytest.v1.xml
WARNING 2011-01-14 11:34:22,468 No dataset_id option found for project test
INFO 2011-01-14 11:34:22,494 Writing THREDDS ESG master catalog
/esg/content/thredds/esgcet/catalog.xml
INFO 2011-01-14 11:34:22,497 Reinitializing THREDDS server
ERROR 2011-01-14 11:34:22,499 Error reading url
https://localhost:443/thredds/admin/debug?catalogs/reinit: URLError('unknown
url type: https',)
Traceback (most recent call last):
File "/usr/local/cdat/bin/esgpublish", line 5, in <module>
pkg_resources.run_script('esgcet==2.7.4', 'esgpublish')
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 489, in run_script
File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1207, in
run_script
File
"/usr/local/cdat/lib/python2.6/site-packages/esgcet-2.7.4-py2.6.egg/EGG-INFO/scripts/esgpublish",
line 434, in <module>
...
-
1 - Check tomcat's server.xml file in the connector section for port 443 is setup properly. Namely, check that the paths to the truststore and keystore are correct and that the passwords are also correct. To check that the passwords are correct use java's keytool to do a simple listing. This requires you to put in the password. If they password you enter and what is in the connector is correct, then it confirms the password and path to keystore/truststore is good.
%> keytool -list -v -keystore -storepass
-
2 - Make sure you have installed the openssl-devel package on your machine. It is a prerequisite that is not done by the installation script. Openssl is needed when building python (the CDAT portion of the install when python is built which needs to have openssl support). (see: bug report )
Solution: install openssl-devel
The ESGF Node makes use of environment variables to set different parameters used during installation and operation. If you are changing an environment variable at the command line or in your environment and they don't seem to be taking affect (i.e. the value is not being changed accordingly) - first thing to do is to check if the environment variable is already being set in /etc/esg.env. Key variables are set in the /etc/esg.env file. This file is sourced as the last environment sourcing sequence, which means that it supercedes variables set at the command line or in the shell environment. This file is chmod'ed 644 to prevent anyone except the node administrator setting these values.
Solution: Check the /etc/esg.env file... if the value you are attempting to set is already present in /etc/esg.env then you will have to either 1) remove the value from the file or 2) edit the file to change the variable entry to desired value. To do either of these things, you have to be the node admin.
Occasionally after a fresh install or upgrade when visiting the main web page you get a strange 500 error page saying the following...
HTTP Status 500 -
type Exception report
message
description The server encountered an internal error () that prevented it from fulfilling this request.
exception
org.apache.jasper.JasperException: org.apache.jasper.JasperException: Unable to load class for JSP
org.apache.jasper.servlet.JspServletWrapper.getServlet(JspServletWrapper.java:161)
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:340)
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
...
To fix this, simply restart the node:
%> esg-node restart
This issue may be related to a page caching chichen/egg issue. Much like having to run LaTex multiple times to compile a document. The cause is still under investigation, but the solution is relatively straight forward.
ERROR - esg.node.filters.AccessLoggingDAO - java.sql.SQLException:
>> ERROR: relation "seq_access_logging" does not exist
>> Query: select nextval('seq_access_logging') Parameters: []
This will be the case if you are upgrading the esgf node manager from an installed version older than 1.0.4.0. The database has to be manually updated. As the database admin, run the following script: [ esgf node manager update database script ](http://rainbow.llnl.gov/dist/esg- node/db_upgrade/create_access_logging.sql)
Scenario: The build and installation of postgres seemed to have gone fine, however when it is time to start postgres you get:
su: /dev/null: Permission denied
Starting Postgress...
su postgres -c "/usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data start"
su: /dev/null: Permission denied
ERROR: Could not start database!
This is because there was already a postgres user on your system and it is set to /dev/null in /etc/passwd. For using the command line postgres commands you need the postgres user to have a real shell. Edit the /etc/passwd file and change the entry for postgres to use the shell /bin/bash.
or run this sed command:
sed -i 's#\(postgres.*:\)\(.*\)$#\1/\bin/\bash#' /etc/passwd
If you have "COMPUTE" also installed make sure that you have the file /esg/content/las/conf/server/las_servers_static.xml
# touch /esg/content/las/conf/server/las_servers_static.xml
(or respective $ESGF_HOME location) Then restart the node
This happens when the installer breaks at certain points. The created/downloaded files in the tomcat webapp directory are still owned by root and cannot be accessed by the tomcat server. Just assure everything is assigned to tomcat:
# chown -R tomcat:tomcat /usr/local/tomcat
The default value for open files is 1024 which might bee too low (shouldn't but there's a leakage that leave files open until they get garbage collected) Check the number of open files allowed for tomcat:
#as tomcat run ulimit
# su -c "ulimit -n" tomcat
1024
#if it's that low try to increase it to 4096 by adding this line to /etc/security/limits.conf
tomcat - nofile 4096
#Check it has been changed
# su -c "ulimit -n" tomcat
4096
This is probably a bug in the security library
threads (200) created..."
There something that's leaving dangling connections in the software stack (or the clients). Those connections are garbaged collected at some point and the only resources the take are the ports they leave open. A work around is to increase the number of threads in the connector (port 80) to something above the default of 200:
<Connector port="80" protocol="HTTP/1.1"
...
maxThreads="400"/>
If you get into problems try doubling it again.
You'll need to restart the node after that.
The best approach I know of is using "jconsole". Here's a description of what's need to be done: http://download.oracle.com/javase/1.5.0/docs/guide/management/agent.html
Basically you need to start tomcat with some extra parameters. The most simple one, apparently not viable for production environment because of memory consumption of the jconsole thread is to add just this:
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=[your jmx port]
-Dcom.sun.management.jmxremote.ssl=true
-Dcom.sun.management.jmxremote.authenticate=true
-Djavax.net.ssl.keyStorePassword=[your password]
-Djavax.net.ssl.keyStore=[full path to keystore file]
Then you'll have to start the jconsole from the same machine and plug to the probably "unnamed" process. You could check the PID externally to be sure.
There is good documentation on how to do this using openssl in the official Globus Toolkit documentation:
(client perspective)
Please be sure to check out this page as well if your question is not answered here:
IDP Node type FAQ on MyProxy (server-side)
Try to connect manually and see what is causing the error:
globus/bin/myproxy-logon -v -s <gateway_host.domain> -l <user> -p <port> -o <X509_CERT_DIR> -T
Where:
variable
default
(for tests, this will change)
Meaning
gateway_host.domain
pcmdi3.llnl.gov
The gateway where the MyProxy server is running
user
Expected "=" to follow "no_default"
The user name of the gateway account. This is the one used for publishing
port
2119
The default value is really 7512 for the MyProxy service, but the pcmdi3.llnl.gov is using this port instead.
X509_CERT_DIR
/root/.globus/certificate-file
Where the certificate will get stored.
The -T parameter tells the MyProxy client to retrieve the root certificate of the server. It is probably required to delete the certificates directory (X509_CERT_DIR) if you connect to a second server. In this case you will probably get an Error like:
OpenSSL Error: s3_clnt.c:897: in library: SSL routines, function SSL3_GET_SERVER_CERTIFICATE: certificate verify failed
globus_gsi_callback_module: Could not verify credential
globus_gsi_callback_module: Can't get the local trusted CA certificate: Untrusted self-signed certificate in chain with hash acdc777a
The error above appears to indicate that whatever's in your $X509_CERT_DIR is not compatible with the MyProxy server that you're trying to get credentials from. For example, if for some reason the MyProxy server is no longer trusted (i.e. trustroots have changed on the server side), you have little choice but to clear out or remove the existing X509_CERT_DIR and try again. An example of this is shown below:
export X509_CERT_DIR=/some/dir
rm -rf $X509_CERT_DIR
[ re-run myproxy logon here using the -T option ]
The X509_CERT_DIR directory on the client side, while not useless, is disposable. So you can rm -rf it if you'd like before every MyProxy logon if you wanted to be very inefficient about things. In most cases if you run into trouble, that will solve the issue.
Symptom:
Please provide a Globus username []: <hidden>
Globus password []: Creating directory: /var/lib/globus-connect-server
ENTER: IO.setup()
ENTER: IO.configure_credential()
ENTER: GCMU.configure_credential()
EXIT: GCMU.configure_credential()
Writing GridFTP credential configuration
EXIT: IO.configure_credential()
ENTER: configure_server()
Creating gridftp configuration
EXIT: IO.configure_server()
ENTER: IO.configure_sharing()
GridFTP Sharing Disabled
ENTER: IO.configure_trust_roots()
ENTER: GCMU.configure_trust_roots()
Fetching MyProxy CA trust roots
ENTER: get_myproxy_dn_from_server()
fetching myproxy dn from server
MyProxy DN is None
EXIT: get_myproxy_dn_from_server()
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/globus/connect/server/io/setup.py", line 137, in <module>
ioobj.setup(reset=reset)
File "/usr/lib/python2.6/site-packages/globus/connect/server/io/__init__.py", line 68, in setup
self.configure_trust_roots(**kwargs)
File "/usr/lib/python2.6/site-packages/globus/connect/server/io/__init__.py", line 285, in configure_trust_roots
super(IO, self).configure_trust_roots(**kwargs)
File "/usr/lib/python2.6/site-packages/globus/connect/server/__init__.py", line 475, in configure_trust_roots
self.get_myproxy_dn_from_server()
File "/usr/lib64/python2.6/os.py", line 471, in __setitem__
putenv(key, item)
TypeError: putenv() argument 2 must be string, not None
- Underlying problem: it is trying to do
myproxy-logon -b -s <myproxy_endpoint>
and failing. - Possible cause: firewall. Check what host is in
myproxy.endpoint
in/esg/config/esgf.properties
, and if it is your host, check that incoming port 7512/tcp is open from the data nodes. - Testing: as non-root (e.g. user globus), try the
myproxy-logon -b -s <myproxy_endpoint>
. Do not try it as root. If you try it as root, it will fail for some other reason even if it is in fact working fine as non-root.
The next two errors are different than the above, and are somewhat related. The errors are shown below, but the solution in both cases is generally the same. Don't run myproxy-logon as root .
Error authenticating: GSS Major Status: Authentication Failed
GSS Minor Status Error Chain:
globus_gss_assist: Error during context initialization
globus_gsi_gssapi: Unable to verify remote side's credentials
globus_gsi_gssapi: Unable to verify remote side's credentials: Couldn't verify the remote certificate
OpenSSL Error: s3_pkt.c:1053: in library: SSL routines, function SSL3_READ_BYTES: sslv3 alert bad certificate SSL alert number 42
------------------------
Error authenticating: GSS Major Status: Authentication Failed
GSS Minor Status Error Chain:
globus_gss_assist: Error during context initialization
globus_gsi_gssapi: Unable to verify remote side's credentials
globus_gsi_gssapi: SSLv3 handshake problems: Couldn't do ssl handshake
OpenSSL Error: s3_pkt.c:1086: in library: SSL routines, function SSL3_READ_BYTES: sslv3 alert unsupported certificate SSL alert number 43
As the root user, myproxy-logon attempts to use host certificates if they are present, rather than user certificates to authenticate with the server. It's highly unlikely that this will succeed. As a non-root user, the user certificates are looked for. Unlike the host certificates on the datanode, the user certificates do not exist. Therefore, anonymous authentication is attempted (which succeeds).
If for some reason it's absolutely imperative that you run myproxy-logon as root, you can force myproxy-logon to think that no host certificates are present on the datanode by explicitly setting the following environment variables to files that don't exist:
export X509_USER_CERT=foo
export X509_USER_KEY=bar
For more information on MyProxy trustroots, check out the official documentation here:
Be sure to install these before running installation script...
Flex v2.5.35
[ http://downloads.sourceforge.net/project/flex/flex/flex-2.5.35/flex-2.5.35.t ar.gz ](http://downloads.sourceforge.net/project/flex/flex/flex-2.5.35/flex-2. 5.35.tar.gz)
Bison v2.4
http://ftp.gnu.org/gnu/bison/bison-2.4.tar.gz
There are a couple of environment variables that will turn the debugging more verbose for both client and server:
export GLOBUS_ERROR_OUTPUT=1
export GLOBUS_ERROR_VERBOSE=1
export GLOBUS_GSI_AUTHZ_DEBUG_LEVEL=2
You may then start the server in the debug mode (you could add -l /tmp/gridftplog
to save the output to a log file.):
globus-gridftp-server -debug -d all -p <port>
In this mode the server ends after the transfer is done. In any case it is
best to add the -debug -d ALL
parameter to the call already used to start
it (in case you _ can _ start gridFTP). This will display the complete
invocation:
ps -wwo args= -C "globus-gridftp-server"
If this is a problem with the security the client might present an output like this:
error: globus_ftp_client: the server responded with an error
500 500-Command failed. : globus_i_gfs_data.c:globus_l_gfs_authorize_cb:911:
500-authorization failed.
500-globus_gsi_authz.c:globus_gsi_authorize:507:
500-Callout returned an error
500-globus_callout.c:globus_callout_handle_call_type:749:
500-The callout returned an error
500-globus_gfork_lib.c:gfork_l_get_env_fd:460:
500-GFork error: Env not set
500 End.
The causes might be many, so you'll have to debug the server. You'll have to mimic the starting command as much as possible in order to debug it properly (see debugging gridFTP above). For Example a complete debug command (in our case) looks like this:
GLOBUS_GSI_AUTHZ_DEBUG_LEVEL=1 GLOBUS_ERROR_OUTPUT=1 GLOBUS_ERROR_VERBOSE=1 GLOBUS_TCP_PORT_RANGE=60000,64000 GLOBUS_TCP_SOURCE_RANGE=60000,64000 GSI_AUTHZ_CONF=/etc/grid-security/gsi-authz.conf /usr/local/globus/sbin/globus-gridftp-server -disable-command-list APPE,DELE,ESTO,MKD,RMD,RNFR,RNTO,RDEL,STOR,STOU,XMKD,XRMD,CHMOD -p 2811 -chroot-path /esg/gridftp_root -usage-stats-id 2811 -usage-stats-target localhost:0\!all -acl customgsiauthzinterface -no-cas -debug -d ALL
The following are some examples of what might go wrong:
globus_error_put(): globus_gsi_system_config.c:globus_i_gsi_sysconfig_create_cert_dir_string:411:
Could not find a valid trusted CA certificates directory
globus_gsi_system_config.c:globus_gsi_sysconfig_dir_exists_unix:4694:
File does not exist: /root/.globus/certificates is not a valid directory
This means the X509_CERT_DIR environment variable was not set. Set it pointing
to the certificates directory, e.g. /etc/grid-security/certificates
,
before starting the server.
Be sure the globus-gridftp-server
command was started with the -no-cas
parameter which tells the gridFTP instance to use ESG's security procedure.
Calling out to auth service https://albedo2.dkrz.de/esgcet/saml/soap/secure/authorizationService.htm to retrieve SAML Assertion
for identity https://albedo2.dkrz.de/esgcet/myopenid/user, file ftp://cmip2.dkrz.de/somefile.nc, and action read
SOAP 1.1 fault: SOAP-ENV:Client [no subcode]
"SSL_ERROR_SSL
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed"
Detail: SSL_connect error in tcp_connect()
This means that the SSL connection to the gateway for authorization purposes failed. The SSL connection requires that the server have a trustworthy certificate, which is granted if either the certificate itself or the CA chain certifying it's validity is in the CA certificate directory.
So you'll have to:
-
Check the certificate is valid: * printf "GET /\n\n" | openssl s_client -connect ipcc-ar5.dkrz.de:443 -CApath $X509_CERT_DIR -verify 999 -quiet
-
Check the Certificates' directory missing problem.
-
If you are using chroot, assure the /$X509_CERT_DIR exists. For example this should do:
cp -r
$X509_CERT_DIR $ [chroot}$X509_CERT_DIR
(normally this implies /esg/gridftp_root/etc/grid-security/certificates == /etc/grid-security/certificates
)
- Check the file exists and you have permission to download it (see Permission missing )
Well this is almost impossible to spot because at this time there's no hint in
any of the mentioned outputs. You'll have to check the gateway logs, and make
sure you are logging esg.saml.authz.service.impl.SAMLAuthorizationServiceSoapImpl
on debug modus.
If this is the case you'll se in the logs something like:
[DEBUG] esg.saml.authz.service.impl.SAMLAuthorizationServiceSoapImpl: SOAP response:
<?xml version="1.0" encoding="UTF-8"?>
<soap11:Envelope xmlns:soap11="http://schemas.xmlsoap.org/soap/envelope/">
<soap11:Body>
<samlp:Response xmlns:samlp="urn:oasis:names:tc:SAML:2.0:protocol" ID="c3434258-f4e5-4b8c-9593-f72cefa00519" InResponseTo="140734728698544" IssueInstant="2010-11-02T13:49:59.366Z" Version="2.0">
[...]
<saml:AuthzDecisionStatement Decision="Indeterminate" Resource="gsiftp://cmip1.dkrz.de:2811/test.nc">
<saml:Action>read</saml:Action>
</saml:AuthzDecisionStatement>
[...]
</samlp:Response>
</soap11:Body>
</soap11:Envelope>
There are two things you should note here:
-
The _ saml:AuthzDecisionStatement Decision="Indeterminate" _ means the access request was not granted (you should se a _ Decision="PERMIT" _ if it where).
-
The requested resource was _ Resource="gsiftp://cmip1.dkrz.de:2811/test.nc" _ this entry should be verbatim equal with that from the gateway DB in the metadata.file_accesess_point table. At the time the publisher is wrongly publishing files as (in our example) _ gsiftp://cmip1.dkrz.de:2811//test.nc _ , so you have to access that file exactly as in this case (i.e. doubling the slashes before the file name).
The GridFTP server don't send the gsiftp URL string "as is" to the gateway for
authorization. it replaces the hostname of the local file by the one from the
host where it is running. So if hostname -f
is not reporting the name
expected (i.e. the one from gsiftp://host_name/) you should define it in the GLOBUS_HOSTNAME
environment variable (which probably you would like to add
to the /etc/esg.env
file).
To clarify this a little more with an example:
- Datanode is called cmip.dkrz.de
- the alias bmbf-ipcc-ar5.dkrz is used for accessing the file
-
globus-url-copy gsiftp://bmbf-ipcc-ar5.dkrz/mydata/myfile.nc file:///dev/null
-
will be trigger the query of (gsiftp://cmip.dkrz.de/mydata/myfile.nc, read) to the gateway attribute service, which probably won't work.
In order for this to work you'll have to:
echo "export GLOBUS_HOSTNAME=bmbf-ipcc-ar5.dkrz" >> /etc/esg.env
#restart the node as usual
reason
There are some issues regarding how the server cache the existing files and their location. If a file or directory is not present while it's being accessed the TDS marks it as "non-existent" and don't try it again afterwards. Every access attempt will be reported as trying to access a non existent file, even though it might be now accessible.
This happens especially when mounting a remote file system after the TDS has been started. DKRZ has experience this a couple of times, because the gpfs system might take longer than expected to be available and thus cause such inconsistency. The only known solution is to restart the server.
1) Be sure that your node or institution is NOT performing web caching for this node!!!
2) Be sure that your node is visible from the outside, i.e. can accept inbound external connections!!!
/usr/local/cdat/lib/python2.6/site-packages/cdtime.so:"
It looks like a problem with the SeLinux security extension, which (apparently) affect shared library loading. E.g., from
[ http://www.archiware.com/support/index.php?_m=knowledgebase&_a=viewarticle&k barticleid=58 ](http://www.archiware.com/support/index.php?_m=knowledgebase&_a =viewarticle&kbarticleid=58) :
In case you run a Linux host and get the following error message in the logfile when starting PresSTORE: Error: modload: /usr/local/aw/bin/libarchdev.so:
- couldn't load file "/usr/local/aw/bin/libarchdev.so": /usr/local/aw/bin/libarchdev.so: cannot restore segment prot after
reloc:
- Permission denied
Fatal: modload: failed to load module '/usr/local/aw/bin/libarchdev.so'
This problem is most probably caused by the security extension SeLinux . SeLinux is active in newer Linux distributions with 2.6. kernels. SeLinux changes some system default behaviour, including the shared library loading.
This can be checked by disabling SeLinux : just add the line
- SELINUX=disabled
to the file
- /etc/sysconfig/selinux
an restart the host.
In case the shared libary can be loaded this way, but the SeLinux shall be kept active, it is required to adopt the security context for the shared library loading by using the chcon program.
without parent THREDDS catalog"
This is a fairly common error. It means the gateway could not reach the node THREDDS server to upload the catalog. Things to check:
- In esg.ini, thredds_url should be the address of the esgf "data" node THREDDS server
- port 443 should be accessible from outside.
- tomcat & thredds should be running.
!ServiceException: Access is denied"
This is probably caused because the gateway account doesn't have the required publishing role.
Find out which group membership is required.
Go to the parent project and select the "Administration" tab.
Login to the gateway and go to Account->"List current Membership"
(see
picture above)
check that the account is member of the proper group and has the special role of _ Data Publisher _
or derived from BaseException, not str
This is an improperly handled java error thrown by the index node. So this could be anything. Most likely this is a security issue that should be fixed from the index node side (if you have access check the logs, they'll tell you exactly what happened).
If not at least verify that you are really publishing what you want, and that the catalog exists and is accesible from the web server (everything that forbids this will cause such an exception).
Particularly check the esg.ini file for this properties (check for typos!):
thredds_url = http://<data_node_fqdn>/thredds/esgcet
this appears to be a problem with tomcat at the gateway. Contact the gateway admin. Or see the gateway FAQ if you are one.
known')
This is probably caused by a port number in the hessian_service_url
entry.
Although it says url, only a subset of url is supported (not the port number
stuff). So just use the hessian_service_port
variable for that.
For example change this:
hessian_service_url=https://myserver.com:8443/...path...
into:
hessian_service_url=https://myserver.com/...path...
hessian_service_port=8443
The problem is that _ myserver.com:8433 _ is understood as the server name and tried to be resolved through DNS to find its ip number.
esgcet.publish.hessianlib.RemoteCallException: Java ServiceException: Parent THREDDS catalog is null, cannot start algorithm without parent THREDDS catalog.
check the _ thredds_url _ variable in the esg.ini file is properly set, a typo there might be preventing the gateway to find the TDS. For Example:
thredds_url = http://cmip2.dkrz.de/thredds/esgcet
If you are not publishing to the default esgcet collection be sure the tredds_url also points to the thredds_root directory, e.g.:
thredds_root = /esg/content/thredds/lucid
thredds_url = http://cmip2.dkrz.de/thredds/lucid
Of course you could use the map files if you still have them and issue:
esgpublish --map mapfile.map --noscan --thredds
But if you don't, you could get a list of the datasets (e.g. of project cmip5) via
esglist_datasets --select name --no-header cmip5 > datasets.txt
Then you could trim that down (or pipe grep in the middle) to select those
interesting datasets. Then you publish them using the \--use-list
flag
esgpublish --use-list datasets.txt --project cmip5 --noscan --thredds
Basically to recreate the complete catalog you could issue:
esglist_datasets --select name --no-header cmip5 | esgpublish --use-list - --project cmip5 --noscan --thredds
This will recreate the catalogs, but if they were already published to a gateway, it must be republished or somehow that new url must get in there
In trying to publish I get this error: :SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure'),)
We're forcing TLS to overcome a security vulnerability in sslv3. Unfortunately, this requires for now a manual patch to the python installed with uvcdat for use with the publisher.
-
Edit /usr/local/uvcdat/1.5.0/lib/python2.7/httplib.py
-
Modify line 1176
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file,ssl_version=ssl.PROTOCOL_TLSv1)
If you have access to the gateway look at the Gateway's publishing FAQ .
Well, at the current time all federation certificates must be present at the following places:
-
$X509_CERT_DIR
-
${chroot}$X509_CERT_DIR
-
${chroot} + distro dependent (see this to know more)
-
_ Notes _ :
-
$X509_CERT_DIR is set before starting gridFTP and normally points to /etc/grid-security/certificates.
-
${chroot} points to the chroot directory used by gridFTP (if you are not using it, assume chroot="")
-
And the java truststore file:
- $CATALINA_HOME/conf/esg-truststore.ts
- $JAVA_HOME[/jre]/lib/security/jssecacerts
_ Notes _ :
* I _ think _ [/jre] is optional and depends on whether $JAVA_HOME points to a JDK or JRE (no '/jre' then).
authentication took place
The probable cause for this is that the tomcat ssl certificate for the node is missing from the truststore. This is required for the time being.
The installer should have already done this, but check it is there with this and insert it if it's not.
Check you can download it from the gateway (you must publish the file to a gateway in order to be able to retrieve it) If you can check the problem is probably at the SAML as the attribute server is not properly identifying either:
- the transaction: check you are not accessing it in https mode at the node (if so change to http and try again). Check the gateway link to the file works.
- the data node: most specifically this might be a certificate issue, see the previous entry .
See the Security subsection and specially its FAQ .
In the ESGF P2P Node these values are the base 10 bit values in a bit vector. They can be combined in any permutation to give you the node configuration you desire. They correspond exactly to the installation "--type" value you set, specifically:
-
DATA_BIT=4 -> "data" type
-
INDEX_BIT=8 -> "index" type
-
IDP_BIT=16 -> "idp" type
-
COMPUTE_BIT=32 -> "compute" type
There is the type "all" which is the sum of these values, namely:
- ALL_BIT=$((DATA_BIT+INDEX_BIT+IDP_BIT+COMPUTE_BIT)) which gives you a sum of 60 . -> "all" type
FYI the other bits are used to determine the setting of the script execution, specifically:
- INSTALL_BIT=1
- TEST_BIT=2
commit?
This happens because of an error while handling file permissions. See this to work around it: [ http://superuser.com/questions/204757/git-chmod-problem- checkout-screws-exec-bit ](http://superuser.com/questions/204757/git-chmod- problem-checkout-screws-exec-bit)