Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some component repeatedly causing an exception, and this is preventing instance startup #339

Open
flyn-org opened this issue Mar 5, 2022 · 2 comments

Comments

@flyn-org
Copy link

flyn-org commented Mar 5, 2022

I am running Eucalyptus 5.1 on CentOS 7.9.2009. Something is causing Eucalyptus to enter a state where it will no longer start instances. Trying to start an instance places the instance into the "pending" state for about 20 minutes after which the instance stops. While this happens the message below is repeatedly written to the logs.

Restarting all of the Eucalyptus services on the computer does not seem to help. Rebooting does seem to restore Eucalyptus to a usable state, but eventually these symptoms reappear.

Fri Mar 4 21:26:07 2022 ERROR [NioServerHandler:web-services-worker-pool-7] [com.eucalyptus.ws.server.NioServerHandler.exceptionCaught(NioServerHandler.java):174] Internal Error: Invalid request : com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]
Fri Mar 4 21:26:07 2022 ERROR [BroadcastNetworkInfoCallback:eucalyptus-bootstrap-callbacks-basiccallbackprocessor-worker-2329] [com.eucalyptus.cluster.callback.BroadcastNetworkInfoCallback.fireException(BroadcastNetworkInfoCallback.java):74] Error in network information broadcast: Action:ProblemAction Code:soapenv:Client Id:RelatesTo Error: Invalid request : com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]Invalid request : com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]
com.eucalyptus.ws.EucalyptusRemoteFault: Action:ProblemAction Code:soapenv:Client Id:RelatesTo Error: Invalid request : com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]Invalid request : com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]
 at [row,col {unknown-source}]: [1,0]Invalid request : com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]
	at com.eucalyptus.ws.handlers.IoSoapHandler.perhapsFault(IoSoapHandler.java:175)
	at com.eucalyptus.ws.handlers.IoSoapHandler.channelRead(IoSoapHandler.java:81)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	at com.eucalyptus.ws.handlers.IoSoapMarshallingHandler.channelRead(IoSoapMarshallingHandler.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	at com.eucalyptus.ws.handlers.IoMessageWrapperHandler.channelRead(IoMessageWrapperHandler.java:58)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1070)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:904)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1280)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:890)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:564)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:505)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:419)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:391)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
	at java.base/java.lang.Thread.run(Thread.java:829)

nc.log contains entries like this, which we think might be related:

version (b3f3c403) and applied version ((null)) do not match (yet), waiting
@obino
Copy link

obino commented Mar 6, 2022

@flyn-org could you try the workaround in #307 and report back? Thanks,

@flyn-org
Copy link
Author

flyn-org commented Mar 7, 2022

@obino, thank you, this looks promising. Here is what I did:

  1. Ran euctl bootstrap.webservices.http_max_chunk_bytes=153600. I was not sure if I needed to restart the Eucalyptus services, so I did not restart them.
  2. Began starting instances using euca-start-instances. I started one at a time, waiting for each to enter a running state.
  3. I made it to the last instance (number 51) before seeing the EOF-related exception above.
  4. I then ran euctl bootstrap.webservices.http_max_chunk_bytes=307200.
  5. The EOF-related exception stopped. In total, it fired seven times between step 3 and 4. I am not sure if step 4 caused the EOF-related exceptions to stop, or if it was a coincidence.
  6. I was then able to start the final instance.

There has been what seems to be some non-determinism mucking up our experiments, so I am hesitant to state conclusively whether this solved our problem. That said, it seems to help, and I am very thankful for the pointer. I will report back in a few days if things remain stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants