Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Civilization II on B2 systematically crashes on a Linux host, works well on a Windows host #206

Open
NovHak-Linux opened this issue Sep 2, 2024 · 9 comments

Comments

@NovHak-Linux
Copy link

NovHak-Linux commented Sep 2, 2024

I already posted the problem on the E-Maculation forums, but I’ll explain again here, hoping this fine wall of text won’t rebuke too many !

After some difficulties, I finally managed to run Civilization II on B2, on my Linux laptop (Ubuntu 24.04). Everything was well, until a system update that occurred during between the 22nd and the 23rd of august. After the update, each and every time I launch the program, OS 7.6.1 immediately crashes with a CHK Error.

I tried the latest AppImage, the Ubuntu stock basilisk2 package, and compiled the program myself with various memory options : direct, banks and real, all with the same result, CHK Error upon program execution.

Having serious reasons to suspect a host-specific problem, I switched to Windows and tried the Windows build… and it worked perfectly.

In both cases, I tried to emulate the same guest : same ROM (performa 500, modelid 74 i.e. gestalt 80), same RAM (128 MB), same CPU (68030 w/o FPU), same OS (Mac OS 7.6.1 French), same disk size and format (2 GiB HFS… what else than HFS on OS 7.6.1 anyway ?).

It may be worth noticing that I first wanted to install the gold edition of Civ II, which is PPC only, so I tried Sheepshaver w/ OS 8.6, but that didn’t work : the game was starting, but the guest was freezing upon unit movement, which happens very soon in game. I had the same freeze upon unit movement on B2 & FPU enabled with the vanilla edition of the game, and with the CPU set to 68040, OS would crash with the CHK Error mentioned before. That’s how I ended up with a 68030 w/o FPU, which worked fine for some weeks, until that recent system update, making the game effectively unplayable on Linux.

But now, on a Windows host, even the gold edition works well on Sheepshaver.

I tried to blame the Linux update first. The update has indeed been quite large, including a glibc update, no kernel update though… but it turns out, or so they say, that the update was a “no-change build” the description being “Fix framepointer flags for s390x and ppc64el”, hence other architectures than my amd64.

My theory is that there’s something wrong with the Linux specifics of B2 and Sheepshaver, that makes it susceptible to subtle changes, such as time management maybe. I’m not picking that out of the blue, since I noticed the similar warning messages when launching B2 and SS on Linux :

  • On B2 : WARNING: RmvTime(000f4b3e): Descriptor not found
  • On SS : WARNING: RmvTime(00247414): Descriptor not found

So that’s where I am. Your insights are welcome !

@kanjitalk755
Copy link
Owner

I tried the macOS, Linux, and Windows versions, but only the Linux version froze.

The warning about RmvTime appears on all types of hosts, so it's probably not a problem.

Regarding the timer, the difference is that the Windows version does not implement a high-precision timer.
I disable it by commenting out the following lines, but it still froze.

#define PRECISE_TIMING 1
#define PRECISE_TIMING_POSIX 1

I also tried building the macOS version using the same method as for Linux, and it worked fine.
So, the generated Makefile etc. seem to be OK.

At this point, I have no idea what the problem is with the Linux version, and it would be difficult for me to fix it.

@NovHak-Linux
Copy link
Author

NovHak-Linux commented Sep 4, 2024

Thanks for your reply, and your involvement in trying to solve the problem !

Despite being reproducible, the problem seems complex. I was pointed to a thread (starts at post 40) with people complaining about being unable to run Civ II, with the same symtoms (CHK error)… but it’s with the Shapeshifter emulator on an Amiga host, so not sure it really helps.

The solutions to the Shapeshifter problem vary and seem very random, some changing the memory requirement (⌘I), some changing the date. I’ve tried all this anyway, with no more luck.

I’ve suspected an AppArmor change, but the last one dates back to 2024-07-16, so that’s not it.

Here’s a list of external libraries & files (including the ELF interpreter) that the AppImage of B2 is depending upon, i.e. outside of the AppImage, hence taken from the host system :

linux-vdso.so.1
libutil.so.1
libpthread.so.0
librt.so.1
libX11.so.6
libstdc++.so.6
libm.so.6
libgcc_s.so.1
libc.so.6
libuuid.so.1
libasound.so.2
libdl.so.2
libxcb.so.1
libfontconfig.so.1
/lib64/ld-linux-x86-64.so.2
libfreetype.so.6
libz.so.1
libresolv.so.2
libharfbuzz.so.0
libexpat.so.1
libbz2.so.1.0
libbrotlidec.so.1
libbrotlicommon.so.1
libgpg-error.so.0

I’m assuming it’s a change in one of those files. I’m speaking of the B2 problem here, because on SS it was already crashing before the update.

EDIT : I checked all those files, those that have been modified recently are only part of that “Fix framepointer flags for s390x and ppc64el” thing, hence are supposed to have been part of that no-change rebuild.

@NovHak-Linux
Copy link
Author

NovHak-Linux commented Sep 4, 2024

I went to the Apple discussion forum with little hope, but it turned out better than I expected !

System_Error.pdf

Correct me if I’m wrong, but that means either there’s indeed a value that suddenly gets out of bounds with a Linux host, or the exception is being thrown erroneously…

@NovHak-Linux
Copy link
Author

NovHak-Linux commented Sep 9, 2024

Failing to find the origin of the problem for now. Tried with the original dynamic libraries & ELF interpreter (i.e. before the system update), still the same CHK error.

Now I’m trying to prevent that CHK exception to be raised, no matter the consequences that could arise (possibly some “harder” crash), just to see what happens next.

To that effect, I’m assuming CHK error is in fact a CPU-raised exception, following execution of either the CHK or CHK2 instruction, as table 8-1 (68030 user’s manual, part two) suggests.

One can see indeed in ../uae_cpu/gencpu.c what looks very much like a raise of exception no. 6 under some conditions upon execution of CHK or CHK2.

So I made the following changes to ../uae_cpu/gencpu.c, commenting out the original and replacing with the line below :

     case i_CHK:
	printf ("\tuaecptr oldpc = m68k_getpc();\n");
	genamode (curi->smode, "srcreg", curi->size, "src", 1, 0);
	genamode (curi->dmode, "dstreg", curi->size, "dst", 1, 0);
//	printf ("\tif ((uae_s32)dst < 0) { SET_NFLG (1); Exception(6,oldpc); goto %s; }\n", endlabelstr);
	printf ("\tif (false) { SET_NFLG (1); Exception(6,oldpc); goto %s; }\n", endlabelstr);
//	printf ("\telse if (dst > src) { SET_NFLG (0); Exception(6,oldpc); goto %s; }\n", endlabelstr);
	printf ("\telse if (false) { SET_NFLG (0); Exception(6,oldpc); goto %s; }\n", endlabelstr);
	need_endlabel = 1;
	break;

     case i_CHK2:
	printf ("\tuaecptr oldpc = m68k_getpc();\n");
	genamode (curi->smode, "srcreg", curi->size, "extra", 1, 0);
	genamode (curi->dmode, "dstreg", curi->size, "dst", 2, 0);
	printf ("\t{uae_s32 upper,lower,reg = regs.regs[(extra >> 12) & 15];\n");
	switch (curi->size) {
	 case sz_byte:
	    printf ("\tlower=(uae_s32)(uae_s8)get_byte(dsta); upper = (uae_s32)(uae_s8)get_byte(dsta+1);\n");
	    printf ("\tif ((extra & 0x8000) == 0) reg = (uae_s32)(uae_s8)reg;\n");
	    break;
	 case sz_word:
	    printf ("\tlower=(uae_s32)(uae_s16)get_word(dsta); upper = (uae_s32)(uae_s16)get_word(dsta+2);\n");
	    printf ("\tif ((extra & 0x8000) == 0) reg = (uae_s32)(uae_s16)reg;\n");
	    break;
	 case sz_long:
	    printf ("\tlower=get_long(dsta); upper = get_long(dsta+4);\n");
	    break;
	 default:
	    abort ();
	}
	printf ("\tSET_ZFLG (upper == reg || lower == reg);\n");
	printf ("\tSET_CFLG_ALWAYS (lower <= upper ? reg < lower || reg > upper : reg > upper || reg < lower);\n");
//	printf ("\tif ((extra & 0x800) && GET_CFLG) { Exception(6,oldpc); goto %s; }\n}\n", endlabelstr);
	printf ("\tif (false) { Exception(6,oldpc); goto %s; }\n}\n", endlabelstr);
	need_endlabel = 1;
	break;

Recompiled and tested… but it doesn’t work (like nothing has changed) ! I suppose either I did something wrong, or CHK error is not a CPU-raised exception, or something else in the code raises that exception. The only ones that raise exception 6 explicitly are CHK and CHK2, but I see some other places where the exception number is variable.

Is there any chance I’m on the good track, or am I completely wrong here ?

@kanjitalk755
Copy link
Owner

Eventually CHK error detects, but the problem occurs even earlier.

If you run it on gdb, you will see that SIGSEGV occurs after starting Civ II, and if you continue several times, a CHK error alert appears.

@NovHak-Linux
Copy link
Author

I hear you, but I suppose the error does come from a thrown exception somewhere in the 68030 emulation, especially considering Mac OS mentions “CHK error”, not “addressing error”, “error 2” or something like that. And I’m curious about disabling the exception and see how it goes…

That being said, I suspect everything will be back in order after some future system update, and I would not be surprised that a system upgrade to 24.10 will “solve” it if not before. Despite being on another platform and with another emulator, I can’t help thinking it’s related to what Shapeshifter users experienced on Amiga…

@NovHak-Linux
Copy link
Author

NovHak-Linux commented Sep 24, 2024

For some reason yet to be clarified, it works again. I’m only posting today, but I tried last sunday (2024-09-22) around 14:00 UTC and it did work already. Still works today. I’m curious to know if it works on your side, which would tend towards either a time-related problem, or a system update. I will have to check what updates have been done recently, but apart from that, the only thing I can think of is that I was away from home for three days, and my computer was off during that period.

Concerning a time-related problem btw, I tested that way too, by preloading a shared object that was replacing time-related glibc calls (hence intercepting system + VDSO calls), here’s the code I wrote (it didn’t change anything in the end, but I’m providing it anyway) :

#include <stdbool.h>
#include <dlfcn.h>
#include <time.h>
#include <sys/time.h>

bool libload=false,cgtload=false,getdload=false,tload=false;
int (*getd)(struct timeval *restrict,struct timezone *restrict);
int (*cgt)(clockid_t,struct timespec*);
time_t (*tt)(time_t *);
void *lib;

time_t time(time_t *tloc)
{
  if (!tload)
    {
      if (!libload)
	{
	  lib=dlopen("libc.so.6",RTLD_LAZY);
	  libload=true;
	}
      *(void **)(&tt)=dlsym(lib,"time");
      tload=true;
    }
  return (*tt)(tloc)-1726041546;
}

/*
int clock_gettime(clockid_t clockid,struct timespec *tp)
{
  struct timespec tms;
  int ret;

  if (!cgtload)
    {
      if (!libload)
	{
	  lib=dlopen("libc.so.6",RTLD_LAZY);
	  libload=true;
	}
      *(void **)(&cgt)=dlsym(lib,"clock_gettime");
      cgtload=true;
    }
  ret=(*cgt)(clockid,tp);
  tp->tv_sec=tms.tv_sec-1726041546;
  //  tp->tv_nsec=tms.tv_nsec-1726041546000000000;
  return ret;
}

int gettimeofday(struct timeval *restrict tv,void *restrict tz)
{
  int ret;

  if (!getdload)
    {
      if (!libload)
	{
	  lib=dlopen("libc.so.6",RTLD_LAZY);
	  libload=true;
	}
      *(void **)(&getd)=dlsym(lib,"gettimeofday");
      getdload=true;
    }
  ret=(*getd)(tv,tz);
  tv->tv_sec-=1726041546;
  //  tv->tv_usec-=2592000000000;
  return ret;
}
*/

To be compiled with :

gcc -shared -fPIC -o time.so time.c

And being used this way :

LD_PRELOAD=./time.so ./BasiliskII-x86_64.AppImage

Functions to be replaced can be uncommented as needed. In the case of B2, time() turned out to be the one to replace. This is a rather lazy piece of code, since it does offset time by a fixed, hardcoded amount ; it would obviously be better to provide a more dynamic way, e.g. with an environment variable, as well as an option to specify an absolute date.

Everyone is free to use and modify, without necessarily mentioning me, provided it’s not being used to end the world.

@kanjitalk755
Copy link
Owner

kanjitalk755 commented Sep 25, 2024

I tried Civ II today and worked fine.
The system has been updated since the last time I tried it.
(Debian 12.7)

@NovHak-Linux
Copy link
Author

OK, so that’s not Ubuntu-specific ! It would be interesting to know if people on a non-debian distro, e.g. Arch, would be similarly affected.

I’m trying to apply the rule of testing Civ II after each system update, but it’s possible that I forgot after the last one, so I can’t rule out something related to the updates… but if it’s not update related, apart from time, what could it be ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants