the upgrade has aborted

your system could be in an unusable state

2023-03-09

tldr; Things never improved, I gave up and rebuilt my server from the lastest ubuntu iso. I was hoping this could be an amazing heroic effort to save a dying server, but sometimes not every patient lives. Some you win, some you lose.
/bin/bash: Permission denied

Well. That happened.

I’ve screwed up a few systems in my time, and this one more than once, but wow was this a big one.

I have no idea how I really did this, but I guess a cavalier attitude to running a do-release-upgrade own my personal server along with an “I’ve always been able to fix it no matter how badly I broke it” approach finally bit me.

So, I jump onto the oob console, and after failing multiple times to type the complex root password, I arrive at the root prompt.

start of the misery

First off, I check the normal things you’d think it was:

# su bob
su: failed to execute /bin/bash: Permission denied

I’m using bob as the username to protect the innocent stupid

So it’s not just ssh, but also just suing to that user.

Has the bash binary some how lost exec perms?

# ls -la /bin/bash
-rwxr-xr-x 1 root root 1396520 Jan 7 2022 /bin/bash
# stat /bin/bash
File: /bin/bash
Size: 1396520       Blocks: 2728       IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 131618      Links: 1
Access: (0755/-rwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)

Nope, looks fine to me.

Is the filesystem mounted noexec?

# mount | grep root
/dev/mapper/vg00-lv_root on / type ext4 (rw,relatime, errorseremount-ro)

Nope, not that either. Also if that was the case I wouldn’t be able to get in as the root user.

Oh, should have checked the home dirs, maybe that’s it?

root@sol:~# ls -la /home
total 44
drwxr-xr-x  6 root root  4096 Mar  8 20:55 .
drwxr-xr-x 22 root root  4096 Mar  8 12:49 ..
drwx------ 83 bob  bom  12288 Mar  8 12:10 bob

I’d be surprised if that was it, the error message would be different.

is there more than bash affected?

I try changing the shell to various other binaries thinking that maybe the bash binary was corrupted in some way:

chsh -s /usr/bin/bash bob
chsh -s /usr/bin/sh bob
chsh -s /usr/bin/dash bob
chsh -s /usr/bin/uptime bob
chsh -s /bin/true bob
chsh -s /bin/false bob
chsh -s /bin/true bob

All fail with the exact same “Permission denied” message.

going back to basics

To make the simplest binary I could think of to use for testing, I grabbed the traditional hello world c program:

#include <stdio.h>

int
main (void)
{
  printf ("Hello, world!\n");
  return 0;
}

and then compiled and compied it to /tmp with:

# gcc -Wall hello.c -o hello
# cp hello /tmp

Right. Time to give it a go and see what happens. As root:

# /tmp/hello
Hello, world!

So running it as root worked. I also copied it to a known noexec mount just to check my sanity:

# cp hello /mnt/tmp
# /mnt/tmp/hello
-bash: /mnt/tmp/hello: Permission denied

Once again, that error message matches the one I get when trying to login as bob, and makes me think maybe I’m missing something obvious.

Next, I set the hello world program to bobs login shell:

# chsh -s /tmp/hello bob

and tried to su to bob:

# su bob
su: failed to execute /tmp/hello: Permission denied

and ssh just for double checking:

/tmp/hello: Permission denied

I also out of desperation try the most simple .c program that will actually compile without error:

int
main(void)
{
}

While it compiles without error and can be run as root, even it just gives me the “Permission denied” message if I try to execute it as a login shell.

deep in the swamp

Damn. At this point I’m at a loss, and decided to write some of this down in the hopes of rubber duck debugging myself.

  1. The system appears mostly healthy, disks are mounted and data is there
  2. The root user has no issues loging in
  3. ssh is working, it’s just until the non-privledge user tries to run anything it breaks
  4. The binaries have correct perms and that’s not the issue
  5. The filesystem isn’t mounted as noexec
  6. The problem isn’t limited to just bash, it’s any binary

After much much googling, all of which kept telling me to chmod 755, I come across this old post from 2000:

“Cannot execute /bin/bash: Permission denied” - solved! by Ben Okopnik

This is exactly what I’m seeing, which makes me glad there might be a solution, but also despondent as it means digging around the lib files and it’s not an easy fix. Also it’s a 23 year old at this point and things get more complex as time goes on.

Time to run strace similar as Ben did from that article:

strace -s 10000 -vfo login.bob login bob

125954 execve("/bin/sh", ["sh", "-", "/tmp/hello"],
["TERM=ansi", "LANG=C", "HOME=/home/bob",
"SHELL=/tmp/hello", "USER=bob", "LOGNAME=bob",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/
games:/usr/local/games:/snap/bin", "MAIL=/var/mail/bob",
"HUSHLOGIN=TRUE"]) = -1 EACCES (Permission denied)

A random stackoverflow post I saw in passing mentioned execve, from the execve man page:

  1. EACCES Search permission is denied on a component of the path prefix of pathname or the name of a script interpreter. (See also path_resolution(7).)
  2. EACCES The file or a script interpreter is not a regular file.
  3. EACCES Execute permission is denied for the file or a script or ELF interpreter.
  4. EACCES The filesystem is mounted noexec.

Checking off each of those items:

  1. The path looks fine on everything I’ve checked
  2. stat says it’s a regular file
  3. The file is 755, but I don’t know what the ELF interpreter part means
  4. mount confims it’s not noexec

Is all this pointing to a cause, or is it effect of something else that I can’t see?

I think I tried running debsums -cs about now, but can’t remember what the result was. Probably nothing that helped find the issue.

Getting desperate and I give this a try, reinstalling everything from the archive directory:

# cd /var/cache/apt/archives 
# for i in *.deb; do sudo dpkg -Gi $i; done

wait … reboot … wait

Log back in as root. Good at least it hasn’t gotten worse. Lets try to su to bob again

# su bob
su: failed to execute /bin/bash: Permission denied
**FUCK**

It’s bedtime. I’m broken. I shutdown the server and leave it for another day. I have a feeling this will need a reinstall and as I’ve already ordered a new SSD there’s no point installing over these files. I’ll use the new SSD as the main OS drive, and keep the old one for holding backup files and other stuff.

The advantage of getting a replacement SSD is I can keep the files on the original SSD and just copy them across when needed. Things like bobs home dir and other config files that I always forget to backup in the heat of a rebuild.

morning arrives

So, I completely removed apparmor and still no joy.

the evening

Still more flailing about with absolutely no progress towards saving the server.

fin

Things never improved. I tried for a week but in the end I just gave up and rebuilt the system. I could never find what actually went wrong after spending so many hours troubleshooting.

Morale of the story, I don’t know. Que Sera, Sera I guess.