Discussion:
The dreaded stale file handle with aufs, this time with ext4 and ubuntu 16.04?
Dan Kegel
2016-05-23 15:36:38 UTC
Permalink
Hi all!
I recently started trying to mount ephemeral lxc containers on tmpfs,
and fairly often, a script inside the container launched by
lxc-start-ephemeral will fail early with

rm: failed to get attributes of '/': Stale file handle

I tried reproducing the problem with the following script, but no luck so far,
the test script ran for a minute with no problems.
I'll post again if I come up with a working repro script.

Any suggestions for what to do on a stock ubuntu machine to
provide more clues?

uname -a says
Linux ubu16-bb-02 4.4.0-22-generic #40-Ubuntu SMP Thu May 12 22:03:46
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

#!/bin/sh
set -e
set -x
# lxc-create -t ubuntu is kind of fragile, might need to retry on network error
lxc-create -t ubuntu -n orig -- -r xenial

# ok, now try to reproduce the problem
mkdir -p /data/tmpfs
mount -t tmpfs none /data/tmpfs
ln -sf /var/lib/lxc/orig /data/tmpfs/orig
# echo "Warning: to see the new container, you'll need to use the
--lxcpath /data/tmpfs option"
while lxc-start-ephemeral \
--orig orig \
--name foo \
--lxcpath /data/tmpfs \
--storage-type dir \
--union-type aufs \
-- \
rm -rf /foobar
do
sleep 1
done
Dan Kegel
2016-05-23 15:47:31 UTC
Permalink
Correction: this was without tmpfs. So the repro script I'm trying
now is shorter,
I'll post again if it manages to reproduce the problem:

#!/bin/sh
set -e
set -x
# lxc-create -t ubuntu is kind of fragile, might need to retry on network error
lxc-create -t ubuntu -n orig -- -r xenial

# ok, now try to reproduce the problem
while lxc-start-ephemeral \
--orig orig \
--name foo \
--storage-type dir \
--union-type aufs \
-- \
rm -rf /foobar
do
sleep 1
done
Post by Dan Kegel
Hi all!
I recently started trying to mount ephemeral lxc containers on tmpfs,
and fairly often, a script inside the container launched by
lxc-start-ephemeral will fail early with
rm: failed to get attributes of '/': Stale file handle
I tried reproducing the problem with the following script, but no luck so far,
the test script ran for a minute with no problems.
I'll post again if I come up with a working repro script.
Any suggestions for what to do on a stock ubuntu machine to
provide more clues?
uname -a says
Linux ubu16-bb-02 4.4.0-22-generic #40-Ubuntu SMP Thu May 12 22:03:46
UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
#!/bin/sh
set -e
set -x
# lxc-create -t ubuntu is kind of fragile, might need to retry on network error
lxc-create -t ubuntu -n orig -- -r xenial
# ok, now try to reproduce the problem
mkdir -p /data/tmpfs
mount -t tmpfs none /data/tmpfs
ln -sf /var/lib/lxc/orig /data/tmpfs/orig
# echo "Warning: to see the new container, you'll need to use the
--lxcpath /data/tmpfs option"
while lxc-start-ephemeral \
--orig orig \
--name foo \
--lxcpath /data/tmpfs \
--storage-type dir \
--union-type aufs \
-- \
rm -rf /foobar
do
sleep 1
done
s***@users.sourceforge.net
2016-05-23 15:56:24 UTC
Permalink
Hello Dan,
Post by Dan Kegel
I recently started trying to mount ephemeral lxc containers on tmpfs,
and fairly often, a script inside the container launched by
lxc-start-ephemeral will fail early with
rm: failed to get attributes of '/': Stale file handle
Unfortunately I don't know much about LXC, particulary how LXC uses
aufs. Instead of rm, run "cat /proc/mounts" and others to collect these
info please.

(from aufs README)
----------------------------------------------------------------------
When you have any problems or strange behaviour in aufs, please let me
know with:
- /proc/mounts (instead of the output of mount(8))
- /sys/module/aufs/*
- /sys/fs/aufs/* (if you have them)
- /debug/aufs/* (if you have them)
- linux kernel version
if your kernel is not plain, for example modified by distributor,
the url where i can download its source is necessary too.
- aufs version which was printed at loading the module or booting the
system, instead of the date you downloaded.
- configuration (define/undefine CONFIG_AUFS_xxx)
- kernel configuration or /proc/config.gz (if you have it)
- behaviour which you think to be incorrect
- actual operation, reproducible one is better
- mailto: aufs-users at lists.sourceforge.net
----------------------------------------------------------------------


J. R. Okajima
Dan Kegel
2016-05-23 16:30:39 UTC
Permalink
OK, trying this overnight:

#!/bin/sh
set -e
set -x
# lxc-create -t ubuntu is kind of fragile, might need to retry on network error
#lxc-create -t ubuntu -n orig -- -r xenial

# Grant self sudo
#echo "ubuntu ALL=(ALL:ALL) NOPASSWD: ALL" | sudo tee -a
/var/lib/lxc/orig/rootfs/etc/sudoers

mkdir -p /shared
chmod 777 /shared

# ok, now try to reproduce the problem
n=0
while sleep 1
do
n=`expr $n + 1`
lxc-start-ephemeral \
--orig orig \
--name foo \
--storage-type dir \
--union-type aufs \
--bdir /shared \
--user ubuntu \
-- \
sudo sh -c "id; if ! rm -rf /foobar; then cat
/proc/mounts > mounts.txt; tar -czf /shared/bug$n.tgz mounts.txt
/sys/module/aufs; fi"
done
Post by s***@users.sourceforge.net
Hello Dan,
Post by Dan Kegel
I recently started trying to mount ephemeral lxc containers on tmpfs,
and fairly often, a script inside the container launched by
lxc-start-ephemeral will fail early with
rm: failed to get attributes of '/': Stale file handle
Unfortunately I don't know much about LXC, particulary how LXC uses
aufs. Instead of rm, run "cat /proc/mounts" and others to collect these
info please.
(from aufs README)
----------------------------------------------------------------------
When you have any problems or strange behaviour in aufs, please let me
- /proc/mounts (instead of the output of mount(8))
- /sys/module/aufs/*
- /sys/fs/aufs/* (if you have them)
- /debug/aufs/* (if you have them)
- linux kernel version
if your kernel is not plain, for example modified by distributor,
the url where i can download its source is necessary too.
- aufs version which was printed at loading the module or booting the
system, instead of the date you downloaded.
- configuration (define/undefine CONFIG_AUFS_xxx)
- kernel configuration or /proc/config.gz (if you have it)
- behaviour which you think to be incorrect
- actual operation, reproducible one is better
- mailto: aufs-users at lists.sourceforge.net
----------------------------------------------------------------------
J. R. Okajima
Dan Kegel
2016-05-24 16:15:09 UTC
Permalink
heh. Failed after 4700 loops because lxc-start-ephemeral tripped over
its own shoelaces (old container didn't disappear within one second).

So forget I mentioned the problem. I'll post a new thread with better
info if it recurs in production.
Post by Dan Kegel
#!/bin/sh
set -e
set -x
# lxc-create -t ubuntu is kind of fragile, might need to retry on network error
#lxc-create -t ubuntu -n orig -- -r xenial
# Grant self sudo
#echo "ubuntu ALL=(ALL:ALL) NOPASSWD: ALL" | sudo tee -a
/var/lib/lxc/orig/rootfs/etc/sudoers
mkdir -p /shared
chmod 777 /shared
# ok, now try to reproduce the problem
n=0
while sleep 1
do
n=`expr $n + 1`
lxc-start-ephemeral \
--orig orig \
--name foo \
--storage-type dir \
--union-type aufs \
--bdir /shared \
--user ubuntu \
-- \
sudo sh -c "id; if ! rm -rf /foobar; then cat
/proc/mounts > mounts.txt; tar -czf /shared/bug$n.tgz mounts.txt
/sys/module/aufs; fi"
done
Post by s***@users.sourceforge.net
Hello Dan,
Post by Dan Kegel
I recently started trying to mount ephemeral lxc containers on tmpfs,
and fairly often, a script inside the container launched by
lxc-start-ephemeral will fail early with
rm: failed to get attributes of '/': Stale file handle
Unfortunately I don't know much about LXC, particulary how LXC uses
aufs. Instead of rm, run "cat /proc/mounts" and others to collect these
info please.
(from aufs README)
----------------------------------------------------------------------
When you have any problems or strange behaviour in aufs, please let me
- /proc/mounts (instead of the output of mount(8))
- /sys/module/aufs/*
- /sys/fs/aufs/* (if you have them)
- /debug/aufs/* (if you have them)
- linux kernel version
if your kernel is not plain, for example modified by distributor,
the url where i can download its source is necessary too.
- aufs version which was printed at loading the module or booting the
system, instead of the date you downloaded.
- configuration (define/undefine CONFIG_AUFS_xxx)
- kernel configuration or /proc/config.gz (if you have it)
- behaviour which you think to be incorrect
- actual operation, reproducible one is better
- mailto: aufs-users at lists.sourceforge.net
----------------------------------------------------------------------
J. R. Okajima
Loading...