Discussion:
git 'Bad file descriptor' fatal error error caused by loaded libau
OmegaPhil
2017-01-22 20:57:53 UTC
Permalink
sfjiro, please can you test the following git commands for me on a
non-aufs filesystem with libau LD_PRELOAD'd:

=========================================================================

git clone https://git.devuan.org/OmegaPhil/udisks2.git
git remote add alioth-git
https://anonscm.debian.org/cgit/pkg-utopia/udisks2.git
git fetch alioth-git

=========================================================================

I get the following error:

=========================================================================

fatal: failed to read object 0000000000000000000000000000000000000000:
Bad file descriptor

=========================================================================

It took me a very long time to realise it could be aufs' doing - as soon
as I killed off LD_PRELOAD, the fetch started working...

The actual problem is happening in git's
sha1_file.c:prepare_packed_git_one, the closedir call right at the
bottom is setting errno to 'Bad file descriptor', git later on notices
that errno is bad and panics (ironically the fetch itself looks good).

Any idea what libau is doing here? Have confirmed the problem on a VM.

aufs v4.8 20161219 running on uptodate Devuan Testing (equivalent to
Debian Testing), kernel 4.8.11.

Thanks
s***@users.sourceforge.net
2017-01-22 22:09:03 UTC
Permalink
Hello OmegaPhil,
Post by OmegaPhil
The actual problem is happening in git's
sha1_file.c:prepare_packed_git_one, the closedir call right at the
bottom is setting errno to 'Bad file descriptor', git later on notices
that errno is bad and panics (ironically the fetch itself looks good).
Any idea what libau is doing here? Have confirmed the problem on a VM.
I could reproduce the problem, and added some debug print into
aufs-util.git/libau/rdu_lib.c:closedir(). See the attachment.

It must be an application's problem instead of libau.
Generally applications should not use errno after a systemcall/library
succeeded. Actually the manual of errno(3) says


DESCRIPTION
The <errno.h> header file defines the integer variable errno, which is set by system calls
and some library functions in the event of an error to indicate what went wrong. Its
value is significant only when the return value of the call indicated an error (i.e., -1
from most system calls; -1 or NULL from most library functions); a function that succeeds
is allowed to change errno.
:::


I didn't check the source files of git-fetch, but this symptom indicates
that git-fetch incorrectly tests errno after libau:closedir() returned a
success.
libau can fix this problem (see the attachment again), but it is
definitly better to fix git-fetch.
Post by OmegaPhil
It took me a very long time to realise it could be aufs' doing - as soon
as I killed off LD_PRELOAD, the fetch started working...
I can understand it very hard. Good job!


J. R. Okajima
OmegaPhil
2017-01-23 20:20:16 UTC
Permalink
Post by s***@users.sourceforge.net
I didn't check the source files of git-fetch, but this symptom indicates
that git-fetch incorrectly tests errno after libau:closedir() returned a
success.
libau can fix this problem (see the attachment again), but it is
definitly better to fix git-fetch.
Right, I see what is happening now. Your patch fixes it, because you now
ensure errno is set to 0 before exiting rdu_lib.c:closedir.

Previously, because no error happening in closedir normally, but 'errno
= EBADF' is ran unconditionally, any use of closedir will result in EBADF.


So, looking at the git code, here is the stack trace of the failure:

=======================================================================

#0 die_errno (fmt=***@entry=0x5555556eabe5 "failed to read object %s")
at usage.c:151
#1 0x00005555556829fb in read_sha1_file_extended
(sha1=***@entry=0x5555559aa948 "",
type=***@entry=0x7fffffffdbe4, size=***@entry=0x7fffffffdbe8,
flag=***@entry=1)
at sha1_file.c:2997
#2 0x0000555555641e49 in read_sha1_file (size=0x7fffffffdbe8,
type=0x7fffffffdbe4,
sha1=0x5555559aa948 "") at cache.h:1094
#3 parse_object (sha1=0x5555559aa948 "") at object.c:269
#4 0x00005555555fb402 in lookup_commit_reference_gently
(sha1=<optimized out>, quiet=1)
at commit.c:24
#5 0x000055555558e606 in update_local_ref (ref=0x5555559aa940,
remote=0x5555559a9d0b "experimental",
remote_ref=0x5555559a9c90, display=0x7fffffffdd10, summary_width=17)
at builtin/fetch.c:640
#6 0x000055555558f649 in store_updated_refs (raw_url=<optimized out>,
remote_name=0x5555559a1590 "alioth-git", ref_map=0x5555559a9c90) at
builtin/fetch.c:846
#7 0x000055555558f7ed in fetch_refs (transport=0x5555559a1a90,
ref_map=0x5555559a9c90)
at builtin/fetch.c:907
#8 0x000055555558fa9f in do_fetch (transport=0x5555559a1a90,
refs=0x5555559a0d20, ref_count=0)
at builtin/fetch.c:1119
#9 0x000055555558fd49 in fetch_one (remote=<optimized out>, argc=0,
argv=0x7fffffffe0f8)
at builtin/fetch.c:1293
#10 0x000055555559022b in cmd_fetch (argc=1, argv=0x7fffffffe0f0,
prefix=<optimized out>)
at builtin/fetch.c:1379
#11 0x000055555556549d in run_builtin (p=0x5555559419c0 <commands+768>,
argc=***@entry=2,
argv=***@entry=0x7fffffffe0f0) at git.c:373
#12 0x0000555555565623 in handle_builtin (argc=2, argv=0x7fffffffe0f0)
at git.c:572
#13 0x0000555555565b84 in run_argv (argcp=***@entry=0x7fffffffdfcc,
argv=***@entry=0x7fffffffdfc0)
at git.c:630
#14 0x0000555555565ca7 in cmd_main (argc=<optimized out>,
argv=<optimized out>) at git.c:707
#15 0x00005555555e15e2 in main (argc=3, argv=0x7fffffffe0e8) at
common-main.c:40

=======================================================================

sha1_file.c:read_sha1_file_extended is where the errno checking happens,
the function is described as:

=======================================================================

This function dies on corrupt objects; the callers who want to
deal with them should arrange to call read_object() and give error
messages themselves.

=======================================================================

the errno code:

=======================================================================

errno = 0;
data = read_object(repl, type, size);
if (data)
return data;

if (errno && errno != ENOENT)
die_errno("failed to read object %s", sha1_to_hex(sha1));

=======================================================================

In this example, there is no data returned since the remote refs don't
exist locally yet, hence the errno check triggers (and its expecting
'file not found').

Inside read_object at the end calls reprepare_packed_git ->
prepare_packed_git -> prepare_packed_git_one: at the end of this
function, closedir is called.

So, read_sha1_file_extended is trying to summarise any failures that
happen beneath it by checking the errno at that point.

I understand what you are saying from the manpage, however it does seem
reasonable to say that any unusual errno state means a failure has
occurred somewhere (ignoring the fact that released libau currently sets
EBADF even when a failure has not happened).

If I report this to git, I bet they'll just say that libau shouldn't
have set errno without an actual failure to report :/
s***@users.sourceforge.net
2017-01-23 21:26:00 UTC
Permalink
Post by OmegaPhil
I understand what you are saying from the manpage, however it does seem
reasonable to say that any unusual errno state means a failure has
occurred somewhere (ignoring the fact that released libau currently sets
EBADF even when a failure has not happened).
If I report this to git, I bet they'll just say that libau shouldn't
have set errno without an actual failure to report :/
As you understood my point, I can understand yours. Also I understand
that people never expect errno is set when they pass a valid DIR*.
Ok, I will fix libau.


J. R. Okajima
OmegaPhil
2017-01-23 21:30:17 UTC
Permalink
Post by s***@users.sourceforge.net
Post by OmegaPhil
I understand what you are saying from the manpage, however it does seem
reasonable to say that any unusual errno state means a failure has
occurred somewhere (ignoring the fact that released libau currently sets
EBADF even when a failure has not happened).
If I report this to git, I bet they'll just say that libau shouldn't
have set errno without an actual failure to report :/
As you understood my point, I can understand yours. Also I understand
that people never expect errno is set when they pass a valid DIR*.
Ok, I will fix libau.
J. R. Okajima
Cheers - just out of interest, do you know how to set a watchpoint on
errno in gdb? That wouldve saved a lot of time. It seems gdb is only
capable of using errno in an expression when it suits it...
s***@users.sourceforge.net
2017-01-24 09:14:47 UTC
Permalink
Post by OmegaPhil
Cheers - just out of interest, do you know how to set a watchpoint on
errno in gdb? That wouldve saved a lot of time. It seems gdb is only
capable of using errno in an expression when it suits it...
In our ancient age, the global errno was just a intger variable. In
these days, especially in the multi-threaded application, errno is a
macro which is expanded to a local address (per thread).
If this is your case,
- obtain the address
- "watch *(int*)0x123abc" (the address you got)
may help.


J. R. Okajima
OmegaPhil
2017-01-24 16:30:03 UTC
Permalink
Post by s***@users.sourceforge.net
Post by OmegaPhil
Cheers - just out of interest, do you know how to set a watchpoint on
errno in gdb? That wouldve saved a lot of time. It seems gdb is only
capable of using errno in an expression when it suits it...
In our ancient age, the global errno was just a intger variable. In
these days, especially in the multi-threaded application, errno is a
macro which is expanded to a local address (per thread).
If this is your case,
- obtain the address
- "watch *(int*)0x123abc" (the address you got)
may help.
J. R. Okajima
Thanks - this actually works now ('watch -l errno'), but when I've tried
before it failed completely - even 'ptype errno' failed - I'll wait for
the next screwup (confirmed it works now with git and SpaceFM, the
latter being multithreaded).

Loading...