Discussion:
AUFS and PREEMPT_RT boot issue
Demmel Nikolaus (BOSP/PAR)
2017-01-16 10:30:18 UTC
Permalink
Hi,

we are using AUFS for our root filesystem with an tmpfs overlay and recently wanted to switch to a kernel with PREEMPT_RT patch. Unfortunately, it seems that mounting aufs seems to hang about 40% of the time during boot (there is no specific kernel or log message). When it comes up, it seems to work correctly.

Are there known issues with AUFS and PREEMPT_RT? Are people using it?

We are on Kernel 4.1.30 with ltsi patches and aufs4.1.13+.

In order to get AUFS to compile, we needed to apply the following additional patch: https://github.com/beagleboard/linux/pull/108/commits/ad740435f28e3ffb5d73a419086b0b6fd3bc7240

Any pointers or suggestions are welcome.

Kind regards,
Nikolaus
s***@users.sourceforge.net
2017-01-16 13:49:56 UTC
Permalink
Hello Nikolaus,
we are using AUFS for our root filesystem with an tmpfs overlay and recentl=
y wanted to switch to a kernel with PREEMPT_RT patch. Unfortunately, it see=
ms that mounting aufs seems to hang about 40% of the time during boot (ther=
e is no specific kernel or log message). When it comes up, it seems to work=
correctly.
Do you mean that mounting aufs took very long time and your system runs
flawlessly after that?
Interesting...

My first suggestion to see what is going on is "strace -T -f mount -t aufs ..."
which will give us a hint.

Sometimes people sets aufs FHSM feature incorrectly. That
setting/configuration should match in kernel-space and user-space. If it
doesn't match, it will take a long time in mounting aufs unexpectedly.
When FHSM is enabled both in kernel and user-space, then it will take a
long time too. But it is an expected behaviour.
In order to get AUFS to compile, we needed to apply the following additiona=
l patch: https://github.com/beagleboard/linux/pull/108/commits/ad740435f28e=
3ffb5d73a419086b0b6fd3bc7240
Hmm, this patch reminds me a post from Daniel Vidal in May 2013 (the
patch looks different from what I suggested, though). Refer to
http://www.mail-archive.com/aufs-***@lists.sourceforge.net/msg04210.html
and its thread.

Next time when you post, tell me the version of patches you are using
and the URL where I can get all of thme.


J. R. Okajima
Demmel Nikolaus (BOSP/PAR)
2017-01-16 15:57:03 UTC
Permalink
Hi J. R. Okajima,

thank you for your prompt response.

I'm assuming from your response that in general you expect AUFS to work with PREEMPT_RT, or is this not the case?
Post by s***@users.sourceforge.net
we are using AUFS for our root filesystem with an tmpfs overlay and recentl=
y wanted to switch to a kernel with PREEMPT_RT patch. Unfortunately, it see=
ms that mounting aufs seems to hang about 40% of the time during boot (ther=
e is no specific kernel or log message). When it comes up, it seems to work=
correctly.
Do you mean that mounting aufs took very long time and your system runs
flawlessly after that?
Interesting...
My first suggestion to see what is going on is "strace -T -f mount -t aufs ..."
which will give us a hint.
Sometimes people sets aufs FHSM feature incorrectly. That
setting/configuration should match in kernel-space and user-space. If it
doesn't match, it will take a long time in mounting aufs unexpectedly.
When FHSM is enabled both in kernel and user-space, then it will take a
long time too. But it is an expected behaviour.
What I actually mean is that in about 40% of the time when booting into a kernel with the RT patch, the boot hangs at

mount -t aufs -o "dirs=/rw=rw:/ro=ro" aufs $ROOT_MOUNT

in our init script and does not appear to return at all. The other 60% it works as expected without delay.

Then exact same configuration just without PREEMPT_RT patch appears to work 100% of the time.

Does your answer still apply? Should we try the strace?
Post by s***@users.sourceforge.net
In order to get AUFS to compile, we needed to apply the following additiona=
l patch: https://github.com/beagleboard/linux/pull/108/commits/ad740435f28e=
3ffb5d73a419086b0b6fd3bc7240
Hmm, this patch reminds me a post from Daniel Vidal in May 2013 (the
patch looks different from what I suggested, though). Refer to
and its thread.
Hm... interesting. To me (being unfamiliar with the innards of both PREEMPT_RT and AUFS) this
looks like potential for a locking issue. The patch / discussion you linked seems to suggest that
the owner should be set assigned to lock.owner, which is not done in the patch we are using. Mind that
the patch I linked is also just something I stumbled across while googling the initial compile error.
In particular the commit message doesn't sound too confident that the change is actually the correct
fix and not just a way to make it compile.

Do you suggest that we should try to change it to the patch you linked?
Post by s***@users.sourceforge.net
Next time when you post, tell me the version of patches you are using
and the URL where I can get all of thme.
Sorry, here are the versions, I hope this helps (I'm copying this from a yocto build config and I'm not sure in what format it would be most useful for you. I'm not even sure what would be the best tool to use to get from the list of repos with kernel source and patches a compiled kernel without using yocto).

https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.1.30.tar.xz
http://git.linuxfoundation.org/?p=ltsi-kernel.git;a=commit;h=079621627b2d96cf0d85a0413ce5670056a70751w
https://github.com/sfjro/aufs4-standalone.git
https://www.kernel.org/pub/linux/kernel/projects/rt/4.1/older/patches-4.1.30-rt34.tar.xz

Best,
Nikolaus
Daniel Vidal
2017-01-16 19:16:03 UTC
Permalink
Hi

Now, i continue appliyng my patch to compile KERNEL+RT+AUFS.

I was try to use the rt_mutex_set_owner() wihtout success. I do not
remember what the problem was.

Now i'm running 4.4.38-rt49 Kernel wiht RT and AUFS

Linux x64-v3 4.4.38-rt49 #1 SMP PREEMPT RT Mon Dec 19 10:28:46 CET
2016 x86_64 GNU/Linux

I use the kernel to run a DEBIAN based live system.

If you want test my kernel i can share-it on google drive.

Sorry for my english.


2017-01-16 16:57 GMT+01:00 Demmel Nikolaus (BOSP/PAR)
Post by Demmel Nikolaus (BOSP/PAR)
Hi J. R. Okajima,
thank you for your prompt response.
I'm assuming from your response that in general you expect AUFS to work with PREEMPT_RT, or is this not the case?
Post by s***@users.sourceforge.net
we are using AUFS for our root filesystem with an tmpfs overlay and recentl=
y wanted to switch to a kernel with PREEMPT_RT patch. Unfortunately, it see=
ms that mounting aufs seems to hang about 40% of the time during boot (ther=
e is no specific kernel or log message). When it comes up, it seems to work=
correctly.
Do you mean that mounting aufs took very long time and your system runs
flawlessly after that?
Interesting...
My first suggestion to see what is going on is "strace -T -f mount -t aufs ..."
which will give us a hint.
Sometimes people sets aufs FHSM feature incorrectly. That
setting/configuration should match in kernel-space and user-space. If it
doesn't match, it will take a long time in mounting aufs unexpectedly.
When FHSM is enabled both in kernel and user-space, then it will take a
long time too. But it is an expected behaviour.
What I actually mean is that in about 40% of the time when booting into a kernel with the RT patch, the boot hangs at
mount -t aufs -o "dirs=/rw=rw:/ro=ro" aufs $ROOT_MOUNT
in our init script and does not appear to return at all. The other 60% it works as expected without delay.
Then exact same configuration just without PREEMPT_RT patch appears to work 100% of the time.
Does your answer still apply? Should we try the strace?
Post by s***@users.sourceforge.net
In order to get AUFS to compile, we needed to apply the following additiona=
l patch: https://github.com/beagleboard/linux/pull/108/commits/ad740435f28e=
3ffb5d73a419086b0b6fd3bc7240
Hmm, this patch reminds me a post from Daniel Vidal in May 2013 (the
patch looks different from what I suggested, though). Refer to
and its thread.
Hm... interesting. To me (being unfamiliar with the innards of both PREEMPT_RT and AUFS) this
looks like potential for a locking issue. The patch / discussion you linked seems to suggest that
the owner should be set assigned to lock.owner, which is not done in the patch we are using. Mind that
the patch I linked is also just something I stumbled across while googling the initial compile error.
In particular the commit message doesn't sound too confident that the change is actually the correct
fix and not just a way to make it compile.
Do you suggest that we should try to change it to the patch you linked?
Post by s***@users.sourceforge.net
Next time when you post, tell me the version of patches you are using
and the URL where I can get all of thme.
Sorry, here are the versions, I hope this helps (I'm copying this from a yocto build config and I'm not sure in what format it would be most useful for you. I'm not even sure what would be the best tool to use to get from the list of repos with kernel source and patches a compiled kernel without using yocto).
https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.1.30.tar.xz
http://git.linuxfoundation.org/?p=ltsi-kernel.git;a=commit;h=079621627b2d96cf0d85a0413ce5670056a70751w
https://github.com/sfjro/aufs4-standalone.git
https://www.kernel.org/pub/linux/kernel/projects/rt/4.1/older/patches-4.1.30-rt34.tar.xz
Best,
Nikolaus
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
Demmel Nikolaus (BOSP/PAR)
2017-01-17 17:03:21 UTC
Permalink
Hi,

thanks for the datapoint! If debugging doesn't lead to any result, I will try with your version to see if that fixes it for us.

I should also try our version with your patch to fix the compilation instead of the one we are using currently. Maybe that just happens to fix the issue.

Thanks for offering to share the kernel. However, we also have a bunch of (hopefully) unrelated changes we need on our system, so its probably better if we build it ourselves.

Don't worry about your English.

Best,
Nikolaus


-----Original Message-----
From: Daniel Vidal [mailto:***@gmail.com]
Sent: Montag, 16. Januar 2017 20:16
To: Demmel Nikolaus (BOSP/PAR) <***@de.bosch.com>
Cc: aufs-***@lists.sourceforge.net
Subject: Re: AUFS and PREEMPT_RT boot issue

Hi

Now, i continue appliyng my patch to compile KERNEL+RT+AUFS.

I was try to use the rt_mutex_set_owner() wihtout success. I do not
remember what the problem was.

Now i'm running 4.4.38-rt49 Kernel wiht RT and AUFS

Linux x64-v3 4.4.38-rt49 #1 SMP PREEMPT RT Mon Dec 19 10:28:46 CET
2016 x86_64 GNU/Linux

I use the kernel to run a DEBIAN based live system.

If you want test my kernel i can share-it on google drive.

Sorry for my english.


2017-01-16 16:57 GMT+01:00 Demmel Nikolaus (BOSP/PAR)
Post by Demmel Nikolaus (BOSP/PAR)
Hi J. R. Okajima,
thank you for your prompt response.
I'm assuming from your response that in general you expect AUFS to work with PREEMPT_RT, or is this not the case?
Post by s***@users.sourceforge.net
we are using AUFS for our root filesystem with an tmpfs overlay and recentl=
y wanted to switch to a kernel with PREEMPT_RT patch. Unfortunately, it see=
ms that mounting aufs seems to hang about 40% of the time during boot (ther=
e is no specific kernel or log message). When it comes up, it seems to work=
correctly.
Do you mean that mounting aufs took very long time and your system runs
flawlessly after that?
Interesting...
My first suggestion to see what is going on is "strace -T -f mount -t aufs ..."
which will give us a hint.
Sometimes people sets aufs FHSM feature incorrectly. That
setting/configuration should match in kernel-space and user-space. If it
doesn't match, it will take a long time in mounting aufs unexpectedly.
When FHSM is enabled both in kernel and user-space, then it will take a
long time too. But it is an expected behaviour.
What I actually mean is that in about 40% of the time when booting into a kernel with the RT patch, the boot hangs at
mount -t aufs -o "dirs=/rw=rw:/ro=ro" aufs $ROOT_MOUNT
in our init script and does not appear to return at all. The other 60% it works as expected without delay.
Then exact same configuration just without PREEMPT_RT patch appears to work 100% of the time.
Does your answer still apply? Should we try the strace?
Post by s***@users.sourceforge.net
In order to get AUFS to compile, we needed to apply the following additiona=
l patch: https://github.com/beagleboard/linux/pull/108/commits/ad740435f28e=
3ffb5d73a419086b0b6fd3bc7240
Hmm, this patch reminds me a post from Daniel Vidal in May 2013 (the
patch looks different from what I suggested, though). Refer to
and its thread.
Hm... interesting. To me (being unfamiliar with the innards of both PREEMPT_RT and AUFS) this
looks like potential for a locking issue. The patch / discussion you linked seems to suggest that
the owner should be set assigned to lock.owner, which is not done in the patch we are using. Mind that
the patch I linked is also just something I stumbled across while googling the initial compile error.
In particular the commit message doesn't sound too confident that the change is actually the correct
fix and not just a way to make it compile.
Do you suggest that we should try to change it to the patch you linked?
Post by s***@users.sourceforge.net
Next time when you post, tell me the version of patches you are using
and the URL where I can get all of thme.
Sorry, here are the versions, I hope this helps (I'm copying this from a yocto build config and I'm not sure in what format it would be most useful for you. I'm not even sure what would be the best tool to use to get from the list of repos with kernel source and patches a compiled kernel without using yocto).
https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.1.30.tar.xz
http://git.linuxfoundation.org/?p=ltsi-kernel.git;a=commit;h=079621627b2d96cf0d85a0413ce5670056a70751w
https://github.com/sfjro/aufs4-standalone.git
https://www.kernel.org/pub/linux/kernel/projects/rt/4.1/older/patches-4.1.30-rt34.tar.xz
Best,
Nikolaus
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
Daniel Vidal
2017-01-29 10:48:36 UTC
Permalink
Hi

I have a questiĆ³n...

Demmel, have you a Radeon graphics card?



2017-01-17 18:03 GMT+01:00 Demmel Nikolaus (BOSP/PAR)
Post by Demmel Nikolaus (BOSP/PAR)
Hi,
thanks for the datapoint! If debugging doesn't lead to any result, I will try with your version to see if that fixes it for us.
I should also try our version with your patch to fix the compilation instead of the one we are using currently. Maybe that just happens to fix the issue.
Thanks for offering to share the kernel. However, we also have a bunch of (hopefully) unrelated changes we need on our system, so its probably better if we build it ourselves.
Don't worry about your English.
Best,
Nikolaus
-----Original Message-----
Sent: Montag, 16. Januar 2017 20:16
Subject: Re: AUFS and PREEMPT_RT boot issue
Hi
Now, i continue appliyng my patch to compile KERNEL+RT+AUFS.
I was try to use the rt_mutex_set_owner() wihtout success. I do not
remember what the problem was.
Now i'm running 4.4.38-rt49 Kernel wiht RT and AUFS
Linux x64-v3 4.4.38-rt49 #1 SMP PREEMPT RT Mon Dec 19 10:28:46 CET
2016 x86_64 GNU/Linux
I use the kernel to run a DEBIAN based live system.
If you want test my kernel i can share-it on google drive.
Sorry for my english.
2017-01-16 16:57 GMT+01:00 Demmel Nikolaus (BOSP/PAR)
Post by Demmel Nikolaus (BOSP/PAR)
Hi J. R. Okajima,
thank you for your prompt response.
I'm assuming from your response that in general you expect AUFS to work with PREEMPT_RT, or is this not the case?
Post by s***@users.sourceforge.net
we are using AUFS for our root filesystem with an tmpfs overlay and recentl=
y wanted to switch to a kernel with PREEMPT_RT patch. Unfortunately, it see=
ms that mounting aufs seems to hang about 40% of the time during boot (ther=
e is no specific kernel or log message). When it comes up, it seems to work=
correctly.
Do you mean that mounting aufs took very long time and your system runs
flawlessly after that?
Interesting...
My first suggestion to see what is going on is "strace -T -f mount -t aufs ..."
which will give us a hint.
Sometimes people sets aufs FHSM feature incorrectly. That
setting/configuration should match in kernel-space and user-space. If it
doesn't match, it will take a long time in mounting aufs unexpectedly.
When FHSM is enabled both in kernel and user-space, then it will take a
long time too. But it is an expected behaviour.
What I actually mean is that in about 40% of the time when booting into a kernel with the RT patch, the boot hangs at
mount -t aufs -o "dirs=/rw=rw:/ro=ro" aufs $ROOT_MOUNT
in our init script and does not appear to return at all. The other 60% it works as expected without delay.
Then exact same configuration just without PREEMPT_RT patch appears to work 100% of the time.
Does your answer still apply? Should we try the strace?
Post by s***@users.sourceforge.net
In order to get AUFS to compile, we needed to apply the following additiona=
l patch: https://github.com/beagleboard/linux/pull/108/commits/ad740435f28e=
3ffb5d73a419086b0b6fd3bc7240
Hmm, this patch reminds me a post from Daniel Vidal in May 2013 (the
patch looks different from what I suggested, though). Refer to
and its thread.
Hm... interesting. To me (being unfamiliar with the innards of both PREEMPT_RT and AUFS) this
looks like potential for a locking issue. The patch / discussion you linked seems to suggest that
the owner should be set assigned to lock.owner, which is not done in the patch we are using. Mind that
the patch I linked is also just something I stumbled across while googling the initial compile error.
In particular the commit message doesn't sound too confident that the change is actually the correct
fix and not just a way to make it compile.
Do you suggest that we should try to change it to the patch you linked?
Post by s***@users.sourceforge.net
Next time when you post, tell me the version of patches you are using
and the URL where I can get all of thme.
Sorry, here are the versions, I hope this helps (I'm copying this from a yocto build config and I'm not sure in what format it would be most useful for you. I'm not even sure what would be the best tool to use to get from the list of repos with kernel source and patches a compiled kernel without using yocto).
https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.1.30.tar.xz
http://git.linuxfoundation.org/?p=ltsi-kernel.git;a=commit;h=079621627b2d96cf0d85a0413ce5670056a70751w
https://github.com/sfjro/aufs4-standalone.git
https://www.kernel.org/pub/linux/kernel/projects/rt/4.1/older/patches-4.1.30-rt34.tar.xz
Best,
Nikolaus
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
Demmel Nikolaus (BOSP/PAR)
2017-01-30 09:51:52 UTC
Permalink
Hi,

no, this is an embedded board with INTEL Atom E3845 processor and no dedicated graphics card.

Btw preliminary testing shows that using your patch instead (which sets the owner of the lock) might not lead to the freezes during boot. We have to test more thoroughly though.

Best,
Nikolaus


-----Original Message-----
From: Daniel Vidal [mailto:***@gmail.com]
Sent: Sonntag, 29. Januar 2017 11:49
To: Demmel Nikolaus (BOSP/PAR) <***@de.bosch.com>
Cc: aufs-***@lists.sourceforge.net
Subject: Re: AUFS and PREEMPT_RT boot issue

Hi

I have a questiĆ³n...

Demmel, have you a Radeon graphics card?



2017-01-17 18:03 GMT+01:00 Demmel Nikolaus (BOSP/PAR)
Post by Demmel Nikolaus (BOSP/PAR)
Hi,
thanks for the datapoint! If debugging doesn't lead to any result, I will try with your version to see if that fixes it for us.
I should also try our version with your patch to fix the compilation instead of the one we are using currently. Maybe that just happens to fix the issue.
Thanks for offering to share the kernel. However, we also have a bunch of (hopefully) unrelated changes we need on our system, so its probably better if we build it ourselves.
Don't worry about your English.
Best,
Nikolaus
-----Original Message-----
Sent: Montag, 16. Januar 2017 20:16
Subject: Re: AUFS and PREEMPT_RT boot issue
Hi
Now, i continue appliyng my patch to compile KERNEL+RT+AUFS.
I was try to use the rt_mutex_set_owner() wihtout success. I do not
remember what the problem was.
Now i'm running 4.4.38-rt49 Kernel wiht RT and AUFS
Linux x64-v3 4.4.38-rt49 #1 SMP PREEMPT RT Mon Dec 19 10:28:46 CET
2016 x86_64 GNU/Linux
I use the kernel to run a DEBIAN based live system.
If you want test my kernel i can share-it on google drive.
Sorry for my english.
2017-01-16 16:57 GMT+01:00 Demmel Nikolaus (BOSP/PAR)
Post by Demmel Nikolaus (BOSP/PAR)
Hi J. R. Okajima,
thank you for your prompt response.
I'm assuming from your response that in general you expect AUFS to work with PREEMPT_RT, or is this not the case?
Post by s***@users.sourceforge.net
we are using AUFS for our root filesystem with an tmpfs overlay and recentl=
y wanted to switch to a kernel with PREEMPT_RT patch. Unfortunately, it see=
ms that mounting aufs seems to hang about 40% of the time during boot (ther=
e is no specific kernel or log message). When it comes up, it seems to work=
correctly.
Do you mean that mounting aufs took very long time and your system runs
flawlessly after that?
Interesting...
My first suggestion to see what is going on is "strace -T -f mount -t aufs ..."
which will give us a hint.
Sometimes people sets aufs FHSM feature incorrectly. That
setting/configuration should match in kernel-space and user-space. If it
doesn't match, it will take a long time in mounting aufs unexpectedly.
When FHSM is enabled both in kernel and user-space, then it will take a
long time too. But it is an expected behaviour.
What I actually mean is that in about 40% of the time when booting into a kernel with the RT patch, the boot hangs at
mount -t aufs -o "dirs=/rw=rw:/ro=ro" aufs $ROOT_MOUNT
in our init script and does not appear to return at all. The other 60% it works as expected without delay.
Then exact same configuration just without PREEMPT_RT patch appears to work 100% of the time.
Does your answer still apply? Should we try the strace?
Post by s***@users.sourceforge.net
In order to get AUFS to compile, we needed to apply the following additiona=
l patch: https://github.com/beagleboard/linux/pull/108/commits/ad740435f28e=
3ffb5d73a419086b0b6fd3bc7240
Hmm, this patch reminds me a post from Daniel Vidal in May 2013 (the
patch looks different from what I suggested, though). Refer to
and its thread.
Hm... interesting. To me (being unfamiliar with the innards of both PREEMPT_RT and AUFS) this
looks like potential for a locking issue. The patch / discussion you linked seems to suggest that
the owner should be set assigned to lock.owner, which is not done in the patch we are using. Mind that
the patch I linked is also just something I stumbled across while googling the initial compile error.
In particular the commit message doesn't sound too confident that the change is actually the correct
fix and not just a way to make it compile.
Do you suggest that we should try to change it to the patch you linked?
Post by s***@users.sourceforge.net
Next time when you post, tell me the version of patches you are using
and the URL where I can get all of thme.
Sorry, here are the versions, I hope this helps (I'm copying this from a yocto build config and I'm not sure in what format it would be most useful for you. I'm not even sure what would be the best tool to use to get from the list of repos with kernel source and patches a compiled kernel without using yocto).
https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.1.30.tar.xz
http://git.linuxfoundation.org/?p=ltsi-kernel.git;a=commit;h=079621627b2d96cf0d85a0413ce5670056a70751w
https://github.com/sfjro/aufs4-standalone.git
https://www.kernel.org/pub/linux/kernel/projects/rt/4.1/older/patches-4.1.30-rt34.tar.xz
Best,
Nikolaus
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
s***@users.sourceforge.net
2017-01-17 03:30:47 UTC
Permalink
I'm assuming from your response that in general you expect AUFS to work wit=
h PREEMPT_RT, or is this not the case?
Although I myself don't use RT patch, yes it should work. Of course,
some workaround may be necessary. It won't be clear until lots of tests
and diving into the patches.
What I actually mean is that in about 40% of the time when booting into a k=
ernel with the RT patch, the boot hangs at=20
mount -t aufs -o "dirs=3D/rw=3Drw:/ro=3Dro" aufs $ROOT_MOUNT
in our init script and does not appear to return at all. The other 60% it w=
orks as expected without delay.
I was misunderstanding. Now it is clear that
- mounting aufs sometimes hungs, and you can do nothing but reboot.
- sometimes it doesn't hung.

Often such problem is caused by an unitialized data such as lock
objects. But of course we are not sure currently. The cause may be
somthing like that, or totally different one. Additionally there may
exist the mulitple causes.
Then exact same configuration just without PREEMPT_RT patch appears to work=
100% of the time.
Does your answer still apply? Should we try the strace?
Yes. strace will show us which systemcall hungs. The most suspicous one
is mount(2), but it is better to confirm.
After finding out the systemcall, then we can dive into the kernel
space. Usually embedding printk or MagicSysrq is a good debugging method
to see what is going on and identify the root cause. But in these days,
ftrace and other tracing features are good choices too, though I don't
have much experiences about them.

For debugging the RT patch, git-bisect may be a good choice such as
- prepare linux-4.1.30 git tree.
- apply and git-commit all patches except RT.
- apply RT patch series and git-commit one by one.
- run 'git-bisect start HEAD "just before RT"'
+ HEAD is the last patch/commit in RT series
+ "just before RT" is the commit of 'apply all patches except RT'
+ repeat the rebuild and test based on the bisection.
+ git-bisect will tell you the suspicius patch, if everything goes
well.

The RT patch seriese may not be bisect-able. In this case, git-bisect
won't help.

Choose any debugging way you like, try harder, and you will find the
root cause and fix it.
Do you suggest that we should try to change it to the patch you linked?
No.
Because I don't know what is correct currently.


J. R. Okajima
Demmel Nikolaus (BOSP/PAR)
2017-01-17 16:58:53 UTC
Permalink
Thanks for the tips on how to approach debugging. I will try that. It might take some time, but I will try to report back here with the result or when I get stuck.

An alternative would of course also be to try a newer Kernel, e.g. the one Daniel Vidal is apparently using successfully.

As such, in case someone has additional experiences or versions that "work for me", this information is still highly valuable.

Best,
Nikolaus




-----Original Message-----
From: ***@users.sourceforge.net [mailto:***@users.sourceforge.net]
Sent: Dienstag, 17. Januar 2017 04:31
To: Demmel Nikolaus (BOSP/PAR) <***@de.bosch.com>
Cc: aufs-***@lists.sourceforge.net
Subject: Re: AUFS and PREEMPT_RT boot issue
I'm assuming from your response that in general you expect AUFS to work wit=
h PREEMPT_RT, or is this not the case?
Although I myself don't use RT patch, yes it should work. Of course,
some workaround may be necessary. It won't be clear until lots of tests
and diving into the patches.
What I actually mean is that in about 40% of the time when booting into a k=
ernel with the RT patch, the boot hangs at=20
mount -t aufs -o "dirs=3D/rw=3Drw:/ro=3Dro" aufs $ROOT_MOUNT
in our init script and does not appear to return at all. The other 60% it w=
orks as expected without delay.
I was misunderstanding. Now it is clear that
- mounting aufs sometimes hungs, and you can do nothing but reboot.
- sometimes it doesn't hung.

Often such problem is caused by an unitialized data such as lock
objects. But of course we are not sure currently. The cause may be
somthing like that, or totally different one. Additionally there may
exist the mulitple causes.
Then exact same configuration just without PREEMPT_RT patch appears to work=
100% of the time.
Does your answer still apply? Should we try the strace?
Yes. strace will show us which systemcall hungs. The most suspicous one
is mount(2), but it is better to confirm.
After finding out the systemcall, then we can dive into the kernel
space. Usually embedding printk or MagicSysrq is a good debugging method
to see what is going on and identify the root cause. But in these days,
ftrace and other tracing features are good choices too, though I don't
have much experiences about them.

For debugging the RT patch, git-bisect may be a good choice such as
- prepare linux-4.1.30 git tree.
- apply and git-commit all patches except RT.
- apply RT patch series and git-commit one by one.
- run 'git-bisect start HEAD "just before RT"'
+ HEAD is the last patch/commit in RT series
+ "just before RT" is the commit of 'apply all patches except RT'
+ repeat the rebuild and test based on the bisection.
+ git-bisect will tell you the suspicius patch, if everything goes
well.

The RT patch seriese may not be bisect-able. In this case, git-bisect
won't help.

Choose any debugging way you like, try harder, and you will find the
root cause and fix it.
Do you suggest that we should try to change it to the patch you linked?
No.
Because I don't know what is correct currently.


J. R. Okajima
Loading...