Home Home > GIT Browse > stable
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJiri Slaby <jslaby@suse.cz>2019-02-12 22:18:26 +0100
committerJiri Slaby <jslaby@suse.cz>2019-02-12 22:21:22 +0100
commit729af27a22df48bfe7fb9d92b5e27b001dd56db5 (patch)
tree3a888c948c89797e5ba67d4893b6a5439e5a6724
parent3a0a1ca67860639c423910f5bcd16f0747bdab3a (diff)
futex: Handle early deadlock return correctly (bnc#1012628).
-rw-r--r--patches.kernel.org/4.20.8-319-futex-Handle-early-deadlock-return-correctly.patch233
-rw-r--r--series.conf1
2 files changed, 234 insertions, 0 deletions
diff --git a/patches.kernel.org/4.20.8-319-futex-Handle-early-deadlock-return-correctly.patch b/patches.kernel.org/4.20.8-319-futex-Handle-early-deadlock-return-correctly.patch
new file mode 100644
index 0000000000..6d5cfe5506
--- /dev/null
+++ b/patches.kernel.org/4.20.8-319-futex-Handle-early-deadlock-return-correctly.patch
@@ -0,0 +1,233 @@
+From: Thomas Gleixner <tglx@linutronix.de>
+Date: Tue, 29 Jan 2019 23:15:12 +0100
+Subject: [PATCH] futex: Handle early deadlock return correctly
+References: bnc#1012628
+Patch-mainline: 4.20.8
+Git-commit: 1a1fb985f2e2b85ec0d3dc2e519ee48389ec2434
+
+commit 1a1fb985f2e2b85ec0d3dc2e519ee48389ec2434 upstream.
+
+commit 56222b212e8e ("futex: Drop hb->lock before enqueueing on the
+rtmutex") changed the locking rules in the futex code so that the hash
+bucket lock is not longer held while the waiter is enqueued into the
+rtmutex wait list. This made the lock and the unlock path symmetric, but
+unfortunately the possible early exit from __rt_mutex_proxy_start() due to
+a detected deadlock was not updated accordingly. That allows a concurrent
+unlocker to observe inconsitent state which triggers the warning in the
+unlock path.
+
+futex_lock_pi() futex_unlock_pi()
+ lock(hb->lock)
+ queue(hb_waiter) lock(hb->lock)
+ lock(rtmutex->wait_lock)
+ unlock(hb->lock)
+ // acquired hb->lock
+ hb_waiter = futex_top_waiter()
+ lock(rtmutex->wait_lock)
+ __rt_mutex_proxy_start()
+ ---> fail
+ remove(rtmutex_waiter);
+ ---> returns -EDEADLOCK
+ unlock(rtmutex->wait_lock)
+ // acquired wait_lock
+ wake_futex_pi()
+ rt_mutex_next_owner()
+ --> returns NULL
+ --> WARN
+
+ lock(hb->lock)
+ unqueue(hb_waiter)
+
+The problem is caused by the remove(rtmutex_waiter) in the failure case of
+__rt_mutex_proxy_start() as this lets the unlocker observe a waiter in the
+hash bucket but no waiter on the rtmutex, i.e. inconsistent state.
+
+The original commit handles this correctly for the other early return cases
+(timeout, signal) by delaying the removal of the rtmutex waiter until the
+returning task reacquired the hash bucket lock.
+
+Treat the failure case of __rt_mutex_proxy_start() in the same way and let
+the existing cleanup code handle the eventual handover of the rtmutex
+gracefully. The regular rt_mutex_proxy_start() gains the rtmutex waiter
+removal for the failure case, so that the other callsites are still
+operating correctly.
+
+Add proper comments to the code so all these details are fully documented.
+
+Thanks to Peter for helping with the analysis and writing the really
+valuable code comments.
+
+Fixes: 56222b212e8e ("futex: Drop hb->lock before enqueueing on the rtmutex")
+Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
+Co-developed-by: Peter Zijlstra <peterz@infradead.org>
+Signed-off-by: Peter Zijlstra <peterz@infradead.org>
+Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
+Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
+Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
+Cc: linux-s390@vger.kernel.org
+Cc: Stefan Liebler <stli@linux.ibm.com>
+Cc: Sebastian Sewior <bigeasy@linutronix.de>
+Cc: stable@vger.kernel.org
+Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1901292311410.1950@nanos.tec.linutronix.de
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+Signed-off-by: Jiri Slaby <jslaby@suse.cz>
+---
+ kernel/futex.c | 28 ++++++++++++++++++----------
+ kernel/locking/rtmutex.c | 37 ++++++++++++++++++++++++++++++++-----
+ 2 files changed, 50 insertions(+), 15 deletions(-)
+
+diff --git a/kernel/futex.c b/kernel/futex.c
+index 5cc8083a4c89..4d1b7db04e10 100644
+--- a/kernel/futex.c
++++ b/kernel/futex.c
+@@ -2850,35 +2850,39 @@ static int futex_lock_pi(u32 __user *uaddr, unsigned int flags,
+ * and BUG when futex_unlock_pi() interleaves with this.
+ *
+ * Therefore acquire wait_lock while holding hb->lock, but drop the
+- * latter before calling rt_mutex_start_proxy_lock(). This still fully
+- * serializes against futex_unlock_pi() as that does the exact same
+- * lock handoff sequence.
++ * latter before calling __rt_mutex_start_proxy_lock(). This
++ * interleaves with futex_unlock_pi() -- which does a similar lock
++ * handoff -- such that the latter can observe the futex_q::pi_state
++ * before __rt_mutex_start_proxy_lock() is done.
+ */
+ raw_spin_lock_irq(&q.pi_state->pi_mutex.wait_lock);
+ spin_unlock(q.lock_ptr);
++ /*
++ * __rt_mutex_start_proxy_lock() unconditionally enqueues the @rt_waiter
++ * such that futex_unlock_pi() is guaranteed to observe the waiter when
++ * it sees the futex_q::pi_state.
++ */
+ ret = __rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, current);
+ raw_spin_unlock_irq(&q.pi_state->pi_mutex.wait_lock);
+
+ if (ret) {
+ if (ret == 1)
+ ret = 0;
+-
+- spin_lock(q.lock_ptr);
+- goto no_block;
++ goto cleanup;
+ }
+
+-
+ if (unlikely(to))
+ hrtimer_start_expires(&to->timer, HRTIMER_MODE_ABS);
+
+ ret = rt_mutex_wait_proxy_lock(&q.pi_state->pi_mutex, to, &rt_waiter);
+
++cleanup:
+ spin_lock(q.lock_ptr);
+ /*
+- * If we failed to acquire the lock (signal/timeout), we must
++ * If we failed to acquire the lock (deadlock/signal/timeout), we must
+ * first acquire the hb->lock before removing the lock from the
+- * rt_mutex waitqueue, such that we can keep the hb and rt_mutex
+- * wait lists consistent.
++ * rt_mutex waitqueue, such that we can keep the hb and rt_mutex wait
++ * lists consistent.
+ *
+ * In particular; it is important that futex_unlock_pi() can not
+ * observe this inconsistency.
+@@ -3002,6 +3006,10 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
+ * there is no point where we hold neither; and therefore
+ * wake_futex_pi() must observe a state consistent with what we
+ * observed.
++ *
++ * In particular; this forces __rt_mutex_start_proxy() to
++ * complete such that we're guaranteed to observe the
++ * rt_waiter. Also see the WARN in wake_futex_pi().
+ */
+ raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
+ spin_unlock(&hb->lock);
+diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
+index 581edcc63c26..978d63a8261c 100644
+--- a/kernel/locking/rtmutex.c
++++ b/kernel/locking/rtmutex.c
+@@ -1726,12 +1726,33 @@ void rt_mutex_proxy_unlock(struct rt_mutex *lock,
+ rt_mutex_set_owner(lock, NULL);
+ }
+
++/**
++ * __rt_mutex_start_proxy_lock() - Start lock acquisition for another task
++ * @lock: the rt_mutex to take
++ * @waiter: the pre-initialized rt_mutex_waiter
++ * @task: the task to prepare
++ *
++ * Starts the rt_mutex acquire; it enqueues the @waiter and does deadlock
++ * detection. It does not wait, see rt_mutex_wait_proxy_lock() for that.
++ *
++ * NOTE: does _NOT_ remove the @waiter on failure; must either call
++ * rt_mutex_wait_proxy_lock() or rt_mutex_cleanup_proxy_lock() after this.
++ *
++ * Returns:
++ * 0 - task blocked on lock
++ * 1 - acquired the lock for task, caller should wake it up
++ * <0 - error
++ *
++ * Special API call for PI-futex support.
++ */
+ int __rt_mutex_start_proxy_lock(struct rt_mutex *lock,
+ struct rt_mutex_waiter *waiter,
+ struct task_struct *task)
+ {
+ int ret;
+
++ lockdep_assert_held(&lock->wait_lock);
++
+ if (try_to_take_rt_mutex(lock, task, NULL))
+ return 1;
+
+@@ -1749,9 +1770,6 @@ int __rt_mutex_start_proxy_lock(struct rt_mutex *lock,
+ ret = 0;
+ }
+
+- if (unlikely(ret))
+- remove_waiter(lock, waiter);
+-
+ debug_rt_mutex_print_deadlock(waiter);
+
+ return ret;
+@@ -1763,12 +1781,18 @@ int __rt_mutex_start_proxy_lock(struct rt_mutex *lock,
+ * @waiter: the pre-initialized rt_mutex_waiter
+ * @task: the task to prepare
+ *
++ * Starts the rt_mutex acquire; it enqueues the @waiter and does deadlock
++ * detection. It does not wait, see rt_mutex_wait_proxy_lock() for that.
++ *
++ * NOTE: unlike __rt_mutex_start_proxy_lock this _DOES_ remove the @waiter
++ * on failure.
++ *
+ * Returns:
+ * 0 - task blocked on lock
+ * 1 - acquired the lock for task, caller should wake it up
+ * <0 - error
+ *
+- * Special API call for FUTEX_REQUEUE_PI support.
++ * Special API call for PI-futex support.
+ */
+ int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
+ struct rt_mutex_waiter *waiter,
+@@ -1778,6 +1802,8 @@ int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
+
+ raw_spin_lock_irq(&lock->wait_lock);
+ ret = __rt_mutex_start_proxy_lock(lock, waiter, task);
++ if (unlikely(ret))
++ remove_waiter(lock, waiter);
+ raw_spin_unlock_irq(&lock->wait_lock);
+
+ return ret;
+@@ -1845,7 +1871,8 @@ int rt_mutex_wait_proxy_lock(struct rt_mutex *lock,
+ * @lock: the rt_mutex we were woken on
+ * @waiter: the pre-initialized rt_mutex_waiter
+ *
+- * Attempt to clean up after a failed rt_mutex_wait_proxy_lock().
++ * Attempt to clean up after a failed __rt_mutex_start_proxy_lock() or
++ * rt_mutex_wait_proxy_lock().
+ *
+ * Unless we acquired the lock; we're still enqueued on the wait-list and can
+ * in fact still be granted ownership until we're removed. Therefore we can
+--
+2.20.1
+
diff --git a/series.conf b/series.conf
index eb46de5d66..6f6757c406 100644
--- a/series.conf
+++ b/series.conf
@@ -1055,6 +1055,7 @@
patches.kernel.org/4.20.8-316-dmaengine-bcm2835-Fix-interrupt-race-on-RT.patch
patches.kernel.org/4.20.8-317-dmaengine-bcm2835-Fix-abort-of-transactions.patch
patches.kernel.org/4.20.8-318-dmaengine-imx-dma-fix-wrong-callback-invoke.patch
+ patches.kernel.org/4.20.8-319-futex-Handle-early-deadlock-return-correctly.patch
########################################################
# Build fixes that apply to the vanilla kernel too.