authorMike Galbraith <umgwanakikbuti@gmail.com>2016-02-17 04:02:59 +0100
committerBen Hutchings <ben@decadent.org.uk>2016-02-27 14:28:49 +0000
commit34bce1998711e65b8a7ef37eb92641cdc368587a (patch)
parentfeae3ca2e5e1a8f44aa6290255d3d9709985d0b2 (diff)
sched: fix __sched_setscheduler() vs load balancing race
__sched_setscheduler() may release rq->lock in pull_rt_task() as a task is being changed rt -> fair class. load balancing may sneak in, move the task behind __sched_setscheduler()'s back, which explodes in switched_to_fair() when the passed but no longer valid rq is used. Tell can_migrate_task() to say no if ->pi_lock is held. @stable: Kernels that predate SCHED_DEADLINE can use this simple (and tested) check in lieu of backport of the full 18 patch mainline treatment. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> [bwh: Backported to 3.2: - Adjust numbering in the comment - Adjust filename] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Willy Tarreau <w@1wt.eu>
1 files changed, 9 insertions, 0 deletions
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 4c6dae17cd1e..98e103988aad 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -2791,6 +2791,7 @@ int can_migrate_task(struct task_struct *p, struct rq *rq, int this_cpu,
* 1) running (obviously), or
* 2) cannot be migrated to this CPU due to cpus_allowed, or
* 3) are cache-hot on their current CPU.
+ * 4) p->pi_lock is held.
if (!cpumask_test_cpu(this_cpu, tsk_cpus_allowed(p))) {
schedstat_inc(p, se.statistics.nr_failed_migrations_affine);
@@ -2804,6 +2805,14 @@ int can_migrate_task(struct task_struct *p, struct rq *rq, int this_cpu,
+ * rt -> fair class change may be in progress. If we sneak in should
+ * double_lock_balance() release rq->lock, and move the task, we will
+ * cause switched_to_fair() to meet a passed but no longer valid rq.
+ */
+ if (raw_spin_is_locked(&p->pi_lock))
+ return 0;
+ /*
* Aggressive migration if:
* 1) task is cache cold, or
* 2) too many balance attempts have failed.