Home Home > GIT Browse
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMel Gorman <mgorman@suse.de>2019-03-19 15:55:24 +0000
committerMel Gorman <mgorman@suse.de>2019-03-19 15:55:24 +0000
commit50744d40b04ca2da3f971009f563fa0e6323398a (patch)
tree8107262e62c49c3fd2f2c97841866d39ff3346fa
parent839f28fd9396411219308ed76e37532e340d382c (diff)
sched: Do not re-read h_load_next during hierarchical load
calculation (bnc#1120909).
-rw-r--r--patches.suse/sched-do-not-re-read-h_load_next-during-hierarchical-load-calculation.patch77
-rw-r--r--series.conf2
2 files changed, 79 insertions, 0 deletions
diff --git a/patches.suse/sched-do-not-re-read-h_load_next-during-hierarchical-load-calculation.patch b/patches.suse/sched-do-not-re-read-h_load_next-during-hierarchical-load-calculation.patch
new file mode 100644
index 0000000000..0fe5e22a26
--- /dev/null
+++ b/patches.suse/sched-do-not-re-read-h_load_next-during-hierarchical-load-calculation.patch
@@ -0,0 +1,77 @@
+From 3d78851e1fb18f96eb2247b641f930ce81021942 Mon Sep 17 00:00:00 2001
+From: Mel Gorman <mgorman@techsingularity.net>
+Date: Tue, 19 Mar 2019 08:51:58 +0000
+Subject: [PATCH] sched: Do not re-read h_load_next during hierarchical load
+ calculation
+
+References: bnc#1120909
+Patch-mainline: No, under review, expected in 5.1
+
+A NULL pointer dereference bug was reported on a distribution kernel but
+the same issue should be present on mainline kernel. It occured on s390
+but should not be arch-specific. A partial oops looks like
+
+[775277.408564] Unable to handle kernel pointer dereference in virtual kernel address space
+...
+[775277.408759] Call Trace:
+[775277.408763] ([<0002c11c56899c61>] 0x2c11c56899c61)
+[775277.408766] [<0000000000177bb4>] try_to_wake_up+0xfc/0x450
+[775277.408773] [<000003ff81ede872>] vhost_poll_wakeup+0x3a/0x50 [vhost]
+[775277.408777] [<0000000000194ae4>] __wake_up_common+0xbc/0x178
+[775277.408779] [<0000000000194f86>] __wake_up_common_lock+0x9e/0x160
+[775277.408780] [<00000000001950de>] __wake_up_sync_key+0x4e/0x60
+[775277.408785] [<00000000005d911e>] sock_def_readable+0x5e/0x98
+
+The bug hits any time between 1 hour to 3 days. The dereference occurs
+in update_cfs_rq_h_load when accumulating h_load. The problem is that
+cfq_rq->h_load_next is not protected by any locking and can be updated
+by parallel calls to task_h_load. Depending on the compiler, code may be
+generated that re-reads cfq_rq->h_load_next after the check for NULL and
+then oops when reading se->avg.load_avg. The dissassembly showed that it
+was possible to reread h_load_next after the check for NULL.
+
+While this does not appear to be an issue for later compilers, it's still
+an accident if the correct code is generated. Full locking in this path
+would have high overhead so this patch uses READ_ONCE to read h_load_next
+only once and check for NULL before dereferencing. It was confirmed that
+there were no further oops after 10 days of testing.
+
+As Peter pointed out, it is also necessary to use WRITE_ONCE to avoid any
+potential problems with store tearing.
+
+Fixes: 685207963be9 ("sched: Move h_load calculation to task_h_load()")
+[peterz@infradead.org: Use WRITE_ONCE to protect against store tearing]
+Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
+Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
+Signed-off-by: Mel Gorman <mgorman@suse.com>
+Cc: stable@vger.kernel.org
+---
+ kernel/sched/fair.c | 6 +++---
+ 1 file changed, 3 insertions(+), 3 deletions(-)
+
+diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
+index 310d0637fe4b..5e61a1a99e38 100644
+--- a/kernel/sched/fair.c
++++ b/kernel/sched/fair.c
+@@ -7713,10 +7713,10 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
+ if (cfs_rq->last_h_load_update == now)
+ return;
+
+- cfs_rq->h_load_next = NULL;
++ WRITE_ONCE(cfs_rq->h_load_next, NULL);
+ for_each_sched_entity(se) {
+ cfs_rq = cfs_rq_of(se);
+- cfs_rq->h_load_next = se;
++ WRITE_ONCE(cfs_rq->h_load_next, se);
+ if (cfs_rq->last_h_load_update == now)
+ break;
+ }
+@@ -7726,7 +7726,7 @@ static void update_cfs_rq_h_load(struct cfs_rq *cfs_rq)
+ cfs_rq->last_h_load_update = now;
+ }
+
+- while ((se = cfs_rq->h_load_next) != NULL) {
++ while ((se = READ_ONCE(cfs_rq->h_load_next)) != NULL) {
+ load = cfs_rq->h_load;
+ load = div64_ul(load * se->avg.load_avg,
+ cfs_rq_load_avg(cfs_rq) + 1);
diff --git a/series.conf b/series.conf
index 6cd7b77599..1632ce9dab 100644
--- a/series.conf
+++ b/series.conf
@@ -20993,6 +20993,8 @@
patches.kabi/posix-timers-overrun-change-kABI-fix.patch
+ patches.suse/sched-do-not-re-read-h_load_next-during-hierarchical-load-calculation.patch
+
########################################################
# locking/core
########################################################