Home Home > GIT Browse > SLE15-AZURE
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTakashi Iwai <tiwai@suse.de>2019-07-15 21:49:49 +0200
committerTakashi Iwai <tiwai@suse.de>2019-07-15 21:49:49 +0200
commitfd08286c4b5c97d30b9d3133e31e3207b492c5a2 (patch)
tree522860983ee13de87ec63864552f0d25f3c764cc
parent708cf06fa3f59e43da6edfa199a928932ac2a56a (diff)
parenta5cd556b9a836ff234e260539b2770c7ca1aa78e (diff)
Merge branch 'users/fdmanana/SLE15/for-next' into SLE15
Pull btrfs fix from Filipe Manana
-rw-r--r--patches.suse/Btrfs-kill-btrfs_clear_path_blocking.patch314
-rw-r--r--series.conf1
2 files changed, 315 insertions, 0 deletions
diff --git a/patches.suse/Btrfs-kill-btrfs_clear_path_blocking.patch b/patches.suse/Btrfs-kill-btrfs_clear_path_blocking.patch
new file mode 100644
index 0000000000..4d509af7a5
--- /dev/null
+++ b/patches.suse/Btrfs-kill-btrfs_clear_path_blocking.patch
@@ -0,0 +1,314 @@
+From: Liu Bo <bo.liu@linux.alibaba.com>
+Date: Wed, 22 Aug 2018 05:54:37 +0800
+Git-commit: 523983401644ebeb331c923c28c9591c07430a7d
+Patch-mainline: v4.20
+References: bsc#1140139
+Subject: [PATCH] Btrfs: kill btrfs_clear_path_blocking
+
+Btrfs's btree locking has two modes, spinning mode and blocking mode,
+while searching btree, locking is always acquired in spinning mode and
+then converted to blocking mode if necessary, and in some hot paths we may
+switch the locking back to spinning mode by btrfs_clear_path_blocking().
+
+When acquiring locks, both of reader and writer need to wait for blocking
+readers and writers to complete before doing read_lock()/write_lock().
+
+The problem is that btrfs_clear_path_blocking() needs to switch nodes
+in the path to blocking mode at first (by btrfs_set_path_blocking) to
+make lockdep happy before doing its actual clearing blocking job.
+
+When switching to blocking mode from spinning mode, it consists of
+
+step 1) bumping up blocking readers counter and
+step 2) read_unlock()/write_unlock(),
+
+this has caused serious ping-pong effect if there're a great amount of
+concurrent readers/writers, as waiters will be woken up and go to
+sleep immediately.
+
+1) Killing this kind of ping-pong results in a big improvement in my 1600k
+files creation script,
+
+MNT=/mnt/btrfs
+mkfs.btrfs -f /dev/sdf
+mount /dev/def $MNT
+time fsmark -D 10000 -S0 -n 100000 -s 0 -L 1 -l /tmp/fs_log.txt \
+ -d $MNT/0 -d $MNT/1 \
+ -d $MNT/2 -d $MNT/3 \
+ -d $MNT/4 -d $MNT/5 \
+ -d $MNT/6 -d $MNT/7 \
+ -d $MNT/8 -d $MNT/9 \
+ -d $MNT/10 -d $MNT/11 \
+ -d $MNT/12 -d $MNT/13 \
+ -d $MNT/14 -d $MNT/15
+
+w/o patch:
+real 2m27.307s
+user 0m12.839s
+sys 13m42.831s
+
+w/ patch:
+real 1m2.273s
+user 0m15.802s
+sys 8m16.495s
+
+1.1) latency histogram from funclatency[1]
+
+Overall with the patch, there're ~50% less write lock acquisition and
+the 95% max latency that write lock takes also reduces to ~100ms from
+>500ms.
+
+--------------------------------------------
+w/o patch:
+--------------------------------------------
+Function = btrfs_tree_lock
+ msecs : count distribution
+ 0 -> 1 : 2385222 |****************************************|
+ 2 -> 3 : 37147 | |
+ 4 -> 7 : 20452 | |
+ 8 -> 15 : 13131 | |
+ 16 -> 31 : 3877 | |
+ 32 -> 63 : 3900 | |
+ 64 -> 127 : 2612 | |
+ 128 -> 255 : 974 | |
+ 256 -> 511 : 165 | |
+ 512 -> 1023 : 13 | |
+
+Function = btrfs_tree_read_lock
+ msecs : count distribution
+ 0 -> 1 : 6743860 |****************************************|
+ 2 -> 3 : 2146 | |
+ 4 -> 7 : 190 | |
+ 8 -> 15 : 38 | |
+ 16 -> 31 : 4 | |
+
+--------------------------------------------
+w/ patch:
+--------------------------------------------
+Function = btrfs_tree_lock
+ msecs : count distribution
+ 0 -> 1 : 1318454 |****************************************|
+ 2 -> 3 : 6800 | |
+ 4 -> 7 : 3664 | |
+ 8 -> 15 : 2145 | |
+ 16 -> 31 : 809 | |
+ 32 -> 63 : 219 | |
+ 64 -> 127 : 10 | |
+
+Function = btrfs_tree_read_lock
+ msecs : count distribution
+ 0 -> 1 : 6854317 |****************************************|
+ 2 -> 3 : 2383 | |
+ 4 -> 7 : 601 | |
+ 8 -> 15 : 92 | |
+
+2) dbench also proves the improvement,
+dbench -t 120 -D /mnt/btrfs 16
+
+w/o patch:
+Throughput 158.363 MB/sec
+
+w/ patch:
+Throughput 449.52 MB/sec
+
+3) xfstests didn't show any additional failures.
+
+One thing to note is that callers may set path->leave_spinning to have
+all nodes in the path stay in spinning mode, which means callers are
+ready to not sleep before releasing the path, but it won't cause
+problems if they don't want to sleep in blocking mode.
+
+[1]: https://github.com/iovisor/bcc/blob/master/tools/funclatency.py
+
+Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+---
+ fs/btrfs/ctree.c | 57 ++++--------------------------------------------
+ fs/btrfs/ctree.h | 2 --
+ fs/btrfs/delayed-inode.c | 3 ---
+ 3 files changed, 4 insertions(+), 58 deletions(-)
+
+diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
+index 0512723d2fc6..be26e2583ce7 100644
+--- a/fs/btrfs/ctree.c
++++ b/fs/btrfs/ctree.c
+@@ -68,42 +68,6 @@ noinline void btrfs_set_path_blocking(struct btrfs_path *p)
+ }
+ }
+
+-/*
+- * reset all the locked nodes in the patch to spinning locks.
+- *
+- * held is used to keep lockdep happy, when lockdep is enabled
+- * we set held to a blocking lock before we go around and
+- * retake all the spinlocks in the path. You can safely use NULL
+- * for held
+- */
+-noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
+- struct extent_buffer *held, int held_rw)
+-{
+- int i;
+-
+- if (held) {
+- btrfs_set_lock_blocking_rw(held, held_rw);
+- if (held_rw == BTRFS_WRITE_LOCK)
+- held_rw = BTRFS_WRITE_LOCK_BLOCKING;
+- else if (held_rw == BTRFS_READ_LOCK)
+- held_rw = BTRFS_READ_LOCK_BLOCKING;
+- }
+- btrfs_set_path_blocking(p);
+-
+- for (i = BTRFS_MAX_LEVEL - 1; i >= 0; i--) {
+- if (p->nodes[i] && p->locks[i]) {
+- btrfs_clear_lock_blocking_rw(p->nodes[i], p->locks[i]);
+- if (p->locks[i] == BTRFS_WRITE_LOCK_BLOCKING)
+- p->locks[i] = BTRFS_WRITE_LOCK;
+- else if (p->locks[i] == BTRFS_READ_LOCK_BLOCKING)
+- p->locks[i] = BTRFS_READ_LOCK;
+- }
+- }
+-
+- if (held)
+- btrfs_clear_lock_blocking_rw(held, held_rw);
+-}
+-
+ /* this also releases the path */
+ void btrfs_free_path(struct btrfs_path *p)
+ {
+@@ -1429,7 +1393,6 @@ tree_mod_log_rewind(struct btrfs_fs_info *fs_info, struct btrfs_path *path,
+ }
+ }
+
+- btrfs_clear_path_blocking(path, NULL, BTRFS_READ_LOCK);
+ btrfs_tree_read_unlock_blocking(eb);
+ free_extent_buffer(eb);
+
+@@ -2611,7 +2574,6 @@ setup_nodes_for_search(struct btrfs_trans_handle *trans,
+ btrfs_set_path_blocking(p);
+ reada_for_balance(fs_info, p, level);
+ sret = split_node(trans, root, p, level);
+- btrfs_clear_path_blocking(p, NULL, 0);
+
+ BUG_ON(sret > 0);
+ if (sret) {
+@@ -2632,7 +2594,6 @@ setup_nodes_for_search(struct btrfs_trans_handle *trans,
+ btrfs_set_path_blocking(p);
+ reada_for_balance(fs_info, p, level);
+ sret = balance_level(trans, root, p, level);
+- btrfs_clear_path_blocking(p, NULL, 0);
+
+ if (sret) {
+ ret = sret;
+@@ -2928,7 +2889,10 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+ }
+ cow_done:
+ p->nodes[level] = b;
+- btrfs_clear_path_blocking(p, NULL, 0);
++ /*
++ * Leave path with blocking locks to avoid massive
++ * lock context switch, this is made on purpose.
++ */
+
+ /*
+ * we have a lock on b and as long as we aren't changing
+@@ -3010,8 +2974,6 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+ if (!err) {
+ btrfs_set_path_blocking(p);
+ btrfs_tree_lock(b);
+- btrfs_clear_path_blocking(p, b,
+- BTRFS_WRITE_LOCK);
+ }
+ p->locks[level] = BTRFS_WRITE_LOCK;
+ } else {
+@@ -3019,8 +2981,6 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+ if (!err) {
+ btrfs_set_path_blocking(p);
+ btrfs_tree_read_lock(b);
+- btrfs_clear_path_blocking(p, b,
+- BTRFS_READ_LOCK);
+ }
+ p->locks[level] = BTRFS_READ_LOCK;
+ }
+@@ -3039,7 +2999,6 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+ btrfs_set_path_blocking(p);
+ err = split_leaf(trans, root, key,
+ p, ins_len, ret == 0);
+- btrfs_clear_path_blocking(p, NULL, 0);
+
+ BUG_ON(err > 0);
+ if (err) {
+@@ -3106,7 +3065,6 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
+ while (b) {
+ level = btrfs_header_level(b);
+ p->nodes[level] = b;
+- btrfs_clear_path_blocking(p, NULL, 0);
+
+ /*
+ * we have a lock on b and as long as we aren't changing
+@@ -3152,8 +3110,6 @@ int btrfs_search_old_slot(struct btrfs_root *root, const struct btrfs_key *key,
+ if (!err) {
+ btrfs_set_path_blocking(p);
+ btrfs_tree_read_lock(b);
+- btrfs_clear_path_blocking(p, b,
+- BTRFS_READ_LOCK);
+ }
+ b = tree_mod_log_rewind(fs_info, p, b, time_seq);
+ if (!b) {
+@@ -5358,7 +5314,6 @@ int btrfs_search_forward(struct btrfs_root *root, struct btrfs_key *min_key,
+ path->locks[level - 1] = BTRFS_READ_LOCK;
+ path->nodes[level - 1] = cur;
+ unlock_up(path, level, 1, 0, NULL);
+- btrfs_clear_path_blocking(path, NULL, 0);
+ }
+ out:
+ path->keep_locks = keep_locks;
+@@ -5948,8 +5903,6 @@ int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path,
+ if (!ret) {
+ btrfs_set_path_blocking(path);
+ btrfs_tree_read_lock(next);
+- btrfs_clear_path_blocking(path, next,
+- BTRFS_READ_LOCK);
+ }
+ next_rw_lock = BTRFS_READ_LOCK;
+ }
+@@ -5985,8 +5938,6 @@ int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path,
+ if (!ret) {
+ btrfs_set_path_blocking(path);
+ btrfs_tree_read_lock(next);
+- btrfs_clear_path_blocking(path, next,
+- BTRFS_READ_LOCK);
+ }
+ next_rw_lock = BTRFS_READ_LOCK;
+ }
+diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
+index ad071b77c04a..cabd87614a39 100644
+--- a/fs/btrfs/ctree.h
++++ b/fs/btrfs/ctree.h
+@@ -2938,8 +2938,6 @@ void btrfs_release_path(struct btrfs_path *p);
+ struct btrfs_path *btrfs_alloc_path(void);
+ void btrfs_free_path(struct btrfs_path *p);
+ void btrfs_set_path_blocking(struct btrfs_path *p);
+-void btrfs_clear_path_blocking(struct btrfs_path *p,
+- struct extent_buffer *held, int held_rw);
+ void btrfs_unlock_up_safe(struct btrfs_path *p, int level);
+
+ int btrfs_del_items(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
+index 9b1df3912260..baccc5215fa4 100644
+--- a/fs/btrfs/delayed-inode.c
++++ b/fs/btrfs/delayed-inode.c
+@@ -755,9 +755,6 @@ static int btrfs_batch_insert_items(struct btrfs_root *root,
+ i++;
+ }
+
+- /* reset all the locked nodes in the patch to spinning locks. */
+- btrfs_clear_path_blocking(path, NULL, 0);
+-
+ /* insert the keys of the items */
+ setup_items_for_insert(root, path, keys, data_size,
+ total_data_size, total_size, nitems);
+--
+2.16.4
+
diff --git a/series.conf b/series.conf
index 44697a77e5..c78c218c36 100644
--- a/series.conf
+++ b/series.conf
@@ -19726,6 +19726,7 @@
patches.suse/0004-btrfs-qgroup-Use-generation-aware-subtree-swap-to-ma.patch
patches.suse/0005-btrfs-qgroup-Don-t-trace-subtree-if-we-re-dropping-r.patch
patches.suse/0006-btrfs-qgroup-Only-trace-data-extents-in-leaves-if-we.patch
+ patches.suse/Btrfs-kill-btrfs_clear_path_blocking.patch
patches.suse/0008-btrfs-protect-space-cache-inode-alloc-with-GFP_NOFS.patch
patches.suse/btrfs-make-sure-we-create-all-new-block-groups.patch
patches.suse/btrfs-fix-warning-when-replaying-log-after-fsync-of-.patch