Home Home > GIT Browse
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTakashi Iwai <tiwai@suse.de>2018-10-15 19:56:59 +0200
committerTakashi Iwai <tiwai@suse.de>2018-10-15 19:56:59 +0200
commit73c344c298033ded83771d310e3dcbd7dceba409 (patch)
tree4b61c00e3baba68f897062dccb67da6648889f69
parent50956854442caffdd65c520f61bb39244e67d28b (diff)
parenta1ef375fea3740081a51101b5dc8a2a140cc190a (diff)
Merge branch 'users/fdmanana/SLE15/for-next' into SLE15
Pull btrfs fixes from Filipe Manana
-rw-r--r--patches.suse/btrfs-fix-file-data-corruption-after-cloning-a-range.patch107
-rw-r--r--patches.suse/btrfs-fix-mount-failure-after-fsync-due-to-hard-link.patch143
-rw-r--r--patches.suse/btrfs-send-fix-invalid-access-to-commit-roots-due-to.patch139
-rw-r--r--series.conf3
4 files changed, 392 insertions, 0 deletions
diff --git a/patches.suse/btrfs-fix-file-data-corruption-after-cloning-a-range.patch b/patches.suse/btrfs-fix-file-data-corruption-after-cloning-a-range.patch
new file mode 100644
index 0000000000..bf0b7e8c62
--- /dev/null
+++ b/patches.suse/btrfs-fix-file-data-corruption-after-cloning-a-range.patch
@@ -0,0 +1,107 @@
+From: Filipe Manana <fdmanana@suse.com>
+Date: Thu, 12 Jul 2018 01:36:43 +0100
+Patch-mainline: 4.18
+Git-commit: bd3599a0e142cd73edd3b6801068ac3f48ac771a
+Subject: [PATCH] Btrfs: fix file data corruption after cloning a range and
+ fsync
+References: bsc#1111901
+
+When we clone a range into a file we can end up dropping existing
+extent maps (or trimming them) and replacing them with new ones if the
+range to be cloned overlaps with a range in the destination inode.
+When that happens we add the new extent maps to the list of modified
+extents in the inode's extent map tree, so that a "fast" fsync (the flag
+BTRFS_INODE_NEEDS_FULL_SYNC not set in the inode) will see the extent maps
+and log corresponding extent items. However, at the end of range cloning
+operation we do truncate all the pages in the affected range (in order to
+ensure future reads will not get stale data). Sometimes this truncation
+will release the corresponding extent maps besides the pages from the page
+cache. If this happens, then a "fast" fsync operation will miss logging
+some extent items, because it relies exclusively on the extent maps being
+present in the inode's extent tree, leading to data loss/corruption if
+the fsync ends up using the same transaction used by the clone operation
+(that transaction was not committed in the meanwhile). An extent map is
+released through the callback btrfs_invalidatepage(), which gets called by
+truncate_inode_pages_range(), and it calls __btrfs_releasepage(). The
+later ends up calling try_release_extent_mapping() which will release the
+extent map if some conditions are met, like the file size being greater
+than 16Mb, gfp flags allow blocking and the range not being locked (which
+is the case during the clone operation) nor being the extent map flagged
+as pinned (also the case for cloning).
+
+The following example, turned into a test for fstests, reproduces the
+issue:
+
+ $ mkfs.btrfs -f /dev/sdb
+ $ mount /dev/sdb /mnt
+
+ $ xfs_io -f -c "pwrite -S 0x18 9000K 6908K" /mnt/foo
+ $ xfs_io -f -c "pwrite -S 0x20 2572K 156K" /mnt/bar
+
+ $ xfs_io -c "fsync" /mnt/bar
+ # reflink destination offset corresponds to the size of file bar,
+ # 2728Kb minus 4Kb.
+ $ xfs_io -c ""reflink ${SCRATCH_MNT}/foo 0 2724K 15908K" /mnt/bar
+ $ xfs_io -c "fsync" /mnt/bar
+
+ $ md5sum /mnt/bar
+ 95a95813a8c2abc9aa75a6c2914a077e /mnt/bar
+
+ <power fail>
+
+ $ mount /dev/sdb /mnt
+ $ md5sum /mnt/bar
+ 207fd8d0b161be8a84b945f0df8d5f8d /mnt/bar
+ # digest should be 95a95813a8c2abc9aa75a6c2914a077e like before the
+ # power failure
+
+In the above example, the destination offset of the clone operation
+corresponds to the size of the "bar" file minus 4Kb. So during the clone
+operation, the extent map covering the range from 2572Kb to 2728Kb gets
+trimmed so that it ends at offset 2724Kb, and a new extent map covering
+the range from 2724Kb to 11724Kb is created. So at the end of the clone
+operation when we ask to truncate the pages in the range from 2724Kb to
+2724Kb + 15908Kb, the page invalidation callback ends up removing the new
+extent map (through try_release_extent_mapping()) when the page at offset
+2724Kb is passed to that callback.
+
+Fix this by setting the bit BTRFS_INODE_NEEDS_FULL_SYNC whenever an extent
+map is removed at try_release_extent_mapping(), forcing the next fsync to
+search for modified extents in the fs/subvolume tree instead of relying on
+the presence of extent maps in memory. This way we can continue doing a
+"fast" fsync if the destination range of a clone operation does not
+overlap with an existing range or if any of the criteria necessary to
+remove an extent map at try_release_extent_mapping() is not met (file
+size not bigger then 16Mb or gfp flags do not allow blocking).
+
+CC: stable@vger.kernel.org # 3.16+
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+---
+ fs/btrfs/extent_io.c | 3 +++
+ 1 file changed, 3 insertions(+)
+
+diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
+index 062b27621e67..1e46d50373ae 100644
+--- a/fs/btrfs/extent_io.c
++++ b/fs/btrfs/extent_io.c
+@@ -4324,6 +4324,7 @@ int try_release_extent_mapping(struct extent_map_tree *map,
+ struct extent_map *em;
+ u64 start = page_offset(page);
+ u64 end = start + PAGE_SIZE - 1;
++ struct btrfs_inode *btrfs_inode = BTRFS_I(page->mapping->host);
+
+ if (gfpflags_allow_blocking(mask) &&
+ page->mapping->host->i_size > SZ_16M) {
+@@ -4346,6 +4347,8 @@ int try_release_extent_mapping(struct extent_map_tree *map,
+ extent_map_end(em) - 1,
+ EXTENT_LOCKED | EXTENT_WRITEBACK,
+ 0, NULL)) {
++ set_bit(BTRFS_INODE_NEEDS_FULL_SYNC,
++ &btrfs_inode->runtime_flags);
+ remove_extent_mapping(map, em);
+ /* once for the rb tree */
+ free_extent_map(em);
+--
+2.19.0
+
diff --git a/patches.suse/btrfs-fix-mount-failure-after-fsync-due-to-hard-link.patch b/patches.suse/btrfs-fix-mount-failure-after-fsync-due-to-hard-link.patch
new file mode 100644
index 0000000000..23923766e7
--- /dev/null
+++ b/patches.suse/btrfs-fix-mount-failure-after-fsync-due-to-hard-link.patch
@@ -0,0 +1,143 @@
+From: Filipe Manana <fdmanana@suse.com>
+Date: Fri, 20 Jul 2018 10:59:06 +0100
+Patch-mainline: 4.19-rc1
+Git-commit: 0d836392cadd5535f4184d46d901a82eb276ed62
+References: bsc#1103543
+Subject: [PATCH] Btrfs: fix mount failure after fsync due to hard link
+ recreation
+
+If we end up with logging an inode reference item which has the same name
+but different index from the one we have persisted, we end up failing when
+replaying the log with an errno value of -EEXIST. The error comes from
+btrfs_add_link(), which is called from add_inode_ref(), when we are
+replaying an inode reference item.
+
+Example scenario where this happens:
+
+ $ mkfs.btrfs -f /dev/sdb
+ $ mount /dev/sdb /mnt
+
+ $ touch /mnt/foo
+ $ ln /mnt/foo /mnt/bar
+
+ $ sync
+
+ # Rename the first hard link (foo) to a new name and rename the second
+ # hard link (bar) to the old name of the first hard link (foo).
+ $ mv /mnt/foo /mnt/qwerty
+ $ mv /mnt/bar /mnt/foo
+
+ # Create a new file, in the same parent directory, with the old name of
+ # the second hard link (bar) and fsync this new file.
+ # We do this instead of calling fsync on foo/qwerty because if we did
+ # that the fsync resulted in a full transaction commit, not triggering
+ # the problem.
+ $ touch /mnt/bar
+ $ xfs_io -c "fsync" /mnt/bar
+
+ <power fail>
+
+ $ mount /dev/sdb /mnt
+ mount: mount /dev/sdb on /mnt failed: File exists
+
+So fix this by checking if a conflicting inode reference exists (same
+name, same parent but different index), removing it (and the associated
+dir index entries from the parent inode) if it exists, before attempting
+to add the new reference.
+
+A test case for fstests follows soon.
+
+CC: stable@vger.kernel.org # 4.4+
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+---
+ fs/btrfs/tree-log.c | 66 +++++++++++++++++++++++++++++++++++++++++++++
+ 1 file changed, 66 insertions(+)
+
+diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
+index 686d566a607c..0dabcb8bab71 100644
+--- a/fs/btrfs/tree-log.c
++++ b/fs/btrfs/tree-log.c
+@@ -1312,6 +1312,46 @@ static int unlink_old_inode_refs(struct btrfs_trans_handle *trans,
+ return ret;
+ }
+
++static int btrfs_inode_ref_exists(struct inode *inode, struct inode *dir,
++ const u8 ref_type, const char *name,
++ const int namelen)
++{
++ struct btrfs_key key;
++ struct btrfs_path *path;
++ const u64 parent_id = btrfs_ino(BTRFS_I(dir));
++ int ret;
++
++ path = btrfs_alloc_path();
++ if (!path)
++ return -ENOMEM;
++
++ key.objectid = btrfs_ino(BTRFS_I(inode));
++ key.type = ref_type;
++ if (key.type == BTRFS_INODE_REF_KEY)
++ key.offset = parent_id;
++ else
++ key.offset = btrfs_extref_hash(parent_id, name, namelen);
++
++ ret = btrfs_search_slot(NULL, BTRFS_I(inode)->root, &key, path, 0, 0);
++ if (ret < 0)
++ goto out;
++ if (ret > 0) {
++ ret = 0;
++ goto out;
++ }
++ if (key.type == BTRFS_INODE_EXTREF_KEY)
++ ret = btrfs_find_name_in_ext_backref(path->nodes[0],
++ path->slots[0], parent_id,
++ name, namelen, NULL);
++ else
++ ret = btrfs_find_name_in_backref(path->nodes[0], path->slots[0],
++ name, namelen, NULL);
++
++out:
++ btrfs_free_path(path);
++ return ret;
++}
++
+ /*
+ * replay one inode back reference item found in the log tree.
+ * eb, slot and key refer to the buffer and key found in the log tree.
+@@ -1421,6 +1461,32 @@ static noinline int add_inode_ref(struct btrfs_trans_handle *trans,
+ }
+ }
+
++ /*
++ * If a reference item already exists for this inode
++ * with the same parent and name, but different index,
++ * drop it and the corresponding directory index entries
++ * from the parent before adding the new reference item
++ * and dir index entries, otherwise we would fail with
++ * -EEXIST returned from btrfs_add_link() below.
++ */
++ ret = btrfs_inode_ref_exists(inode, dir, key->type,
++ name, namelen);
++ if (ret > 0) {
++ ret = btrfs_unlink_inode(trans, root,
++ BTRFS_I(dir),
++ BTRFS_I(inode),
++ name, namelen);
++ /*
++ * If we dropped the link count to 0, bump it so
++ * that later the iput() on the inode will not
++ * free it. We will fixup the link count later.
++ */
++ if (!ret && inode->i_nlink == 0)
++ inc_nlink(inode);
++ }
++ if (ret < 0)
++ goto out;
++
+ /* insert our name */
+ ret = btrfs_add_link(trans, BTRFS_I(dir),
+ BTRFS_I(inode),
+--
+2.19.0
+
diff --git a/patches.suse/btrfs-send-fix-invalid-access-to-commit-roots-due-to.patch b/patches.suse/btrfs-send-fix-invalid-access-to-commit-roots-due-to.patch
new file mode 100644
index 0000000000..2fb834e01f
--- /dev/null
+++ b/patches.suse/btrfs-send-fix-invalid-access-to-commit-roots-due-to.patch
@@ -0,0 +1,139 @@
+From: Robbie Ko <robbieko@synology.com>
+Date: Mon, 14 May 2018 10:51:34 +0800
+Patch-mainline: 4.17
+Git-commit: 6f2f0b394b54e2b159ef969a0b5274e9bbf82ff2
+Subject: [PATCH] Btrfs: send, fix invalid access to commit roots due to
+ concurrent snapshotting
+References: bsc#1111904
+
+[BUG]
+btrfs incremental send BUG happens when creating a snapshot of snapshot
+that is being used by send.
+
+[REASON]
+The problem can happen if while we are doing a send one of the snapshots
+used (parent or send) is snapshotted, because snapshoting implies COWing
+the root of the source subvolume/snapshot.
+
+1. When doing an incremental send, the send process will get the commit
+ roots from the parent and send snapshots, and add references to them
+ through extent_buffer_get().
+
+2. When a snapshot/subvolume is snapshotted, its root node is COWed
+ (transaction.c:create_pending_snapshot()).
+
+3. COWing releases the space used by the node immediately, through:
+
+ __btrfs_cow_block()
+ --btrfs_free_tree_block()
+ ----btrfs_add_free_space(bytenr of node)
+
+4. Because send doesn't hold a transaction open, it's possible that
+ the transaction used to create the snapshot commits, switches the
+ commit root and the old space used by the previous root node gets
+ assigned to some other node allocation. Allocation of a new node will
+ use the existing extent buffer found in memory, which we previously
+ got a reference through extent_buffer_get(), and allow the extent
+ buffer's content (pages) to be modified:
+
+ btrfs_alloc_tree_block
+ --btrfs_reserve_extent
+ ----find_free_extent (get bytenr of old node)
+ --btrfs_init_new_buffer (use bytenr of old node)
+ ----btrfs_find_create_tree_block
+ ------alloc_extent_buffer
+ --------find_extent_buffer (get old node)
+
+5. So send can access invalid memory content and have unpredictable
+ behaviour.
+
+[FIX]
+So we fix the problem by copying the commit roots of the send and
+parent snapshots and use those copies.
+
+CallTrace looks like this:
+ ------------[ cut here ]------------
+ kernel BUG at fs/btrfs/ctree.c:1861!
+ invalid opcode: 0000 [#1] SMP
+ CPU: 6 PID: 24235 Comm: btrfs Tainted: P O 3.10.105 #23721
+ ffff88046652d680 ti: ffff88041b720000 task.ti: ffff88041b720000
+ RIP: 0010:[<ffffffffa08dd0e8>] read_node_slot+0x108/0x110 [btrfs]
+ RSP: 0018:ffff88041b723b68 EFLAGS: 00010246
+ RAX: ffff88043ca6b000 RBX: ffff88041b723c50 RCX: ffff880000000000
+ RDX: 000000000000004c RSI: ffff880314b133f8 RDI: ffff880458b24000
+ RBP: 0000000000000000 R08: 0000000000000001 R09: ffff88041b723c66
+ R10: 0000000000000001 R11: 0000000000001000 R12: ffff8803f3e48890
+ R13: ffff8803f3e48880 R14: ffff880466351800 R15: 0000000000000001
+ FS: 00007f8c321dc8c0(0000) GS:ffff88047fcc0000(0000)
+ CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
+ R2: 00007efd1006d000 CR3: 0000000213a24000 CR4: 00000000003407e0
+ DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
+ DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
+ Stack:
+ ffff88041b723c50 ffff8803f3e48880 ffff8803f3e48890 ffff8803f3e48880
+ ffff880466351800 0000000000000001 ffffffffa08dd9d7 ffff88041b723c50
+ ffff8803f3e48880 ffff88041b723c66 ffffffffa08dde85 a9ff88042d2c4400
+ Call Trace:
+ [<ffffffffa08dd9d7>] ? tree_move_down.isra.33+0x27/0x50 [btrfs]
+ [<ffffffffa08dde85>] ? tree_advance+0xb5/0xc0 [btrfs]
+ [<ffffffffa08e83d4>] ? btrfs_compare_trees+0x2d4/0x760 [btrfs]
+ [<ffffffffa0982050>] ? finish_inode_if_needed+0x870/0x870 [btrfs]
+ [<ffffffffa09841ea>] ? btrfs_ioctl_send+0xeda/0x1050 [btrfs]
+ [<ffffffffa094bd3d>] ? btrfs_ioctl+0x1e3d/0x33f0 [btrfs]
+ [<ffffffff81111133>] ? handle_pte_fault+0x373/0x990
+ [<ffffffff8153a096>] ? atomic_notifier_call_chain+0x16/0x20
+ [<ffffffff81063256>] ? set_task_cpu+0xb6/0x1d0
+ [<ffffffff811122c3>] ? handle_mm_fault+0x143/0x2a0
+ [<ffffffff81539cc0>] ? __do_page_fault+0x1d0/0x500
+ [<ffffffff81062f07>] ? check_preempt_curr+0x57/0x90
+ [<ffffffff8115075a>] ? do_vfs_ioctl+0x4aa/0x990
+ [<ffffffff81034f83>] ? do_fork+0x113/0x3b0
+ [<ffffffff812dd7d7>] ? trace_hardirqs_off_thunk+0x3a/0x6c
+ [<ffffffff81150cc8>] ? SyS_ioctl+0x88/0xa0
+ [<ffffffff8153e422>] ? system_call_fastpath+0x16/0x1b
+ ---[ end trace 29576629ee80b2e1 ]---
+
+Fixes: 7069830a9e38 ("Btrfs: add btrfs_compare_trees function")
+CC: stable@vger.kernel.org # 3.6+
+Signed-off-by: Robbie Ko <robbieko@synology.com>
+Reviewed-by: Filipe Manana <fdmanana@suse.com>
+Signed-off-by: David Sterba <dsterba@suse.com>
+Signed-off-by: Filipe Manana <fdmanana@suse.com>
+---
+ fs/btrfs/ctree.c | 16 ++++++++++++++--
+ 1 file changed, 14 insertions(+), 2 deletions(-)
+
+diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
+index cb917fa6d944..5f7cd9f0005a 100644
+--- a/fs/btrfs/ctree.c
++++ b/fs/btrfs/ctree.c
+@@ -5458,12 +5458,24 @@ int btrfs_compare_trees(struct btrfs_root *left_root,
+ down_read(&fs_info->commit_root_sem);
+ left_level = btrfs_header_level(left_root->commit_root);
+ left_root_level = left_level;
+- left_path->nodes[left_level] = left_root->commit_root;
++ left_path->nodes[left_level] =
++ btrfs_clone_extent_buffer(left_root->commit_root);
++ if (!left_path->nodes[left_level]) {
++ up_read(&fs_info->commit_root_sem);
++ ret = -ENOMEM;
++ goto out;
++ }
+ extent_buffer_get(left_path->nodes[left_level]);
+
+ right_level = btrfs_header_level(right_root->commit_root);
+ right_root_level = right_level;
+- right_path->nodes[right_level] = right_root->commit_root;
++ right_path->nodes[right_level] =
++ btrfs_clone_extent_buffer(right_root->commit_root);
++ if (!right_path->nodes[right_level]) {
++ up_read(&fs_info->commit_root_sem);
++ ret = -ENOMEM;
++ goto out;
++ }
+ extent_buffer_get(right_path->nodes[right_level]);
+ up_read(&fs_info->commit_root_sem);
+
+--
+2.19.0
+
diff --git a/series.conf b/series.conf
index 60750a43e1..058c7878ef 100644
--- a/series.conf
+++ b/series.conf
@@ -15542,6 +15542,7 @@
patches.fixes/ARM-8770-1-kprobes-Prohibit-probing-on-optimized_cal.patch
patches.fixes/ARM-8771-1-kprobes-Prohibit-kprobes-on-do_undefinstr.patch
patches.fixes/ARM-8772-1-kprobes-Prohibit-kprobes-on-get_user-func.patch
+ patches.suse/btrfs-send-fix-invalid-access-to-commit-roots-due-to.patch
patches.suse/btrfs-fix-xattr-loss-after-power-failure.patch
patches.suse/btrfs-fix-duplicate-extents-after-fsync-of-file-with.patch
patches.drivers/qede-Fix-ref-cnt-usage-count.patch
@@ -16895,6 +16896,7 @@
patches.fixes/fat-fix-memory-allocation-failure-handling-of-match_.patch
patches.fixes/mm-huge_memory.c-fix-data-loss-when-splitting-a-file.patch
patches.fixes/mm-memcg-fix-use-after-free-in-mem_cgroup_iter.patch
+ patches.suse/btrfs-fix-file-data-corruption-after-cloning-a-range.patch
patches.arch/vfio-spapr-Use-IOMMU-pageshift-rather-than-pagesize.patch
patches.arch/KVM-PPC-Check-if-IOMMU-page-is-contained-in-the-pinn.patch
patches.fixes/lib-iov_iter-Fix-pipe-handling-in-_copy_to_iter_mcsa.patch
@@ -17081,6 +17083,7 @@
patches.arch/s390-fix-br_r1_trampoline-for-machines-without-exrl.patch
patches.fixes/binfmt_elf-Respect-error-return-from-regset-active.patch
patches.suse/0001-btrfs-Don-t-remove-block-group-still-has-pinned-down.patch
+ patches.suse/btrfs-fix-mount-failure-after-fsync-due-to-hard-link.patch
patches.suse/btrfs-fix-send-failure-when-root-has-deleted-files-s.patch
patches.fixes/ext4-sysfs-print-ext4_super_block-fields-as-little-e.patch
patches.suse/xfs-fix-a-null-pointer-dereference-in-xfs_bmap_exten.patch