Read playwatch issue 1

3/30/2023

I have been continuing to try and track this issue down.

I could see only getting 75-80% as under performance however, 50% is just way too low. I think only achieving half of the available bandwidth for synchronous reads is more than just under performance at this point. Also the ZFS module parameter for zfs_vdev_scheduler was set to noop. In the data I have presented all the NVMe drive schedulers were set to none under /sys/block/nvme#n1/queue/scheduler. Include any warning/errors/backtraces from the system logs In general, has this issue been noticed with NVMe SSD’s and ZFS and is there a current fix to this issue? If there is no current fix, is this issue being worked on?Īlso, I have tried to duplicate the XFS results using ZFS 0.8.0-rc2 using Direct I/O, but the performance for 0.8.0 read almost exactly matched ZFS 0.7.12 read performance without Direct I/O. In general it seems ZFS is letting request sit in the hardware queues longer than XFS and the Raw Devices causing a huge performance penalty in ZFS reads (effectively cutting the available device bandwidth in half). I have also verified that the output from iostat shows a larger await_r value for ZFS over XFS for these tests as well. In general, ZFS has a significant latency between io_scheudule calls. In plotted data the first 10,000 timestamps were ignored to allow for the file systems to reach a steady state. Below is a link to to histograms as well as total elapsed time in microseconds between io_schedule calls for the tests I described above.

I decided to take timestamps of each call to io_schedule for both ZFS and XFS to measure the latency between the calls. Comparing the ZFS flame graphs to XFS flame graphs, I found that the number of samples between the submit_bio and io_schedule was significantly larger in the ZFS case. It is in this call that io_scheudle is called. In general I found most of the perf samples were occurring in zio_wait. In order to try and solve what was cutting ZFS read performance in half, I generated flame graphs using the following tool: Below are the throughput measurements I collected for each of these case. Even with the ARC enabled we were seeing no performance benefits. We decided to disable the ARC in all cases, because we were reading 128 GB of data, which was exactly equal to 2x available memory on Socket 0. In both the Single ZPool and Multipl ZPool cases I set the record sizes for all pools to 1 MB and I set the primarycache=none. In each of the ZPools, I read a 32 GB file of random data using 6 I/O threads per file with request sizes of 1 MB. In the ZFS Multiple ZPool case I create 4 separate ZPools each consisting of a single VDEV. In the ZFS Single ZPool case I created a single ZPool composed of 4 VDEVs and read 128 GB file of random data using 24 I/O threads with request sizes of 1 MB. In each of the XFS file systems, I read a 32 GB file of random data using 6 I/O threads per file with request sizes of 1 MB using Direct I/O.

In the XFS case, I created a single XFS file system on each of the 4 devices. For the Raw Device tests, I had 6 I/O threads per devices with a request sizes of 1 MB and a total of 32 GB read from each device using Direct I/O. I conducted four tests consisting of measuring throughput for the raw devices, XFS, and ZFS 0.7.12. I also issued asynchronous sequential reads to the file systems/devices while pinning all XDD threads to NUMA domain 1 and Socket 0's memory banks. In order to measure read throughput of ZFS, XFS, and the Raw Devices, the XDD tool was used which is available all cases I am presenting, kernel 4.18.20-100 was used and I disabled all CPU's not on socket 0 within the kernel. Describe how to reproduce the problemīriefly describing our setup, we currently have four PM1724a devices attached to the PCIe root complex in NUMA domain 1 on an AMD EPYC 7401 processor. We are currently seeing poor read performance with ZFS 0.7.12 with our Samsung PM1725a devices in a Dell PowerEdge R7425.

0 Comments

Read playwatch issue 1

Leave a Reply.

Author

Archives

Categories