ZFS Heavy Write Amplification due to Free Space Fragmentation

I have setup ZFS RAID0 Setup for PostgreSQL database. The Storage and Instances are in AWS EC2 and EBS volumes.

NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT pgpool   479G   289G   190G         -    70%    60%  1.00x  ONLINE  -   xvdf  59.9G  36.6G  23.3G         -    71%    61%   xvdg  59.9G  34.7G  25.2G         -    70%    57%   xvdh  59.9G  35.7G  24.2G         -    71%    59%   xvdi  59.9G  35.7G  24.2G         -    71%    59%   xvdj  59.9G  36.3G  23.6G         -    71%    60%   xvdk  59.9G  36.5G  23.4G         -    71%    60%   xvdl  59.9G  36.6G  23.3G         -    71%    61%   xvdm  59.9G  36.6G  23.2G         -    71%    61% 

Previously the FRAG is at 80% on most the devices and we have suffered a heavy write IOPS. As the pool capacity is previously at 75% utilization (400GB), provisioned additional 10GB to each device (400GB + 80GB). Now the FRAG is reduced to 70%. Once important metric is that the write IOPS is much lesser for the same workload.

enter image description here

As per the Cloudwatch metrics after increase in EBS size, write IOPS drastically reduced to 1200 – 1400 IOPS from 4000 IOPS for Master PG and reduced to 600 IOPS from 3000 IOPS for Slave PG. I have suspected that this is due to how FRAG affects IO as explained in this answer.

We have set recordsize=128K as the compressratio is much better than recordsize=8K. I think due to higher recordsize, FRAG is increased quickly and results in write amplification and heavy write IOPS. Will decreasing the record size would prevent write amplification or any other problem which I am missing?


ubuntu@ip-10-0-1-59:~$   sudo zpool get all NAME    PROPERTY                       VALUE                          SOURCE pgpool  size                           479G                           - pgpool  capacity                       60%                            - pgpool  altroot                        -                              default pgpool  health                         ONLINE                         - pgpool  guid                           1565875598252756833            - pgpool  version                        -                              default pgpool  bootfs                         -                              default pgpool  delegation                     on                             default pgpool  autoreplace                    off                            default pgpool  cachefile                      -                              default pgpool  failmode                       wait                           default pgpool  listsnapshots                  off                            default pgpool  autoexpand                     on                             local pgpool  dedupditto                     0                              default pgpool  dedupratio                     1.00x                          - pgpool  free                           190G                           - pgpool  allocated                      289G                           - pgpool  readonly                       off                            - pgpool  ashift                         0                              default pgpool  comment                        -                              default pgpool  expandsize                     -                              - pgpool  freeing                        0                              - pgpool  fragmentation                  71%                            - pgpool  leaked                         0                              - pgpool  multihost                      off                            default pgpool  feature@async_destroy          enabled                        local pgpool  feature@empty_bpobj            enabled                        local pgpool  feature@lz4_compress           active                         local pgpool  feature@multi_vdev_crash_dump  enabled                        local pgpool  feature@spacemap_histogram     active                         local pgpool  feature@enabled_txg            active                         local pgpool  feature@hole_birth             active                         local pgpool  feature@extensible_dataset     active                         local pgpool  feature@embedded_data          active                         local pgpool  feature@bookmarks              enabled                        local pgpool  feature@filesystem_limits      enabled                        local pgpool  feature@large_blocks           enabled                        local pgpool  feature@large_dnode            enabled                        local pgpool  feature@sha512                 enabled                        local pgpool  feature@skein                  enabled                        local pgpool  feature@edonr                  enabled                        local pgpool  feature@userobj_accounting     active                         local 

ZFS Props

ubuntu@ip-10-0-1-59:~$   sudo zfs get all NAME    PROPERTY              VALUE                  SOURCE pgpool  type                  filesystem             - pgpool  creation              Mon Oct  8 18:45 2018  - pgpool  used                  289G                   - pgpool  available             175G                   - pgpool  referenced            288G                   - pgpool  compressratio         5.06x                  - pgpool  mounted               yes                    - pgpool  quota                 none                   default pgpool  reservation           none                   default pgpool  recordsize            128K                   default pgpool  mountpoint            /mnt/PGPOOL            local pgpool  sharenfs              off                    default pgpool  checksum              on                     default pgpool  compression           lz4                    local pgpool  atime                 off                    local pgpool  devices               on                     default pgpool  exec                  on                     default pgpool  setuid                on                     default pgpool  readonly              off                    default pgpool  zoned                 off                    default pgpool  snapdir               hidden                 default pgpool  aclinherit            restricted             default pgpool  createtxg             1                      - pgpool  canmount              on                     default pgpool  xattr                 sa                     local pgpool  copies                1                      default pgpool  version               5                      - pgpool  utf8only              off                    - pgpool  normalization         none                   - pgpool  casesensitivity       sensitive              - pgpool  vscan                 off                    default pgpool  nbmand                off                    default pgpool  sharesmb              off                    default pgpool  refquota              none                   default pgpool  refreservation        none                   default pgpool  guid                  571000568545391306     - pgpool  primarycache          all                    default pgpool  secondarycache        all                    default pgpool  usedbysnapshots       0B                     - pgpool  usedbydataset         288G                   - pgpool  usedbychildren        364M                   - pgpool  usedbyrefreservation  0B                     - pgpool  logbias               throughput             local pgpool  dedup                 off                    default pgpool  mlslabel              none                   default pgpool  sync                  standard               default pgpool  dnodesize             legacy                 default pgpool  refcompressratio      5.07x                  - pgpool  written               288G                   - pgpool  logicalused           1.42T                  - pgpool  logicalreferenced     1.42T                  - pgpool  volmode               default                default pgpool  filesystem_limit      none                   default pgpool  snapshot_limit        none                   default pgpool  filesystem_count      none                   default pgpool  snapshot_count        none                   default pgpool  snapdev               hidden                 default pgpool  acltype               off                    default pgpool  context               none                   default pgpool  fscontext             none                   default pgpool  defcontext            none                   default pgpool  rootcontext           none                   default pgpool  relatime              off                    default pgpool  redundant_metadata    most                   local pgpool  overlay               off                    default