Why is disk IO higher on Debian 10 (MariaDB 10.3) with MySQL replication?

I have a MySQL/MariaDB master-master replication setup that has been working well for several years, the db and tables are not very large (under 200MB for 18 tables). These were on 2 servers running Debian 9 and MariaDB 10.1.44. Now I’ve spun up 2 new servers running Debian 10 and I’m in the process of moving things over to them, but stopped half-way because I’m seeing much higher disk IO usage on the new servers (about 6x more).

So currently, one of the Debian 9 servers and one of the Debian 10 servers are in master-master relationship, with one Debian 9 still being a slave of the master Debian 9 server, and same on the Debian 10 side of things.

I didn’t notice the increased disk IO until after all read/write operations were moved to the Debian 10 master. I was trying to browse tables and saw how slow it was outputting the query results, and it felt like I was on a dial-up connection watching the rows scroll across. It turned out there was some disk contention with the virtual host that was partly responsible, and that problem is now mostly gone.

Now, as you can imagine, none of this is crashing the server with such a "small" set of tables, but as things continue to grow, I’m concerned that there is some underlying mis-configuration which will rear its ugly head at an inopportune time. On the Debian 9 servers, iotop shows steady write IO at around 300-600Kb/s, but on Debian 10 it spikes as high as 6MB/s, and averages around 3MB/s.

Here is the standard config on all 4 servers, everything else is default Debian settings (or MariaDB, as the case may be), full config for Debian 10 at https://pastebin.com/Lk2FR4e3:

max_connections = 1000 query_cache_limit       = 4M query_cache_size        = 0 query_cache_type        = 0 server-id               = 1 # different for each server log_bin                 = /var/log/mysql/mysql-bin.log binlog_do_db            = optimizer replicate-do-db         = optimizer report-host             = xyz.example.com #changed obviously log-slave-updates       = true innodb_log_file_size    = 32M innodb_buffer_pool_size = 256M 

Here are some other settings I’ve tried that don’t seem to make any difference (checked each one by one):

binlog_annotate_row_events = OFF binlog_checksum = NONE binlog_format = STATEMENT innodb_flush_method = O_DIRECT_NO_FSYNC innodb_log_checksums = OFF log_slow_slave_statements = OFF replicate_annotate_row_events = OFF 

I’ve gone through all the settings here that have changed from MariaDB 10.1 to 10.3, and can’t seem to find any that make a difference: https://mariadb.com/kb/en/replication-and-binary-log-system-variables/

I also did a full listing of the server variables and compared the configs on 10.1 to the 10.3 configuration and didn’t find anything obvious. But either I’m missing something, or the problem lies with Debian 10 itself.

Results of SHOW ENGINE INNODB STATUS are here: https://pastebin.com/mJdLQv8k

Now, how about that disk IO, what is it actually doing? I include 3 screenshots here to show what I mean by increased disk IO: Resource graphs on the Debian 10 master

That is from the Debian 10 master, and you can see where I moved operations back to the Debian 9 server (more on that in a second). Notice the disk IO does go down slightly at that point, but not to the levels that we’ll see on the Debian 9 master. Also note that the public bandwidth chart is pretty much only replication traffic, and that the disk IO far outstrips the replication traffic. The private traffic is all the reads/writes from our application servers.

Resource graphs on Debian 9 master

This is the Debian 9 master server, and you can see where I moved all operations back to this server, the private traffic shoots up, but the write IO hovers around 500kB/s. I didn’t have resource graphs being recorded on the old servers, thus the missing bits on the left.

Debian 10 slave server resource graphs

And lastly, for reference, here is the Debian 10 slave server (that will eventually be half of the master<–>master replication). There are no direct reads/writes on this server, all disk IO is from replication.

Just to see what would happen (as I alluded to above), I reverted all direct read/write operations to the Debian 9 master server. While disk IO did fall somewhat on the Debian 10 server, it did not grow on the Debian 9 server to any noticeable extent.

Also, on the Debian 10 slave server, I did STOP SLAVE once to see what happened, and the disk IO went to almost nothing. Doing the same on the Debian 10 master server barely did not have the same drastic effect, though it’s possible there WAS some change that wasn’t obvious; the disk IO numbers on iostat fluctuate much more wildly on the Debian 10 servers than they do on the Debian 9 servers.

So, what is going on here? How can I figure out why MariaDB is writing so much data to disk apparently and/or how can I stop it?

Thanks in advance!

MariaDB 10.3 InnoDB random crash due to memory exhaust on VPS

after migrating to new VPS my server occassionally experiences database crashes. This may happen 1-2x in 2-3 days. Server runs MariaDB 10.3 on CentOS 7 with nginx 1.16. Virtual server has 1 GB memory, no swap. During normal day-peak traffic I see cca 600 – 700 MB of memory occupied, 300 – 400 MB free.

I found logs in /var/log/messages (bellow) from which I understood that InnoDB crashes due to memory exhaust (pls correct me if I misread). But I did not find any DDoS attack or other suspicious activity at the time of crash.

Is there any way to figure out, what’s the reason of memory exhaust? Should I tune up some InnoDB buffer/setting .. ? What should I do to prevent from crashes?

On previous VPS was server running with this configuration for 4-5 years without crash (Maria 10.1 / nginx / CentOS 6).

I am not an database expert, so I will appreciate any advice. Thank you.

Here is /var/log/messages: (crash started at 22:20)

May 27 22:20:01 mysite systemd: Created slice User Slice of root. May 27 22:20:01 mysite systemd: Started Session 28800 of user root. May 27 22:20:01 mysite systemd: Created slice User Slice of nginx. May 27 22:20:01 mysite systemd: Started Session 28801 of user nginx. May 27 22:20:11 mysite kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 May 27 22:20:11 mysite kernel: mysqld cpuset=/ mems_allowed=0 May 27 22:20:11 mysite kernel: CPU: 0 PID: 26756 Comm: mysqld Not tainted 3.10.0-957.12.1.el7.x86_64 #1 May 27 22:20:11 mysite kernel: Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.10.1-1ubuntu1~cloud0 04/01/2014 May 27 22:20:11 mysite kernel: Call Trace: May 27 22:20:11 mysite kernel: [<ffffffff96363021>] dump_stack+0x19/0x1b May 27 22:20:11 mysite kernel: [<ffffffff9635da4a>] dump_header+0x90/0x229 May 27 22:20:11 mysite kernel: [<ffffffff95d01092>] ? ktime_get_ts64+0x52/0xf0 May 27 22:20:11 mysite kernel: [<ffffffff95d582df>] ? delayacct_end+0x8f/0xb0 May 27 22:20:11 mysite kernel: [<ffffffffc033f71a>] ? virtballoon_oom_notify+0x2a/0x70 [virtio_balloon] May 27 22:20:11 mysite kernel: [<ffffffff95dba634>] oom_kill_process+0x254/0x3d0 May 27 22:20:11 mysite kernel: [<ffffffff95dba0dd>] ? oom_unkillable_task+0xcd/0x120 May 27 22:20:11 mysite kernel: [<ffffffff95dba186>] ? find_lock_task_mm+0x56/0xc0 May 27 22:20:11 mysite kernel: [<ffffffff95dbae76>] out_of_memory+0x4b6/0x4f0 May 27 22:20:11 mysite kernel: [<ffffffff9635e54e>] __alloc_pages_slowpath+0x5d6/0x724 May 27 22:20:11 mysite kernel: [<ffffffff95dc1254>] __alloc_pages_nodemask+0x404/0x420 May 27 22:20:11 mysite kernel: [<ffffffff95e0e148>] alloc_pages_current+0x98/0x110 May 27 22:20:11 mysite kernel: [<ffffffff95db6497>] __page_cache_alloc+0x97/0xb0 May 27 22:20:11 mysite kernel: [<ffffffff95db90f8>] filemap_fault+0x298/0x490 May 27 22:20:11 mysite kernel: [<ffffffffc01fdd0e>] __xfs_filemap_fault+0x7e/0x1d0 [xfs] May 27 22:20:11 mysite kernel: [<ffffffff95cc2e00>] ? wake_bit_function+0x40/0x40 May 27 22:20:11 mysite kernel: [<ffffffffc01fdf0c>] xfs_filemap_fault+0x2c/0x30 [xfs] May 27 22:20:11 mysite kernel: [<ffffffff95de462a>] __do_fault.isra.59+0x8a/0x100 May 27 22:20:11 mysite kernel: [<ffffffff95de4bdc>] do_read_fault.isra.61+0x4c/0x1b0 May 27 22:20:11 mysite kernel: [<ffffffff95de9584>] handle_pte_fault+0x2f4/0xd10 May 27 22:20:11 mysite kernel: [<ffffffff95dec0bd>] handle_mm_fault+0x39d/0x9b0 May 27 22:20:11 mysite kernel: [<ffffffff963705e3>] __do_page_fault+0x203/0x4f0 May 27 22:20:11 mysite kernel: [<ffffffff963709b6>] trace_do_page_fault+0x56/0x150 May 27 22:20:11 mysite kernel: [<ffffffff9636ff42>] do_async_page_fault+0x22/0xf0 May 27 22:20:11 mysite kernel: [<ffffffff9636c788>] async_page_fault+0x28/0x30 May 27 22:20:11 mysite kernel: Mem-Info: May 27 22:20:11 mysite kernel: active_anon:187693 inactive_anon:17396 isolated_anon:0#012 active_file:1781 inactive_file:3687 isolated_file:0#012 unevictable:0 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:3639 slab_unreclaimable:7716#012 mapped:10218 shmem:27944 pagetables:13399 bounce:0#012 free:12232 free_pcp:30 free_cma:0 May 27 22:20:11 mysite kernel: Node 0 DMA free:4592kB min:708kB low:884kB high:1060kB active_anon:8412kB inactive_anon:996kB active_file:0kB inactive_file:216kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:472kB shmem:1420kB slab_reclaimable:60kB slab_unreclaimable:380kB kernel_stack:16kB pagetables:728kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:48 all_unreclaimable? yes May 27 22:20:11 mysite kernel: lowmem_reserve[]: 0 972 972 972 May 27 22:20:11 mysite kernel: Node 0 DMA32 free:44212kB min:44344kB low:55428kB high:66516kB active_anon:742360kB inactive_anon:68588kB active_file:7124kB inactive_file:14532kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1032048kB managed:998980kB mlocked:0kB dirty:0kB writeback:0kB mapped:40400kB shmem:110356kB slab_reclaimable:14496kB slab_unreclaimable:30484kB kernel_stack:2704kB pagetables:52868kB unstable:0kB bounce:0kB free_pcp:240kB local_pcp:240kB free_cma:0kB writeback_tmp:0kB pages_scanned:15611 all_unreclaimable? yes May 27 22:20:11 mysite kernel: lowmem_reserve[]: 0 0 0 0 May 27 22:20:11 mysite kernel: Node 0 DMA: 2*4kB (EM) 3*8kB (EM) 5*16kB (UE) 4*32kB (UE) 2*64kB (UM) 1*128kB (E) 2*256kB (UE) 3*512kB (UEM) 2*1024kB (UE) 0*2048kB 0*4096kB = 4592kB May 27 22:20:11 mysite kernel: Node 0 DMA32: 403*4kB (UEM) 597*8kB (UEM) 288*16kB (UEM) 116*32kB (UEM) 41*64kB (UEM) 30*128kB (UEM) 18*256kB (UEM) 18*512kB (UEM) 7*1024kB (UEM) 1*2048kB (U) 0*4096kB = 44212kB May 27 22:20:11 mysite kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB May 27 22:20:11 mysite kernel: 33433 total pagecache pages May 27 22:20:11 mysite kernel: 0 pages in swap cache May 27 22:20:11 mysite kernel: Swap cache stats: add 0, delete 0, find 0/0 May 27 22:20:11 mysite kernel: Free swap  = 0kB May 27 22:20:11 mysite kernel: Total swap = 0kB May 27 22:20:11 mysite kernel: 262010 pages RAM May 27 22:20:11 mysite kernel: 0 pages HighMem/MovableOnly May 27 22:20:11 mysite kernel: 8288 pages reserved May 27 22:20:11 mysite kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name May 27 22:20:11 mysite kernel: [ 1321]     0  1321     9404      357      23        0             0 systemd-journal May 27 22:20:11 mysite kernel: [ 1349]     0  1349    11119      141      24        0         -1000 systemd-udevd May 27 22:20:11 mysite kernel: [ 1573]     0  1573    13880      114      28        0         -1000 auditd May 27 22:20:11 mysite kernel: [ 1933]    81  1933    14524      138      34        0          -900 dbus-daemon May 27 22:20:11 mysite kernel: [ 1961]     0  1961    48774      118      35        0             0 gssproxy May 27 22:20:11 mysite kernel: [ 2001]     0  2001     6594       83      17        0             0 systemd-logind May 27 22:20:11 mysite kernel: [ 2027]   999  2027   153158     1754      59        0             0 polkitd May 27 22:20:11 mysite kernel: [ 2116]   998  2116    29446      115      30        0             0 chronyd May 27 22:20:11 mysite kernel: [ 2385]     0  2385    89521     5530      94        0             0 firewalld May 27 22:20:11 mysite kernel: [ 2961]     0  2961    26866      500      49        0             0 dhclient May 27 22:20:11 mysite kernel: [ 3022]     0  3022   143481     2781      98        0             0 tuned May 27 22:20:11 mysite kernel: [ 3189]     0  3189    22577      280      44        0             0 master May 27 22:20:11 mysite kernel: [ 3191]    89  3191    22647      286      43        0             0 qmgr May 27 22:20:11 mysite kernel: [ 3228]     0  3228    28216      255      58        0         -1000 sshd May 27 22:20:11 mysite kernel: [ 3229]     0  3229    66328     1182      59        0             0 rsyslogd May 27 22:20:11 mysite kernel: [ 3235]     0  3235    31579      170      18        0             0 crond May 27 22:20:11 mysite kernel: [ 3236]     0  3236    27526       33      10        0             0 agetty May 27 22:20:11 mysite kernel: [ 3237]     0  3237    27526       33      11        0             0 agetty May 27 22:20:11 mysite kernel: [15958]     0 15958   130428     6710     147        0             0 php-fpm May 27 22:20:11 mysite kernel: [15982]     0 15982    14721      288      25        0             0 nginx May 27 22:20:11 mysite kernel: [15983]   997 15983    15403      975      31        0             0 nginx May 27 22:20:11 mysite kernel: [21323]   997 21323   245378    11185     347        0             0 php-fpm May 27 22:20:11 mysite kernel: [21327]   997 21327   246079    12329     348        0             0 php-fpm May 27 22:20:11 mysite kernel: [21331]   997 21331   244508     9902     345        0             0 php-fpm May 27 22:20:11 mysite kernel: [21582]   997 21582   225663    12179     340        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 4073]   997  4073   224018     8808     337        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 4076]   997  4076   223517     8433     335        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5510]   997  5510   244374     7954     342        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5517]   997  5517   244782     8575     344        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5519]   997  5519   244994     9463     345        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5520]   997  5520   244763     8907     344        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5521]   997  5521   223316     8418     335        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5522]   997  5522   244493     8212     343        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5523]   997  5523   223509     9244     335        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5524]   997  5524   223996     8921     336        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5527]   997  5527   224017     8661     335        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5536]   997  5536   244945     9254     346        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5546]   997  5546   223528     8214     334        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5551]   997  5551   244849     8735     343        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5557]   997  5557   223438     8154     333        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5558]   997  5558   244862     8565     344        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5559]   997  5559   244877     8601     343        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5561]   997  5561   223924     8706     338        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5562]   997  5562   224018     8866     336        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5563]   997  5563   224008     9187     335        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5564]   997  5564   244965     8904     344        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5566]   997  5566   244403     8290     348        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5567]   997  5567   244920     9447     344        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5568]   997  5568   244514     8736     343        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5569]   997  5569   223477     7876     334        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5570]   997  5570   224004     9054     335        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5574]   997  5574   223319     7945     333        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5575]   997  5575   244729     8709     343        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5576]   997  5576   223435     8296     336        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5577]   997  5577   244711     8679     343        0             0 php-fpm May 27 22:20:11 mysite kernel: [ 5578]   997  5578   244467     8119     344        0             0 php-fpm May 27 22:20:11 mysite kernel: [26393]   996 26393   302816    41380     170        0             0 mysqld May 27 22:20:11 mysite kernel: [10107]    89 10107    22603      271      45        0             0 pickup May 27 22:20:11 mysite kernel: [11273]     0 11273    45598      242      45        0             0 crond May 27 22:20:11 mysite kernel: [11274]     0 11274    45598      242      45        0             0 crond May 27 22:20:11 mysite kernel: [11275]     0 11275    28294       45      11        0             0 sh May 27 22:20:11 mysite kernel: [11276]   997 11276    28294       43      11        0             0 sh May 27 22:20:11 mysite kernel: [11277]   997 11277    96384     2413     139        0             0 php May 27 22:20:11 mysite kernel: [11278]     0 11278    50559     1613      54        0             0 python May 27 22:20:11 mysite kernel: Out of memory: Kill process 26393 (mysqld) score 163 or sacrifice child May 27 22:20:11 mysite kernel: Killed process 26393 (mysqld) total-vm:1211264kB, anon-rss:165520kB, file-rss:0kB, shmem-rss:0kB May 27 22:20:11 mysite systemd: mariadb.service: main process exited, code=killed, status=9/KILL May 27 22:20:11 mysite systemd: Unit mariadb.service entered failed state. May 27 22:20:11 mysite systemd: mariadb.service failed. May 27 22:20:11 mysite systemd: Removed slice User Slice of nginx. May 27 22:20:16 mysite systemd: mariadb.service holdoff time over, scheduling restart. May 27 22:20:16 mysite systemd: Stopped MariaDB 10.3.14 database server. May 27 22:20:16 mysite systemd: Starting MariaDB 10.3.14 database server... May 27 22:20:16 mysite mysqld: 2019-05-27 22:20:16 0 [Note] /usr/sbin/mysqld (mysqld 10.3.14-MariaDB-log) starting as process 11329 ... May 27 22:20:17 mysite systemd: Started MariaDB 10.3.14 database server. May 27 22:20:27 mysite systemd: Removed slice User Slice of root. May 27 22:21:02 mysite systemd: Created slice User Slice of nginx. May 27 22:21:02 mysite systemd: Started Session 28802 of user nginx. 

And here is corresponding /var/log/mysql/errors.log from that crash:

Version: '10.3.14-MariaDB-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server 2019-05-27 22:20:16 0 [Note] InnoDB: Using Linux native AIO 2019-05-27 22:20:16 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2019-05-27 22:20:16 0 [Note] InnoDB: Uses event mutexes 2019-05-27 22:20:16 0 [Note] InnoDB: Compressed tables use zlib 1.2.7 2019-05-27 22:20:16 0 [Note] InnoDB: Number of pools: 1 2019-05-27 22:20:16 0 [Note] InnoDB: Using SSE2 crc32 instructions 2019-05-27 22:20:16 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M 2019-05-27 22:20:16 0 [Note] InnoDB: Completed initialization of buffer pool 2019-05-27 22:20:16 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2019-05-27 22:20:16 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=699238373 2019-05-27 22:20:17 0 [Note] InnoDB: 128 out of 128 rollback segments are active. 2019-05-27 22:20:17 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2019-05-27 22:20:17 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2019-05-27 22:20:17 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2019-05-27 22:20:17 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2019-05-27 22:20:17 0 [Note] InnoDB: Waiting for purge to start 2019-05-27 22:20:17 0 [Note] InnoDB: 10.3.14 started; log sequence number 699238382; transaction id 395866 2019-05-27 22:20:17 0 [Note] Plugin 'FEEDBACK' is disabled. 2019-05-27 22:20:17 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2019-05-27 22:20:17 0 [Note] Recovering after a crash using tc.log 2019-05-27 22:20:17 0 [Note] InnoDB: Buffer pool(s) load completed at 190527 22:20:17 2019-05-27 22:20:17 0 [Note] Starting crash recovery... 2019-05-27 22:20:17 0 [Note] Crash recovery finished. 2019-05-27 22:20:17 0 [Note] Server socket created on IP: '::'. 2019-05-27 22:20:17 0 [Note] Reading of all Master_info entries succeded 2019-05-27 22:20:17 0 [Note] Added new Master_info '' to hash table 2019-05-27 22:20:17 0 [Note] /usr/sbin/mysqld: ready for connections. Version: '10.3.14-MariaDB-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server 

Maria DB 10.3 or My SQL 5.7

Hello Everyone,

This topic may have been discussed somewhere else, and if it is please feel free to move this to the proper location. How… | Read the rest of http://www.webhostingtalk.com/showthread.php?t=1765672&goto=newpost