first of all, I’m not an database expert at all, but I managed to set up a system fitting my monitoring needs for more than one year now. Unfortunately, now the problem arises. Short summary of my system:
- Raspberry Pi 3 (Raspbian buster) is running InfluxDB 1.8.5
- Python script is running as a service, writing ~20 datapoints every 5sec plus additional several 100 per month on demand. I would say, that is not too much in total.
My database is now >1GB:
/var/lib/influxdb $ sudo du -hd1 12K ./meta 39M ./wal 1.1G ./data 1.2G .
I started to notice several days/weeks ago, that my system got really laggy. uptime reported load averages >4. I suspected the pretty old sd-card and switched to a proper new one, first only by putting the old image to the new card, later I really reinstalled everything and restored the influx data. It got better, but not really good. uptime reported now load averages in the range of 2. I added a monitor for uptime in my python script and it looks like this:
(at ~22.30 I restarted influxdb)
I did some more analysis and found that I can read the influx log with the command
sudo journalctl -u influxdb.service. In the result I find lots of lines with the content similar to
Apr 26 21:04:22 xxx influxd: ts=2021-04-26T19:04:22.133912Z lvl=info msg="Error replacing new TSM files" log_id=0TksnBt0000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0TktwGCG000 op_name=tsm1_compact_group db_shard_id=534 error="cannot allocate memory"
And here my knowledge really stops. Does anyone know what the issue is and what I can do to "repair" my database without loosing more data?