InfluxDB retention policy and data downsampling

By Atomstar on Saturday 15 June 2019 12:52 - Comments (1)
Category: Linux, Views: 1.261

Goal: set up data downsampling scheme on existing collectd data.

This article elaborates the official InfluxDB guide here (https://docs.influxdata.c...wnsampling_and_retention/)
where we already have a database with existing measurements that we want to
keep.

I'd like to downsample my collectd data on ping, memory, load, etc. data from
two machines. I only want high (minute) time resolution for a few days, then
downsample to save space.

Define downsampling queries

It helps to first finetune your downsampling queries manually, such that you can check if you're happy with the result

code:
1
2
3
4
5
6
SELECT mean(*) FROM df GROUP BY time(1h),*
SELECT mean(*) FROM interface GROUP BY time(1h),*
SELECT mean(*) FROM iwinfo GROUP BY time(1h),*
SELECT mean(*) FROM load GROUP BY time(1h),*
SELECT mean(*) FROM memory GROUP BY time(1h),*
SELECT mean(*) FROM ping GROUP BY time(1h),*

Create long retention policies

First create long retention policy to save the existing data to. If we would
create the short one first, the retention policy could prune our data if we're
not fast enough. Note that this is NOT the default policy.

code:
1
CREATE RETENTION POLICY "always" ON "collectd" DURATION INF REPLICATION 1

Manually aggregate old data

Now we manually save our old data into the newly created retention policy.
Note that this is done in a different measurement. To retain tags, we use ",*"
in the GROUP BY clause. If omitted, this would drop all tags.

code:
1
2
3
4
5
6
SELECT mean(*) INTO "always"."df_hourly" FROM "df" GROUP BY time(1h),*
SELECT mean(*) INTO "always"."interface_hourly" FROM "interface" GROUP BY time(1h),*
SELECT mean(*) INTO "always"."iwinfo_hourly" FROM "iwinfo" GROUP BY time(1h),*
SELECT mean(*) INTO "always"."load_hourly" FROM "load" GROUP BY time(1h),*
SELECT mean(*) INTO "always"."memory_hourly" FROM "memory" GROUP BY time(1h),*
SELECT mean(*) INTO "always"."ping_hourly" FROM "ping" GROUP BY time(1h),*

Create 5-day short DEFAULT retention policy

Now that we've safely stored our aggregated data, we can create the short
and default retention policy, 5 days in this case. This means that influxdb
regularly prunes data older than 5 days from this database.

code:
1
CREATE RETENTION POLICY "five_days" ON "collectd" DURATION 5d REPLICATION 1 DEFAULT

Set up continuous queries

Now we're ready to automate the data downsampling using continuous queries. By
default these continuous queries run every 30 minutes, which can be altered in
the configuration file of influxdb.

code:
1
2
3
4
5
6
CREATE CONTINUOUS QUERY "cq_60m_df" ON "collectd" BEGIN SELECT mean(*) INTO "always"."df_hourly" FROM "df" GROUP BY time(1h),* END
CREATE CONTINUOUS QUERY "cq_60m_interface" ON "collectd" BEGIN SELECT mean(*) INTO "always"."interface_hourly" FROM "interface" GROUP BY time(1h),* END
CREATE CONTINUOUS QUERY "cq_60m_iwinfo" ON "collectd" BEGIN SELECT mean(*) INTO "always"."iwinfo_hourly" FROM "iwinfo" GROUP BY time(1h),* END
CREATE CONTINUOUS QUERY "cq_60m_load" ON "collectd" BEGIN SELECT mean(*) INTO "always"."load_hourly" FROM "load" GROUP BY time(1h),* END
CREATE CONTINUOUS QUERY "cq_60m_memory" ON "collectd" BEGIN SELECT mean(*) INTO "always"."memory_hourly" FROM "memory" GROUP BY time(1h),* END
CREATE CONTINUOUS QUERY "cq_60m_ping" ON "collectd" BEGIN SELECT mean(*) INTO "always"."ping_hourly" FROM "ping" GROUP BY time(1h),* END

Check setup

This should show the above commands on the database:

code:
1
SHOW CONTINUOUS QUERIES

The below command should show data filling up:

code:
1
2
3
SELECT * FROM "always"."ping_hourly" ORDER BY DESC LIMIT 10
2019-06-14T16:00:00Z rpi3b 40.960490196078446 ping          dataix.ru
2019-06-14T16:00:00Z rpi3b root     4249448094.8965516 df_complex used

then wait until the hour is over and check again for updates.

Sources

Volgende: How many houses are not sold publicly/funda? 01-06 How many houses are not sold publicly/funda?

Comments


By Tweakers user RobIII, Saturday 15 June 2019 18:51

Also don't forget to enable the retention policies and continuous queries in the configuration.

Comment form
(required)
(required, but will not be displayed)
(optional)