Collecting Nextcloud User Quota Information With Telegraf

Collecting system level stats from Nextcloud with Telegraf is well documented, and very well supported.

However, I wanted to extract some additional information - current storage quota allocations and usage. Nextcloud allows you to apply a storage quota to each individual user, so I though it'd be useful to be able to monitor for accounts that are getting close to their quota.

The information is a bit more buried within Nextcloud's APIs than the system level stats, and so can not be (as easily) consumed using inputs.http.

This post gives details of an exec plugin which can fetch quota usage, per user, and pass it into Telegraf in InfluxDB line protocol

This post assumes you've already got Telegraf set up and running somewhere that can reach your Nextcloud instance (it might even be running on the same box - mine is).

Nextcloud Setup

The script will need credentials for an admin account within Nextcloud - whilst having admin creds knocking about isn't particularly palatable, Nextcloud expects you to have them to be able to view other users - which isn't an entirely unreasonable position.

It'd be prudent to create a new admin account for this, though, rather than using your existing.

Exec plugin

Save the following as /usr/local/src/telegraf_plugins/nextcloud_user_quota.py (you can change this if needed, but examples below will refer to this path)

The script is also available in my telegraf-plugins Github repo.

#!/usr/bin/env python3
#
# Telegraf Exec plugin to monitor nextcloud user quota usage
#
# Copyright (c) 2021 B Tasker
#
import base64
import requests
import sys
import time

# Config
NEXTCLOUD_DOMAIN=""
NEXTCLOUD_PROTO=""
NEXTCLOUD_PASS=""
MEASUREMENT=""



def makeRequest(path, params=False):
    ''' Place a request to the Nextcloud API


    '''
    if not params:
        params = {}

    params['format'] = "json"
    headers = {
        "Content-Type" : "application/x-www-form-urlencoded",
        "OCS-APIRequest" : "true",
        "Authorization" : f"Basic {ENCODED_AUTH}"

        }


    r = SESSION.get(f"{NEXTCLOUD_PROTO}://{NEXTCLOUD_DOMAIN}/ocs/v2.php/cloud/{path}", params=params, headers=headers)

    if r.status_code == 200:
        return r.json(), 200
    else:
        return False, r.status_code



def getUserList():
    ''' Get a list of users

    '''
    resp_json, stat_code = makeRequest('/users')

    if not resp_json:
        return False, stat_code

    return resp_json['ocs']['data']['users'], stat_code



def getUserInfo(user):
    ''' Fetch info from the API for a username

    '''
    userinfo, stat_code = makeRequest(f'/users/{user}')

    if not userinfo:
        return False, stat_code

    if userinfo['ocs']['data']['quota']['quota'] < 0:
        # Unlimited
        userinfo['ocs']['data']['quota']['quota'] = 0
        userinfo['ocs']['data']['quota']['relative'] = 0.00

    return userinfo['ocs']['data']['quota'], stat_code



def quota_to_lp(user, quota_obj):
    ''' Take a quota object and output Influx line protocol

    '''
    return f"{MEASUREMENT},user={user},hostname={NEXTCLOUD_DOMAIN} quota={quota_obj['quota']}i,free={quota_obj['free']}i,used={quota_obj['used']}i,percent_used={quota_obj['relative']} {TIMESTAMP}"



def status_to_lp(stat_code, user = False):
    ''' Accept a status code and an optional user and create a line of LP

    '''
    if user:
        s = f"{MEASUREMENT},user={user},hostname={NEXTCLOUD_DOMAIN} api_status_code={stat_code} {TIMESTAMP}"    
    else:
        s = f"{MEASUREMENT},user=none,hostname={NEXTCLOUD_DOMAIN} api_status_code={stat_code} {TIMESTAMP}"

    return s



def main():
    ''' Main entrypoint

    '''
    users, stat_code = getUserList()

    print(status_to_lp(stat_code))    
    if not users:
        # API returned an error
        sys.exit(1)

    # Otherwise
    for user in users:
        quota_obj, stat_code = getUserInfo(user)

        print(status_to_lp(stat_code, user))        
        if not quota_obj:
            # API returned an error
            # Other users might work though
            continue

        lp = quota_to_lp(user, quota_obj)
        print(lp)



# Work starts
SESSION=requests.session()
TIMESTAMP=int(time.time()*1000000000) # we use int to prevent an exponent from being used
ENCODED_AUTH=base64.b64encode(bytes(NEXTCLOUD_PASS,'utf-8')).decode()

# Trigger the app
main()

There's a config section at the top that you'll need to edit

NEXTCLOUD_DOMAIN="[domain]" # Insert your nextcloud domain (e.g. nextcloud.example.com)
NEXTCLOUD_PROTO="https" # Should be http or https
NEXTCLOUD_PASS="[user]:[password]" # Username and password for your admin users
MEASUREMENT="nextcloud_quotas" # What should we call the measurement in InfluxDB

Make sure you make the script executable

chmod +x /usr/local/src/telegraf_plugins/nextcloud_user_quota.py

Configuring in Telegraf

Next we need to tell Telegraf to use the plugin. In your Telegraf config (probably /etc/telegraf/telegraf.conf) add the following

[[inputs.exec]]
    commands = [
        "/usr/local/src/telegraf_plugins/nextcloud_user_quotas.py",
    ]
    timeout = "60s"
    interval = "30m"
    name_suffix = ""
    data_format = "influx"

This'll tell Telegraf to run the script every half hour.

You can set it more frequently, but keep in mind that the quota calculations are quite expensive for Nextcloud - when I had it set at 1m intervals, the MySQL instance backing NextCloud got quite CPU happy (I'm not sure, but assume it calculates usage by summing all the file records in the database).

When the script runs, Telegraf will receive line protocol like the following

nextcloud_quotas,user=none,hostname=nextcloud.example.com api_status_code=200 1638040580343771904
nextcloud_quotas,user=adminacct,hostname=nextcloud.example.com api_status_code=200 1638040580343771904
nextcloud_quotas,user=adminacct,hostname=nextcloud.example.com quota=0i,free=470842998784i,used=15209524i,percent_used=0.0 1638040580343771904
nextcloud_quotas,user=btasker,hostname=nextcloud.example.com api_status_code=200 1638040580343771904
nextcloud_quotas,user=btasker,hostname=nextcloud.example.com quota=64424509440i,free=24384307555i,used=40040201885i,percent_used=62.15 1638040580343771904
nextcloud_quotas,user=telegraf_api_adm_poller,hostname=nextcloud.example.com api_status_code=200 1638040580343771904
nextcloud_quotas,user=telegraf_api_adm_poller,hostname=nextcloud.example.com quota=0i,free=470842998784i,used=22868401i,percent_used=0.0 1638040580343771904

There's essentially two groups of data in here, there's API response code tracking (field api_status_code) and actual quota usage

  • quota : 0 for infinite, measured in bytes
  • free : bytes of quota free
  • used : bytes of quota used
  • percent_used : percentage quota used (0.0 if infinite quota)

Graphing

Graphing these is pretty trivial, for example to graph out percentage usage and exclude the admin and telegraf_api_adm_poller accounts, we can use the following Flux

from(bucket: "telegraf/autogen")
|> range(start: v.timeRangeStart, stop: v.timeRangeStart)
|> filter(fn: (r) => r._measurement == "nextcloud_quotas" and r._field == "percent_used")
|> filter(fn: (r) => r.user != "adminacct" and r.user != "telegraf_api_adm_poller")
|> aggregateWindow(every: 5m, fn: mean)
|> keep(columns: ["_time", "user", "_value"])

Which gives us a graph something like

Graphing Percentage Quota used in Nextcloud

If we want to track the rate at which users are consuming their quota at any given time, we can run a query like

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStart)
  |> filter(fn: (r) => r._measurement == "nextcloud_quotas" and r._field == "used")
  |> derivative(unit: 1s, nonNegative: true)
  |> map(fn: (r) => ({r with _value: r._value * 8.0}))
  |> keep(columns: ["_time", "user", "_value"])

Giving us a graph showing the rate at which each user has been writing into storage

Per user write rate

It's also helpful to be able to see at a glance, how much quota has been allocated in total, as well as how much is actually used

Total Quota allocated:

field="quota"

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart)
  |> filter(fn: (r) => r._measurement == "nextcloud_quotas" and r._field == field)
  |> last()
  |> group()
  |> sum()
  |> map(fn: (r) => ({r with _value: r._value /1024/1024/1024}))

Total Quota Used:

field="used"

from(bucket: "telegraf/autogen")
  |> range(start: v.timeRangeStart)
  |> filter(fn: (r) => r._measurement == "nextcloud_quotas" and r._field == field)
  |> last()
  |> group()
  |> sum()
  |> map(fn: (r) => ({r with _value: r._value /1024/1024/1024}))

(Same query just looking at a different field).

Per user write rate