Monitoring Solar Generation stats with InfluxDB, Telegraf and Soliscloud

Solar has been on our wish-list for quite some time, but never quite got beyond the "we should probably look at doing that next year" stage.

Last year, though, things changed: we saw huge energy price rises as the result of Russia's invasion of Ukraine, followed by interest rates rocketing in response to the abject ineptitude of Liz Truss's government. The result was that we decided it was time to bite the bullet and get onto an installer's waiting list.

Solar installations tend to consist of 3 main components - Photovoltaic (PV) Panels, at least one Inverter and a Meter. Some (us included) also add a battery for storage.

The inverter converts DC from the panels (and battery) to AC, but also acts as a router, communicating with each of the other components in order to decide whether to send power to the battery, house or grid.

There are a wide range of Solar Inverters on the market, each with their own pros and cons. In practice though, consumers don't always get much choice over the inverter that they get (at least not unless they're willing to switch between installation companies).

The inverter that came with our installation was manufactured by Ginlong's Solis.

Monitoring

Most modern solar inverters report generation and usage statistics back into infrastructure managed by the manufacturer. Solis, like many others, exposes these metrics to consumers via an online UI offering monitoring of current and historic inverter and panel output as well as this funky diagram

Screenshot of part of the Soliscloud interface, an animated image showing panel, battery and grid output along with usage

Solis's interface, Soliscloud, has an accompanying android app which can also be used to see usage as well as to receive alarms/notifications on your phone.

Building My Own

The navigation is a little arcane, but there's nothing inherently wrong with the Soliscloud interface - it does what it needs to do just fine.

The problem, for me, is simply that the information is locked away in one (proprietary) system, meaning that it isn't possible to factor other sources into any analysis I want to do of the system's performance.

I also prefer, where at all possible, that all my dashboards are in a single place (which is currently Grafana).

Soliscloud has an API though, so I set about writing a Telegraf exec plugin to pull metrics from Soliscloud so that they can be written into InfluxDB for later analysis and visualisation in Grafana.

This post talks about how I set that up, as well as a few issues I ran into along the way.


API Access

If you're looking to set the plugin up, the first thing you're going to need is API access. For whatever reason, it's not enabled by default and needs to be requested via a support ticket (docs on doing that are here).

Once your support ticket has been actioned, you should be able to log into the Soliscloud interface and retrieve your API credentials, which will consist of three things

  • The API url (consisting of scheme, domain and port)
  • An API Key ID
  • A shared secret

If necessary, the credentials can be re-retrieved from the UI at a later date.


Installing and Configuring the Plugin

Once you've got the credentials, it's just a case of hooking the plugin up (Installing Telegraf is easy, so I'll skip over that section).

The first thing to do, is to fetch a copy of my Soliscloud plugin and save it somewhere that Telegraf can access it. The easiest way to do this, is to clone my plugins repo down (that way, any future updates are just a git pull away)

git clone https://github.com/bentasker/telegraf-plugins.git -o bentasker-telegraf-plugins
sudo mv bentasker-telegraf-plugins /usr/local/src/

If you want to test the plugin before adding it to Telegraf, you can provide credentials and invoke manually

export API_ID="<your api id>"
export API_SECRET="<your api secret>"
export API_URL="<your api url>"

/usr/local/src/bentasker-telegraf-plugins/soliscloud/soliscloud.py

After a few seconds, you should see some some line protocol being returned

solar_inverter,type=device,device_type=battery,inverter_id=123456,inverter_sn=78901112,station=13141516178,userId=1920212223,batteryType=1.0,influxdb_database=Systemstats,batteryState=charging batteryPowerUnit="kW",batteryPowerPerc=46.0,batteryHealthPerc=100.0,batteryCurrentStr="A",batteryTodayChargeEnergy=2.5,batteryTodayChargeEnergyStr="kWh",batteryTodayDischargeEnergy=0.3,batteryTodayDischargeEnergyStr="kWh",readingAge=6i,batteryVoltage=50.6,batteryChargeRate=0.612,batteryDischargeRate=0.0,batteryCurrent=12.1
solar_inverter,type=device,device_type=inverter,inverter_id=123456,inverter_sn=78901112,station=13141516178,userId=1920212223,inverter_model=3101,influxdb_database=Systemstats state=3,todayYield=6.4,todayYieldStr="kWh",power_ac=1.097,power_ac_str="kW",temperature=36.8,gridBuyToday=5.0,gridSellToday=0.0,batterySupplyToday=0.3,batteryChargeToday=2.5,readingAge=6i,stationCapacity=3.28,stationCapacityUsedPerc=38.0,consumptionToday=10.0,panel_1=558.0,panel_2=540.0,panel_3=0.0,panel_4=0.0,panel_5=0.0,panel_6=0.0,panel_7=0.0,panel_8=0.0,panel_9=0.0,panel_10=0.0,panel_11=0.0,panel_12=0.0,panel_13=0.0,panel_14=0.0,panel_15=0.0,panel_16=0.0,panel_17=0.0,panel_18=0.0,panel_19=0.0,panel_20=0.0,panel_21=0.0,panel_22=0.0,panel_23=0.0,panel_24=0.0,panel_25=0.0,panel_26=0.0,panel_27=0.0,panel_28=0.0,panel_29=0.0,panel_30=0.0,panel_31=0.0

Once you're happy that the plugin is able to fetch data, Telegraf can be configured to run it by adding an inputs.exec block to Telegraf's config (remember to update the environment setting with your API creds)

[[inputs.exec]]
    commands = [
        "/usr/local/src/bentasker-telegraf-plugins/soliscloud/soliscloud.py",
    ]
    timeout = "60s"

    # The inverter sends stats every 5m, so there's no
    # point checking more regularly
    interval = "5m"

    name_suffix = ""
    data_format = "influx"

    # update the values here with your API credentials
    environment = [
    "API_ID=",
    "API_SECRET=",
    "API_URL=https://www.soliscloud.com:13333"    
    ]

If this is a new Telegraf setup, remember to add your InfluxDB output to the configuration too:

[[outputs.influxdb_v2]]
  ## The URLs of the InfluxDB cluster nodes.
  urls = ["https://eu-central-1-1.aws.cloud2.influxdata.com"]

  ## Token for authentication.
  token = "<token>"

  ## Organization is the name of the organization you wish to write to; must exist.
  organization = "<org name>"

  ## Destination bucket to write into.
  bucket = "telegraf"

All that's needed after that, is a quick restart of Telegraf

systemctl restart telegraf

Metrics should start appearing in InfluxDB every 5 minutes or so.


Metrics

Most of the metrics exposed by the API are incrementing counters (for example gridBuyToday and gridBuyTotal) and so increase as time passes (although the Today counters obviously reset daily).

The plugin passes these through unchanged, so in order to get point-in-time usage rather than a cumulative total an aggregate such as difference needs to be used (it's also possible to achieve this using delta in a Grafana transform).

The current version of the plugin also calculates and exposes a couple of fields that are not directly provided by the API:

  • localSupplyToday: value (in kWh) of energy supplied locally today
  • localSupplyPercToday: Percentage of consumed energy supplied locally today
  • localSupplyTotal: Total kWh supplied by local sources
  • localSupplyPercTotal: Percentage of total consumed energy supplied by local sources

More information on what each of the fields is and means can be found in the plugin's README.


Cumulative Readings & Day 1 Pain

It's probably fair to say that I got a little over-excited when I found out that Soliscloud had an API, because I wrote the plugin about 2 weeks before there was actually any data to fetch. Not having any API responses to examine, I tested against mocked responses built based on the API Doc's description of response structure - testing my understanding of the doc against my understanding of the doc, what could possibly go wrong?

As it turned out, though, I wasn't too far off the mark and, on install day, I was pleased to find that the plugin worked with very few changes needed.

However, as metrics came in, I spotted something odd in one of the fields:

Screenshot of the comment linked above - readings don't line up with the unit the API is claiming

To explain this a little further: for many of the values that the API response provides, there are two attributes: <fieldname> and <fieldname>Str. One gives the reading, whilst the other gives the unit for that reading.

So for gridPurchasedTodayEnergy the API was returning the following

{
   "gridPurchaseTodayEnergy" : "650",
   "gridPurchaseTodayEnergyStr" : "kWh"
}

The API was claiming that we'd purchased 650kWh of energy so far that day. Which is.... uh.... a lot.

I wondered whether, rather than 650 kilowatt hours we'd actually used 650 watt hours and that the unit was wrong, but I couldn't find anything to support even that usage. So, I decided to wait and see whether it corrected itself once usage ticked over 1000.

When I checked stats the next day, the original stats were still wrong, but the current day's stats were reporting correctly (with sub kWh values correctly reported as decimals):

Screenshot of Chronograf showing usage increasing throughout the second day

Whatever had happened on that first day, hadn't corrected itself, but also hadn't repeated on the second.

So, I figured that - rather than being a switch between units - it must be something to do with the inverter having only just been brought online, and therefore wouldn't happen again.

However, a few days later, I experienced similar with a different cumulative counter - gridSellTodayEnergy - it too was reporting a ridiculously high figure. Somehow, near mid-day we'd allegedly gone from exporting nothing to having exported 35 kWh.

It clearly wasn't a bug in the plugin causing this because the Soliscloud UI was reporting exactly the same value

Screenshot of Soliscloud UI showing us doing an impossible amount of export

Further checking confirmed that this was the first day that we'd exported any electricity at all. Based on my experience with gridPurchaseTodayEnergy, I theorised that the counters would start to report correctly on subsequent days. Sure enough, we also exported the next day and readings switched to being reported correctly.

So, whilst I can't quite rationalise what logic might be at play on the Soliscloud side, it does looks like there is an issue with the way that counters are initialised the very first time that they're used. To confuse things further, this only seems to be the case with the Today fields and not the Total ones.

Unfortunately, there isn't much that the plugin can do to mitigate this without running the risk of introducing serious errors into the data in future.

So, if you're reading this post because you're going to have a Solis inverter installed, it is worth being aware that this seems to be an issue and that you'll probably want to discard/ignore initial metrics for the affected fields (so far, it only seems to be those two).

I ignored the false readings in my graphs by simply adding an additional lower timebound into the underlying InfluxQL queries

SELECT
   mean("gridSellToday") AS "Exported to Grid"
FROM "Systemstats"."autogen"."solar_inverter" 
WHERE 
   $timeFilter 
   -- Whatever the main timebounds
   -- constrain to only include readings
   -- starting the day after the spike
   AND time > '2023-06-04T00:00:00Z' 
GROUP BY time($__interval) FILL(null)

It's not the tidiest of solutions, but it's quick and easy and does what's needed.


Dashboarding

With metrics now available in the database, I built a Grafana dashboard to show the current state of the system.

Earlier, I mentioned that part of the reason I was doing this at all, was so that I could factor in data from other sources when visualising the data. So, it seems worth quickly summarising what those are.

  • Weather: Solar panel efficiency is, unsurprisingly, quite heavily driven by the weather. I wanted to include information on sunlight levels captured from my weather station.
  • Energy cost: I wanted to be able to show the monetary value of the energy we were generating/saving. Although Soliscloud's API can export a per-unit value, because of my existing energy usage monitoring, I already keep that updated in OWL Intuition and didn't want to have to keep two sources of truth in sync.

The result is a Grafana dashboard giving an at-a-glance view of Solar generation, household usage and money saved.

Screenshot of my grafana dashboard, it has cells at the top presenting current stats as well as current battery state and charge level. Below that are graphs indicating solar yield and savings

The cost savings graph is generated by running two InfluxQL queries and then applying a Grafana transform.

The first query calculates the amount of locally supplied energy

-- Query A
SELECT
   (mean("batterySupplyToday") + 
   mean("todayYield")) - 
   mean("batteryChargeToday") AS "nongrid" 
FROM "Systemstats"."autogen"."solar_inverter" 
WHERE 
   $timeFilter 
GROUP BY time($__interval) 
FILL(previous)

(I made the graph before I added localSupplyToday to the plugin)

The second fetches pricing information from the measurement used to store stats from Owl Intuition

-- Query B
SELECT 
   mean("unit_cost") AS "mean_unit_cost" 
FROM "Systemstats"."autogen"."power_watts"  
WHERE 
    $timeFilter 
GROUP BY time($__interval)
fill(previous)

Both queries use fill(previous) because the underlying datasources report stats at completely different intervals. fill() writes values into empty windows so that the two sources can be joined for further processing.

Grafana transforms then perform the join before multiplying the fields together

Screenshot of Grafana transforms, applying an outer join on time and then multiplying nongrid by mean_unit_cost

The result is what the local energy used would have cost if we'd bought it in from the grid.

We're not currently receiving payment for exported energy (it takes a little while to get registered), but once we are, a logical improvement would be to add an additional series showing the value of any exports (which are normally paid at a lower per-unit price).


Timezone fun

I ran into another issue whilst trying to display daily statistics.

For example, the following query should extract daily efficiency levels

SELECT 
  last("localSupplyPercToday") AS "LocalPower" 
FROM "Systemstats"."autogen"."solar_inverter" 
WHERE 
   $timeFilter 
GROUP BY time(1d)

However, a few of the days returned claimed that 100% of energy was locally supplied. I'd love for that to be true, but it just isn't.

The cause of this issue is to do with timezones.

InfluxDB uses UTC and the timestamps applied in our write pipeline (Telegraf) are also UTC, so there's no issue there.

The problem is, Soliscloud isn't using UTC and resets the daily cumulative counters at 00:00 BST (23:00 UTC). So, those 100% readings are actually saying that 100% of energy provided between 00:00 and 01:00 BST came from the battery.

The issue is easily addressed, though, because InfluxQL allows timezones to be specified at query time:

SELECT 
  last("localSupplyPercToday") AS "LocalPower" 
FROM "Systemstats"."autogen"."solar_inverter" 
WHERE 
   $timeFilter 
GROUP BY time(1d)
tz('Europe/London')

The result is a more realistic (and really no less pleasing) set of results between 73 and 80%:

Grafana chart showing percentage of power supplied locally per day, values range from 73-79.5%


Downsampling

The API having a 5 minute granularity means that graphing quite long time periods is already relatively inexpensive, however, in a few years time, it's quite likely that I'm going to want to graph years worth of data to help assess how successful the install has been.

It's also quite unlikely that I'm going to want to see what happened now at a 5 minute granularity anyway.

So, I've also set up some downsampling to reduce the granularity of data to 30 minutes (a slightly arbitrary figure, although the rationale is described here).

It's not uncommon to see mean as the aggregate of choice when downsampling. However, where cumulative counters are involved, I tend to prefer to use max. For metrics reflecting rate (for example panel output stats) I also wanted to record bounds and percentiles.

To effect this, I created 3 new hourly downsampling jobs.

One collecting mean:

downsample_soliscloud_stats_mean:
    # Name for the task
    name: "Downsample Soliscloud Solar Stats (Mean)"
    influx: home1x

    # Query the last n mins
    period: 120

    # Window into n minute blocks
    window: 30

    # taken from in_bucket
    bucket: Systemstats
    measurement:
        - solar_inverter

    fields:
        - batteryHealthPerc
        - batteryPowerPerc
        - power_ac
        - state
        - temperature

    aggregates: 
        mean:

    output_influx: 
        - influx: home2xreal
    output_bucket: Systemstats/rp_720d

One collecting the max of cumulative counters

downsample_soliscloud_stats_max:
    # Name for the task
    name: "Downsample Soliscloud Solar Stats (Max)"
    influx: home1x

    # Query the last n mins
    period: 120

    # Window into n minute blocks
    window: 30

    # taken from in_bucket
    bucket: Systemstats
    measurement:
        - solar_inverter

    fields:
        - batteryChargeToday
        - batterySupplyToday
        - batteryTodayChargeEnergy
        - batteryTodayDischargeEnergy
        - consumptionToday
        - consumptionTotal
        - gridBuyToday
        - gridBuyTotal
        - gridSellToday
        - gridSellTotal
        - localSupplyPercToday
        - localSupplyPercTotal
        - localSupplyToday
        - localSupplyTotal
        - readingAge
        - stationCapacity
        - stationCapacityUsedPerc
        - todayUsage
        - todayYield
        - totalYield

    aggregates: 
        max:

    output_influx: 
        - influx: home2xreal
    output_bucket: Systemstats/rp_720d

And one collecting mean, min, max, p99 and p50 for rate based stats

# This job applied multiple aggregates
downsample_soliscloud_stats_multistats:
    # Name for the task
    name: "Downsample Soliscloud Solar Stats (Multiple Outputs)"
    influx: home1x

    # Query the last n mins
    period: 120

    # Window into n minute blocks
    window: 30

    # taken from in_bucket
    bucket: Systemstats
    measurement:
        - solar_inverter

    fields:
        # I only have 2 strings connected
        # so only need panel_1 and panel_2    
        - panel_1
        - panel_2
        - batteryVoltage
        - batteryCurrent
        - batteryChargeRate
        - batteryDischargeRate        
        - consumption

    aggregates: 
        min:
            field_suffix: "_min"
        max:
            field_suffix: "_max"
        mean:
        percentile:
            - 95
            - 50

    output_influx: 
        - influx: home2xreal
    output_bucket: Systemstats/rp_720d

As a result, a copy of my existing dashboard can be used to efficiently graph the same statistics over a much longer time period, as well as being able to show median (p50) and 99th percentiles for battery and panel output.

Screenshot of Grafana showing the median/p50 output of my panels over the course of a couple of days


Alerting

The Soliscloud app will - in theory - display notifications if the inverter enters an alarm state (whether because there's an issue with the battery, or a power cut means that grid power has been lost etc).

However, I didn't want to rely on that alone - I've found that apps that only occasionally alert tend to fail silently on Android (in particular, overly-aggressive memory management at the OS level can cause issues).

The API exposes a numeric indication of the inverter's state:

Screenshot of the Soliscloud API doc showing attribute state - values are 1: online, 2: offline, 3: alarm

So, it should have been quite straightforward to build a simple alert.

However, there was a complication - the values noted in the API doc don't align with the responses given by the API: My inverter consistently reports a state of 3 which should mean it's in an Alarm state (uh-oh).

However, the inverter's working and neither the Soliscloud UI or the app report any alarms

Screenshot of the Soliscloud UI, showing inverter online with 0 alarms

So, the most likely explanation is that the values listed in the doc are wrong (and that 3 is Online).

Rather than relying on assumptions (or going off and deliberately causing alarms), it seemed prudent to instead build an alert based on steady-state: if the value reported for state changes, we want an alert to fire.

I used the following InfluxQL query in Grafana's alerting to implement this

SELECT
    sum("state") 
FROM 
   (
    SELECT 
        difference("state") AS "state" 
    FROM "Systemstats"."autogen"."solar_inverter" 
    WHERE 
        time > now() - 1h
    ) 
GROUP BY time(10m)

The Alert is then configured to take the last 10 minute group and fire if the value of that group is not between 0 and 0.1 (any change will be a whole number, so will exceed this range)

Screenshot of the Grafana alert config, applying Reduce/Last to the query output and then testing whether the result falls between 0 and 0.1

To summarise, the alert:

  • Queries the last hour of state readings
  • Uses difference to calculate the change in value (which should normally be 0)
  • Aggregates into six 10 minute windows, summing the differences in each group
  • Takes the most recent 10 minute window
  • Alerts if the total difference in that window is not between 0 and 0.1

This means that the alert will fire for any state change, whether that's moving to an alarm state or recovering from one.

When an alert does fire, Grafana is able to send me notifications via PagerDuty, as well as by email

Screenshot of the Grafana alert email showing a (forced) alert for inverter state

(I adjusted the query to add 10 onto the result to force an alarm, hence the high value)

The Soliscloud API does also have an alarmList endpoint, so at some point I may extend the plugin to fetch alarm details, but for now the alert notification should serve as a prompt to open the app/login to the UI to see what the issue is.


Conclusion

Although the Soliscloud API has a few oddities, writing a plugin to get metric collection up and running was relatively straightforward. Feeding Solar generation and energy usage metrics into InfluxDB allows me to quite trivially track the efficiency of the system.

Having this data in a common location also means that it's possible to do more in-depth (and interesting) analysis, because it allows easy comparison between data sources. The ability to do that has already proven useful when looking at the counter initialisation issues with gridPurchasedTodayEnergy, as I was able to correlate the energy flow values recorded by my Clamp meter to those being reported by Soliscloud's API.

Although an important step, getting the plugin up and running was really just the start of a wider range of solar related projects:

Solar panels lose some efficiency as temperatures rise, so I thought it might also be interesting to set up a dashboard or (more likely) a Jupyter notebook correlating panel output with solar energy and outdoor temperature in order to chart out the impact of changes in ambient temperature.

Because we track usage of our appliances, it should also be possible to write a notebook that can consume weather forecasts along with historic generation and weather data in order to (roughly) predict when best to turn appliances like the dishwasher on.

Although, for now, actually turning them on will have to be a manual action:

I tooted: It's funny, since they became a thing, I've always had the mindset "WHY would I want my *kitchen appliances* to be smart?"". But, now that we've had our solar install, I'm actually seeing a use for it. What I'd love, is for things like my dishwasher to be on the local network so that (after it's been loaded), something like HomeAssistant can say "We've started exporting energy to the grid, GO GO magic dishwasher". But, it'd have to be local only, and I still don't think a fridge needs smarts

In the medium term though, my intention is to try and build automation to help with this load-shifting (particularly in winter), so that we try and use any excess energy rather than exporting to the grid only to have to buy it back at a higher price later in the day.

And, of course, it should also be possible to build something that takes the initial install cost as well as energy savings/export income into account to track our path towards return-on-investment.

Suffice to say, the metrics that we're now collecting have the potential to help keep me out of trouble for quite some time.