Monitoring Solar Generation stats with InfluxDB, Telegraf and Soliscloud
Solar has been on our wish-list for quite some time, but never quite got beyond the "we should probably look at doing that next year" stage.
Last year, though, things changed: we saw huge energy price rises as the result of Russia's invasion of Ukraine, followed by interest rates rocketing in response to the abject ineptitude of Liz Truss's government. The result was that we decided it was time to bite the bullet and get onto an installer's waiting list.
Solar installations tend to consist of 3 main components - Photovoltaic (PV) Panels, at least one Inverter and a Meter. Some (us included) also add a battery for storage.
The inverter converts DC from the panels (and battery) to AC, but also acts as a router, communicating with each of the other components in order to decide whether to send power to the battery, house or grid.
There are a wide range of Solar Inverters on the market, each with their own pros and cons. In practice though, consumers don't always get much choice over the inverter that they get (at least not unless they're willing to switch between installation companies).
The inverter that came with our installation was manufactured by Ginlong's Solis.
Monitoring
Most modern solar inverters report generation and usage statistics back into infrastructure managed by the manufacturer. Solis, like many others, exposes these metrics to consumers via an online UI offering monitoring of current and historic inverter and panel output as well as this funky diagram
Solis's interface, Soliscloud, has an accompanying android app which can also be used to see usage as well as to receive alarms/notifications on your phone.
Building My Own
The navigation is a little arcane, but there's nothing inherently wrong with the Soliscloud interface - it does what it needs to do just fine.
The problem, for me, is simply that the information is locked away in one (proprietary) system, meaning that it isn't possible to factor other sources into any analysis I want to do of the system's performance.
I also prefer, where at all possible, that all my dashboards are in a single place (which is currently Grafana).
Soliscloud has an API though, so I set about writing a Telegraf exec plugin to pull metrics from Soliscloud so that they can be written into InfluxDB for later analysis and visualisation in Grafana.
This post talks about how I set that up, as well as a few issues I ran into along the way.
API Access
If you're looking to set the plugin up, the first thing you're going to need is API access. For whatever reason, it's not enabled by default and needs to be requested via a support ticket (docs on doing that are here).
Once your support ticket has been actioned, you should be able to log into the Soliscloud interface and retrieve your API credentials, which will consist of three things
- The API url (consisting of scheme, domain and port)
- An API Key ID
- A shared secret
If necessary, the credentials can be re-retrieved from the UI at a later date.
Installing and Configuring the Plugin
Once you've got the credentials, it's just a case of hooking the plugin up (Installing Telegraf is easy, so I'll skip over that section).
The first thing to do, is to fetch a copy of my Soliscloud plugin and save it somewhere that Telegraf can access it. The easiest way to do this, is to clone my plugins repo down (that way, any future updates are just a git pull
away)
git clone https://github.com/bentasker/telegraf-plugins.git -o bentasker-telegraf-plugins
sudo mv bentasker-telegraf-plugins /usr/local/src/
If you want to test the plugin before adding it to Telegraf, you can provide credentials and invoke manually
export API_ID="<your api id>"
export API_SECRET="<your api secret>"
export API_URL="<your api url>"
/usr/local/src/bentasker-telegraf-plugins/soliscloud/soliscloud.py
After a few seconds, you should see some some line protocol being returned
solar_inverter,type=device,device_type=battery,inverter_id=123456,inverter_sn=78901112,station=13141516178,userId=1920212223,batteryType=1.0,influxdb_database=Systemstats,batteryState=charging batteryPowerUnit="kW",batteryPowerPerc=46.0,batteryHealthPerc=100.0,batteryCurrentStr="A",batteryTodayChargeEnergy=2.5,batteryTodayChargeEnergyStr="kWh",batteryTodayDischargeEnergy=0.3,batteryTodayDischargeEnergyStr="kWh",readingAge=6i,batteryVoltage=50.6,batteryChargeRate=0.612,batteryDischargeRate=0.0,batteryCurrent=12.1
solar_inverter,type=device,device_type=inverter,inverter_id=123456,inverter_sn=78901112,station=13141516178,userId=1920212223,inverter_model=3101,influxdb_database=Systemstats state=3,todayYield=6.4,todayYieldStr="kWh",power_ac=1.097,power_ac_str="kW",temperature=36.8,gridBuyToday=5.0,gridSellToday=0.0,batterySupplyToday=0.3,batteryChargeToday=2.5,readingAge=6i,stationCapacity=3.28,stationCapacityUsedPerc=38.0,consumptionToday=10.0,panel_1=558.0,panel_2=540.0,panel_3=0.0,panel_4=0.0,panel_5=0.0,panel_6=0.0,panel_7=0.0,panel_8=0.0,panel_9=0.0,panel_10=0.0,panel_11=0.0,panel_12=0.0,panel_13=0.0,panel_14=0.0,panel_15=0.0,panel_16=0.0,panel_17=0.0,panel_18=0.0,panel_19=0.0,panel_20=0.0,panel_21=0.0,panel_22=0.0,panel_23=0.0,panel_24=0.0,panel_25=0.0,panel_26=0.0,panel_27=0.0,panel_28=0.0,panel_29=0.0,panel_30=0.0,panel_31=0.0
Once you're happy that the plugin is able to fetch data, Telegraf can be configured to run it by adding an inputs.exec
block to Telegraf's config (remember to update the environment
setting with your API creds)
[[inputs.exec]]
commands = [
"/usr/local/src/bentasker-telegraf-plugins/soliscloud/soliscloud.py",
]
timeout = "60s"
# The inverter sends stats every 5m, so there's no
# point checking more regularly
interval = "5m"
name_suffix = ""
data_format = "influx"
# update the values here with your API credentials
environment = [
"API_ID=",
"API_SECRET=",
"API_URL=https://www.soliscloud.com:13333"
]
If this is a new Telegraf setup, remember to add your InfluxDB output to the configuration too:
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
urls = ["https://eu-central-1-1.aws.cloud2.influxdata.com"]
## Token for authentication.
token = "<token>"
## Organization is the name of the organization you wish to write to; must exist.
organization = "<org name>"
## Destination bucket to write into.
bucket = "telegraf"
All that's needed after that, is a quick restart of Telegraf
systemctl restart telegraf
Metrics should start appearing in InfluxDB every 5 minutes or so.
Metrics
Most of the metrics exposed by the API are incrementing counters (for example gridBuyToday
and gridBuyTotal
) and so increase as time passes (although the Today
counters obviously reset daily).
The plugin passes these through unchanged, so in order to get point-in-time usage rather than a cumulative total an aggregate such as difference
needs to be used (it's also possible to achieve this using delta
in a Grafana transform).
The current version of the plugin also calculates and exposes a couple of fields that are not directly provided by the API:
-
localSupplyToday
: value (in kWh) of energy supplied locally today -
localSupplyPercToday
: Percentage of consumed energy supplied locally today -
localSupplyTotal
: Total kWh supplied by local sources -
localSupplyPercTotal
: Percentage of total consumed energy supplied by local sources
More information on what each of the fields is and means can be found in the plugin's README.
Cumulative Readings & Day 1 Pain
It's probably fair to say that I got a little over-excited when I found out that Soliscloud had an API, because I wrote the plugin about 2 weeks before there was actually any data to fetch. Not having any API responses to examine, I tested against mocked responses built based on the API Doc's description of response structure - testing my understanding of the doc against my understanding of the doc, what could possibly go wrong?
As it turned out, though, I wasn't too far off the mark and, on install day, I was pleased to find that the plugin worked with very few changes needed.
However, as metrics came in, I spotted something odd in one of the fields:
To explain this a little further: for many of the values that the API response provides, there are two attributes: <fieldname>
and <fieldname>Str
. One gives the reading, whilst the other gives the unit for that reading.
So for gridPurchasedTodayEnergy
the API was returning the following
{
"gridPurchaseTodayEnergy" : "650",
"gridPurchaseTodayEnergyStr" : "kWh"
}
The API was claiming that we'd purchased 650kWh
of energy so far that day. Which is.... uh.... a lot.
I wondered whether, rather than 650 kilowatt hours we'd actually used 650 watt hours and that the unit was wrong, but I couldn't find anything to support even that usage. So, I decided to wait and see whether it corrected itself once usage ticked over 1000
.
When I checked stats the next day, the original stats were still wrong, but the current day's stats were reporting correctly (with sub kWh values correctly reported as decimals):
Whatever had happened on that first day, hadn't corrected itself, but also hadn't repeated on the second.
So, I figured that - rather than being a switch between units - it must be something to do with the inverter having only just been brought online, and therefore wouldn't happen again.
However, a few days later, I experienced similar with a different cumulative counter - gridSellTodayEnergy
- it too was reporting a ridiculously high figure. Somehow, near mid-day we'd allegedly gone from exporting nothing to having exported 35 kWh.
It clearly wasn't a bug in the plugin causing this because the Soliscloud UI was reporting exactly the same value
Further checking confirmed that this was the first day that we'd exported any electricity at all. Based on my experience with gridPurchaseTodayEnergy
, I theorised that the counters would start to report correctly on subsequent days. Sure enough, we also exported the next day and readings switched to being reported correctly.
So, whilst I can't quite rationalise what logic might be at play on the Soliscloud side, it does looks like there is an issue with the way that counters are initialised the very first time that they're used. To confuse things further, this only seems to be the case with the Today
fields and not the Total
ones.
Unfortunately, there isn't much that the plugin can do to mitigate this without running the risk of introducing serious errors into the data in future.
So, if you're reading this post because you're going to have a Solis inverter installed, it is worth being aware that this seems to be an issue and that you'll probably want to discard/ignore initial metrics for the affected fields (so far, it only seems to be those two).
I ignored the false readings in my graphs by simply adding an additional lower timebound into the underlying InfluxQL queries
SELECT
mean("gridSellToday") AS "Exported to Grid"
FROM "Systemstats"."autogen"."solar_inverter"
WHERE
$timeFilter
-- Whatever the main timebounds
-- constrain to only include readings
-- starting the day after the spike
AND time > '2023-06-04T00:00:00Z'
GROUP BY time($__interval) FILL(null)
It's not the tidiest of solutions, but it's quick and easy and does what's needed.
Dashboarding
With metrics now available in the database, I built a Grafana dashboard to show the current state of the system.
Earlier, I mentioned that part of the reason I was doing this at all, was so that I could factor in data from other sources when visualising the data. So, it seems worth quickly summarising what those are.
- Weather: Solar panel efficiency is, unsurprisingly, quite heavily driven by the weather. I wanted to include information on sunlight levels captured from my weather station.
- Energy cost: I wanted to be able to show the monetary value of the energy we were generating/saving. Although Soliscloud's API can export a per-unit value, because of my existing energy usage monitoring, I already keep that updated in OWL Intuition and didn't want to have to keep two sources of truth in sync.
The result is a Grafana dashboard giving an at-a-glance view of Solar generation, household usage and money saved.
The cost savings graph is generated by running two InfluxQL queries and then applying a Grafana transform.
The first query calculates the amount of locally supplied energy
-- Query A
SELECT
(mean("batterySupplyToday") +
mean("todayYield")) -
mean("batteryChargeToday") AS "nongrid"
FROM "Systemstats"."autogen"."solar_inverter"
WHERE
$timeFilter
GROUP BY time($__interval)
FILL(previous)
(I made the graph before I added localSupplyToday
to the plugin)
The second fetches pricing information from the measurement used to store stats from Owl Intuition
-- Query B
SELECT
mean("unit_cost") AS "mean_unit_cost"
FROM "Systemstats"."autogen"."power_watts"
WHERE
$timeFilter
GROUP BY time($__interval)
fill(previous)
Both queries use fill(previous)
because the underlying datasources report stats at completely different intervals. fill()
writes values into empty windows so that the two sources can be joined for further processing.
Grafana transforms then perform the join before multiplying the fields together
The result is what the local energy used would have cost if we'd bought it in from the grid.
We're not currently receiving payment for exported energy (it takes a little while to get registered), but once we are, a logical improvement would be to add an additional series showing the value of any exports (which are normally paid at a lower per-unit price).
Timezone fun
I ran into another issue whilst trying to display daily statistics.
For example, the following query should extract daily efficiency levels
SELECT
last("localSupplyPercToday") AS "LocalPower"
FROM "Systemstats"."autogen"."solar_inverter"
WHERE
$timeFilter
GROUP BY time(1d)
However, a few of the days returned claimed that 100% of energy was locally supplied. I'd love for that to be true, but it just isn't.
The cause of this issue is to do with timezones.
InfluxDB uses UTC and the timestamps applied in our write pipeline (Telegraf) are also UTC, so there's no issue there.
The problem is, Soliscloud isn't using UTC and resets the daily cumulative counters at 00:00 BST
(23:00 UTC
). So, those 100% readings are actually saying that 100% of energy provided between 00:00
and 01:00
BST came from the battery.
The issue is easily addressed, though, because InfluxQL allows timezones to be specified at query time:
SELECT
last("localSupplyPercToday") AS "LocalPower"
FROM "Systemstats"."autogen"."solar_inverter"
WHERE
$timeFilter
GROUP BY time(1d)
tz('Europe/London')
The result is a more realistic (and really no less pleasing) set of results between 73 and 80%:
Downsampling
The API having a 5 minute granularity means that graphing quite long time periods is already relatively inexpensive, however, in a few years time, it's quite likely that I'm going to want to graph years worth of data to help assess how successful the install has been.
It's also quite unlikely that I'm going to want to see what happened now at a 5 minute granularity anyway.
So, I've also set up some downsampling to reduce the granularity of data to 30 minutes (a slightly arbitrary figure, although the rationale is described here).
It's not uncommon to see mean
as the aggregate of choice when downsampling. However, where cumulative counters are involved, I tend to prefer to use max
. For metrics reflecting rate (for example panel output stats) I also wanted to record bounds and percentiles.
To effect this, I created 3 new hourly downsampling jobs.
One collecting mean
:
downsample_soliscloud_stats_mean:
# Name for the task
name: "Downsample Soliscloud Solar Stats (Mean)"
influx: home1x
# Query the last n mins
period: 120
# Window into n minute blocks
window: 30
# taken from in_bucket
bucket: Systemstats
measurement:
- solar_inverter
fields:
- batteryHealthPerc
- batteryPowerPerc
- power_ac
- state
- temperature
aggregates:
mean:
output_influx:
- influx: home2xreal
output_bucket: Systemstats/rp_720d
One collecting the max
of cumulative counters
downsample_soliscloud_stats_max:
# Name for the task
name: "Downsample Soliscloud Solar Stats (Max)"
influx: home1x
# Query the last n mins
period: 120
# Window into n minute blocks
window: 30
# taken from in_bucket
bucket: Systemstats
measurement:
- solar_inverter
fields:
- batteryChargeToday
- batterySupplyToday
- batteryTodayChargeEnergy
- batteryTodayDischargeEnergy
- consumptionToday
- consumptionTotal
- gridBuyToday
- gridBuyTotal
- gridSellToday
- gridSellTotal
- localSupplyPercToday
- localSupplyPercTotal
- localSupplyToday
- localSupplyTotal
- readingAge
- stationCapacity
- stationCapacityUsedPerc
- todayUsage
- todayYield
- totalYield
aggregates:
max:
output_influx:
- influx: home2xreal
output_bucket: Systemstats/rp_720d
And one collecting mean
, min
, max
, p99
and p50
for rate based stats
# This job applied multiple aggregates
downsample_soliscloud_stats_multistats:
# Name for the task
name: "Downsample Soliscloud Solar Stats (Multiple Outputs)"
influx: home1x
# Query the last n mins
period: 120
# Window into n minute blocks
window: 30
# taken from in_bucket
bucket: Systemstats
measurement:
- solar_inverter
fields:
# I only have 2 strings connected
# so only need panel_1 and panel_2
- panel_1
- panel_2
- batteryVoltage
- batteryCurrent
- batteryChargeRate
- batteryDischargeRate
- consumption
aggregates:
min:
field_suffix: "_min"
max:
field_suffix: "_max"
mean:
percentile:
- 95
- 50
output_influx:
- influx: home2xreal
output_bucket: Systemstats/rp_720d
As a result, a copy of my existing dashboard can be used to efficiently graph the same statistics over a much longer time period, as well as being able to show median (p50
) and 99th percentiles for battery and panel output.
Alerting
The Soliscloud app will - in theory - display notifications if the inverter enters an alarm state (whether because there's an issue with the battery, or a power cut means that grid power has been lost etc).
However, I didn't want to rely on that alone - I've found that apps that only occasionally alert tend to fail silently on Android (in particular, overly-aggressive memory management at the OS level can cause issues).
The API exposes a numeric indication of the inverter's state:
So, it should have been quite straightforward to build a simple alert.
However, there was a complication - the values noted in the API doc don't align with the responses given by the API: My inverter consistently reports a state
of 3
which should mean it's in an Alarm state (uh-oh).
However, the inverter's working and neither the Soliscloud UI or the app report any alarms
So, the most likely explanation is that the values listed in the doc are wrong (and that 3 is Online).
Rather than relying on assumptions (or going off and deliberately causing alarms), it seemed prudent to instead build an alert based on steady-state: if the value reported for state
changes, we want an alert to fire.
I used the following InfluxQL query in Grafana's alerting to implement this
SELECT
sum("state")
FROM
(
SELECT
difference("state") AS "state"
FROM "Systemstats"."autogen"."solar_inverter"
WHERE
time > now() - 1h
)
GROUP BY time(10m)
The Alert is then configured to take the last 10 minute group and fire if the value of that group is not between 0
and 0.1
(any change will be a whole number, so will exceed this range)
To summarise, the alert:
- Queries the last hour of
state
readings - Uses
difference
to calculate the change in value (which should normally be0
) - Aggregates into six 10 minute windows, summing the differences in each group
- Takes the most recent 10 minute window
- Alerts if the total difference in that window is not between
0
and0.1
This means that the alert will fire for any state change, whether that's moving to an alarm state or recovering from one.
When an alert does fire, Grafana is able to send me notifications via PagerDuty, as well as by email
(I adjusted the query to add 10 onto the result to force an alarm, hence the high value)
The Soliscloud API does also have an alarmList
endpoint, so at some point I may extend the plugin to fetch alarm details, but for now the alert notification should serve as a prompt to open the app/login to the UI to see what the issue is.
Conclusion
Although the Soliscloud API has a few oddities, writing a plugin to get metric collection up and running was relatively straightforward. Feeding Solar generation and energy usage metrics into InfluxDB allows me to quite trivially track the efficiency of the system.
Having this data in a common location also means that it's possible to do more in-depth (and interesting) analysis, because it allows easy comparison between data sources. The ability to do that has already proven useful when looking at the counter initialisation issues with gridPurchasedTodayEnergy
, as I was able to correlate the energy flow values recorded by my Clamp meter to those being reported by Soliscloud's API.
Although an important step, getting the plugin up and running was really just the start of a wider range of solar related projects:
Solar panels lose some efficiency as temperatures rise, so I thought it might also be interesting to set up a dashboard or (more likely) a Jupyter notebook correlating panel output with solar energy and outdoor temperature in order to chart out the impact of changes in ambient temperature.
Because we track usage of our appliances, it should also be possible to write a notebook that can consume weather forecasts along with historic generation and weather data in order to (roughly) predict when best to turn appliances like the dishwasher on.
Although, for now, actually turning them on will have to be a manual action:
In the medium term though, my intention is to try and build automation to help with this load-shifting (particularly in winter), so that we try and use any excess energy rather than exporting to the grid only to have to buy it back at a higher price later in the day.
And, of course, it should also be possible to build something that takes the initial install cost as well as energy savings/export income into account to track our path towards return-on-investment.
Suffice to say, the metrics that we're now collecting have the potential to help keep me out of trouble for quite some time.