I'm seeing quite a number of updates & requests on the forums to do with looking at derived values based on data in feeds and presented different ways (Max, Min, Average temp and kWh Delta are the ones that come to mind first). Then on the other side, 'Delta' added to some (but not all) of the individual visualizations.
I'm wondering if there's a nice way we can separate and generalize some of these features.
The Virtual Feeds concept seems like it might be applicable; if we could define a virtual feed that would do these calculations, then we could look at them using any of the existing visualizations.
So, for example, we'd want something that would compute a delta, given starting point and a period of time. So a kWh/d virtual feed would use the kWh feed, starting at midnight (hopefully I can specify local or UTC...), for 24 hour periods. Or I might want a weekly one, starting on Monday midnight CET, for a 7 day period. And the same for daily (or weekly, or annual) mean (or max or min) temperatures (or min/max/avg W).
Then, in the background, we could choose to optimze and/or cache the calculations. Delta may be OK to calculate real time (we are just read 2 numbers & subtracting), but max/min/avg might need the sampling process. Or we could allow even a virtual feed to have a storage mechanism that would cache results. E.g. daily max temp for today needs calculated by reading all the data points. But outside the current 'period' the value isn't ever going to change, so could be calculated once properly (using all data points) and cached. Calculating an annual mean could take a while - but if the annual mean was instead described as the yearly mean of another virual feed that was the daily mean, it would only be looking at 364 cached data points, plus all the data points for the current day.
As another generalization, if we could add 'export' as a 'visualization', then that could be applied to any of the virtual feeds. We could think about text or Excel as well as CSV.
Opinions?
Re: Some Thoughts on Separating Visualizations, Data Definitions & Calculation
Hello blaal02, Im currently writing a post processing module for emoncms which will be able to be used initially for calculating kWh data from historic power data. But could be easily extended in the way you describe to calculate other more complicated processed outputs such as monthly averages (correct for timezones) or histograms, min, max etc.
The big advantage over input processing is that you dont have to setup everything at the start and by carrying this kind of thing out as a post processing step it significantly reduces write load as data can be written to disk in large blocks (rather than millions of small writes) - it may not even need to be written to disk as you point out such a processed output could be cached in memory as it can be rebuilt from base data at any time.
Here's the progress I've made so far, while it works provisionally, it needs a service script and full installation and use documentation: It can calculate cumulative kwh data from power data, solar export data from solar and consumption input feeds and trim the start of a feed to delete old data.
http://github.com/emoncms/postprocess
Re: Some Thoughts on Separating Visualizations, Data Definitions & Calculation
Hi Trystan,
Post-processing may be the way to go. So long as the processing is run at least as often as the shortest 'period' that's being calculated over, then the only difference will be the final data point (or maybe two) that represents the current, not yet complete period, and perhaps the prior one that's complete but the process hasn't quite run for it yet.
As I read this, the results of the post-processing is just another feed that can be used by any of the visualizations?
Post-processing & virtual feeds seem to be rather similar, in the sense that both can be added "after the fact", and both create additional feeds that pass to normal visualizations.
So, thoughts about the user interface... 2 UIs describing the same kind of result (differing just in the back end way of getting there) seems a little weird. Maybe rename Virtual Feeds to Calculated? Or Derived? Then have a flag to indicate whether it's dynamically calculated, or post-processed? Or even decide automatically what the "best" implementation of a particular feed is?
The "caching" I'd been thinking about was actually storing (on "disk") the results of anything that wasn't going to change next time it was requested - which would basically be exactly the same as your post-processing results.
Re: Some Thoughts on Separating Visualizations, Data Definitions & Calculation
Hello blaal02, from what I understand virtual feeds would not be able to do the kind of calculations required to convert things like power data to kWh data, from what chaveiro describes here https://openenergymonitor.org/emon/node/10977 it can do a sampled approach to averaging that might be quite close to the actual average but its not quite the same as running a process through the entire source feed. Although it does look like it might be sufficient for summing, subtracting feeds, Il admit I havent really used this feature. In my mind it was something else - so perhaps it can do some of the things the post processing module might do but not others.
Perhaps there is an opportunity to combine it in some way, maybe chaveiro can add to the discussion on this.
From my perspective both looking at my current requirements and also taking a longer view the main outputs I want are the ability to
Im working on updating the apps module and documentation for home energy monitoring and the solar pv monitor and Id like to make it as easy as possible for people with a lot of existing data to convert their data over to the new feeds, and standardised engines required for these app modules. Trying to develop the app modules to support every feed engine and kwh cumulative or daily data results in endless bugs and support difficulty - so trying to sort that out is my main focus at the moment and the reason Im developing the post processing module.
Re: Some Thoughts on Separating Visualizations, Data Definitions & Calculation
Once a feed is converted using the post processing module you could either chose to keep running the post processor before you want to view the data which would be a write efficient appraoch for systems running of SD cards but perhaps more likely is that you use the post processor to build the processed feeds you need from historic source data and then once built you connect those feeds back to the source inputs with standard input processing so that they continue to update as new data comes in - which I think addresses your first point " So long as the processing is run at least as often as the shortest 'period' that's being calculated over"
Re: Some Thoughts on Separating Visualizations, Data Definitions & Calculation
Hi Trystan, I'm with you, I've used Virtual Feeds for simple things like adding my 2 split phases (US) to get a total supply. And C to F calculations. But I've seen references in there to 'Schedules', and I was wondering if that could be used (or abused!) to be the user interface for those 'periods' that we are wanting things delta'd, min'd, max'd, mean'd over - I don't think it can do that now, but could maybe be 'adapted'.
From an user perspective, I'm thinking this could be how they'd describe what they want to see. Internally, I guess the 'ideal' would be all of the things we've been talking about! When the feed is used (at 'visualization' time), it would look at all the historical data that had been saved by the post-processing, and calculate anything that was beyond that. Depending on what calculation was needed, this could be the sampling process, just for that display. Or it could be a 'proper' process using all the data - and that could be added to the stored data (at least for the 'completed' periods), or perhaps it could dynamically run your post-processing procedures to get everything current, then show it. Or for some simple calculations, even keep it totally virtual, read original base data every time.
But, as principles go, I'm again with you - log the basic "raw" data from the sensors. Then separately (perhaps much later, and with several iterations) define what we'd like to actually see on the graphs, but have these all go back historically. Also, it will be good if we can come up with general tools/calculations that can be used to come up with everything you need - but can be (hopefully) be used for other things too.
For kWh/d processing, going forward you are thinking that we'd log cumulative kWh, then calculate the periodic deltas at display time? But for people that didn't log cumulative kWh originally, you want the post-processing to be able to back into this from kWh/d data that they did log?
Thanks, Sandy
Re: Some Thoughts on Separating Visualizations, Data Definitions & Calculation
I've just read this topic...
Trystan i think what you are describing as post processing is in reality off-line processing of historical data.
Virtual feeds do post processing in realtime.
It has 2 ways of calculating data: sampling or absolute, default config is to be absolute that is slower, but precise.
The concept of virtual feeds is to be used for data visualization, so they are feeds (virtual) that can be inquired via a graphic widget. When you move around the graph, the data is calculated just for that time window you are viewing.
The engine behind supports being called from other code besides graphics and can be used for other purposes. Say for example, you want to make a manual job where the user sets a processlist, defines a start and end time and output the result to a new feed. 90% of the work is already coded in virtual feeds logic.
Post processing in realtime can't be used to read a large time range of data because it can mean MBs of information and it's slow.
Off-line post processing can, because you don't need to wait for it to be ready. You use the data after conversion ends.
Some of the use cases can be done already with virtual feeds realtime post processing.
Yes:
Yes, with a dedicated processor and on a limited time range (or it becomes very slow):
No, because of the need to read full source feed data, and it's slow: