Node to Feed mapping - idea to save space

Submitted by chaveiro on Tue, 23/06/2015 - 18:50

So currently we have the notion of: Node, Input and Feed.

With this relationships:

Node to Input (1->Many)
Input has a processor that writes to multiple Feed (1->Many)

Assumptions:

Node data came from a bulk json post with all the node inputs values at the same timestamp
Inputs have processors running saving to multiple Feeds at the same timestamp.
Feed data is saved individually by engine with an associated timestamp (can be realtime, average, or aggregated).

What i'm thinking of is, if we could have the node notion context on the processor, we could have the same node context at the feed level for the same node.
Since the processors would be running on the node context it could see all nodes inputs data at the same timestamp, and a new feed object (lets call it) NodeFeed could be used with a shared timestamp for all its inputs that are based on the same time resolution.

We could have different NodeFeed objects with different time resolution coming from the same node processor.

The savings in data size are that for each node input data on the same time resolution would only require a unique timestamp.

In sql this can be translated as a NodeFeed table with a timestamp column and multiple columns for each of the nodes input data.

For example if you have a node publishing a bulk message to the server at regular interval with 1 voltage + 8 CT sensors + 1 Frequency + 8 power factor, currently we require 18 feeds data + timestamps = 36 'size unit' to save each sensor data to a feed. With this NodeFeed object we just require 19 'size units', a 47.22% saving.

What do you think about this idea?

Re: Node to Feed mapping - idea to save space

Submitted by pb66 on Wed, 24/06/2015 - 13:03.

"Node data came from a bulk json post with all the node inputs values at the same timestamp"

This isn't necessarily true as it has always been possible always post to named inputs using urls like http://emoncms.org/input/post.json?node=10&json={power:200,voltage:24000} etc but this method isn't compatible with bulk upload so the bulk upload was changed to accept missing data place holders so sending a bulk upload like [[100,,300,400],[,200,300,400],[100,200,,400]] would result in inputs 1 thru 3 being updated once only each and input 4, 3 times. I didn't include all the node ids and timestamps but these frames would normally each have a different timestamp so inputs 1,2&3 would all show different update times.

This was implemented among other reasons to allow MQTT to not force the use of a different input for each sensor, especially with the current 32 max. and individual values posted with the same timestamp would get processed the same way as a single complete frame.

Only the PHPtimeseries actually records a timestamp the phpfina feed is only the size of the data, the data's position in the file denotes it's timestamp relative to when the file was created, which is recorded in the metadata. So a table to replace a number of phpfina feeds would actually increase the size by adding an unused timestamp and where a fixed interval feed may only update once per hour or per day the all in one table would allocate space for that feed value at every update.

I do however think the concept of recording unprocessed input data in a table as you describe could have a massive benefit in that you could effectively "replay time" if down the line you find yourself thinking "I wish I had set up a feed that does xyz", perhaps because a new processor or feed engine gets developed you could actually build that feed from when you started collecting data to the present day. You could rebuild lost or corrupt feeds etc. This would only work with a table because as inputs get added a table can get horizontally bigger where as the feed files cannot, even a group feed per node won't work because of adding or altering individual inputs.

If using input data from a table, in SQL you can effectively do many calculations and graphs on the fly so keeping feeds is less important, And I did propose a change to Trystan some time ago that I would like to share, that was to include the processing chain in the feed.dat file so that in normal running a feed would be "triggered" directly either by data coming in on a particular input (phptimeseries) or every n seconds (phpfina or phpfiwa) and it would then use the current input data held in RAM to do whatever processing was defined to calculate and arrive at the value to be retained.

Creating a feed could be as easy as selecting a type of feed, not engine but type with processing included eg "export" would be allow negative x -1 append to feed. but with an SQL enabled emoncms you could also select a start time and date to "replay time" from the input table and voila a new "historical" feed. so the feeds could still be used to speed up SQL.

To accomadate the trend to move away from SQL to flat files, emoncms "lowwrite" could also just save all the input data to a csv file that could be opened in SQL if a situation demanded rather than a SQL table, making the low write version less dependent on SQL as currently the SQL tables are the link between inputs and feeds.

This could be a step towards a single emoncms version for all, sorry to hijack your thread but it would be really good to see the various emoncms's converge and since the mandatory use of SQL is contrary to Trystans lowwrite direction and the lack of SQL functionality is the main reason many of us don't use the low write version, including both as optional so that one or the other or both could be enabled in the settings would be a good way to go.

Paul

Re: Node to Feed mapping - idea to save space

Submitted by chaveiro on Wed, 24/06/2015 - 18:43.

Your ideas are interesting.

I'm running only on sql engine not only for historic reasons, but also convenience of data manipulation. Space and performance are no issues on my hosting. So have less knowledge of the other engines.

My criteria is the need to correlate different inputs for the same timestamp.

I want to add gps coordinates as an input, and the ideia of having 2 feeds for lon and lat is not the best. Dont want to use json nor strings in engine. In sql this are two columns. I may end up using a new input module for it, but would be good if the current input module supported it somehow out of the box.

Regarding post processing original data, i have the ideia of virtual feeds that apply logic in realtime to existing feeds data. Configurable like the input processor list but for each virtual feed, where the end value is not saved to anywhere but returned to the client requesting the virtual feed data.
From the consumer side virtual feeds are like any other existing feed.

Re: Node to Feed mapping - idea to save space

Submitted by pb66 on Thu, 25/06/2015 - 12:26.

Generally I think many users with databasing experience would be more comfortable with SQL if they already know it, only less tech savvy newcomers and those technical enough to embrace an application specific data storage solution would willingly give it up, given a choice.

I would like the best of both worlds (don't we all?) the power of SQL and the option to run from an sd card, but when forced to choose I think SQL wins as I can add a hdd at which point space is no longer an issue either.

I think we are thinking along the same lines but in different terms.

Your "single table for all feeds" idea I like but believe it should be unprocessed input values rather than feed values, which sort of ties in with your virtual feed idea's to not save processed data.

My idea of defining a feeds processing chain in it's feed.dat fits with your virtual feed configurations, although you don't want persistent data and I say make it optional, don't save 'one of' queries or log unnecessary calculated data on a powerful machine but do persist frequently used data on Pi's and busy servers etc.

So all the raw input values and the calculated "feed" values would be held in RAM and only written to disk if configured to do so on a per feed basis for feeds or a "sql vs no sql" basis for input values. obviously if the input value table isn't retained (ie non-sql mode) then on the fly calculations are restricted to using the feed values that have been retained, but each user can choose what to retain and even if SQL isn't enabled on a SD card install, a CSV file can be written to every few minutes and if feeds need to be created down the line simply start sql create the feed(s) from the csv and shut it down again.

"I may end up using a new input module for it" maybe that is the better way to go but rather than a new inputs module a new feeds or processing module? I believe inputs are just that "inputs" they should just receive the data, full stop, job done! inputs can be named, allocated units etc but each of the "collection of processes that defines a feed value" files (needs a better name I think) should be managed separately to the inputs as a "virtual feed" with or without a persistent feed attached.

Interesting stuff!!

Paul

Re: Node to Feed mapping - idea to save space

Submitted by chaveiro on Fri, 26/06/2015 - 17:25.

So i'm looking at emoncms code and it's doable to use the same code of process list edit of input module on the feeds module. For that i've started to modularize the processlist stuff on a new module called process.
Will report soon my tests ideas.

Re: Node to Feed mapping - idea to save space

Submitted by pb66 on Fri, 26/06/2015 - 18:30.

I'm looking forward to seeing what you have.

Paul

Re: Node to Feed mapping - idea to save space

Submitted by chaveiro on Sat, 11/07/2015 - 22:42.

I opted to abandon the initial idea of processing full node inputs as a whole, because would only benefit sql engine, in favor of virtual feed concept. Feeds that acts as a regular feed but data is calculated in realtime from existing feeds data.

See : http://openenergymonitor.org/emon/node/10977

Archived Forum