emonBase (OKG Wiznet W5200) works for 1-10 minutes before stopping

I've recently built an emonTx and emonBase (emonGLCD and second emonTx await their build!).

emonBase is receiving data from emonTx (no CT hooked up yet but I can see power data: 3312mV).

emonBase is successfully sending this data to emoncms.org and I can graph it.

However after between 1 and 10 minutes, emonBase just stops.  I'm monitoring the serial output and have put something in the loop to output free memory every second.  I get 543 bytes free consistently before the loop stops.  The loop stops at seemingly random points (e.g. after time has been requested from CMS, after time has been received from CMS, etc.).  Hitting the reset button on the board kicks it off again and it works for another limited time...

The sketch I'm using can be seen here:-

https://github.com/baldrick/Open-Kontrol-Gateway/blob/master/OKG_Wiz5200_RFM12B_emoncms_multinode/OKG_Wiz5200_RFM12B_emoncms_multinode.ino

The apiurl & timeurl values have been updated in my local repo but I haven't uploaded those to github for obvious reasons ;-)

This is a fork of the standard sketch with a few extra bits of logging (via Serial.print) to see if I could work out what was going on.  The standard sketch exhibits the same random-stoppage behaviour but gives less of a hint as to what's going on)...  But now I'm stuck.

Once in this state the OKG remains on the network and responds to pings.

Has anyone else seen this behaviour?  Any idea what it might be or how I can go about debugging / fixing it?

Robert Wall's picture

Re: emonBase (OKG Wiznet W5200) works for 1-10 minutes before stopping

Two suggestions to try:

  1. Set the highest baud rate you have for serial comms.
  2. Don't use serial comms at all (for ease of use wrap all the comms statements in #ifdef SERIALCOMMS ... #endif and then define it when debugging and leave it out otherwise.

I've not noticed any stability issues - certainly none inside a 10 minute window - but I'm doing both the above.

Lord Baldrick's picture

Re: emonBase (OKG Wiznet W5200) works for 1-10 minutes before stopping

Thanks for the suggestions Robert.  I haven't been able to look at this stuff for a bit as am recovering from an op last Weds ;-)

However I did finish my GLCD today so thought I'd give the OKG another go tonight...

What is the highest baud rate I can use for serial comms?  9600 works (i.e. I see sensible messages when using the Arduino IDE Tools > Serial Monitor.  But a couple of other values I tried (57600 & 115200) gave me gibberish in the monitor.  I'm using a relatively new (<18months old) PC so would be incredulous if that's at fault ;-)

I've wrapped all the Serial output in #ifdef's as you suggested and left the OKG running with them switched off (so no serial output ... although I've left the programmer connected, does that matter? ... I'm trying again with it unplugged anyway).  Of course this means the only way I can monitor whether it's working is via flashes of the LED I've put in and seeing data on emoncms.org.

The way the LEDs are flashing I wonder whether the setup() function is being called multiple times (which I also saw before ... although I assumed this is what the "Reset Capacitor Fix" fixed)...

Even so, after a while I get the same behaviour as before: the OKG seemingly crashes after a few minutes.  No LED flashes, no data on emoncms.org and "11min baseFail" on my newly-built GLCD.

 

Any thoughts how I should go about figuring out what's wrong?

Robert Wall's picture

Re: emonBase (OKG Wiznet W5200) works for 1-10 minutes before stopping

I regularly use 115200 baud - on a Windows XP laptop, so that's not likely to be a problem unless you have a ridiculously long cable (and mine is something like 4 - 5 m!). (You did change the baud rate in the sketch "Serial.begin(115200);" as well as in the IDE?). I can't easily test my OKG now at 9600 baud, but I had a problem with something resetting until I upped the baud rate. I wish I could remember exactly what!

If the setup function is being called regularly, it means either:
the watchdog (if you have one) is firing, which means something isn't completing inside the acceptable time limit,
or it means the power supply is falling over and the processor is resetting. What power supply are you using, and is it stable and noise-free? (There's something on this site about switched-mode PSU quality - check it out).

Is anything getting hot that shouldn't be? Are the voltages - there are 3 regulated supplies - within limits?

As far as I can determine, the "Reset Capacitor fix" corrects a timing problem when programming via the FTDI interface. I don't think it is likely to be related to your problem.

I've been using the standard Github sketch -"OKG_Wiz5200_RFM12B_emoncms_multinode" which I think yours is derived from - but with ALL the serial comms removed as I've said.

If I'm able, tomorrow I'll try mine with low speed comms and see if I can get it to fail.

Lord Baldrick's picture

Re: emonBase (OKG Wiznet W5200) works for 1-10 minutes before stopping

Ah, no, I'd only changed speed in the sketch. I'll take a look at the IDE tomorrow, cheers.

PSU an old one I dug out of a drawer so could be at fault.  Another thing on the list for tomorrow!  Thx.

Lord Baldrick's picture

Re: emonBase (OKG Wiznet W5200) works for 1-10 minutes before stopping

Yes my sketch is derived from the standard OKG_Wiz5200_RFM12B_emoncms_multinode.

I'm not using the watchdog (at least there's no wdt_enable in my sketch ... nor the standard one AFAICS).

Ok, serial speed now at 115200 and being monitored without gibberish ;-)

I don't think anything's getting particularly hot ... a couple of components (voltage regulator, processor) are ~4-5 deg C over ambient (25-26 vs 21) and it looks like the Wiznet is even warmer (29-30 deg C) but that doesn't feel like a big deal to me.  Is it?

In fact it's now been running ok for a little over 10 minutes with no problems (and serial comms on...) ... I think I'll leave it to see how long it lasts!

Lord Baldrick's picture

Re: emonBase (OKG Wiznet W5200) works for 1-10 minutes before stopping

Hmm... It seems it lasted about 20 minutes before "crashing".  Temperatures are about the same as before.

Does the Wiznet chip operate independently of the processor?  I ask because the OKG remains on the network and responds to pings after the crash...

I guess next I'll break open the multimeter and check voltages again ... and replace that PSU!

Robert Wall's picture

Re: emonBase (OKG Wiznet W5200) works for 1-10 minutes before stopping

Yes, the Wiznet module has its own processor. Those temperatures aren't alarming enough to worry about, though I'll check mine when I next fire it up (the system is still in development mode: it is going to be protracted because deployment is 300 miles away, so I need to get it right!).

The power supply looks to be the best place to look at the moment. See here: http://openenergymonitor.blogspot.co.uk/2011/08/not-all-usb-power-suppli...

Robert Wall's picture

Re: emonBase (OKG Wiznet W5200) works for 1-10 minutes before stopping

Just looking through the sketch, there's a potential bug where the time is being parsed in line 161 of OKG_Wiz5200_RFM12B_emoncms_multinode.ino

      char tmp[] = {line_buf[1],line_buf[2]};

That declares the array with two elements. It depends on the next location in memory being a NULL to correctly terminate the string ready for atoi() to convert it to an integer. It should therefore be:

      char tmp[] = {line_buf[1],line_buf[2],0};

I've no idea whether that is your problem, but it could well be a problem.

Lord Baldrick's picture

Re: emonBase (OKG Wiznet W5200) works for 1-10 minutes before stopping

You're right.  I made the fix you suggested and uploaded the new sketch ... but soon experienced the same problem.

I wondered whether another similar bug exists when sending the time over the air so I added ,0 to give:-

char data[] = {'t',hour,minute,second,0};

Unfortunately this doesn't seem to have solved my problem ... although I do now see I receive a fair number of failed RF packets (even with no emonTx or emonGLCD powered up!) ... by failed I mean non-zero rf12_crc (at which point I assume rf12_hdr, and indeed any data, is garbage).  Perhaps an artifact of living in a congested air space (London)?

I'm hoping a new power supply will fix my problem.  Will update once I've sourced one ;-)

Robert Wall's picture

Re: emonBase (OKG Wiznet W5200) works for 1-10 minutes before stopping

"char data[] = {'t',hour,minute,second,0};"

No, I think that one is safe because it isn't being processed as a string - at least in the base.

I was trying without success to make mine fall over, sending data slowly to the serial port didn't work, but when the emoncms server itself fell over, I had to stop. When Trystan gets that going again, I'll have another try (or I'll have a go at setting up my own WAMP server, but for that I need to write a time server - argh! ).

"Will update once I've sourced [a power supply]"  A tried, tested and approved one is here. (Requires one of these too).

Lord Baldrick's picture

Re: emonBase (OKG Wiznet W5200) works for 1-10 minutes before stopping

Ah, you're right, that one is safe ... but because its size is passed to rf12_sendStart.

Good luck setting it up on WAMP ... in my experience you're better off with LAMP ... even if that means running a Linux VM within Windows under VirtualBox or similar ;-)  My assumption then is that you won't need to write a time server :)

I've ordered one of these as it's half the price and free shipping ... looks identical ;-)  I'll run it off my PC since I'll most likely keep the OKG next to it ... so no separate PSU required (USB voltage was very stable when powering the emonTx ... if I still get problems I'll go the whole hog and spring for the separate brick!).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.