Read-only image time issues

The latest SD card image has had some issues with the fake-hwclock and many of the problems encountered with using a read-only file system on a Pi are down to incorrect timestamps as either a network connection isn't available or the files are locked to read-only and cannot be updated. The latest image has the fake-hwclock.data file relocated to the writable data partition and a symlink in it's place. in most cases this works well but can cause an issue if events don't happen exactly as expected at the right times during boot up.

Having helped a few users with this issue I did some research and this is what I found.

When the ntp daemon is initiated at boot-up it will attempt to correct the current system time that was previously loaded from the fake-hwclock.data file using ntp servers. Only the initial synchronization attempts are able to overcome a difference greater than 1000 secs, if any subsequent attempts discover a difference of 1000 secs or more the ntp daemon will terminate and no longer monitor or correct the time as it assumes user intervention is required to resolve a greater issue than a minor alignment. This means if there is no internet connection at the time of the initial attempts, it is then unable to resolve any time difference unless the ntp daemon is restarted and performs a successful initial sync.

Providing the time difference remains less than 1000 secs the npt daemon will then continue to check and resync the time approximately every 5-6 mins (usually cycling 4 servers, each checked at up to 1024secs intervals). this time is held in memory and not affected directly by read-only access.

A cron job is run hourly to save the current system time to fake-hwclock.data. This does require write access so in read-only mode the "saved" time remains unchanged (unless rpi-rw is used and then left in RW mode), in the last pre-configured SD card image the fake-hwclock.data file was moved to the RW "data" partition and can be written to at anytime.

If the Pi is shutdown or rebooted correctly the current system time is saved to fake-hwclock.data during the shutdown process as well, depending on if write access is available at the time.

This operation generally doesn't present a problem, but what is happening is the Pi is not ready to follow symlinks during the early stages of booting up so it cannot see the symlinked "data" partition's copy of fake-hwclock.data, which then results in a unix time of 0 (1970) as the file appears empty and if the internet connection is not then available for that first time resync, ntp will exit at a subsequent check as 1970 til now is over 1000 secs.

Only rebooting the Pi or restarting ntpd with an active internet connection can correct the time once this happens.

I therefore recommend reverting fake-hwclock.data to its former position (only if it's located on the data partition) and then editing 2 files to momentarily provide write access to enable that file to update at hourly intervals and again at shutdown.

to do this first we need to switch from read-only mode to write mode using

rpi-rw

1) Delete the symlink and move the relocated fake-hwclock.data back to it's original location

sudo rm /etc/fake-hwclock.data

sudo mv /home/pi/data/fake-hwclock.data /etc/fake-hwclock.data

This should ensure the file can be found at boot time so the Pi shouldn't revert to unixtime 0 (1970) instead it should use the last saved time and date temporarily until the time is corrected by ntp (this may not be accurate but it should be more recent than 1970).

2) Enable the hourly cron job to back-up the current time by temporarily remounting the partition as RW.

sudo nano /etc/cron.hourly/fake-hwclock

   replace this line

fake-hwclock save

   with these 3

sudo mount -o remount,rw /dev/mmcblk0p2  /
fake-hwclock save
sudo mount -o remount,ro /dev/mmcblk0p2  /

By enabling the time to be back-up every hour any power interruptions can only result in a 1hr maximum (plus downtime) error in time at reboot until the time is corrected by ntp. Again not accurate but better than running full RO and going back to some much earlier point in the past or running full RW and risking damage to the SD card or file system.

3) Also enable the fake-hwclock service to save the current time at shutdown the same way.

sudo nano /etc/init.d/fake-hwclock

   and replace this line

fake-hwclock save;;

   with these 3

sudo mount -o remount,rw /dev/mmcblk0p2  /
fake-hwclock save
sudo mount -o remount,ro /dev/mmcblk0p2  /
;;

(edited to correct ";;" location) 

Whilst nothing can be done if there is a sudden loss of power this edit will mean a graceful reboot should only lose seconds.

These three steps should restore the Pi's timekeeping to it's former Raspbian standard and overcome any issues imposed by mounting the file system as read-only.

Restart the ntp daemon and return to read-only mode

sudo /etc/init.d/ntp restart
rpi-ro

Additionally as the accuracy of the data we collect is dependent on an accurate timestamp I suspect a 4th step may be needed to improve the Pi's timekeeping, this additional step could counteract delayed network connections at start up caused by a poor internet connection or even just a router taking longer than the Pi to reboot and get a connection after a power outage.

4) Create an init script to run a small utility at start-up to keep testing for an internet connection every 60secs or so and when successful, restart the ntpd service to initiate the first run again and then exit. there will be no further use for it unless the Pi is rebooted or the time gets over 1000 secs out of sync (if that's happening theres a bigger issue somewhere).

5) Possibly create a cron entry to re-start the utility and therefore the ntp daemon periodically

I have started writing a script for step 4 and will post it when i have completed it and done some testing, hopefully we can resolve this fully before the next image is compiled.

If you are not totally confident with doing these changes please hang on until a couple of other users have tried it and hopefully confirmed it's ok, I see no reason for it not to work but I really would not like to be responsible for any corrupted or loss of data.

So any volunteers ?

Paul

dBC's picture

Re: Read-only image time issues

Something doesn't sound quite right there.  It's true that ntpd will only give you one shot at a jump greater than 1000 secs(*), but you don't spend that one-shot until it has successfully set the time.   If ntpd has successfully changed the kernel time by more than 1000 seconds and it later decides it needs to do so again, then it exits.  But if it can't currently reach any servers, it just waits until it can.

You can verify that you don't waste your one-shot during network outages with this simple experiment:

1. stop ntpd

2. manually set the time to 1970

3. pull the ethernet cable

4. start ntpd

5. wait as long as you like

6. plug the ethernet cable back in

ntpd will then jump the kernel time all the way forward from 1970 to the correct time.  On some systems it might take a while, depending on local jitter.  On one of my machines (an Asus netbook) it takes about 5 mins before it decides it's going to trust the NTP servers.  But on everything else I've tried (including 2 Raspberry PIs running various distros) it latches on after just a couple of updates.

(*) you only get the one-shot if you're running ntpd with the -g option, which most  Linux distros do.  If you repeat the above experiment without the -g,  ntpd will exit after step 6 when it hears from the NTP servers and decides the jump is too large. See the man page excerpt below.   Most Linux distros set the ntpd switches in /etc/default/ntp

What do you do with the driftfile when you mount readonly?   On most Linux distros it lives in /var/lib/ntp, but it's location is specified in /etc/ntp.conf with the driftfile directive.  ntpd writes to that every hour (unless the value it wants to write hasn't changed).  

man ntpd:

-g     Normally, ntpd exits with a message to the system log if the offset exceeds the  panic  threshold,  which  is
        1000 s by default.  This option allows the time to be set to any value without restriction; however, this can
         happen only once.  If the threshold is exceeded after that, ntpd will exit with a message to the system  log.
         This option can be used with the -q and -x options.

 

man ntp.conf

driftfile driftfile
              This  command  specifies the name of the file use to record the frequency offset of the local clock oscillator.
              If the file exists, it is read at startup in order to set the initial frequency offset and  then  updated  once
              per  hour with the current frequency offset computed by the daemon.  If the file does not exist or this command
              is not given, the initial frequency offset is assumed to be zero.  In this case, it may take some hours for the
              frequency to stabilize and the residual timing errors to subside.

              The  file  format  consists  of a single line containing a single floating point number, which records the fre‐
              quency offset measured in parts-per-million (PPM).  The file is updated by  first  writing  the  current  drift
              value  into  a  temporary  file and then renaming this file to replace the old version.  This implies that ntpd
              must have write permission for the directory the drift file is located in, and that file system links, symbolic
              or otherwise, should be avoided.

 

 

pb66's picture

Re: Read-only image time issues

I was of the same opinion that "Something doesn't sound quite right there" despite my doubt, my understanding was the ntpd -g was not repeated after a successful internet connection was established, although I could not find anything official to absolutely verify this one way or the other I did trawl through many forum discussions, guides and tutorials to arrive at that conclusion, it makes far more sense to only adjust the time after successfully confirming the correct time with live ntp servers over the internet.

This however doesn't explain why several users have had a system time/date of 1970 for prolonged periods of time (days) despite having ssh access and posting data to emoncms.org, as barmy as my findings seemed it fitted the issue.

I could only find limited info on the Raspbian ntp implementation and the fake-hwclock's involvement. What I couldn't establish specifically was if the loading of the "fake time" at boot effected the one-shot.

I'm not able to do the tests at the moment but will try them at a later date, in the meantime I will assume you are correct as you appear to know the system well and it's certainly more feasible for it to work that way. If the one-shot is retained until a working internet connection is available that's great as my suggested steps 4 & 5 will not be required.

The main issue though, is that the Pi is apparently unable to read the symlinked copy of the fake-hwclock.file at boot up which results in a 0 timestamp, this seems to get rectified if there is an internet connection available, but data can be disrupted in the interim even when there is a good internet connection, obviously the longer it takes the worse the impact will be.

See attached log1 this is an excerpt from a emonhub.log from a Pi running the latest SD image and has a perfect internet connection. The timestamps are good to begin with and @2014-09-21 14:07:19,967 a reboot command is issued, emonhub closes, the Pi reboots and emonhub restarts, emonhub's log messages start at 1970-01-01 01:00:58,039 (unixtime 58.039) and around 12 seconds later the time sets to 2014-09-21 14:08:44,914. The fake-hwclock on this Pi was on the writable data partition and would of been saved at shutdown, this was confirmed to be the case that the file is being written but then ignored at boot.

Log2 is from a read only system that did not have the fake-hwclock.data file on a writable partition. however the Pi was in RW mode until it was restarted at around midnight the previous evening (power outage) therefore the "saved time" was 00:17:02. Following a reboot at around 15:50 the initial time is the "saved time" not 1970 for a few seconds and then gets corrected very quickly by ntp.

Log3 is the second Pi again a few minutes later, it is put in RW mode before rebooting again and it clearly saves the time at shutdown and loads it as expected at boot.

The suggested steps 1 to 3 will allow the fake-hwclock.data file to be found at boot, plus allow it to be saved hourly and at shutdown,

I don't think the drift time is catered for in the read only image, the file is located  /var/lib/ntp/ntp.drift which is not currently writable, I'm guessing the impact of that is quite minimal, but I may look into it once we get the Pi to reliably report the correct millennium :-)

Paul

dBC's picture

Re: Read-only image time issues

I'm not particularly familiar with fake-hwclock (none of my distros include it).  If I get time tomorrow I'll play with it.  From what I can find on the web, it's a way of loading the kernel time with an approx time (+/- an hour) rather than using 1970, when ntp hasn't done its stuff yet.   Personally, I prefer my clocks to display 1970 when they're wrong, that way I know they need attention, but each to their own.

When one is stuck in the 70's have you been able to determine whether or not ntpd is still running?  If it is running and just not syncing, ntpq is an good utility for poking around inside ntpd's state and looking at its associations and peers.  

I don't think the drift time is catered for in the read only image, the file is located  /var/lib/ntp/ntp.drift which is not currently writable, I'm guessing the impact of that is quite minimal,

I'm not sure if it's relevant or not, but I suspect if you did have a valid driftfile, ntpd might latch onto the NTP servers faster (in the case where the network is up).

P.S.  Do you have ntpdate installed?  That will immediately force the system time to what the NTP servers say, without waiting around for ntpd to sync up  (again only useful in the case where the network is up).

pb66's picture

Re: Read-only image time issues

Fake-hwclock seems simple enough http://manpages.ubuntu.com/manpages/raring/man8/fake-hwclock.8.html#contenttoc5 (not using ubuntu, but fake-hwclock seems quite uniform across distros) it just saves the current time to file hourly by cron and at shutdown. That time is then supposed to be loaded at boot. I assume the symlink is the hurdle.

"Personally, I prefer my clocks to display 1970 when they're wrong, that way I know they need attention, but each to their own"

I can see the benefit there and would normally agree but then I would have to implement some sort of test before allowing emonhub, emoncms or any other time recording software to start and that could result in lost data rather than slightly inaccurately timestamped data, (always seem to be deciding the lesser evils)

Because emonhub either buffers the incorrectly timestamped data until a network connection is available or in the case of a local emoncms which doesn't need a network connection to receive from a local emonHub, the data with the wrong timestamp can play havoc on some feeds. for example a whaccumulator that suddenly goes back in time by a few days or hours due to a reboot, 

Having now read a bit about the drift file I think you are right the valid drift file would result in a quicker correction as an absent file causes lengthy calculations to begin that can cause huge delays to corrections. I have checked all my P's and they all appear to have a file but it is not writable on the RO ones, so i assume the file is created during the initial set up stages and only updated if it happens to try and update while I have the Pi in RW mode for another reason. So I guess in my cases the Pi's will quickly establish a slightly inaccurate time (+/- 20ppm is not alot) another lesser evil trade off!

Ideally I would like to make the drift file RW momentarily to save but cannot find where this is done. It would be pointless symlinking it to "data" as would creating a file in RAM at boot as this would initiate the calculating at each boot.

I don't have ntpdate installed as apparently it will not work after ntpd is running so I would effectively only mimic what ntpd -g does currently unless ntpd is stopped and restarted which in itself would rerun the "one-hit" and would need a trigger once the network is up which is where I was going with step 4.

Having only read up on this over the weekend I haven't established if ntpd is running when these issues occur, I don't actually experience the issues on my Pi's but several other users have and as the issue seem to always be perceived as an emonhub error I though I would have a go at resolving it , but to do that I need to set up a Pi running the pre-configured image with a screen and keyboard to be able to monitor the effects of no network as ssh will include a network connection unless I disconnect my whole network from the outside world.

 

 

dBC's picture

Re: Read-only image time issues

Ideally I would like to make the drift file RW momentarily to save but cannot find where this is done.

It's rewritten by ntpd every hour (regardless of whether it previously existed).  So in the case of a virgin system you've just installed, the file will first appear one hour after you first install and run ntpd.

dBC's picture

Re: Read-only image time issues

I don't have ntpdate installed as apparently it will not work after ntpd is running so I would effectively only mimic what ntpd -g does currently unless ntpd is stopped and restarted

On all the machines I have ntpdate installed, it puts a script called ntpdate into /etc/network/if-up.d (scripts in that directory get executed when an interface comes up).  That script stops ntpd, runs the ntpdate binary, and then restarts ntpd.

ntpdate is a lot more brute-force than ntpd.  ntpdate pretty much grabs the time from an NTP server and slams it into the system clock, with very few questions asked.  By comparison, ntpd sits in the background listening to NTP server messages, runs filters on them, and if you're lucky, will eventually decide to sync to one of them.. but this can take tens of minutes.  From there on in, it's very good at tracking those servers and keeping your system time extremely accurate. 

So the two work very nicely together.  ntpdate will force your system time to be fairly correct pretty much immediately, and ntpd will then get it (and keep it) extremely correct after that.

Here's an example log of me bringing my desktop machine out of suspend at 21:02:

Oct  6 21:02:13 ntpd[21608]: ntpd exiting on signal 15
Oct  6 21:02:28 ntpdate[22939]: adjust time server 59.167.227.65 offset 0.373472 sec
Oct  6 21:02:28 ntpd[22966]: ntpd 4.2.6p3@1.2290-o Tue Jun  5 20:12:08 UTC 2012 (1)
Oct  6 21:02:28 ntpd[22967]: proto: precision = 0.111 usec

You can see it killed ntpd, then it ran ntpdate to brute-force my system time (which had drifted by 0.37 secs while suspended), then it restarted ntpd.   ntpd then took a full 4 minutes before it sync'd to an NTP server (not shown in the logs).  You can see ntpdate took just 15 seconds to fetch the time and fix my system time.

pb66's picture

Re: Read-only image time issues

Ok so ntpdate could force a correction at anytime as and when the network comes up,

How would that be affected by an active network connection without internet access ? for example a power outage following which the Pi is quick to reboot and a router could provide an ip address but an external connection to the internet takes a bit longer to establish. The script I had written tests connectivity by trying to open a webpage, I'm guessing the if-up.d could get caught out here.

Although once triggered my script restarts ntpd I suppose it could be used to run ntpdate instead.

The ntp.drift file is still equally important regardless of how the initial time is established I imagine, if the file is missing the calcs will take considerably longer to fine tune the time.

I can only think of one way around it and that is to create a /var/lib/ntp directory in RAM using an entry in the fstab ( currently done for /var/logs) and as the Pi is booting but before the ntpd is started a ntp.drift.BAK file is copied to /var/lib/ntp/ntp.drift for ntpd to find and keep updated and the hourly cron job that currently saves fake.hwclock.data gets extended to save the ntp.drift file to ntp.drift.BAK and the same for saving at shutdown so that there is always a recent drift value available at boot up.

Even if a RTC was fitted to removed the need for fake-hwclock, the drift file would probally still need to be functional.

If there is a recent fake-hwclock.data entry and a recent ntp,drift file is there a need for the ntpdate as well do you think? ntpd's gradual changes have their advantages as well eg less chance of skipping a scheduled cron job or jumping forwards or backwards in time while mid measurement.

pb66's picture

Re: Read-only image time issues

There again if ntpdate is used the drift at that time would be 0 so a drift file could just be created with a zero value to prevent  the longer ntp calcs starting, then there would be no need to back it up just the RAM copy would do.

dBC's picture

Re: Read-only image time issues

Ok so ntpdate could force a correction at anytime as and when the network comes up,  How would that be affected by an active network connection without internet access ?

Just as your script is polling a webpage, ntpdate is polling an NTP server.   In the man page you'll find several switches that let you control its behaviour in that regard:

-p samples
              Specify the number of samples to be acquired from each server as the integer samples, with values from 1  to  8
              inclusive. The default is 4.

-t timeout
              Specify the maximum time waiting for a server response as the value timeout, in seconds and fraction. The value
              is is rounded to a multiple of 0.2 seconds. The default is 1 second, a value suitable for polling across a LAN.

 

If there is a recent fake-hwclock.data entry and a recent ntp,drift file is there a need for the ntpdate as well do you think? ntpd's gradual changes...

Utlimately I guess that's a design decision for your server, but in general I think the answer is yes.  The options' features are:

fake-hwclock:  instant time setting, no network required, accuracy to within about an hour

real-hwclock:  instant time setting, no network required, accuracy to within tens of seconds

ntpdate:  ~15 secs to set the time, network required, accuracy to within hundred of msecs

ntpd:   tens of minutes to set time, network required, accuracy to within one or two msecs

 

On an always-on server the first 3 only happen after a major event, i.e. a reboot or a LAN failure.  The general consensus of most distro designs is you want to get the server time as accurate as you can as quickly as you can, and then fine tune it, so they do real-hwclock, ntpdate, ntpd in that order.   Given you have to use fake-hwclock with its incredibly poor accuracy, I would think ntpdate would be even more important.

But you're right to consider all the possible failure modes and how you want your server to behave under those conditions. For example, if the ethernet cable has fallen out of the RPi the interface will never come up, but if it's receiving data via RF and has local storage, you might still want it to log.  Or your DSL service might be down, so the interface comes up, but ntpdate and ntpd will never get you the correct time.... etc. etc.

 

There again if ntpdate is used the drift at that time would be 0

The driftfile doesn't contain the difference between your time and the NTP server's time (that difference is known as "offset" in NTP parlance).  The driftfile contains the frequency adjustment that needs to be applied to your local crystal in PPM.  As ntpd runs it effectively calibrates your crystal for you.  That has two benefits:

. if you fall off the network, time will continue to advance based on your adjusted crystal

. next time you restart, ntpd can immediately be using your adjusted crystal while it tries to sync to the servers

 

a drift file could just be created with a zero value to prevent  the longer ntp calcs starting

A driftfile containing 0 is identical to a missing driftfile to ntpd (see man page entry from an earlier post above).

dBC's picture

Re: Read-only image time issues

I wonder if there's any path through that maze of scripts that can end up writing 1970 to the fake h/w clock, and then periodically loading the fake h/w clock into the system time, behind ntpd's back.  That would be a way to use up your one-shot big jump.

pb66's picture

Re: Read-only image time issues

As I recall, the cases that had 1970 stick for a prolonged time were run for a while without an internet connection. Although unsure of the exact details and durations etc I suspect the symlinked fake-hwclock was not accessible at boot so the 0 unix time stamp was loaded and then if no working internet connection was provided for the ntpd to rectify the time within 1 hour of booting up the 1970's time and date would get saved to fake-hwclock and there have been instances where the fake-hwclock.data has been confirmed as a 1970's timestamp.

The steps 1 to 3 above are definitely going to over come the initial 1970's problem and in the majority of cases, the Pi will function as expected.

A sturdier solution is still required for Pi's with weak or intermittent network connections, like ntpdate, my script or both maybe, I'm not sure until I can get a test rig set up to try a few things. The ntpdate script may poll the ntp servers once it has been triggered by the initial network connection but that may not be when the internet connection is established, by polling a site until a reply is obtained and then running ntpdate the time is sure to be corrected unless the connection is lost again within a second or 2 and not regained.

Although I was aware the drift value wasn't the difference. I had thought it was the required frequency correction to align the current and correct times gradually and therefore would be zero if a correction is forced. So the drift value ideally needs to be made "editable" using something like previously suggested, but as a last resort could be left as is so that it is always micro seconds out but doesn't restart the lengthy initial calcs which could delay any adjustments being made at all.

I think as the pre-configured image will have a date already set in fake-hwclock it would be good advice for users to initially connect to the internet for a period or to set the clock manually and reboot or run sudo fake-hwclock save to refresh the fake-hwclock ensuring no matter what the Pi will not then revert back before that date.

EnergyRnR's picture

Re: Read-only image time issues

Paul,

 I can confirm your 1st para above; this is precisely what I've validated to be happening for me here.

And, it's a good idea that to 'deploy' this emonbase, we need an install doc or procedure that requires the user to get the date correct as they set it up. 

I expect to be going to my 'offline' emonbase tomorrow so it'll be interesting to see if I can get it working without access to internet. My worry would be if there were a power outage, it'll need manual intervention to recover.... Admittedly this completely offline use case is a severe example but we all like a challenge :-)

Thanks a lot for going to such trouble investigating this very interesting issue.

I've implemented your steps 1-3 on another system I have here. The 'last updated' status on the input page is performing a lot better ; i.e. there are 2 nodes, and both are updating very quickly. I was seeing long delays for one of the nodes. There are of course other variables, such as whether doors are closed etc but it's definitely much 'cleaner' just from watching the input page update.

Eamonn.

pb66's picture

Re: Read-only image time issues

Thanks for the feedback Eamonn, In the case of your offline Pi, doing steps 1 to 3 will be enough to allow the Pi to count from the time and date you set and store that value at hourly intervals and at shutdown. This means the accuracy will not be corrected and may loss or gain a few seconds, not the end of the world. But a sudden power loss will result in a time loss of up to 1 hr plus outage time, so a prolonged outage will disrupt the validity of the ongoing data as would a series of momentary interruptions eg a momentary outage just before the "hourly save" will set the timekeeping back an hour to the previously saved time.

Unless you can be sure the power is totally reliable I would recommend a RTC as a power outage may be un traceable and invalidate all your data as the Pi is none the wiser and keeps logging from the last saved time after a reboot as if time stood still, days weeks or months later you would just see a time or date difference but the data probably won't show the outages and there is no way for it to alert you at the time either.

EnergyRnR's picture

Re: Read-only image time issues

agreed ; I do need to get a config with the RTC for the offline scenario. 

Eamonn

dBC's picture

Re: Read-only image time issues

I had thought it was the required frequency correction to align the current and correct times gradually and therefore would be zero if a correction is forced.

Ah ok.  ntpd is constantly trying to fix the frequency and fix the offset.  The driftfile is where it keeps track of the frequency fix it's calculated for this machine. So even if you somehow did get the system time perfect  such that no offset adjustment was currently required, the value stored in driftfile wouldn't change.

Whether or not it can write the file, internally it continues to use the value it wants to use (and would have written). So not being able to write to the file will only affect things next time it starts up.  Instead of starting with the frequency tweak it's learned over the months/years it's been watching, it has to start over.  On all the machines I've ever looked at, once it's been up for a few days (months?) the contents of that file barely changes.  But it does vary a lot from machine to machine of course.  

So maybe you can design something with your readonly Vs read-write filesystem to take advantage of that, i.e. you could perhaps get away with writing it once after ntpd has been sync'd for a few days, and then burning that value into the readonly version for use from them on (but on a per-machine basis, not globally)

pb66's picture

Re: Read-only image time issues

Finding a point to intervene in the drift file saving that isn't too complex and minimal opportunity to introduce side effects could be tricky. Where as the fake-hwclock uses cron, so a couple of simple edits to remount as RW momentarily works, it appears the ntpd code does the hourly drift file save directly, that is an assumption based on the fact I couldn't find a cron for it and I'm unable to view the ntp code to confirm.

I'm not sure if I would be able to query the drift value from ntp and then write it to a file, where as if the ntp.drift file is moved to RAM ntpd can update it as usual and that file can then just be copied to a RO location that can be copied back at boot time.

Since the ntpd saves once an hour only if the value has changed I feel that is probably the safest route for the "fix" to take and amalgamate the hourly back-up of the current time and drift value (if != existing back-up) into one momentary remount cycle.

I need to experiment a bit really...

dBC's picture

Re: Read-only image time issues

it appears the ntpd code does the hourly drift file save directly

Yes, it does.

 

I'm unable to view the ntp code to confirm

http://www.ntp.org/downloads.html

 

I'm not sure if I would be able to query the drift value from ntp

You can do it from C by calling ntp_adjtime(), or from a script with the kerninfo command in ntpdc and parsing the output for the item labeled "pll frequency":

# ntpdc -c kerninfo | grep "pll frequency"
pll frequency:        -11.870 ppm
#
# more /var/lib/ntp/ntp.drift
-11.869
#

In the above example, my ntp.drift file is about 40 minutes old and you can see it's already changed its mind (ever so slightly) about what the value should be.

[EDIT]

I feel that is probably the safest route for the "fix" to take

Don't interpret the details I've added above as an attempt to talk you out of your proposed solution.  If anything, I probably agree with you that it's the safest route.  I was just filling in the bits you were unsure about so you've got plenty of options to choose from.

pb66's picture

Re: Read-only image time issues

Fantastic so something as simple as 

ntpdc -c kerninfo | grep "pll frequency" | awk -F' ' '{print $3}' > /var/lib/ntp/ntp.drift

would just overwrite the existing value in ntp.drift, if run as root by cron.

and If that could be improved further to only over write if the value is not equal to the existing contents I think that would be much better than creating a file in RAM and copying at boot.

 

 

TrystanLea's picture

Re: Read-only image time issues

To echo Eamonn, thanks a lot for looking into this Paul, dbC, I think im just about following.

pb66's picture

Re: Read-only image time issues

So something like this one line added to both the fake-hwclock hourly cron and the ntpd shutdown script

dval=`ntpdc -c kerninfo | grep "pll frequency" | awk -F' ' '{print $3}'`; dfile=/var/lib/ntp/ntp.drift; if [ ! $dval = $(< $dfile) ]; then echo $dval > $dfile; fi

should save the drift value  to file, but only if it has changed, while the fake-hwclock routine has the file system mounted as RW momentarily.

shell isn't my strong point so I welcome any corrections if that syntax can be improved, although I have just tested the logic and it seems to function correctly I have not actually tried it in place yet. 

pb66's picture

Re: Read-only image time issues

I still don't have a Pi set up that I can remove the network connection from (as I run them headless), but I have now added these mods to a read-only Pi to see if it works ok normally.

Step 2 adds a neat safety function as every hour the system will be switched to RO regardless of whether it is RO or RW  at the time, handy for those of us that never remember to use rpi-ro after making changes :-)

dBC's picture

Re: Read-only image time issues

A couple of corner cases you might want to test against:

1. if ntpd has exited for some reason, then ntpdc will output an error: "ntpdc: read: Connection refused", so you'd want to make sure you don't clobber your good drift file with a bad one (or null one) in that case

2. it's also quite legal for the drfit file not to exist, in which case you want to create it, so you'd want to make sure your "has it changed?" test doesn't prevent that.

While it's true that ntpd does include that "has it changed?" optimisation before it rewrites ntp.drift every hour, in practice I think it's a code path rarely exercised.  On all my machines, I can't find a single one where ntp.drift is older than 1 hour.  Even on machines that have been up a long time, by the time an hour has come around it almost always wants to tweak the value, even if only by 0.001 as in the example I pasted above.  So if it makes it easier (or more robust) you could drop that test and I suspect it would make little difference to how often the file is written.

pb66's picture

Re: Read-only image time issues

You're right, if the drift value is most likely going to be different more often than not, then the "if changed" may introduce more issues than it helps ie not creating the file if it doesn't exist yet. i like the simplicity of "just copy it regardless" although as you point out ntpd may not be running for example if a difference of 1000 secs is detected after the "one big step" it will close.

Would a simple check to see if the ntpd.pid file exists be sufficient ?

if [ -f /var/run/ntpd.pid ] ; then
  dval=`ntpdc -c kerninfo | grep "pll frequency" | awk -F' ' '{print $3}'`
  dfile=/var/lib/ntp/ntp.drift
  echo $dval > $dfile
fi

I am also considering creating a cron and init script called something like "ntp-backup" so that the existing file(s) can be deleted, disabled or just ignored so that our mods are independent and unlikely to be altered by updated packages or in the case of someone fitting a RTC and removing/deleting the fake-hwclock as per most guides instruct, the drift file will still work and if fake-hwclock is re-installed/re-enabled it will work with the RO file system again.

solutions101's picture

Re: Read-only image time issues

Thanks for that

 

now i am seeing a timestamp on my inputs, however im not getting any reports from my CT inputs...

 

here is an output from my logs, i have increased the log level :

pi@raspberrypi:~$ tail /var/log/emonhub/emonhub.log
2014-10-09 13:11:35,983 DEBUG 8 Append to 'emonCMS' buffer => time: 1412860295.92, data: [10, 0, 0, 0, 0, 25433, 0], ref: 8
2014-10-09 13:11:36,089 INFO Sending: http://localhost/emoncms/input/bulk.json?apikey=E-M-O-N-C-M-S-A-P-I-K-E-Y&data=[[1412860295.92,10,0,0,0,0,25433,0]]&sentat=1412860296
2014-10-09 13:11:37,251 DEBUG Receipt acknowledged with 'ok' from http://localhost/emoncms
2014-10-09 13:11:46,763 DEBUG 9 NEW FRAME : 1412860306.76  10 0 0 0 0 0 0 0 0 103 99 0 0
2014-10-09 13:11:46,768 DEBUG 9 Timestamp : 1412860306.76
2014-10-09 13:11:46,771 DEBUG 9      Node : 10
2014-10-09 13:11:46,773 DEBUG 9    Values : [0, 0, 0, 0, 25447, 0]
2014-10-09 13:11:46,803 DEBUG 9 Append to 'emonCMS' buffer => time: 1412860306.76, data: [10, 0, 0, 0, 0, 25447, 0], ref: 9
2014-10-09 13:11:46,906 INFO Sending: http://localhost/emoncms/input/bulk.json?apikey=E-M-O-N-C-M-S-A-P-I-K-E-Y&data=[[1412860306.76,10,0,0,0,0,25447,0]]&sentat=1412860306
2014-10-09 13:11:47,385 DEBUG Receipt acknowledged with 'ok' from http://localhost/emoncms

 

the mains power voltage seems to work fine, just the CT come back with no readings.

 

please advise

 

thanks

 

dBC's picture

Re: Read-only image time issues

Would a simple check to see if the ntpd.pid file exists be sufficient ?

No, the pid file only gets cleaned up if ntpd is shutdtown gracefully by management.  If the daemon just quietly exits, the stale pid file will remain around forever (well until it's replaced by a new one next time it gets restarted).  It's probably better to do a sanity check on dval before you write it to the file.  In the case where the daemon is not running, dval should end up empty after your assignment because the grep will fail to find anything.

pb66's picture

Re: Read-only image time issues

Ok so a more reliable check would be to just check $dval for content, makes sense and it's simpler, win-win ! I'll take another look at doing this over the weekend hopefully and get a spare screen down from the loft to run some tests, thanks, I really appreciate all your help with this!

pb66's picture

Re: Read-only image time issues

Although still being tested I have packaged this up so that it can be implemented with one command from the home directory when in RW mode (rpi-rw) and so far, it seems to work ok. 

git clone https://github.com/emonhub/ntp-backup.git && ~/ntp-backup/install

EnergyRnR's picture

Re: Read-only image time issues

nice one. spoiling us . .... Thanks!

pb66's picture

Re: Read-only image time issues

No worries!

Let us know how you get on if you use it, I've tidied up some permissions etc this morning and tested some more, but you can't beat real world testing.

 

pb66's picture

Re: Read-only image time issues

I've modified this a little to be less sensitive to where it is initiated from. it does still need to be in "write" mode before running 

git clone https://github.com/emonhub/ntp-backup.git ~/ntp-backup && ~/ntp-backup/install

pb66's picture

Re: Read-only image time issues

doh!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.