Friday, February 16, 2018

Mystery solved...

Mystery solved...  For a couple of years now, we've had a problem with three different digital clocks.  All of them mysteriously ran too fast, but intermittently.  Over the course of a month, they might gain a half hour or so.  The thing that's really weird about this is that three completely different digital clocks – different brands, different circuitry, different chips at the core – all did the same thing.  Also, the problem happened whether the clock was in our house or in my barn office – and those two places have different electrical supplies (even different transformers!).  I still suspected something amiss with our AC power supply, but then when I ran one from my UPS (a model that is continuously supplying inverted power) it still had the same behavior.  I gave up, and started to think of explanations that included Harry Potter and divine intervention.

But then a few weeks ago, the problem suddenly stopped.  WTF?  Three separate clocks having a problem, and all of a sudden they all work?  How could this be??

A few days ago I had the first thought of a possible explanation.  The clocks all started working about the same time that the power company fixed the top part of our power pole (blogged here) – could that be what fixed my clocks?  I didn't know exactly what they did beyond replacing the top crossbar.  So I called the power company, hoping I might get some answers.  And I did!  Turns out they keep good records of what was found and what was done – and one of the notes they'd made on that day said “heavily corroded HV connection”.  The “HV” means “high voltage”, the input into the transformers for my house and my barn – and those two transformers shared that same HV connection.  Ah ha!  Now there was a possibility!  A heavily corroded connection might exhibit intermittent connections when, for example, the wind shook the power pole a bit. 

So I rigged up a bit of an experiment with one of the clocks.  I wired up an outlet whose power came through a wire that I cut, stripped, and then bound together with a rubber band.  Then I tried wiggling that joint.  Lo and behold, I was able to replicate the peculiar behavior.  The clock didn't lose power, but it did pick up the intermittent connection as though it was additional cycles to be counted – and the result was the clock gained time, just as we used to observe.

I'm left with the mystery of why the problem still happened with a clock connected to my UPS.  I think the most likely explanation is that the intermittent connection issue was transmitted through the UPS as coupled noise – which is certainly disappointing.  The UPS works fine if I switch off its primary power, so I know the inverter it contains is working properly.  I see no glitches on its output (using a 'scope) if I flip its primary power on and off.  Nonetheless, the clocks malfunctioned when running on it prior to January 11. 

I have to conclude that for our entire time in this house, up until the repairs on January 11, we've had flaky power and didn't even know it!  The clocks have been rock-solid now for over a month.  I'm very glad to have the problem fixed, but I'm still amazed that we didn't even know we had a problem!


2 comments:

  1. Most UPS just pass through power when it’s on; they are not using the inverter.

    Interestingly, AC can be quite inaccurate over short periods of time but very accurate over long period is time because they count cycles, compare to the atomic clock, and adjust. Conversely, crystal clocks are more accurate over short periods of time, but because any inaccuracy accumulates, less accurate over the long haul.

    ReplyDelete
  2. Hi, Dithermaster! You found my silly little blog! :)

    Here's the UPSJ in question. It is an actual online UPS, with the inverter supplying power all the time. There is a bypass relay, but the way I have it configured, it's only engaged manually. Since I wrote this post, I've now repeated my intermittent connection test by making the input to the UPS intermittent – and it does couple the noise, greatly attenuated, right through the UPS. With a 'scope on the output, I'm seeing 50 to 80 volts PtP noise on the output, mostly around the 60 Hz zero-crossing – quite likely to be interpreted as cycles by the cycle-counting circuitry in my clocks. I'm very disappointed by that (lack of) performance on the UPS. All my computer gear is powered by switching power supplies with hold-up caps plenty big enough to handle that noise, so I'm not too worried about them (and they're the main reason I have the UPS). Still, I'd have expected better from Leibert...

    In a past life, late '90s and early '00s, I used to run a medium sized datacenter (~2,000 servers) for two companies doing stock, option, and bond trading (including algorithmic trading). You can probably imagine they were quite interested in uptime. :) I learned way more about two things than I ever wanted to know: UPS/generator systems, and dual Internet connections. Despite having a UPS/generator system that cost millions of dollars, we still experienced high failure rates, approaching 30%. Worse, we had two fires (one in the battery room, the other in an ATS). Amongst the lessons I learned there: bypass UPSs can be counted on to fail (I exaggerate, but not by too much!), and don't believe a word the UPS vendors claim about dwell time. For my tiny “datacentger” here, I'm running that UPS at 1/3 rated load and I have a propane backup generator with a 30 second start time. I only need a dwell time of 60 seconds or so to feel safe from power problems (frequent here, unfortunately).

    The cycle-adjustment that the power companies do, I've only recently discovered, is only really true for power companies that are participating in the large geographic grid interconnections. We have quite a few local (town-owned) systems here that have no interconnect – and most of them don't count cycles at all. There's one just four miles north of me (the town of Hyrum) where part of the system (just the city properties these days, 500MW total) is powered by a 101 year old hydro turbine. I got to visit it last year, and it was a real hoot to see that ancient thing. Anyway, the turbine speed is controlled by an old-fashion spinning-ball governor (really!) that controls the water gate feeding water to the turbine. The operator there told me the frequency varied from about 52 Hz to about 74 Hz, and was really bad (slow) at adjusting to abrupt load changes. Cycle-counting was definitely not part of their operation! :)

    I did something fun last year related to time-keeping: I rigged up a GPS-referenced NTP server on a Raspberry Pi, so I've got my own stratum 1 NTP server on-premises. Total overkill to be sure, and I have absolutely no practical purpose for this – but I've got sub-millisecond time error on my computers here! When I was running that datacenter I spent an inordinate amount of time on keeping track of time. It's critical in the trading world that your timestamps be accurate – and even more critical for the algorithmic trading platforms. I was CTO at that company, and I joked that it meant Chief Time Officer, 'cause that's what I spent so much of my efforts on. The main thing I did, though, was to exterminate all the custom time-tracking stuff a predecessor had written and embedded in our code, and replace it with a much simpler NTP system, based off a triply-redundant cell-tower referenced clock. Back in those days it was surprisingly difficult to do this; modern electronics has made it almost trivial...

    ReplyDelete