Thursday, August 28, 2014

Office hours...

Office hours...  Saw an article on this study, which looks at when people actually go to work (graph at right).  This immediately brought two thoughts to mind:

First, in my last job I kept very strange hours, which will surprise exactly nobody who knows me :)  I generally arrived at work between 5:00 am and 5:30 am (after a one hour drive), and left around 2:00 pm.  I was in almost perfect sync with people who worked on the East Coast, in a time zone +3 hours from mine.

Then I recalled one quite interesting problem I worked on at that same job.  Our product was a SaaS product (otherwise known as a web application), and generally people logged into it when then came into work, and remained logged into it all day long.  Each of our customers had a separate instance of our product.  One of our large customers – out of hundreds of customers – was having a problem we were having trouble tracking down.  Every morning their system would start thrashing like mad: CPU utilization spiked, disk I/O spiked, and response times were in minutes instead of seconds.  After a half hour or so, every single morning, their system would settle down and start behaving normally.  Diagnostics showed nothing.  We swapped out hardware; no change.  People were pulling their hair out trying to figure it out, while the customer was issuing escalating threats on a daily basis.  I was one of the team troubleshooting the problem.  I was chasing down a strange symptom we observed, and happened to look at the table that contained login information – and noticed that almost 1,000 (of 1,200 total) users were logging in within a 3 minute period.  This seemed almost impossibly synchronized – how could 1,000 users all log in with such a synchronized schedule?  That question prompted a phone call to the customer, who verified that in fact all their employees did report to work in exactly this synchronized fashion.  About 90% of the employees were union members, and they clocked in precisely at 9:00 am, and out at 5:00 pm.  In fact, our contact at the company was surprised there was as much as a 3 minute spread in the logins, until he realized that many workers probably grabbed a cup of coffee on the way in.  Anyway, the high number of concurrent logins was the root of our issue.  We had the company solve it temporarily by telling their folks to log in on a staggered schedule (based on the last two digits of their SSN); then we fixed the concurrency issue with a simple queue mechanism.  No other customer had ever reported this issue, in years of experience with hundreds of thousands of users.  It's all the union's fault :)

No comments:

Post a Comment