Recently in Downtime Category
February 23, 2013
Server Shutdowns Redux
- anatolius
- bayes
- freesurface
- earnserv1
- gosset
- inviscid
- earnserv2
February 22, 2013
Servers Back Up
- anatolius
- bayes
- earnserv1
- freesurface
- inviscid
- gosset
- earnserv2
February 21, 2013
Server Shutdowns - update no. 1
- gosset
- earnserv1
Server Shutdowns
- anatolius
- bayes
- freesurface
- gosset
- inviscid
- earnserv1
- earnserv2
January 1, 2013
Main server down for a bit
December 13, 2012
Anatolius Down Friday Afternoon
November 19, 2012
Anatolius Down for Upgrades Monday, Nov. 19
November 5, 2012
Anatolius Down for Upgrades on Thursday

Mathematicians (Photo credit: KennethMoyle)
October 12, 2012
Server Back to Normal
- email should be responding normally;
- files in home directories are editable;
- workstations will allow logins.
Server Update
Server Problems - what's happening
August 28, 2012
Bayes Still Down
August 27, 2012
Server Upgrades
August 25, 2012
Mail Flowing Again
Systems Coming Up Post-Power-Outage
August 24, 2012
Change of Plans re. Servers During Power Outage
August 22, 2012
Saturday's Power Outage & Our Systems
- the network in Hamilton Hall is on emergency power and will remain up; MacSecure wireless might well be available, too;
- linux and Macintosh workstations managed by RHPCS will shutdown automatically at 4 am;
- the main server has limited battery backup and will remain up until about 9 am on Saturday;
- these servers will be shutdown on Friday at 10 pm
- anatolius
- inviscid
- earnserv1
- earnserv2
- these servers will remain up but will not have access to home directories once the main server is shut down and jobs running from /home may fail (jobs running from the local disk - i.e. /1/home or /scratch - will be OK);
- bayes
- gosset
- freesurface
January 26, 2012
Bayes Still Down
December 20, 2011
Brief Downtime Wednesday Morning
Partial Service Interruption at 4:40 pm Today
November 16, 2011
More about email during downtime
Mail is up Wednesday morning; other files to come
November 15, 2011
Actually, file array looks bad after all
- web sites remain up
- workstations can be used with the tempuser account
- mail forwarding works (for those who had forwarding set)
- home directories are accessible via sftp and Windows file sharing
- printing works from ms and from the tempuser accounts on the workstations
File array recovered, but ...
November 14, 2011
File array on its way back up
- new mail will start flowing again
- workstations will work
- one disk failed at about 6pm on Sunday; this did not affect any services
- a second disk failed at about 8pm that night; this took the storage array off line
- web sites were back up at 10:15 pm on Sunday and have been up since
- most workstations are usable for web, printing via the account tempuser (contact me or Sheree for the password)
- mail from before Sunday at 3:00 am is available via
- http://mathmail.mcmaster.ca
- imap/pop mail clients
- home directory files are accessible read-only via Windows file sharing
- smb://ms.mcmaster.ca (in OS X and linux)
- \\ms.mcmaster.ca\ (in Windows)
November 13, 2011
New Server Problem Sunday Night
Back to Normal Sunday Afternoon
The new battery-backup unit - put in place on Saturday - should prevent a power problem such as we encountered on Friday evening.
November 12, 2011
Bayes, freesurface back up
Shutdown Sunday Afternoon
November 11, 2011
System Problems Friday Evening
November 8, 2011
Bayes down for maintenance Monday afternoon
August 24, 2011
File-server Problem Wednesday Afternoon
May 3, 2011
Service Interruption Wednesday Morning
Facility Services will be testing emergency power in Hamilton Hall at 7:30 am on Wednesday May 4th. This will affect the UPS (i.e. battery-backup unit) which gave us trouble on May 2nd and April 15th - and we can't trust this unit to coast thorough the power interruption.
I'm going to perform a prophylactic shutdown of the systems on that UPS unit at 7:25 am. Everything should be up again at 7:35 am.
The main server will not be shut down, but mail and web will be unavailable because the mail file array will be.
Note the faulty UPS will be replaced in the next week or so; the replacement will require about 30 minutes of downtime.
May 2, 2011
Power Failure Monday Morning
We had a server-room power failure this morning - web, email and workstations were unavailable between 8:37 AM and 9:15 AM. Just as was the case on April 15th, the main server wasn't down put the main storage array was.
This second failure confirms that we have a problem with one of our battery-backup units; I will be replacing it ASAP.
April 15, 2011
Possible Service Interruptions
We are still trying to pinpoint the source of the partial power failure in the server room earlier this afternoon. We know that something went wrong with a UPS unit which has served us faithfully for six years now, but we don't know precisely what.
Depending on what we find, we may need to shut down the storage array and compute servers with very little notice. And a similar power failure might be possible, too.
There may be loss of access to home directories and interruptions to mail and web service with little or (should the power fail) no notice. So: save early and save often. I'll post an update once we know that things are stable again.
Server/Power Problem Friday Afternoon
We lost power to part of the Hamilton Hall server room on Friday afternoon just before 3:00 pm. The main server wasn't affected, but the main storage array was, which means that mail, web and workstations were unavailable until the problem was corrected. Web sites were back up by half past three, but other services were spotty until about four o'clock.
There was no damage to the files on the storage array, though some mail may have been returned to senders as undeliverable.
Most workstations will need to be rebooted (Alt-Ctrl-F1 then Alt-Ctrl-Del); some will need to be restarted (hold power button for ten seconds to turn off then turn back on).
Any jobs running on bayes, gosset or freesurface will have been lost as those servers were connected to the part of the power system which failed.
April 8, 2011
Partial Shutdown on Saturday, April 30th
Facility Services has announced that air conditioning to Hamilton Hall will be turned off from 6:00 am to 4:30 pm on Saturday, April 30th. In order to prevent damage from overheating, we will be shutting down most systems in the server room on Friday afternoon: this means bayes, gosset, freesurface, etc.
I will leave the main file/web/mail server up, but if the room starts getting too hot I will shutdown everything but web services (no email, no workstations, no changes to the web server).
Announcement from FS follows...
April 6, 2011
Post-Downtime Update
Our main server (ms.mcmaster.ca) is now back in HH after a few months in the ABB server room and is using a new, larger disk array, also in HH (we were borrowing space in ABB while ms was there). Thanks for your patience as we completed another part of the migration to new server infrastructure.
A few notes regarding the downtime and recovery ...
- contrary to my plan, the xguest login on the ms workstations did not work
- the downtime extended to 8:45 pm instead of 7:00 pm
- the web server was down from 4:55 pm to 5:20 pm
- the main page (and other database-driven pages) were down for another hour
- other sites (e.g. course and instructor pages) were OK
- most workstations are working fine as of 9 o'clock Wednesday morning, but a few will need to be rebooted; if your workstation is frozen
- hold down the power button for ten seconds to turn it off
- wait five seconds
- turn it back on
- note that the boot may take five minutes or so while the disk is checked for errors
April 5, 2011
Mail Hiccoughs
Some of you will be having trouble getting to your mail via imap clients or webmail until later on this evening: I neglected to redirect the mathmail.mcmaster.ca to the new network location of the server. My apologies.
Note that anyone using the addresses mail.math.mcmaster.ca or ms.mcmaster.ca won't see these problems - though mathmail.mcmaster.ca is the preferred address.
Downtime Extended but Web Sites Up
Downtime This Evening - Changes
The scheduled downtime from 4:45 pm to 7:00 pm this evening will proceed as planned but with these differences ...
- the www.math.mcmaster.ca web site will not be down for more than a few seconds (though you won't be able to make changes during this period)
- limited-use guest accounts will be available on the most workstations
April 4, 2011
Dowtime Tuesday Afternoon
We will be moving our main server and storage array from their temporary berth in ABB back to our HH server room on Tuesday. Email, workstation and home-directory access will be down from 4:45 pm to 7:00 pm on Tuesday; web sites will be down from ca. 6:50 pm to 7:00 pm.
If all goes well, I should have the linux workstations set up so that you can login and run a browser without logging into the server; you won't be able to read your mail @math.mcmaster.ca or get to your files, though.
March 29, 2011
Servers Were Offline for ca. 3 hrs Monday Evening
A few servers were off-line from 5:45 pm to 9:05 pm Monday, March 28th due to a network-configuration problem introduced by some new equipment installed that afternoon. The affected systems were
- freesurface
- gosset
- webwork
- earnserv2
March 18, 2011
Downtime 4:30 pm - 5:00 pm Monday, March 28th
The main server and all services (web, mail, workstation) will be down from 4:30 pm to 5:00 pm on Monday, March 28th while I move the hardware to a new location.
February 24, 2011
Systems Up Again Following Scheduled Downtime
We're up an running again as of 5:30 pm - which means that we were down for 90 minutes instead of the announced 30 minutes. While we had the system off-line, we moved to a larger storage system. So we're now running with more than twice the storage, double the RAM and twelve CPUs instead of eight.
While email and workstations were down for the entire period, web sites were up and down a few times - I had them up and running whenever I could safely do so.
February 23, 2011
Downtime Thursday Afternoon
The main server will go down at 4:00 pm on Thursday. The server itself should be up again almost immediately but it may take up to half an hour for all services to resume (mail, web, workstations, etc.).
February 17, 2011
Server Problems - sort of
The ms server and the workstations have been agonizingly slow (at best) since about 8:15 this morning. A disk on our main storage array failed and the array was hobbled (in "degraded mode", for those who follow these sorts of things). We do not yet know why performance was as miserable as it was - it should have been poor, not horrible.
There were three interruptions of five to ten minutes as I sought the cause of the problem - working on the invalid assumption that it was our server again.
The disk has been replaced and the storage array is rebuilding itself. Performance is going to be poor until the rebuild is complete.
Workstations may need to be rebooted if they have got confused over the state of the links to the home directories (though I have forced a refresh remotely on all systems which were responding).
February 16, 2011
More About Web Sites During Server Problems
I stated in an earlier post today that "some web sites were partially down". I've had some questions about what that means, precisely.
All web sites hosted on ms.mcmaster.ca were down from 4:30 to 6:15 yesterday evening.
From 6:15 to 9:00 pm, many pages on the main math web site (the official-looking blue pages) were failing; other sites (e.g. iidda.mcmaster.ca, mathmail.mcmaster.ca) were OK, as were personal and course pages on www.math.mcmaster.ca.
From 9:00 pm yesterday to 9:45 am today, all www.math.mcmaster.ca pages were working from on campus and from VPN connections, but not from off campus. As of 9:50 am today, things were back to normal.
Delayed Mail Delivery
You may notice that some mail is arriving later than expected or in the wrong order. That's because mail which could not be delivered earlier when the server was busy or down was held upstream for a few hours before delivery was attempted again.
Lordy - Server Sorted Out
Ok - that was no fun. My clever-clever hop from one piece of hardware to another yesterday evening went from bad to worse: server performance was periodically horrible and some web sites were partially down.
We're now back to running perfectly well and normally on some borrowed hardware while I get this sorted out ... "this" being "being able to swap server hardware quickly and without significant downtime, frustration and grey hairs".
We will try the switch again in a few days - most likely Saturday afternoon.
Note that there is no worry of data or mail loss.
February 15, 2011
Server Up But with Some Web Problems
The half-hour of downtime scheduled for 4:30 this afternoon extended to nearly two hours: a theoretically routine hardware switchover wasn't. The upside is that we learned some new things about iSCSI storage arrays. The downside was ... well, two hours of downtime.
I am having a very unexpected problem with the web server: the main www.math.mcmaster.ca is failing, though other sites on the same server (mathmail.mcmaster.ca, wiki.math.mcmaster.ca, iidda.mcmaster.ca), personal sites (www.math.mcmaster.ca/matt etc.) and course sites (e.g. www.math.mcmaster.ca/S1cc3) are all fine.
Downtime This Afternoon
I'm going to take the main server off-line for about half an hour this afternoon starting at 4:30. I've been trying to keep the downtime required for this upgrade to a minimum and to off hours, but as time is pressing, we're going to have this daytime interruption.
Workstation, printing and email access will be shut off during most of this period. I will keep the web sites up for as much of the period as possible.
February 13, 2011
Systems Back Up
The downtime early Sunday afternoon lasted a little longer than I expected and was a little downer than I expected: all systems served by ms were down from ca. 2:15 pm to 3:00 pm. (mail and web were intermittently down between noon and 2:00 pm).
Everything is now back up.
Most workstations will probably need to be rebooted in order to work properly.
February 12, 2011
I still have a little more testing to do before finalizing some server upgrades. I will be taking services off-line between 11 am and 1 pm on Sunday. Web sites will stay up (read-only) with only very brief interruptions. Workstation, email and printer access will be down for five to 30 minutes at a time during this period.
February 11, 2011
Downtime Saturday Afternoon
I didn't finish the update work during the downtime scheduled for Thursday afternoon - nor was there any downtime to speak of. I will be taking services off-line between 3pm and 5pm on Saturday. Web sites will stay up (read-only) with only very brief interruptions. Workstation, email and printer access will be down for five to 30 minutes at a time during this period.
February 9, 2011
Downtime Thursday Morning and Evening
I'm going to be taking services off-line for about one hour on Thursday and Friday mornings so that I can complete some server work. The Thursday outage will start at 7:30 am and will affect workstations and mail intermittently; web sites be largely unaffected. The Friday downtime will start at 7:00 am and will affect workstations and mail; web sites will be up most of the time.
November 29, 2010
Two Power Failures Last Friday
There were two power failures last Friday morning: the first at ca. 1:15 am was brief and only took out systems which were not on battery backup. The second one, at ca. 8:50 am, lasted about ten minutes and took out most systems on battery backup as well.
My thanks to the sysadmins who helped pick up the pieces in Math & Stats while I was enjoying my holiday in blissful ignorance of the problems here on campus.
November 12, 2010
We're OK after a rough week, computerwise
We've experience more than our share of computer woes this week: in the department, across campus, and beyond. Most things have settled down as of late Thursday evening.
In Math & Stats
Our new main server has been just fine following a shake-in period earlier in the term, but a new interim storage system failed on Wednesday and Thursday; web sites were up most of the time, but email and workstations were out of commission for several hours. We're using some borrowed file space for now.The HH-303 printer has gone from bad to worse: I disabled printing on Tuesday while I get HP to deal with this properly. Scanning still works just fine. The HH-403 printer is the main alternative.
Campus-wide and Beyond
There was a fourteen-hour campus Internet outage which started on Monday afternoon - this was due to cut cable some 10 kms from campus.Tuesday saw problems with some UTS systems such as univmail, resulting (as I understand it) from the strain of catching up after the Internet outage.
November 11, 2010
Systems Back Up
All systems are back up as of 9 p.m. using borrowed file-storage space. We're still shuffling files around and so the server and workstations will feel a little bit slow for a few hours.
We have no reason to believe that any data was lost and post-poned mail deliveries should be finished come Friday morning.
Note that while email and workstations were down much of the afternoon and evening, the web sites have been up since 4:00 p.m. (and up and down earlier in the day).
We will be transitioning to a permanent storage space of our own next week; brief downtime periods will be announced in advance.
Almost Up
We'll should have everything running again some time later this evening using the borrowed file-server space. There might be some brief un-announced downtime on late this evening or Friday. There will be plenty of notice before we shift back to our own upgraded storage in a week or so.
It might be necessary to reboot your workstation: just press Alt-Ctrl-F1 and then Alt-Ctrl-Del.
Note that web sites are already up - we were able to shift those to another file server much earlier.
Server Problems Cnt'd: Web up; mail & workstations down
The flakey (though new) storage server continues to be flakey and will not stay up long enough for us to get the file updates to the fail-over storage. We have disabled logins and email for the next hour or so.
Web sites remain up using a different file server - though changes made this morning are not reflected as we are using last night's backups.
Storage-Server Problems
While our new server is stable, we are having repeated problems with a borrowed storage server: it crashed yesterday afternoon and again this morning, taking email, web sites and the workstations down with it.
As we speak, we are getting a fail-over system ready ... two, actually. Workstation and mail performance will suffer while we are copying data from the current system.
There will be brief periods of downtime without advance warning so that we can take the unreliable storage system out of play as soon as possible.
Note that you can subscribe to Computing News blog entries to keep abreast of service announcements - see the SUBSCRIBE VIA EMAIL in the right-hand column.
October 18, 2010
Server and Systems Back Up
That took longer than the planned fifteen minutes, but we are now running ms on more powerful hardware (more processors, more RAM). We will be monitoring performance.
Thanks for bearing with us.
Server & Workstation Downtime Monday Afternoon
Because the main server is struggling so under the new configuration (introduced last week), we are going to make a change this afternoon which we believe will provide immediate improvement.
The main server, ms, will be down for about fifteen minutes, during which time the workstations, email and www.math.mcmaster.ca will not work. The change will happen some time between 3:00 and 4:30.
It should not be necessary to reboot your linux workstation afterwards.
October 14, 2010
Servers Back Up
The ten-minutes of downtime scheduled for 4:45 actually took 20 minutes - but everything's back up and we are able to proceed with some upgrades.
Servers to Go Down Briefly at 4:45 Today
The main server, ms, and the compute server anatolius will go down for five to ten minutes at 4:45 today (Thursday). Workstations, websites, email and printing will be out of commission for that time.
Sorry for the short notice - this is a part of the longer-term server-upgrade plan which we are accelerating so outages such as the one earlier today are less likely.
Server Down Briefly
The main server, ms, was locked up from approx. 2:00 to 2:10 this afternoon. The Web server was down for a further five minutes. Everything's OK now.
October 1, 2010
Mail/Workstation Downtime Friday Morning
Access to mail and workstations will be down again from 6 to 7 Friday morning.
September 28, 2010
Mail/Workstation Downtime Wednesday & Thursday Mornings
I will be moving user home directories from mathserv to our new (borrowed, really) file server on Wednesday and Thursday mornings.
Between 6:00 and 7:30 on Wednesday and Thursday mornings, the following will be unavailable
- webmail
- access from mail clients
- ssh login (and thus pine, etc.)
- incoming mail (delivery will simply be deferred
- workstation access
- access to network printers
Web access will not be affected except briefly for sites in ~/public_html folders (while individual folders are in the process of being moved).
September 20, 2010
Server Reboot Today at 5 pm
I will be rebooting the new main server (ms.mcmaster.ca) today at 5 pm in order to implement a performance tweak. Web and email will be unavailable for two to five minutes.
Workstations may pause during this period.
September 17, 2010
Services Back Up; Possible File Loss
One of our two file servers - one which was to be taken off-line next week - failed rather spectacularly this afternoon. About half of our home directories (starting with m - z, mostly) were unavailable from 12:50 pm to 3:00 pm.
What Was Affected
Because I've not yet recovered the failed file system and may not be able to do so anytime soon, I've reverted to last night's backups. If your home directory was on that disk, you will have lost changes/additions to your files and email from ca. 2:30 am to 12:50 pm.All users whose files might have been affected will receive email from me with more information.
When (and if) I recover the failed file system, I will make files and mail boxes updated during that period available to you.
What Was Not Affected
Note that almost all of the web site was unaffected. Email sent between 12:50 pm and 3:00 pm will not have been lost but simply queued for later delivery. MS workstations still running the previous OS were down between 12:50 pm and 1:20 pm; systems running the new OS should not have been affected (except that some users could not login).Server Crash
So we're going through all of the grief of the upgrade to get off of an unstable server ... and that server crashed. The problem primarily affects people whose last name starts with m - z, though other people might see problems, too; e.g.
- workstations which have not been upgraded will likely need to be rebooted
- mail will be turned off periodically
We are working on the problem and will brings things back up ASAP.
September 15, 2010
Mail & Workstation Downtime Thursday Afternoon, Friday Morning
Email and the post-doc/grad-student linux workstations will unavailable between 5:00 pm and 6:00 pm Thursday, September 16th and between 7:00 am and 8:00 am on Friday, September 17th while we move over to the new file server. Web sites will stay up except for very brief interruptions.
Downtimes for Server Upgrades
Today: In-bound mail to @math.mcmaster.ca adresses will be paused from 3:30 pm to 4:00 pm today (Wednesday). You will still be able to access your inbox and mail folders via mail clients, pine, and web mail. If the out-bound/SMTP address of your mail client is mathmail or smtp1 (and not mail.math.mcmaster.ca or mathserv), then you will still be send mail, too.
Tomorrow: The workstations, email and access to home directories will be down from 4:30 pm to 6:00 pm tomorrow (Thursday). Web sites will still up, though with very brief interruptions.
After 4:00 today, all mail/spam processing will be handled by our new, faster server. After tomorrow at 6:00, we will be using a borrowed (and faster) file server so that we can upgrade our own file servers.
September 14, 2010
Server Reboot at Noon
The new server, ms.mcmaster.ca, will be rebooted at noon. Web and email will be down for a few minutes.
August 2, 2010
Mail/Login Problem for Some Users
Some time after the file server recovered on Saturday morning, there was a hiccough with the non-crashed server accessing home directories starting with a - l. This was easily corrected, but unfortunately was not caught not until after I returned from camping. As of 10:30 Monday, all home directories are accessible and mail is starting to flow to accounts starting with a - l.
July 31, 2010
Server & Services Backup Saturday Morning
The server which failed is back up as of 8:00 AM today. No files were lost. Queued mail for people with usernames m - z is now flowing, although filtering and delivery of the thousands of queued messages (mostly spam, of course) may take hour or so.
The source problem will not be corrected until after the long weekend and so the server must be considered unstable for the next few days.
Server-Problem Update
The failed file server is back up as of 12:45 am Saturday but it will take several hours for the file systems to recover; I am going to let that process finish and then perform backups before I bring them back on-line.
Web services will continue to work. Mail and login will be available for people with usernames starting with a - l. MS workstations as well as mail/login access for everyone else will be down until some time midmorning on Saturday.
The ultimate source of the problem is still unknown, but is related to power supplies. The file server must be considered unstable for the next few days.
July 30, 2010
Server Down Friday Evening
One of our file servers went down Friday evening. I'm not sure when it will be back up but I will have an updated notice up early Saturday morning.
July 22, 2010
Post-Downtime Update
The scheduled downtime went as planned and the servers and workstation were back on-line as of 11:05 AM. Thanks for your patience.
July 19, 2010
Downtime Thursday Morning
The Math & Stats servers will be down from 9 AM - 11 AM on Thursday, July 22nd while we install new equipment in the server room. All of the ms workstations will be down, the computation servers will be turned off, and email will be unavailable (mail sent to our server should simply be delayed). Note that a read-only version of the www.math.mcmaster site will be up on a backup system during the downtime.
February 8, 2010
Servers Back on Line
We are running with two file servers again and last week's performance strain should be over. Anyone whose username beings with m-z who was logged into one of the ms workstations before 8 am today should log out and back in (or press Alt-Ctrl-Bksp) to avoid session instability.
Services Down Monday from 7 am to 8 am
As announced last week, email and workstation access are down between 7 am and 8 am this morning so that I can bring the second file server back into production.
February 5, 2010
Sluggish during opportunistic upgrade; downtime Monday morning
As the second file server was already down and we are failed over to a single server I'm taking this opportunity to upgrade the size and speed of the server's main file system (originally planned for next month). This means that workstation and web-site performance will be sluggish until Monday morning.
Workstations and email (but not most web sites) will be down from 7 am to 8 am next Monday while I bring the second file server back into production mode.
February 1, 2010
Possible File/Mail Loss for Some Users
Accounts starting with the letters m to z have home directories on the failed server; these home directories have been recovered on the other server using backups. If your account is in this range, you may have lost mail or file changes from early Monday morning.
More specifically, mail received for these accounts and file changes made between the time of the backups (ca. 1:30 am) and the time of the server failure (ca. 2:30 am) are not reflected on the recovered home directories being used.
Once I have the time to analyze the failed server, I should be able to recover any missing messages or files.
Workstations up; mail restricted; systems slow
The workstations and most other services are up. Access to mail via imap and pop clients will be restricted at times while the server struggles to process the backlog of spam and the workstation reboots; use pine from the command line or web mail at http://mail.math.mcmaster.ca.
Because we are now running all services off of one server instead of two, the workstations and some web sites will be slow.
Preparing for Server Failover
The faulty file server is still not working properly and we are preparing to failover to a single server. Once the home-directory mirror is updated - that should take 45 minutes - all services will come back on line.
So we expect everything to be working, albeit more slowly than usual, by 11:15.
Note that mail for usernames starting with letters a to l is back up as of 10:00 am.
Server Problem Continues
The server problem has not been solved. We are still working on a minimally disruptive solution. Email and workstations are down; most web sites are up.
Server Problem Monday Morning
One of the two main file servers was found to be having trouble at 7 am today. We are working on the problem. Mail has been turned off for now; web and workstation access will be interrupted half an hour or so.
August 31, 2009
Compute Servers Up Post-AC-Outage
The compute servers were brought back up at 9 AM. The primary file/web/mail servers (and thus the workstations) remained up all weekend as the server-room stayed relatively cool even without the AC running.
August 29, 2009
Reminder: Servers Down August 29th & 30th
As announced earlier, the compute servers will be down from Saturday evening to Monday morning while the AC is turned off. The main servers (web, email, workstations) will stay up unless the server room starts to overheat.
August 27, 2009
Server Back Up
The unexpected downtime which started at 7:20 pm was resolved at 8:15 pm.
Unexpected Downtime Thursday Evening
The file server didn't come up properly after a routine reboot. I'm working on the problem. Most web sites are up; email and workstations are down.
Server Reboot at 6:45 pm This Evening
The main server (mathserv) will be rebooted at 6:45 pm this evening as the final part of today's maintenance. Mail, web and workstation access be interrupted for ca. ten minutes.
August 24, 2009
Reminder: Servers Down August 29th & 30th
The compute servers will be down Saturday evening & all day Sunday (Aug. 29, 30) because the campus air-conditioning system will be turned off. I hope to be able to leave the email/web/workstation servers up, but they will be shut down if the room becomes too hot.
See the original announcement for details.
August 21, 2009
Downtime Afternoon of Thursday, August 27th
The HH/BSB workstations and email access will be down Thursday afternoon between 3 pm and 7 pm while I perform some maintenance and updates to improve performance and capacity. Web sites will be up most of the time.
July 28, 2009
Servers Down August 29th & 30th
July 16, 2009
Compute Servers Down During Power Outage
Contrary to my note yesterday, I will be shutting down the compute servers before the power outage (A/C will be off in the server room and we need to reduce the chance of over heating the room).
June 8, 2009
Workstation Interruption at Noon
The ms workstations will freeze up for about one minutes shortly afternoon while I make adjustments on the server. You should not need to reboot.
June 2, 2009
Workstation Access
Workstation access is still being restored; most will be ready by 10:00 am.
June 1, 2009
Partial Service Recovery; Some Data Lost
I have declared the second failed disk in the main data array officially dead after following a few false leads. Any mail received and any file changes between 4:30 am and 10:15 am are irrecoverably lost.
We are now running with the backup of the home folders on the fail-over file server (which is actually mathserv, the mail/web server).
Mail is flowing again as of 5:20 pm. Access to mail clients was opened at 5:30 pm.
Workstation access will be down until Tuesday morning.
Mail, web and workstation may be slow Tuesday while I get the main file server into full service.
Web Sites Still Up During Downtime
Note that all web sites are back up after a brief interruption. Web sites under home directories (e.g. www.math.mcmaster.ca/~moylek) are available read-only from the backup server and so cannot be modified.
Servers/Systems Down Monday
It appears fairly certain that the second disk really did fail before the replacement for the first failed disk could be built into the array. The file server, workstations and email will be down all day while I replace the disks and recover from files from backup.
Once recovery is under way, I will make a final attempt to recover data from the old disks which may mean that no data is lost. If that fails, any file changes or mail received between 4am and 10:15 am will be lost.
February 22, 2009
Server Reboot at 9 PM Sunday
I will be rebooting the primary file server at 9 PM Sunday in order to complete some software upgrades. Web, email and workstation access will be interrupted for ca. 10 minutes.
Everything Back on Line
All services are back on line as of 1:40 PM following the earlier crash. The was no file loss; most workstations will start working again without rebooting. Investigation continues.
File Server Problem Early Sunday Morning
The primary file server crashed early Sunday morning. I am going to keep the server off-line while I try to isolate and fix the problem. Mail, ssh and workstation logins will be down for the next few hours. Web access will stay up most of the time.
February 20, 2009
Post-Power Failure Problems.
I spoke too soon: the mail file server did not come up cleanly. I am working on it.
Possible Power Loss Friday Afternoon
Facility services plans to cut the emergency power this afternoon at ca. 3:00. If this cut affects the server room - which it should not, but that didn't help us this morning - then the compute/group servers will be shut down but the primary servers will stay up and there will be not effect on email, web or workstations.
Update: Power Loss Friday Morning
The power loss in the server room thing morning was related to the planned cut to emergency power in Hamilton Hall. But the server room battery-backup unit is meant to be on regular power, so this was not expected. Facility Services is investigating.
February 18, 2009
Problems with Off-Campus Access Resolved
We discovered at ca. 9 PM that mathserv was not accessible from off-campus (unless using VPN access) and had not been since ca. 2 PM. This was related to the network problem affecting workstations this morning and is fixed as of 9:30 PM.
Ok - We're Back Up (for real this time)
The primary server is OK again. If your workstation is still weird (no login, no mail, no icons), please reboot. It appears that a latent network problem which was around for a few days became quite actual (possibly triggered by the file server crash).
Mathserv Problems Continue - More Reboots Possible
Problems with mathserv persist. I may need to reboot again between now and 2 pm.
Server Reboot at 12:15 to Solve Scattered, Lingering Problems
I'm going to reboot mathserv at 12:15 today in order to resolve scattered workstation problems related to the file server crash early this morning. Web, email and workstations will be down for 10 - 15 minutes. Reboot your workstation if it doesn't start behaving properly by 12:30.
... and You Can Now Login, too.
While all systems were go at 7:30 today, I neglected to turn off the logon block until 7:50. So now you can login and get mail.
Systems Back up
The main file server crashed at 1:15 am today; web sites were up again at 6:30; all services are up again as of 7:30 am. Mail is catching up quickly; we ran for 15 seconds without a spam filter, so expect a burst; most workstations should work without rebooting.
File Server Problem Early Wednesday Morning
The main file server is down. I don't yet know why yet, except that it has nothing to do with the upgrades (since they are being done to a different server). But am going to look shortly.
February 16, 2009
Downtime for Upgrades During Reading Week - Correction
I'd quite forgotten that the first day of Reading Week is also Family Day, the downtime planned for Monday and Tuesday will in fact take place on Tuesday and Wednesday.
February 10, 2009
Server Reboot - Late but Successful
The planned server reboot happened at 5:20 rather than 5:00, but was quick and otherwise successful. Thanks for enduring yet another (brief) interruption.
Server Reboot this Afternoon
I will be rebooting the primary file server at 5 pm this afternoon in order to implement some OS updates. Email and workstations will be down for ca. ten minutes.
Downtime for Upgrades During Reading Week
I will be upgrading hardware and software on the primary department server over reading week. Most services will be up through the upgrades, but there will be periods of five to fifteen minutes on Monday afternoon and Tuesday afternoon when email is down and the workstations do not respond; there should be only very brief web interruptions.
Server Crash Tuesday Morning
The primary file server crashed at 6 AM today and came back up after file-system repairs were complete at 9:50 AM. The primary web and mail server took collateral damage but was back up at 10:15, at which point all services were running again. Things will be slow for another couple of hours while file-system repairs are completed in the background and the mail queue filters 5000+ messages (mostly spam, of course).
January 19, 2009
OMG! What's with all of the downtime?
Funny you should ask; I was asking myself that very thing this weekend. Here's how I answered myself ....
January 18, 2009
Server and Systems Working Again
Servers and systems are all responding again as of 3:00 pm Sunday following the hardware problem this morning. The disk responsible for failures today and Dec. 30th has been replaced.
Unexpected Downtime Sunday
I was unable to finish the preventative maintenance during the Friday evening and planned to reschedule for next week. The hardware in question decided otherwise and died this morning. All services should be up mid-afternoon; web sites are up as of 11:30.
January 17, 2009
Workstation Problem Saturday Morning
The workstations did not come up properly after the power outage and server work of Friday evening. Things were working again by noon.
January 16, 2009
Reminder: Shutdown at 4:30 pm Friday
The server and workstations are going down at 4:30 pm today. Web sites will remain readable through the upgrade (4:30 - 6:30) and power outage (6:00 - 9:00).
January 15, 2009
More About Server Disk Failure
A disk failed in one of the main file-server arrays this morning and until it is fixed the server will be under performance strain and sensitive to data loss. The files on that array are all to do with the linux workstations, not mail, web or home directories. Normally, we could fix such a failure by swapping out disks on the fly, but in this case the disk in question is the one the server boots from.
Server Disk Problem and Maintenance
A primary disk in one of the main file-server arrays failed Thursday morning. I will be shutting the file server down at 4:30 pm Friday (prior to the power outage) in order to replace the disk and perform some upgrades.
Electrical Shutdown in HH and BSB Friday Evening
Facility Services will cut power in HH and BSB between 6 pm and 9 pm Friday evening. In order to perform some server maintenance at the same time, I will be taking down the file server at 4:30 pm. Web sites hosted by the departmental server will be accessible during the downtime and power outage but not workstations, email or file-server access.
January 12, 2009
Recovery Updates
There are some hangovers resulting from the overheating and shutdowns: some mail will have bounced back to senders; mail service will be slow while a backlog of mail (mostly spam) is processed; workstations will be slow while the file server corrects some disk errors caused by the heat-related crash.
Weekend Emergency Shutdown and Recovery
As of 9:50 am today, most systems are running again; web service was restored at 9:30. The servers were shutdown Saturday morning after an air-conditioning failure in the server room. We are investigating ways of preventing or ameliorating such problems.
December 18, 2008
Power Outage Thursday Evening
Facility Services has just announced that there will be a power outage this evening from 5 pm to 7 pm. The main servers and the network will remain up on backup/emergency power, so if you are outside of HH or BSB, web and email will work.
I will be shutting down non-essential servers and office Macintosh and linux systems at 4:50 pm.
December 14, 2008
Workstations Up
The workstations are able to connect to the file server as of 4:30 pm. All major services are now fully operational as far as my testing shows. Things we be slow this evening while the file systems are being rebuilt, though. Send us email if you see any problems.
Mail Services Up
Mail services are back on line and ssh logins are no longer read-only. There will be a delay with workstation access while a file-system problem is corrected.
LIMITED ACCESS DURING UPGRADE
While the primary file server is being upgraded, the following are up:
Mail delivery, webmail, and imap/pop mail are down until the file server comes back up.
Servers Going Down at 1 PM
The announced system downtime has been pushed forward a bit and the systems will go down at 1 pm. Web service will come back shortly thereafter and other systems about an hour later.
December 12, 2008
Extended Downtime for Server Upgrade Sunday Afternoon
While all systems are down due to the network upgrade this Sunday I will be upgrading hardware and software on our primary file server. The file server, email access and workstations will remain down for about an hour after the network comes back up; most web sites will be accessible immediately.
December 8, 2008
Workstation Hiccoughs
The ms-workstations went pretty much unresponsive for about two minutes mid-morning and for about ten minutes late this afternoon. These hiccoughs are related to the recent weekend crashes and my attempts to ameliorate things. You may see similar, brief problems again this week, though I am, of course, trying to keep interruptions to a minimum. Your patience as we try to sort out this server problem is appreciated.
If your workstation stops responding or gives strange errors this week, please wait five minutes before rebooting - it will very likely come back to life with all applications and windows still open.
December 7, 2008
All Systems Go
The workstations and mail are functional again as of 10:30 am (other services where up earlier or didn't go down at all).
File Server Problem Early Sunday Morning
Our primary file server face-planted early Sunday morning. Email is down but web service is restored as of 9:20 am. All services should be up by 10:00 am.
Efforts to determine the elusive cause will be intensified this week.
November 30, 2008
Server Outage on Sunday
The file server was down Sunday morning from 4:30 am to 10:00 am. Still investigating.
November 28, 2008
Mathserv Reboot at 2:30 This Afternoon
I will be rebooting mathserv at 2:30 pm today to sort out some lingering problems. Web, email and workstations will be down for about ten minutes.
Please don't reboot your workstations; they will freeze when the server goes down and should return to life when the server comes back up.
Systems Down for 30 Minutes
Systems were down for half an hour late this morning because of the servers seized up. Everything is back on line as of 11:52 and we are investigating.
October 30, 2008
Power Outage Thursday Night
The power in Hamilton Hall and BSB will be out for about two hours starting at 11 pm tonight. I will be shutting down the compute servers and linux workstations at 10 pm. The workstations should boot up on their own when the power comes back on.
August 18, 2008
Power Outage Wednesday Morning
The power to Hamilton Hall and the Burke Science Building will be cut from 6:00 am to 7:30 am this Wednesday. We will shut down all RHPCS-managed linux and OS X workstations and all servers except for mathserv at 4:00 am. You should shut down - and perhaps unplug - your office systems if you manage your own.
Another Power Outage Last Thursday
Just for the record, there was a power outage at 7:30 AM on Thursday the 14th. Email was sent to department members Wednesday afternoon, which was as soon as we found out about it.
August 11, 2008
Power Outage Tomorrow - More Systems Going Down
Contrary to my earlier announcement, I will be shutting down all servers except mathserv prior to the power shutdown tomorrow morning.
August 8, 2008
Computing Updates: Internet Outage; Power Outage; more ...
This summary of recent postings to the Computing News blog was emailed to all department members this afternoon.
Power Outage Tuesday Morning
The power will be turned off in Hamilton Hall on Tuesday morning between 5 am and 7 am. Bluespruce and freesurface are not on battery backup right now and will be shut down at 4 am. We will shut linux workstations down remotely; you should turn off your office Windows or OS X workstation when you leave on Monday.
Internet Outage Friday Night
UTS advises us that the campus connection to the Internet will be down from Friday at 11pm until Saturday at 5am. Internal networks will still work, but external sites will be unavailable from campus, campus sites will be unavailable from the outside, and email will be essentially halted.
AppleTalk Access to Mathserv Disabled
AppleTalk access to mathserv is disabled because the protocol does not play well with NFS. You can use smb://mathserv to reach shares on mathserv from OS X.
Departmental Computing Updates via RSS
You can read the departmental computing updates in Apple Mail, Thunderbird and some other mail clients if you subscribe to the RSS feed at
http://www.math.mcmaster.ca/blogs/computing_news/index.rdf
If You are Leaving the Department Soon ...
.... please have a look at the So Long, Farewell... page on the Computing Resources site.
HH-403 Printer Update
The HH-403 printer is still down and will remain so until I return from vacation next week; I've not yet isolated the source of the probem.
Power Outage Tuesday Morning
The power will be turned off in Hamilton Hall on Tuesday morning between 5 am and 7 am. Bluespruce and freesurface are not on battery backup right now and will be shut down at 4 am. We will shut linux workstations down remotely; you should turn off your office Windows or OS X workstation when you leave on Monday.
August 6, 2008
Internet Outage Friday Night
UTS advises us that the campus connection to the Internet will be down from Friday at 11pm until Saturday at 5am. Internal networks will still work, but external sites will be unavailable from campus, campus sites will be unavailable from the outside, and email will be essentially halted.
July 30, 2008
Campus Power Outage
The power went out on campus from ca. 8:30 to 8:50 Wednesday evening. Mathserv and some of the compute servers coasted through on backup power; other systems and most workstations will need to be restarted on Thursday morning.
July 21, 2008
Scheduled Mathserv Downtime Wednesday Afternoon
I will be shutting mathserv down at 4:45 on Wednesday afternoon for a memory upgrade; it should be back up by 5:15pm. Web, email and linux workstation access will be down for the duration.

