Recently in Downtime Category

February 23, 2013

Server Shutdowns Redux

The recent cooling problem in the Hamilton Hall server room persists, I'm afraid.  

In order to keep the temperature down, I've shut off the servers which were either idle or did not have any jobs with significant runtime on them:

  • anatolius
  • bayes
  • freesurface
  • earnserv1

These servers are still running:

  • gosset
  • inviscid
  • earnserv2

Before I go so far as to shut the above servers down, I may start pausing jobs so as to reduce the heat output.  I will let you know if your jobs are affected.

My priority is to keep ms.mcmaster.ca and the storage servers (and its many disks) healthy so that we have web, email, file and workstation services.

I will follow up with Facility Services on Monday morning.


February 22, 2013

Servers Back Up

The server room temperature has fallen since last night's shutdowns and I've powered the compute servers back up.  No particular cooling problem has been identified by Facility Services yet, but I have asked them to turn the temperature down in the server room so that we have more wiggle room, temperature wise.

These servers were turned off last night:
  • anatolius
  • bayes
  • earnserv1
  • freesurface
  • inviscid
  • gosset
These servers remained up:
  • earnserv2
Note that email, web sites, printing, file services and workstations were not affected as ms.mcmaster.ca - the main server - was left up and running.

February 21, 2013

Server Shutdowns - update no. 1

The cooling problems in the server room persist.  I've now shutdown two more compute servers:

  • gosset
  • earnserv1

Server Shutdowns

There has been a cooling problem in the Hamilton Hall server room since Wednesday afternoon, a problem exacerbated by heavy computational load (and thus heat output) on several servers.

I have already shutdown compute servers which did not have any long-running jobs on them:

  • anatolius
  • bayes
  • freesurface

If the temperature climbs any higher, I may have to shut down the following (busy) systems without warning:

  • gosset
  • inviscid
  • earnserv1
  • earnserv2
My priority is to keep ms.mcmaster.ca and the storage servers (and its several disks) healthy so that we have web, email, file and workstation services.  I am still trying to get more information and an estimate of the repair time from Facility Services.

January 1, 2013

Main server down for a bit

The main departmental server, ms - which provides email, web and file services - crashed on New Year's Eve at 5 pm.  It is back up as of 9 am New Year's Day.  There's no obvious problem and we will investigate further on January 2nd.

There appears to be no file corruption or mail loss, though some mail sent after 5 pm on December 31st will have bounced back to the sender with an error message.

December 13, 2012

Anatolius Down Friday Afternoon

I will be taking anatolius down on Friday afternoon in order to upgrade some underlying software (which will yield more processing power).  It should be up again by 4 pm.

November 19, 2012

Anatolius Down for Upgrades Monday, Nov. 19

I'm taking anatolius down today in order to complete the upgrades originally scheduled for the 5th.

November 5, 2012

Anatolius Down for Upgrades on Thursday

I will be taking anatolius down on Thursday morning in order to upgrade the underlying software.  When it comes back up, it will have more processors available.  I expect anatomies
Mathematicians

Mathematicians (Photo credit: KennethMoyle)

 to be back up later on the same day.

If this will be a great problem for you, please let me know as soon as possible.

October 12, 2012

Server Back to Normal

As of 11:00 pm, the storage array (which died this morning) has been recovered and the main file system is once again mounted on ms.mcmaster.ca: 

  • email should be responding normally;
  • files in home directories are editable;
  • workstations will allow logins.
A check of the file system revealed no errors so we don't expect that there was any file loss.

Incoming mail was only briefly interrupted; no mail should have been lost though some may have bounced.  Web sites were up except for a few periods of a few minutes.

My thanks to Todd Pfaff for diving into the XML config files of the storage array when things got weird.

The disk array from the dead storage server is now running in the fail-over server.  A priority over the weekend and early next week will be to bring the dead storage server back to life so that we have a live fail-over server again.

Server Update

The file-storage array is still down but we are making progress.

I believe that I have identified the hardware which needs to be replaced and I have the spare part, but replacing it will involve disassembling much of the server and will take several hours.

Before beginning that process, I am - very carefully - attempting to bring up the disks from the production storage array in our fail-over system.  If this is successful, then everything should be up this evening.  If it does not work, I will replace the failed part on Saturday.

Server Problems - what's happening

The storage-server problem and resulting service outages are described in this earlier message.  Here's how things look for recovery.

My first priority will be to recover the primary storage server.  My second priority is to prepare the backup storage server to take over for the primary one; unfortunately, that server was already being worked on and is will not need a few hours work to finish some upgrades.

If all goes well, we should have one of the two storage servers on line and all services working by the end of the day.  It is possible that the current state (web up; email mostly up; workstations down) may continue through the weekend.

August 28, 2012

Bayes Still Down

Bayes is still offline while I work on some software problems; it will be up this afternoon, if all goes well.

Update @ noon: bayes simply will not boot at all now.  I am going to concentrate on bringing up more cores on a newer and faster box.

August 27, 2012

Server Upgrades

Before bring up bayes and freesurface following the power outage on the weekend, I'm going to perform some hardware tests and OS upgrades.  Both systems should be up by the end of today.

August 25, 2012

Mail Flowing Again

There departmental server wasn't receiving mail after it came back up at the end of the power outage; this was corrected at 10:45 pm.

Systems Coming Up Post-Power-Outage

I powered up the main server, ms.mcmaster.ca, at 6:00 pm on Saturday following the twelve-hour power outage.  Web sites - including http://www.math.mcmaste.ca - and email were working again by 6:10.

Much of the mail sent to your @math.mcmaster.ca account will have been queued up for delivery and will have arrived by now; other messages may have timed out, in which case a message will have been sent to the sender.

I was not able to have a copy of www.math.mcmaster.ca up during the outage, which had been my plan.  Unlike past outages, this time we did not have emergency power to all the network links between the HH server room and the campus network trunk.

I will need to visit the server room to bring up the computation servers anatolius, bayes, gosset and freesurface; I plan to do that on Sunday.

August 24, 2012

Change of Plans re. Servers During Power Outage

While I had announced that I was going to leave some of the compute servers running throughout Saturday's power outage, I have decided that I will, after all, have to shut those computers down: the cooling system will be powered off and the computer room will overheat if the servers are left running without air conditioning for twelve hours.

All of the compute servers will be shut off at 10:00 pm on Friday.

Workstations will shut down at 4:00 am on Saturday.

A read-only copy of www.math.mcmaster.ca will be available Saturday morning, except perhaps for a few hours in the very early morning.


August 22, 2012

Saturday's Power Outage & Our Systems

You will likely have heard that there will be a power outage affecting Hamilton Hall and the Burke Science Building this coming Saturday from 5 am to 5 pm (see announcement from Facility Management below).   

The Math & Stats servers and other systems will be affected as follows ...

  • the network in Hamilton Hall is on emergency power and will remain up; MacSecure wireless might well be available, too;
  • linux and Macintosh workstations managed by RHPCS will shutdown automatically at 4 am;
  • the main server has limited battery backup and will remain up until about 9 am on Saturday;
  • these servers will be shutdown on Friday at 10 pm
    • anatolius
    • inviscid
    • earnserv1
    • earnserv2
  • these servers will remain up but will not have access to home directories once the main server is shut down and jobs running from /home may fail (jobs running from the local disk - i.e. /1/home or /scratch - will be OK);
    • bayes
    • gosset
    • freesurface
Web Sites

I will have a read-only version of the web sites hosted by ms - including www.math.mcmaster.ca - available via another server once ms is shutdown.

Email

Email not be available once the main server is shut down on Saturday morning; in-bound mail will be queued for delivery once ms comes back on line.


Restarting Systems

I plan to restart the main server as well as anatolius, inviscid, earnserv1 and earnserv2 once the power comes back on.

Following is the original announcement from Facility Services.

Continue reading Saturday's Power Outage & Our Systems.

January 26, 2012

Bayes Still Down

Bayes is still down while I figure out just where the cooling failure is.  

December 20, 2011

Brief Downtime Wednesday Morning

I will be shutting down mail, web and file server access to the main server for about five minutes on Wednesday morning at 8 am in order to correct a small problem.

Partial Service Interruption at 4:40 pm Today

We had read-only access to the main disk array for about ten minutes this afternoon, starting at about 4:40.  This was followed by ca. one minute during which the web folders and home directories were not accessible at all.

Email was not being delivered during this period, nor could one write to files or mailboxes.  Workstation users may have seen browsers or apps crash.

As of 4:52, all is well again.


The problem was the result of a configuration tweak which didn't go as planned; my apologies for any frustration.

November 16, 2011

More about email during downtime

The server was not able to accept email until the Sunday at until Wednesday at 10 am* (though the web sites were up most of that time and existing messages were readble).  Senders will have received messages saying that mail was not deliverable; mail will not simply have vanished.

Some mail sent in the past day or so may still make it to you if it has been queued for retransmission up stream.

Mail which arrived between Sunday at 3 am and Sunday at 8 pm was not in the backups and is not recovered yet.  I pulled as much of that mail as I could from the damaged file system and will make it available later.

* This is pretty much unprecedented and avoiding this in the future will mean making some changes to our system so that storage problems (which are inevitable) don't take mail out of service for so long.  


Mail is up Wednesday morning; other files to come

Mail is flowing as of 10:09 am today, once we worked out hardware and software complications - hydra-like, as they tend to be sometimes - at about 2 am and then recovered files from backup.

The backups are from Sunday morning at 3 am.  We have recovered some of the changes between 3 am and 8 pm from the broken file system; I will make these available to you later.

Home directories contain only mail and web files right now; other files will follow.

You can get to backups of your files using Windows networking or sftp via the HomeDirectoryBackup link you'll find at the top of your home directory.

You will be able to login to workstations again later on today once the rest of the data is recovered.  You can continue to use the tempuser account for now; Alt-F2 -> smb://ms/ will get you to your files.

I'll be posting some forensic information and remedial plans later for those who are interested.


November 15, 2011

Actually, file array looks bad after all

Despite my optimism of an hour ago, it appears that the file system on the storage array is corrupted and we will have to restore from Sunday's backups.  The restore will take more than a day to complete, but some people will be able to receive mail and use their workstations early Tuesday morning.

I'm going to let some checks run until morning, recover any changes dating from between the backups and the crash which I can, and then begin.

For now ...
  • web sites remain up
  • workstations can be used with the tempuser account
  • mail forwarding works (for those who had forwarding set)
  • home directories are accessible via sftp and Windows file sharing
  • printing works from ms and from the tempuser accounts on the workstations

File array recovered, but ...

We've managed to recover the main file array (the 4.5 Tb file system which holds mail, web sites and home directories, among other things).  But there may errors and I don't want to bring it back on line until I'm sure that there is no significant file corruption.

Web sites will remain up but logins and mail will be disabled fot the next few hours.

November 14, 2011

File array on its way back up


After an awful lot of reading and frowning - a few hours of unexpected screwdriver work - I've got the damaged storage array reassembling itself without any data loss.

I should be able to bring the array on line this evening.  At that point ...
  • new mail will start flowing again
  • workstations will work

And a reminder of what happened ...
  • one disk failed at about 6pm on Sunday; this did not affect any services
  • a second disk failed at about 8pm that night; this took the storage array off line
... and what the status of services has been ...
  • web sites were back up at 10:15 pm on Sunday and have been up since
  • most workstations are usable for web, printing via the account tempuser (contact me or Sheree for the password)
  • mail from before Sunday at 3:00 am is available via
    • http://mathmail.mcmaster.ca
    • imap/pop mail clients
  • home directory files are accessible read-only via Windows file sharing
    • smb://ms.mcmaster.ca (in OS X and linux)
    • \\ms.mcmaster.ca\ (in Windows)

November 13, 2011

New Server Problem Sunday Night

The main storage array is off line as of Sunday at 8:10 pm due to a disk problem on the main storage array. As of 11:15 pm, web sites are on line via backups.  Email & home directory access are not available and won't be until some time Monday.  

Printing to the network printers should still work.

 The problem is not directly related to the power problem of Friday evening, but may well be an indirect consequence. I will post updates here.

Back to Normal Sunday Afternoon

I moved the main storage array onto the new battery backup unit at 5pm today; web, email, workstations and printing were paused for ten minutes during the move. All systems were back on line as of 5:12pm.

The new battery-backup unit - put in place on Saturday - should prevent a power problem such as we encountered on Friday evening.

November 12, 2011

Bayes, freesurface back up

Bayes and freesurface are back on line as of 2:00 pm today; gosset needs some attention, which it will get on Monday.

Shutdown Sunday Afternoon

The main departmental server will be down for about half an hour on Sunday afternoon starting at 4:00 so that I can replace a faulty battery-backup unit: web, email and workstation access be down for that period.

Once power is restored, I will be able to bring the compute servers bayes, freesurface and gosset should back on line.

November 11, 2011

System Problems Friday Evening

We had an unfortunate confluence of power and system problems on Friday afternoon and web, email and workstations were down from 4pm to 6:45 pm. 

There is no damage to the main file system and there is no reason to believe any files were lost.  Any mail sent to @math.mcmaster.ca addresses will have been queued for retransmission; there is now a steady flow of mail coming in, though it will be a few hours before all delayed mail is retransmitted.

Some of the workstations came back to life when the file and storage servers came back o n line at 6:45 but others will need to be power cycled (i.e. hold down the power button for 10s to turn off, then turn the power back on).

Please let me know if you found any on-going problems, of course.

Due to the power problems, the compute servers bayes, gosset and freesurface will be powered off for the weekend.

I will be scheduling downtime for Monday evening in order to rectify the power problems and bring the rest of the compute servers up.

November 8, 2011

Bayes down for maintenance Monday afternoon

I will be taking bayes (the general-purpose Stats compute server) down on the afternoon of Monday, November 14th in order to check a hardware problem and upgrade the operating system.

Bayes will probably be available again by Monday evening.

If this schedule poses a great problem for you, please let me know by Thursday, November 10th.

August 24, 2011

File-server Problem Wednesday Afternoon

We had a problem with the connection between the main file/web/mail server (ms.mcmaster.ca) and the primary storage array this afternoon.  File access - and thus all other kinds of access - became spotty at 3:46 pm, then became read-only and then was lost altogether.  Things are OK again as of 4:15 pm.

There is no reason to believe that any files or mail messages were lost.

May 3, 2011

Service Interruption Wednesday Morning

Facility Services will be testing emergency power in Hamilton Hall at 7:30 am on Wednesday May 4th.  This will affect the UPS (i.e. battery-backup unit) which gave us trouble on May 2nd and April 15th - and we can't trust this unit to coast thorough the power interruption.

I'm going to perform a prophylactic shutdown of the systems on that UPS unit at 7:25 am.  Everything should be up again at 7:35 am.

The main server will not be shut down, but mail and web will be unavailable because the mail file array will be.

Note the faulty UPS will be replaced in the next week or so; the replacement will require about 30 minutes of downtime.

May 2, 2011

Power Failure Monday Morning

We had a server-room power failure this morning - web, email and workstations were unavailable between 8:37 AM and 9:15 AM.   Just as was the case on April 15th, the main server wasn't down put the main storage array was.

This second failure confirms that we have a problem with one of our battery-backup units; I will be replacing it ASAP.

April 15, 2011

Possible Service Interruptions

We are still trying to pinpoint the source of the partial power failure in the server room earlier this afternoon.  We know that something went wrong with a UPS unit which has served us faithfully for six years now, but we don't know precisely what.

Depending on what we find, we may need to shut down the storage array and compute servers with very little notice.   And a similar power failure might be possible, too.

There may be loss of access to home directories and interruptions to mail and web service with little or (should the power fail) no notice.  So: save early and save often.  I'll post an update once we know that things are stable again.


Enhanced by Zemanta

Server/Power Problem Friday Afternoon

We lost power to part of the Hamilton Hall server room on Friday afternoon just before 3:00 pm.  The main server wasn't affected, but the main storage array was, which means that mail, web and workstations were unavailable until the problem was corrected.  Web sites were back up by half past three, but other services were spotty until about four o'clock.

There was no damage to the files on the storage array, though some mail may have been returned to senders as undeliverable.

Most workstations will need to be rebooted (Alt-Ctrl-F1 then Alt-Ctrl-Del); some will need to be restarted (hold power button for ten seconds to turn off then turn back on).

Any jobs running on bayes, gosset or freesurface will have been lost as those servers were connected to the part of the power system which failed.

April 8, 2011

Partial Shutdown on Saturday, April 30th

Facility Services has announced that air conditioning to Hamilton Hall will be turned off from 6:00 am to 4:30 pm on Saturday, April 30th.   In order to prevent damage from overheating, we will be shutting down most systems in the server room on Friday afternoon: this means bayes, gosset, freesurface, etc.



I will leave the main file/web/mail server up, but if the room starts getting too hot I will shutdown everything but web services (no email, no workstations, no changes to the web server).


Announcement from FS follows...

Continue reading Partial Shutdown on Saturday, April 30th.

April 6, 2011

Post-Downtime Update

Our main server (ms.mcmaster.ca) is now back in HH after a few months in the ABB server room and is using a new, larger disk array, also in HH (we were borrowing space in ABB while ms was there).  Thanks for your patience as we completed another part of the migration to new server infrastructure.

A few notes regarding the downtime and recovery ...

  • contrary to my plan, the xguest login on the ms workstations did not work
  • the downtime extended to 8:45 pm instead of 7:00 pm
  • the web server was down from 4:55 pm to 5:20 pm
    • the main page (and other database-driven pages) were down for another hour
    • other sites (e.g. course and instructor pages) were OK
  • most workstations are working fine as of 9 o'clock Wednesday morning, but a few will need to be rebooted; if your workstation is frozen
    • hold down the power button for ten seconds to turn it off
    • wait five seconds
    • turn it back on
    • note that the boot may take five minutes or so while the disk is checked for errors

April 5, 2011

Mail Hiccoughs

Some of you will be having trouble getting to your mail via imap clients or webmail until later on this evening: I neglected to redirect the mathmail.mcmaster.ca to the new network location of the server. My apologies.

Note that anyone using the addresses mail.math.mcmaster.ca or ms.mcmaster.ca won't see these problems - though mathmail.mcmaster.ca is the preferred address.

Downtime Extended but Web Sites Up

The scheduled downtime is not quite over.  Websites and printing are up but email and workstations are still down while I finish some work on the storage array.

Things should be working again at ca. 8:30 pm.  My apologies for the delay.

Note that all web sites hosted by ms.mcmaster.ca were down from 4:55 pm to 5:20 pm (contrary to my announced intention of limiting downtime to a few seconds).  The www.math.mcmaster.ca main page and other database-dependent pages were generating errors until 6:30 due to a network problem introduced during the server move.  Other parts of the site (i.e. most course and instructor pages) were fine.

Downtime This Evening - Changes

The scheduled downtime from 4:45 pm to 7:00 pm this evening will proceed as planned but with these differences ...

  1. the www.math.mcmaster.ca web site will not be down for more than a few seconds (though you won't be able to make changes during this period)
  2. limited-use guest accounts will be available on the most workstations
Once the upgrades begin at 4:45 today, you will not be able to get to your email, use your linux account on your workstation or print.  But if you logout and login with the username xguest, you should find that you are able to use a browser (if firefox doesn't work, use Chromium).

All services should be on-line again by 7:00 pm; some (e.g. printing) will come up sooner.

April 4, 2011

Dowtime Tuesday Afternoon

We will be moving our main server and storage array from their temporary berth in ABB back to our HH server room on Tuesday. Email, workstation and home-directory access will be down from 4:45 pm to 7:00 pm on Tuesday; web sites will be down from ca. 6:50 pm to 7:00 pm.

If all goes well, I should have the linux workstations set up so that you can login and run a browser without logging into the server; you won't be able to read your mail @math.mcmaster.ca or get to your files, though.

Enhanced by Zemanta

March 29, 2011

Servers Were Offline for ca. 3 hrs Monday Evening

A few servers were off-line from 5:45 pm to 9:05 pm Monday, March 28th due to a network-configuration problem introduced by some new equipment installed that afternoon.  The affected systems were 

  • freesurface
  • gosset
  • webwork
  • earnserv2
RHPCS and UTS are sorting the problem out.  These servers might loose their network connections momentarily later on today.

March 18, 2011

Downtime 4:30 pm - 5:00 pm Monday, March 28th

The main server and all services (web, mail, workstation) will be down from 4:30 pm to 5:00 pm on Monday, March 28th while I move the hardware to a new location.

February 24, 2011

Systems Up Again Following Scheduled Downtime

We're up an running again as of 5:30 pm - which means that we were down for 90 minutes instead of the announced 30 minutes. While we had the system off-line, we moved to a larger storage system. So we're now running with more than twice the storage, double the RAM and twelve CPUs instead of eight.

While email and workstations were down for the entire period, web sites were up and down a few times - I had them up and running whenever I could safely do so.

February 23, 2011

Downtime Thursday Afternoon

The main server will go down at 4:00 pm on Thursday. The server itself should be up again almost immediately but it may take up to half an hour for all services to resume (mail, web, workstations, etc.).

February 17, 2011

Server Problems - sort of

The ms server and the workstations have been agonizingly slow (at best) since about 8:15 this morning. A disk on our main storage array failed and the array was hobbled (in "degraded mode", for those who follow these sorts of things). We do not yet know why performance was as miserable as it was - it should have been poor, not horrible.

There were three interruptions of five to ten minutes as I sought the cause of the problem - working on the invalid assumption that it was our server again.

The disk has been replaced and the storage array is rebuilding itself. Performance is going to be poor until the rebuild is complete.

Workstations may need to be rebooted if they have got confused over the state of the links to the home directories (though I have forced a refresh remotely on all systems which were responding).

February 16, 2011

More About Web Sites During Server Problems

I stated in an earlier post today that "some web sites were partially down". I've had some questions about what that means, precisely.

All web sites hosted on ms.mcmaster.ca were down from 4:30 to 6:15 yesterday evening.

From 6:15 to 9:00 pm, many pages on the main math web site (the official-looking blue pages) were failing; other sites (e.g. iidda.mcmaster.ca, mathmail.mcmaster.ca) were OK, as were personal and course pages on www.math.mcmaster.ca.

From 9:00 pm yesterday to 9:45 am today, all www.math.mcmaster.ca pages were working from on campus and from VPN connections, but not from off campus. As of 9:50 am today, things were back to normal.

Delayed Mail Delivery

You may notice that some mail is arriving later than expected or in the wrong order. That's because mail which could not be delivered earlier when the server was busy or down was held upstream for a few hours before delivery was attempted again.

Lordy - Server Sorted Out

Ok - that was no fun. My clever-clever hop from one piece of hardware to another yesterday evening went from bad to worse: server performance was periodically horrible and some web sites were partially down.

We're now back to running perfectly well and normally on some borrowed hardware while I get this sorted out ... "this" being "being able to swap server hardware quickly and without significant downtime, frustration and grey hairs".

We will try the switch again in a few days - most likely Saturday afternoon.

Note that there is no worry of data or mail loss.

February 15, 2011

Server Up But with Some Web Problems

The half-hour of downtime scheduled for 4:30 this afternoon extended to nearly two hours: a theoretically routine hardware switchover wasn't. The upside is that we learned some new things about iSCSI storage arrays. The downside was ... well, two hours of downtime.

I am having a very unexpected problem with the web server: the main www.math.mcmaster.ca is failing, though other sites on the same server (mathmail.mcmaster.ca, wiki.math.mcmaster.ca, iidda.mcmaster.ca), personal sites (www.math.mcmaster.ca/matt etc.) and course sites (e.g. www.math.mcmaster.ca/S1cc3) are all fine.

Downtime This Afternoon

I'm going to take the main server off-line for about half an hour this afternoon starting at 4:30. I've been trying to keep the downtime required for this upgrade to a minimum and to off hours, but as time is pressing, we're going to have this daytime interruption.

Workstation, printing and email access will be shut off during most of this period. I will keep the web sites up for as much of the period as possible.

February 13, 2011

Systems Back Up

The downtime early Sunday afternoon lasted a little longer than I expected and was a little downer than I expected: all systems served by ms were down from ca. 2:15 pm to 3:00 pm. (mail and web were intermittently down between noon and 2:00 pm).

Everything is now back up.

Most workstations will probably need to be rebooted in order to work properly.

February 12, 2011

I still have a little more testing to do before finalizing some server upgrades. I will be taking services off-line between 11 am and 1 pm on Sunday. Web sites will stay up (read-only) with only very brief interruptions. Workstation, email and printer access will be down for five to 30 minutes at a time during this period.

February 11, 2011

Downtime Saturday Afternoon

I didn't finish the update work during the downtime scheduled for Thursday afternoon - nor was there any downtime to speak of. I will be taking services off-line between 3pm and 5pm on Saturday. Web sites will stay up (read-only) with only very brief interruptions. Workstation, email and printer access will be down for five to 30 minutes at a time during this period.

February 9, 2011

Downtime Thursday Morning and Evening

I'm going to be taking services off-line for about one hour on Thursday and Friday mornings so that I can complete some server work. The Thursday outage will start at 7:30 am and will affect workstations and mail intermittently; web sites be largely unaffected. The Friday downtime will start at 7:00 am and will affect workstations and mail; web sites will be up most of the time.

November 29, 2010

Two Power Failures Last Friday

There were two power failures last Friday morning: the first at ca. 1:15 am was brief and only took out systems which were not on battery backup. The second one, at ca. 8:50 am, lasted about ten minutes and took out most systems on battery backup as well.

My thanks to the sysadmins who helped pick up the pieces in Math & Stats while I was enjoying my holiday in blissful ignorance of the problems here on campus.

November 12, 2010

We're OK after a rough week, computerwise

We've experience more than our share of computer woes this week: in the department, across campus, and beyond. Most things have settled down as of late Thursday evening.

In Math & Stats

Our new main server has been just fine following a shake-in period earlier in the term, but a new interim storage system failed on Wednesday and Thursday; web sites were up most of the time, but email and workstations were out of commission for several hours. We're using some borrowed file space for now.

The HH-303 printer has gone from bad to worse: I disabled printing on Tuesday while I get HP to deal with this properly. Scanning still works just fine. The HH-403 printer is the main alternative.

Campus-wide and Beyond

There was a fourteen-hour campus Internet outage which started on Monday afternoon - this was due to cut cable some 10 kms from campus.

Tuesday saw problems with some UTS systems such as univmail, resulting (as I understand it) from the strain of catching up after the Internet outage.

November 11, 2010

Systems Back Up

All systems are back up as of 9 p.m. using borrowed file-storage space. We're still shuffling files around and so the server and workstations will feel a little bit slow for a few hours.

We have no reason to believe that any data was lost and post-poned mail deliveries should be finished come Friday morning.

Note that while email and workstations were down much of the afternoon and evening, the web sites have been up since 4:00 p.m. (and up and down earlier in the day).

We will be transitioning to a permanent storage space of our own next week; brief downtime periods will be announced in advance.

Almost Up

We'll should have everything running again some time later this evening using the borrowed file-server space. There might be some brief un-announced downtime on late this evening or Friday. There will be plenty of notice before we shift back to our own upgraded storage in a week or so.

It might be necessary to reboot your workstation: just press Alt-Ctrl-F1 and then Alt-Ctrl-Del.

Note that web sites are already up - we were able to shift those to another file server much earlier.

Server Problems Cnt'd: Web up; mail & workstations down

The flakey (though new) storage server continues to be flakey and will not stay up long enough for us to get the file updates to the fail-over storage. We have disabled logins and email for the next hour or so.

Web sites remain up using a different file server - though changes made this morning are not reflected as we are using last night's backups.

Storage-Server Problems

While our new server is stable, we are having repeated problems with a borrowed storage server: it crashed yesterday afternoon and again this morning, taking email, web sites and the workstations down with it.

As we speak, we are getting a fail-over system ready ... two, actually. Workstation and mail performance will suffer while we are copying data from the current system.

There will be brief periods of downtime without advance warning so that we can take the unreliable storage system out of play as soon as possible.

Note that you can subscribe to Computing News blog entries to keep abreast of service announcements - see the SUBSCRIBE VIA EMAIL in the right-hand column.

October 18, 2010

Server and Systems Back Up

That took longer than the planned fifteen minutes, but we are now running ms on more powerful hardware (more processors, more RAM). We will be monitoring performance.

Thanks for bearing with us.

Server & Workstation Downtime Monday Afternoon

Because the main server is struggling so under the new configuration (introduced last week), we are going to make a change this afternoon which we believe will provide immediate improvement.

The main server, ms, will be down for about fifteen minutes, during which time the workstations, email and www.math.mcmaster.ca will not work. The change will happen some time between 3:00 and 4:30.

It should not be necessary to reboot your linux workstation afterwards.

October 14, 2010

Servers Back Up

The ten-minutes of downtime scheduled for 4:45 actually took 20 minutes - but everything's back up and we are able to proceed with some upgrades.

Servers to Go Down Briefly at 4:45 Today

The main server, ms, and the compute server anatolius will go down for five to ten minutes at 4:45 today (Thursday). Workstations, websites, email and printing will be out of commission for that time.

Sorry for the short notice - this is a part of the longer-term server-upgrade plan which we are accelerating so outages such as the one earlier today are less likely.

Server Down Briefly

The main server, ms, was locked up from approx. 2:00 to 2:10 this afternoon. The Web server was down for a further five minutes. Everything's OK now.

October 1, 2010

Mail/Workstation Downtime Friday Morning

Access to mail and workstations will be down again from 6 to 7 Friday morning.

September 28, 2010

Mail/Workstation Downtime Wednesday & Thursday Mornings

I will be moving user home directories from mathserv to our new (borrowed, really) file server on Wednesday and Thursday mornings.

Between 6:00 and 7:30 on Wednesday and Thursday mornings, the following will be unavailable


  • webmail

  • access from mail clients

  • ssh login (and thus pine, etc.)

  • incoming mail (delivery will simply be deferred

  • workstation access

  • access to network printers

Web access will not be affected except briefly for sites in ~/public_html folders (while individual folders are in the process of being moved).

September 20, 2010

Server Reboot Today at 5 pm

I will be rebooting the new main server (ms.mcmaster.ca) today at 5 pm in order to implement a performance tweak. Web and email will be unavailable for two to five minutes.

Workstations may pause during this period.

September 17, 2010

Services Back Up; Possible File Loss

One of our two file servers - one which was to be taken off-line next week - failed rather spectacularly this afternoon. About half of our home directories (starting with m - z, mostly) were unavailable from 12:50 pm to 3:00 pm.

What Was Affected

Because I've not yet recovered the failed file system and may not be able to do so anytime soon, I've reverted to last night's backups. If your home directory was on that disk, you will have lost changes/additions to your files and email from ca. 2:30 am to 12:50 pm.

All users whose files might have been affected will receive email from me with more information.

When (and if) I recover the failed file system, I will make files and mail boxes updated during that period available to you.

What Was Not Affected

Note that almost all of the web site was unaffected. Email sent between 12:50 pm and 3:00 pm will not have been lost but simply queued for later delivery. MS workstations still running the previous OS were down between 12:50 pm and 1:20 pm; systems running the new OS should not have been affected (except that some users could not login).

Server Crash

So we're going through all of the grief of the upgrade to get off of an unstable server ... and that server crashed. The problem primarily affects people whose last name starts with m - z, though other people might see problems, too; e.g.


  • workstations which have not been upgraded will likely need to be rebooted

  • mail will be turned off periodically


We are working on the problem and will brings things back up ASAP.

September 15, 2010

Mail & Workstation Downtime Thursday Afternoon, Friday Morning

Email and the post-doc/grad-student linux workstations will unavailable between 5:00 pm and 6:00 pm Thursday, September 16th and between 7:00 am and 8:00 am on Friday, September 17th while we move over to the new file server. Web sites will stay up except for very brief interruptions.

Downtimes for Server Upgrades

Today: In-bound mail to @math.mcmaster.ca adresses will be paused from 3:30 pm to 4:00 pm today (Wednesday). You will still be able to access your inbox and mail folders via mail clients, pine, and web mail. If the out-bound/SMTP address of your mail client is mathmail or smtp1 (and not mail.math.mcmaster.ca or mathserv), then you will still be send mail, too.

Tomorrow: The workstations, email and access to home directories will be down from 4:30 pm to 6:00 pm tomorrow (Thursday). Web sites will still up, though with very brief interruptions.

After 4:00 today, all mail/spam processing will be handled by our new, faster server. After tomorrow at 6:00, we will be using a borrowed (and faster) file server so that we can upgrade our own file servers.

September 14, 2010

Server Reboot at Noon

The new server, ms.mcmaster.ca, will be rebooted at noon. Web and email will be down for a few minutes.

August 2, 2010

Mail/Login Problem for Some Users

Some time after the file server recovered on Saturday morning, there was a hiccough with the non-crashed server accessing home directories starting with a - l. This was easily corrected, but unfortunately was not caught not until after I returned from camping. As of 10:30 Monday, all home directories are accessible and mail is starting to flow to accounts starting with a - l.

July 31, 2010

Server & Services Backup Saturday Morning

The server which failed is back up as of 8:00 AM today. No files were lost. Queued mail for people with usernames m - z is now flowing, although filtering and delivery of the thousands of queued messages (mostly spam, of course) may take hour or so.

The source problem will not be corrected until after the long weekend and so the server must be considered unstable for the next few days.

Server-Problem Update

The failed file server is back up as of 12:45 am Saturday but it will take several hours for the file systems to recover; I am going to let that process finish and then perform backups before I bring them back on-line.

Web services will continue to work. Mail and login will be available for people with usernames starting with a - l. MS workstations as well as mail/login access for everyone else will be down until some time midmorning on Saturday.

The ultimate source of the problem is still unknown, but is related to power supplies. The file server must be considered unstable for the next few days.

July 30, 2010

Server Down Friday Evening

One of our file servers went down Friday evening. I'm not sure when it will be back up but I will have an updated notice up early Saturday morning.

July 22, 2010

Post-Downtime Update

The scheduled downtime went as planned and the servers and workstation were back on-line as of 11:05 AM. Thanks for your patience.

July 19, 2010

Downtime Thursday Morning

The Math & Stats servers will be down from 9 AM - 11 AM on Thursday, July 22nd while we install new equipment in the server room. All of the ms workstations will be down, the computation servers will be turned off, and email will be unavailable (mail sent to our server should simply be delayed). Note that a read-only version of the www.math.mcmaster site will be up on a backup system during the downtime.

February 8, 2010

Servers Back on Line

We are running with two file servers again and last week's performance strain should be over. Anyone whose username beings with m-z who was logged into one of the ms workstations before 8 am today should log out and back in (or press Alt-Ctrl-Bksp) to avoid session instability.

Services Down Monday from 7 am to 8 am

As announced last week, email and workstation access are down between 7 am and 8 am this morning so that I can bring the second file server back into production.

February 5, 2010

Sluggish during opportunistic upgrade; downtime Monday morning

As the second file server was already down and we are failed over to a single server I'm taking this opportunity to upgrade the size and speed of the server's main file system (originally planned for next month). This means that workstation and web-site performance will be sluggish until Monday morning.

Workstations and email (but not most web sites) will be down from 7 am to 8 am next Monday while I bring the second file server back into production mode.

February 1, 2010

Possible File/Mail Loss for Some Users

Accounts starting with the letters m to z have home directories on the failed server; these home directories have been recovered on the other server using backups. If your account is in this range, you may have lost mail or file changes from early Monday morning.

More specifically, mail received for these accounts and file changes made between the time of the backups (ca. 1:30 am) and the time of the server failure (ca. 2:30 am) are not reflected on the recovered home directories being used.

Once I have the time to analyze the failed server, I should be able to recover any missing messages or files.

Workstations up; mail restricted; systems slow

The workstations and most other services are up. Access to mail via imap and pop clients will be restricted at times while the server struggles to process the backlog of spam and the workstation reboots; use pine from the command line or web mail at http://mail.math.mcmaster.ca.

Because we are now running all services off of one server instead of two, the workstations and some web sites will be slow.

Preparing for Server Failover

The faulty file server is still not working properly and we are preparing to failover to a single server. Once the home-directory mirror is updated - that should take 45 minutes - all services will come back on line.

So we expect everything to be working, albeit more slowly than usual, by 11:15.

Note that mail for usernames starting with letters a to l is back up as of 10:00 am.

Server Problem Continues

The server problem has not been solved. We are still working on a minimally disruptive solution. Email and workstations are down; most web sites are up.

Server Problem Monday Morning

One of the two main file servers was found to be having trouble at 7 am today. We are working on the problem. Mail has been turned off for now; web and workstation access will be interrupted half an hour or so.

August 31, 2009

Compute Servers Up Post-AC-Outage

The compute servers were brought back up at 9 AM. The primary file/web/mail servers (and thus the workstations) remained up all weekend as the server-room stayed relatively cool even without the AC running.

August 29, 2009

Reminder: Servers Down August 29th & 30th

As announced earlier, the compute servers will be down from Saturday evening to Monday morning while the AC is turned off. The main servers (web, email, workstations) will stay up unless the server room starts to overheat.

August 27, 2009

Server Back Up

The unexpected downtime which started at 7:20 pm was resolved at 8:15 pm.

Unexpected Downtime Thursday Evening

The file server didn't come up properly after a routine reboot. I'm working on the problem. Most web sites are up; email and workstations are down.

Server Reboot at 6:45 pm This Evening

The main server (mathserv) will be rebooted at 6:45 pm this evening as the final part of today's maintenance. Mail, web and workstation access be interrupted for ca. ten minutes.

August 24, 2009

Reminder: Servers Down August 29th & 30th

The compute servers will be down Saturday evening & all day Sunday (Aug. 29, 30) because the campus air-conditioning system will be turned off. I hope to be able to leave the email/web/workstation servers up, but they will be shut down if the room becomes too hot.
See the original announcement for details.

August 21, 2009

Downtime Afternoon of Thursday, August 27th

The HH/BSB workstations and email access will be down Thursday afternoon between 3 pm and 7 pm while I perform some maintenance and updates to improve performance and capacity. Web sites will be up most of the time.

July 28, 2009

Servers Down August 29th & 30th

The compute servers will be down Saturday evening & all day Sunday (Aug. 29, 30) because the campus air-conditioning system will be turned off. I hope to be able to leave the email/web/workstation server up, but they will be shut down if the room gets too hot.
Continue reading Servers Down August 29th & 30th.

July 16, 2009

Compute Servers Down During Power Outage

Contrary to my note yesterday, I will be shutting down the compute servers before the power outage (A/C will be off in the server room and we need to reduce the chance of over heating the room).

June 8, 2009

Workstation Interruption at Noon

The ms workstations will freeze up for about one minutes shortly afternoon while I make adjustments on the server. You should not need to reboot.

June 2, 2009

Workstation Access

Workstation access is still being restored; most will be ready by 10:00 am.

June 1, 2009

Partial Service Recovery; Some Data Lost

I have declared the second failed disk in the main data array officially dead after following a few false leads. Any mail received and any file changes between 4:30 am and 10:15 am are irrecoverably lost.

We are now running with the backup of the home folders on the fail-over file server (which is actually mathserv, the mail/web server).

Mail is flowing again as of 5:20 pm. Access to mail clients was opened at 5:30 pm.

Workstation access will be down until Tuesday morning.

Mail, web and workstation may be slow Tuesday while I get the main file server into full service.

Web Sites Still Up During Downtime

Note that all web sites are back up after a brief interruption. Web sites under home directories (e.g. www.math.mcmaster.ca/~moylek) are available read-only from the backup server and so cannot be modified.

Servers/Systems Down Monday

It appears fairly certain that the second disk really did fail before the replacement for the first failed disk could be built into the array. The file server, workstations and email will be down all day while I replace the disks and recover from files from backup.

Once recovery is under way, I will make a final attempt to recover data from the old disks which may mean that no data is lost. If that fails, any file changes or mail received between 4am and 10:15 am will be lost.

February 22, 2009

Server Reboot at 9 PM Sunday

I will be rebooting the primary file server at 9 PM Sunday in order to complete some software upgrades. Web, email and workstation access will be interrupted for ca. 10 minutes.

Everything Back on Line

All services are back on line as of 1:40 PM following the earlier crash. The was no file loss; most workstations will start working again without rebooting. Investigation continues.

File Server Problem Early Sunday Morning

The primary file server crashed early Sunday morning. I am going to keep the server off-line while I try to isolate and fix the problem. Mail, ssh and workstation logins will be down for the next few hours. Web access will stay up most of the time.

February 20, 2009

Post-Power Failure Problems.

I spoke too soon: the mail file server did not come up cleanly. I am working on it.

Possible Power Loss Friday Afternoon

Facility services plans to cut the emergency power this afternoon at ca. 3:00. If this cut affects the server room - which it should not, but that didn't help us this morning - then the compute/group servers will be shut down but the primary servers will stay up and there will be not effect on email, web or workstations.

Update: Power Loss Friday Morning

The power loss in the server room thing morning was related to the planned cut to emergency power in Hamilton Hall. But the server room battery-backup unit is meant to be on regular power, so this was not expected. Facility Services is investigating.

February 18, 2009

Problems with Off-Campus Access Resolved

We discovered at ca. 9 PM that mathserv was not accessible from off-campus (unless using VPN access) and had not been since ca. 2 PM. This was related to the network problem affecting workstations this morning and is fixed as of 9:30 PM.

Ok - We're Back Up (for real this time)

The primary server is OK again. If your workstation is still weird (no login, no mail, no icons), please reboot. It appears that a latent network problem which was around for a few days became quite actual (possibly triggered by the file server crash).

Mathserv Problems Continue - More Reboots Possible

Problems with mathserv persist. I may need to reboot again between now and 2 pm.

Server Reboot at 12:15 to Solve Scattered, Lingering Problems

I'm going to reboot mathserv at 12:15 today in order to resolve scattered workstation problems related to the file server crash early this morning. Web, email and workstations will be down for 10 - 15 minutes. Reboot your workstation if it doesn't start behaving properly by 12:30.

... and You Can Now Login, too.

While all systems were go at 7:30 today, I neglected to turn off the logon block until 7:50. So now you can login and get mail.

Systems Back up

The main file server crashed at 1:15 am today; web sites were up again at 6:30; all services are up again as of 7:30 am. Mail is catching up quickly; we ran for 15 seconds without a spam filter, so expect a burst; most workstations should work without rebooting.

File Server Problem Early Wednesday Morning

The main file server is down. I don't yet know why yet, except that it has nothing to do with the upgrades (since they are being done to a different server). But am going to look shortly.

February 16, 2009

Downtime for Upgrades During Reading Week - Correction

I'd quite forgotten that the first day of Reading Week is also Family Day, the downtime planned for Monday and Tuesday will in fact take place on Tuesday and Wednesday.

February 10, 2009

Server Reboot - Late but Successful

The planned server reboot happened at 5:20 rather than 5:00, but was quick and otherwise successful. Thanks for enduring yet another (brief) interruption.

Server Reboot this Afternoon

I will be rebooting the primary file server at 5 pm this afternoon in order to implement some OS updates. Email and workstations will be down for ca. ten minutes.

Downtime for Upgrades During Reading Week

I will be upgrading hardware and software on the primary department server over reading week. Most services will be up through the upgrades, but there will be periods of five to fifteen minutes on Monday afternoon and Tuesday afternoon when email is down and the workstations do not respond; there should be only very brief web interruptions.

Continue reading Downtime for Upgrades During Reading Week.

Server Crash Tuesday Morning

The primary file server crashed at 6 AM today and came back up after file-system repairs were complete at 9:50 AM. The primary web and mail server took collateral damage but was back up at 10:15, at which point all services were running again. Things will be slow for another couple of hours while file-system repairs are completed in the background and the mail queue filters 5000+ messages (mostly spam, of course).

January 19, 2009

OMG! What's with all of the downtime?

Funny you should ask; I was asking myself that very thing this weekend. Here's how I answered myself ....

Continue reading OMG! What's with all of the downtime?.

January 18, 2009

Server and Systems Working Again

Servers and systems are all responding again as of 3:00 pm Sunday following the hardware problem this morning. The disk responsible for failures today and Dec. 30th has been replaced.

Unexpected Downtime Sunday

I was unable to finish the preventative maintenance during the Friday evening and planned to reschedule for next week. The hardware in question decided otherwise and died this morning. All services should be up mid-afternoon; web sites are up as of 11:30.

January 17, 2009

Workstation Problem Saturday Morning

The workstations did not come up properly after the power outage and server work of Friday evening. Things were working again by noon.

January 16, 2009

Reminder: Shutdown at 4:30 pm Friday

The server and workstations are going down at 4:30 pm today. Web sites will remain readable through the upgrade (4:30 - 6:30) and power outage (6:00 - 9:00).

January 15, 2009

More About Server Disk Failure

A disk failed in one of the main file-server arrays this morning and until it is fixed the server will be under performance strain and sensitive to data loss. The files on that array are all to do with the linux workstations, not mail, web or home directories. Normally, we could fix such a failure by swapping out disks on the fly, but in this case the disk in question is the one the server boots from.

Server Disk Problem and Maintenance

A primary disk in one of the main file-server arrays failed Thursday morning. I will be shutting the file server down at 4:30 pm Friday (prior to the power outage) in order to replace the disk and perform some upgrades.

Electrical Shutdown in HH and BSB Friday Evening

Facility Services will cut power in HH and BSB between 6 pm and 9 pm Friday evening. In order to perform some server maintenance at the same time, I will be taking down the file server at 4:30 pm. Web sites hosted by the departmental server will be accessible during the downtime and power outage but not workstations, email or file-server access.

Continue reading Electrical Shutdown in HH and BSB Friday Evening.

January 12, 2009

Recovery Updates

There are some hangovers resulting from the overheating and shutdowns: some mail will have bounced back to senders; mail service will be slow while a backlog of mail (mostly spam) is processed; workstations will be slow while the file server corrects some disk errors caused by the heat-related crash.

Weekend Emergency Shutdown and Recovery

As of 9:50 am today, most systems are running again; web service was restored at 9:30. The servers were shutdown Saturday morning after an air-conditioning failure in the server room. We are investigating ways of preventing or ameliorating such problems.

December 18, 2008

Power Outage Thursday Evening

Facility Services has just announced that there will be a power outage this evening from 5 pm to 7 pm. The main servers and the network will remain up on backup/emergency power, so if you are outside of HH or BSB, web and email will work.

I will be shutting down non-essential servers and office Macintosh and linux systems at 4:50 pm.

Continue reading Power Outage Thursday Evening.

December 14, 2008

Workstations Up

The workstations are able to connect to the file server as of 4:30 pm. All major services are now fully operational as far as my testing shows. Things we be slow this evening while the file systems are being rebuilt, though. Send us email if you see any problems.

Mail Services Up

Mail services are back on line and ssh logins are no longer read-only. There will be a delay with workstation access while a file-system problem is corrected.

LIMITED ACCESS DURING UPGRADE

While the primary file server is being upgraded, the following are up:

  • most web sites;
  • READ-ONLY shell login to mathserv;
  • READ-ONLY Windows file sharing;
    Mail delivery, webmail, and imap/pop mail are down until the file server comes back up.

  • Servers Going Down at 1 PM

    The announced system downtime has been pushed forward a bit and the systems will go down at 1 pm. Web service will come back shortly thereafter and other systems about an hour later.

    December 12, 2008

    Extended Downtime for Server Upgrade Sunday Afternoon

    While all systems are down due to the network upgrade this Sunday I will be upgrading hardware and software on our primary file server. The file server, email access and workstations will remain down for about an hour after the network comes back up; most web sites will be accessible immediately.

    December 8, 2008

    Workstation Hiccoughs

    The ms-workstations went pretty much unresponsive for about two minutes mid-morning and for about ten minutes late this afternoon. These hiccoughs are related to the recent weekend crashes and my attempts to ameliorate things. You may see similar, brief problems again this week, though I am, of course, trying to keep interruptions to a minimum. Your patience as we try to sort out this server problem is appreciated.

    If your workstation stops responding or gives strange errors this week, please wait five minutes before rebooting - it will very likely come back to life with all applications and windows still open.

    December 7, 2008

    All Systems Go

    The workstations and mail are functional again as of 10:30 am (other services where up earlier or didn't go down at all).

    File Server Problem Early Sunday Morning

    Our primary file server face-planted early Sunday morning. Email is down but web service is restored as of 9:20 am. All services should be up by 10:00 am.
    Efforts to determine the elusive cause will be intensified this week.

    November 30, 2008

    Server Outage on Sunday

    The file server was down Sunday morning from 4:30 am to 10:00 am. Still investigating.

    November 28, 2008

    Mathserv Reboot at 2:30 This Afternoon

    I will be rebooting mathserv at 2:30 pm today to sort out some lingering problems. Web, email and workstations will be down for about ten minutes.

    Please don't reboot your workstations; they will freeze when the server goes down and should return to life when the server comes back up.

    Systems Down for 30 Minutes

    Systems were down for half an hour late this morning because of the servers seized up. Everything is back on line as of 11:52 and we are investigating.

    October 30, 2008

    Power Outage Thursday Night

    The power in Hamilton Hall and BSB will be out for about two hours starting at 11 pm tonight. I will be shutting down the compute servers and linux workstations at 10 pm. The workstations should boot up on their own when the power comes back on.

    Continue reading Power Outage Thursday Night.

    August 18, 2008

    Power Outage Wednesday Morning

    The power to Hamilton Hall and the Burke Science Building will be cut from 6:00 am to 7:30 am this Wednesday. We will shut down all RHPCS-managed linux and OS X workstations and all servers except for mathserv at 4:00 am. You should shut down - and perhaps unplug - your office systems if you manage your own.

    Another Power Outage Last Thursday

    Just for the record, there was a power outage at 7:30 AM on Thursday the 14th. Email was sent to department members Wednesday afternoon, which was as soon as we found out about it.

    August 11, 2008

    Power Outage Tomorrow - More Systems Going Down

    Contrary to my earlier announcement, I will be shutting down all servers except mathserv prior to the power shutdown tomorrow morning.

    August 8, 2008

    Computing Updates: Internet Outage; Power Outage; more ...

    This summary of recent postings to the Computing News blog was emailed to all department members this afternoon.

    Power Outage Tuesday Morning
    The power will be turned off in Hamilton Hall on Tuesday morning between 5 am and 7 am. Bluespruce and freesurface are not on battery backup right now and will be shut down at 4 am. We will shut linux workstations down remotely; you should turn off your office Windows or OS X workstation when you leave on Monday.

    Internet Outage Friday Night
    UTS advises us that the campus connection to the Internet will be down from Friday at 11pm until Saturday at 5am. Internal networks will still work, but external sites will be unavailable from campus, campus sites will be unavailable from the outside, and email will be essentially halted.

    AppleTalk Access to Mathserv Disabled
    AppleTalk access to mathserv is disabled because the protocol does not play well with NFS. You can use smb://mathserv to reach shares on mathserv from OS X.

    Departmental Computing Updates via RSS
    You can read the departmental computing updates in Apple Mail, Thunderbird and some other mail clients if you subscribe to the RSS feed at
    http://www.math.mcmaster.ca/blogs/computing_news/index.rdf

    If You are Leaving the Department Soon ...
    .... please have a look at the So Long, Farewell... page on the Computing Resources site.

    HH-403 Printer Update
    The HH-403 printer is still down and will remain so until I return from vacation next week; I've not yet isolated the source of the probem.

    Power Outage Tuesday Morning

    The power will be turned off in Hamilton Hall on Tuesday morning between 5 am and 7 am. Bluespruce and freesurface are not on battery backup right now and will be shut down at 4 am. We will shut linux workstations down remotely; you should turn off your office Windows or OS X workstation when you leave on Monday.

    Continue reading Power Outage Tuesday Morning.

    August 6, 2008

    Internet Outage Friday Night

    UTS advises us that the campus connection to the Internet will be down from Friday at 11pm until Saturday at 5am. Internal networks will still work, but external sites will be unavailable from campus, campus sites will be unavailable from the outside, and email will be essentially halted.

    July 30, 2008

    Campus Power Outage

    The power went out on campus from ca. 8:30 to 8:50 Wednesday evening. Mathserv and some of the compute servers coasted through on backup power; other systems and most workstations will need to be restarted on Thursday morning.

    July 21, 2008

    Scheduled Mathserv Downtime Wednesday Afternoon

    I will be shutting mathserv down at 4:45 on Wednesday afternoon for a memory upgrade; it should be back up by 5:15pm. Web, email and linux workstation access will be down for the duration.

    Monthly Archives

    About this Archive

    This page is an archive of recent entries in the Downtime category.

    Documentation Updates is the previous category.

    Email is the next category.

    Find recent content on the main index or look in the archives to find all content.