www.GBCH.net
Site policies
Incident history
Local time

Greg Black
  Weblog
  Software
  Papers
  Mailing lists

Incident history

Introduction

This section lists significant incidents in reverse chronological order. It does not attempt to reconstruct past history.

The expression DoS attack which appears in various places in the following material stands for denial of service attack. It describes any of a variety of attacks that work by consuming resources to a degree that renders the service in question either inaccessible or so slow as to be useless. E.g., an attack that sends meaningless data packets to a machine in such a volume that the pipe to the machine is saturated with the bogus data is one such attack. These attacks are becoming increasingly popular, perhaps because there is no real defence against them.

A note about times: In the history below, all times are expressed in local time. This is explained to some extent on the site home page.

The Incident History

ADSL service failure [2004-10-31 02:46 to 06:48]
Noted that we had no connectivity for four hours this morning. No explanation forthcoming from Pacific Internet and things magically went back to normal as we began to investigate.

Power interruption [2004-08-26 06:27 to 08:14]
After managing 205+ days of uptime, we had a local unexplained power outage for an hour and three-quarters.

ADSL service failure [2004-07-24 11:30 to 13:10]
Yet another failure of our ADSL service, neither acknowledged nor explained by our ISP.

ADSL service failures [2004-07-22 10:11 to 10:51 and 16:36 to 17:13]
Our ISP admitted to part of the first service interruption but not to the second; no explanation was offered for either.

Telstra line cut [2004-07-14 15:13 to 2004-07-15 14:11]
Telstra workers installing services to a new house a few doors away cut our ADSL line. It took several service calls to get them to fix it.

Power interruption [2004-01-30 16:25 to 2004-01-31 22:36]
Continuing storms brought down more local trees and our power yet again, this time for 30 hours.

Power interruption [2004-01-28 17:14 to 2004-01-29 15:18]
Widespread power outages across Brisbane on the fourth successive day of storms left us without power for 22 hours.

Power interruption [2003-12-13 21:39 to 22:31]
Yet another power outage in the Indooroopilly and St Lucia area. This privatisation of services really is a wondrous thing.

Power interruption [2003-12-11 15:46 to 17:32]
A hot day, so all the air conditioners were on; so the new fuse in the street pole (replaced after the tree falling incident on 29 October) failed. When the Energex technician finally arrived, he expressed surprise that the previous people had installed a 45 amp fuse; and he replaced it with a 60 amp fuse. Let's hope that one lasts the summer.

ADSL failure - update [2003-12-10 11:17, 17:21]
In the absence of any information from Pacific Internet or Telstra, decided to just try the ADSL line this morning; it worked and has been up since 11:17 (it's now 18:39). At 17:21, Pacific internet rang to say that Telstra had reset the line (didn't they claim to have done that yesterday?) and that they believed it was OK. I pointed out that I'd been using it for about 6 hours and they suggested that they could close the ticket. I guess we're done with this, for now. Who knows what the problem was, or if it was fixed.

ADSL failure [2003-12-09 01:30 to 06:46]
Discovered that ppp was continually retrying on the ADSL line; setup connection via dialup modem. Pacific Internet say the fault is with Telstra; Telstra say they reset the line with no effect and will have to deploy a "specialist". We have been given until close of business tomorrow for resolution. No idea at all if/when this will be fixed.

Install a new kernel [2003-11-25 15:48 to 15:52]
Unfortunately, the previous reboot was not entirely successful, as the kernel on the disk we had installed had been built for 686-class machines and would not boot on a Pentium. Fortunately, there was a GENERIC kernel on the disk that allowed us to boot; but it had way too much stuff in it to be a viable thing for such a low end machine. So, over almost an hour we built a new kernel with 586-class support and then shut down again for a few minutes to reboot from that kernel.

More hardware modifications [2003-11-25 14:04 to 14:56]
Since the previous effort this morning was a failure, we have now put the disks from our gateway box into an older machine as a temporary measure. This means that the gateway is now running on a Pentium-100 instead of a Celeron-366, but it still has 384 MBytes of memory and will be able to limp along for a while. This outage was to allow both boxes to be shutdown and various disks and some other minor components to be swapped around. The necessary data was copied to the disks while the systems were running normally.

Hardware modifications [2003-11-25 11:12 to 11:22]
Brief shutdown to allow installation of a supported video card in the temporary hardware - its own card is too new for the OS release in use; but it turns out that the older, supported, AGP card from the original machine cannot be fitted to a current AGP slot. Whatever are these people thinking of when they design these things? Must be money.

Disk failure [2003-11-21 04:20 to 20:15]
Another recent disk (Maxtor, manufactured 03/2003) failed in the early hours this morning, causing incoming email to be rejected until the system was shutdown at 06:41. New Seagate disk purchased but it was dead on arrival, although it gave bizarre symptoms causing two hours to be wasted as it was tested in a variety of machines. Second replacement disk was installed in the gateway system, but it had developed other faults as well - multiple hardware failures in one box are difficult to diagnose quickly. Eventually, the two disks were installed in another machine (which is now languishing without its real disks) so that we could get back on the Net. The task of restoring several GB from two tapes took a couple of extra hours. There will be further (hopefully brief) interruptions to service in the next few days when we build a temporary gateway, get it running, rebuild the cannibalised system and eventually revert to a new gateway box.

Storm damage [2003-10-29 11:11 to 14:33]
Six months ago, an arborist told us the large palm tree at the front of our house would still be standing long after the house fell down. As of today, the house is about 88 years old; the palm tree was about 45 years old. We had a huge electrical storm last night; today we had strong gusty winds. At 11:05, the palm tree snapped off and ripped out the power lines to the house as it dived across the front fence and into the street. Good work by various emergency services people saw us back on the air within three hours. There was a further slight delay in getting back on the Net due to my need to get some cold water and air into me after chasing around to get the power back.

Power interruption [2003-09-19 17:24 to 18:05]
Yet another major power outage in Indooroopilly.

Power interruption [2003-09-13 15:19 to 15:51]
Major power outage in Indooroopilly.

Telstra ADSL failure [2003-09-12 04:29 to 15:15]
We lost our ADSL link at 04:29, due to some undeclared problems at Telstra. We switched to a 28k modem backup link at 09:42 and left that up until we discovered (by accident, as there was no advice from any of the responsible parties) that the ADSL service was working at 15:15.

Power interruption [2003-08-24 07:00 to 10:10]
Power outage in our street when a fuse failed in a street transformer; for some reason, it took almost 3 hours to get workers to the site for a five minute fix.

Power interruption [2003-08-11 19:28 to 20:52]
Major power outage in Indooroopilly and St Lucia area.

Power interruption [2003-07-01 21:14 to 22:43]
Major power outage in Indooroopilly and St Lucia area.

Replacement UPS [2003-06-22 13:26 to 13:29]
Our damaged UPS was finally returned yesterday after 2 months away for service. It seems that the people in The Philippines who were supposed to provide components managed to supply the wrong parts three times. We kept the outage as short as possible while switching out the loan UPS and restoring our unit. With a bit of luck, we'll be OK for a while now.

Disk failure [2003-05-17 03:30 to 15:02]
Our newest (and therefore most important) disk failed catastrophically and without any warning some time between 03:15 and 03:30 this morning. It has been returned for warranty replacement, but this is expected to take up to one month (thanks Seagate for the great service). We purchased a replacement disk and installed it around 11:00, but it took several more hours to restore what we could from backup tapes. Unfortunately, 20 years of reliable disks had got us into a habit of not backing up a lot of bulky data that seemed to be not quite essential and so we've lost 230,000 files (about 7 GB) and will be wasting plenty of time over the next few months recovering from that. A better backup plan will result, although we hope it won't need to show its worth for another 20 years. We were off the Net for almost 12 hours and no doubt some incoming email bounced if it was on servers with a short fuse, for which our apologies.

System maintenance [2003-05-16 17:16 to 17:40]
Our UPS repair people have now admitted that they're waiting on parts from The Philippines and offered us a loan UPS. While we went through the mandatory period off-line to hook up the UPS, we took the opportunity to do full memory checks and fsck runs on each system. The gateway box has over 500,000 files, so the fsck run took some time. There will be another brief outage when the UPS is removed for return to the repair place and again when we return our UPS to service.

Power interruption [2003-05-02 12:05 to 12:09]
Brief power outage in the local area; had the UPS been on-line, we would have kept running, but without it we could not.

System maintenance [2003-04-22 16:58 to 17:46]
We shut the network down for a short time this afternoon in order to remove the UPS for service and to re-route many cables. There will be a further (very brief) outage when the UPS returns.

Power interruption [2003-04-03 17:44 to 18:10]
Power went out in our area during a brief but severe electrical storm.

Power interruption [2002-12-24 19:02 to 22:49]
Power went out in our area during a severe electrical storm which appears to have put the Tennyson power station off the air for some hours.

System upgrade [2002-07-03 17:35 to 18:15]
We were off-line for 40 minutes while we transferred several Gbytes of data between two disks and swapped them between machines. There will be a further brief outage sometime soon when we switch our gateway machine over to a different box, but we expect that disruption to be minimal.

Network outage [2002-06-07 23:20 to 00:45]
Pacific Internet experienced some unexplained network connectivity problems for Brisbane ADSL customers.

Network outage [2002-02-20 08:45 to 09:15]
Unexplained outage on ADSL line.

Power interruption [2002-01-09 13:47-17:52]
Power went out in our area for several hours this afternoon.

Network packet loss [2002-01-02 to 2002-01-09]
For the past week, we have been experiencing packet loss in common with all Pacific Internet's Brisbane customers. They claim the problem is now cured (albeit without revealing what the problem was), and it seems to us that line quality is back to normal.

Power interruption [2001-12-04 15:30-15:52]
Off line briefly while electrical contractors added a new circuit for our air conditioning.

Network outage [2001-11-27 11:50 to 12:10]
Our network connectivity was severely broken for about 20 minutes today. One of Pacific Internet's links went down (reason not known) and was replaced by an alternate.

Switched to ADSL connection [2001-11-26 15:10]
Returned from long weekend to discover that we had sync on the ADSL modem; brought up the link and it seems to be running OK.

Wildlife on power lines [2001-11-21 14:35-17:50]
The power went off across a wide area of Brisbane just after we solved the previous problem this afternoon. Initially, Energex had no idea about either the reason for the outage or when service would be restored. At 16:30, they said the problem was "wildlife on power lines" and service would be restored by 17:00. At 17:15, they claimed service would be back at 18:00. It was in fact restored at 17:51.

Kernel panic problem fixed [2001-11-21 14:00]
It turns out that there's a bug in FreeBSD releases prior to 4.3 in which things get confused if netgraph(4) is used on an interface that is not "up". Release 4.3 does not exhibit the problem and 4.2 can work around it by ensuring the interface is up before doing anything.

Downtime with kernel panics [2001-11-20 13:00-16:00]
I've finally managed to crash a FreeBSD kernel (several times). For reasons which I haven't understood at all (yet), the addition of the netgraph(4) module to the kernel (required for PPPoE which is required for the ADSL modem) has caused it to crash every time ppp was started with the PPPoE profile. Each crash was different and none of them provided any real help. There's more to be done on this.

Failed ADSL connection [2001-11-16]
The tech arrived with the new ADSL modem and installed it but failed to get it working because some inept person in the chain provided the wrong phone number to Telstra who duly did the ADSL setup on the wrong line. Now we wait until some unknown future date for another attempt.

Network outage [2001-11-15 12:30 to 13:15]
Our network connectivity was severely broken for some unexplained reason for about 45 minutes today. The rest of the world (or some part thereof) seems to have been able to get to us, but we could not get out.

Local network upgrade [2001-11-11]
We had intermittent interruptions to service during the day today while we added a few bits of new hardware to some of the local machines and changed our LAN over from the 10 Mbps half-duplex setup we've been running for several years to a 100 Mbps full-duplex network. We're now ready, at least from the hardware perspective, for our ADSL connection, but we still don't know when it will be installed.

Preparation for ADSL connection [2001-11-07]
Rebooted several times during the day between 06:00 and 14:15 while we installed and tested the additional NIC that will be connected to the ADSL line shortly. Also noted late in the day that all the DNS delegations we have been waiting on have now been effected; and the reverse lookups have now been set up. There are still a few minor bits of DNS housekeeping to be done, but it's all under our control now and should not provide any surprises along the road.

Switch to Pacific Internet [2001-11-04 02:00]
We're now running via a 33.6k modem link to Pacific Internet after several hours of switching back and forward between them and the old AsiaOnline service while we got our Melbourne clients moved without having anybody on site. Quite an adventure: it seems to have gone well, but there are still many details to be concluded.

As part of this project, we have ordered a 256k/64k ADSL line and we expect to have this installed some time in the next 1 to 3 weeks. Watch this space for further news.

Power station explosion and fire [2001-11-02 15:15-16:32]
We were off the air for an hour and a quarter because of an explosion and subsequent fire in a local power station. Since our nice long uptimes have now been clobbered, we'll have a few more short outages over the next few days while we experiment with some new hardware and with connections to our new ISP. More on that as it happens.

AsiaOnline goes under [2001-10-26 17:43]
We received a notice from AsiaOnline (our current ISP) with the following statement: Asia Online's Australian subsidiaries were placed in voluntary administration on Wednesday 24th October 2001. It would have been nice to have been notified sooner, rather than waiting until the end of the work week. We are now negotiating for a new service with another ISP. This will involve various disruptions, including a new IP address--so there will be some problems with access until things settle down. Apologies to all who are affected by this. As part of our immediate response, we have significantly shortened the TTLs on all our DNS records until we complete the switch.

Subscribed to MAPS RBL+ service [2001-07-31]
This is not an incident per se, but seems relevant in the light of our RBL problems in June. We have now become full subscribers to the new MAPS RBL+ service, so all incoming email is checked against the three MAPS databases in a single DNS query instead of the three separate queries that we used to have to make.

DoS attack [2001-07-18 19:40 to 2001-07-19 15:40]
When it became clear that this ping flood was not going to stop, we dropped our Internet connection at 21:54 (having received 6294 ICMP packets of 1052 bytes from 193.226.179.169. We made several brief connections over the following 17 hours to allow mail in and out and eventually persuaded our ISP to block the offender, which allowed us back on the Net. No email should have been lost.

DoS attack [2001-07-09 22:45 to 2001-07-11 16:00]
The exact start time of this attack is unclear, but it was somewhere between 2100 and 2200. It was noticed at 2245 and the link was dropped until we had evidence that it had petered out. For most of the time, it completely saturated our modem, making it pointless to maintain our connection. We were able to restore service before mail started bouncing from our backup MX host, so no mail was lost although many people will have been irritated by sendmail `warnings' once their messages had been in the queue for four hours. And our HTTP server was inaccessible for about 42 hours. Apologies to everybody who was affected.

Memory upgrade 2001-06-28 15:56-16:59[]
Although it's less than two years old, the memory that is specified for this box is no longer available and our vendor was unwilling to just sell us additional memory to install; instead he wanted to see the box and install and test the memory himself. As it happened, the 256 Mbyte stick we tried did not work; but a pair of 128 Mbyte sticks appear to be doing the right thing. So we now have 384 Mbytes of memory.

Bogus UDP packet attack [2001-06-17 02:09 to 2001-06-20 11:35]
During this period of 81hrs 26min, we were receiving bogus UDP packets addressed to random ports with no listeners, purporting to come from 193.231.202.166 and arriving at the rate of about 1 packet/second with a volume of about 5 Mbytes/hour (almost half the bandwidth of our modem). Despite repeated requests to our ISP to block this traffic, and very helpful responses from one of the ISP's staff, no blocking was provided. The perpetrator seems to have lost interest or perhaps was caught somewhere upstream. We were disconnected from the Internet for several hours on 19 June, until our ISP agreed not to bill us for the bogus traffic. We made connections every 3 hours for long enough to allow email to come in from our secondary MX, and we don't think any email was lost. Of course, people who wanted to access our HTTP server or our GNATS server were out of luck for much of that day. Apologies to all.

RBL lookup failures [2001-06-12 10:45 to 2001-06-13 02:15]
After rejecting over 1,000 incoming email messages with `451 temporary RBL lookup error', we turned off RBL checks for the time being. The reason for the problem has not been made clear. It caused a substantial amount of incoming email to be delayed by up to 15.5 hours, but nothing should have permanently bounced in that time.

UPS monitor tests [2001-05-29 15:15-15:45]
Rebooted twice after simulating power failures to test the latest revision of upsd, our internal UPS monitoring software.

Routing problem [2001-05-25 10:00-15:45]
Our 203.9.155.248/29 addresses were dropped from our ISP's routing tables (again). Since most services travel via 203.24.22.66, this had a limited effect on services to outside (except those looking for gba.oz.au hosts); but it prevented several internal services from working.

Operating system upgrade [2001-05-22 07:10-07:18]
Added some security patches to FreeBSD-4.2-Release and took the opportunity to do a full fsck for the first time since February.

DNS problems [2001-04-21 to 2001-04-23]
Our ISP was playing around with their DNS servers over the weekend and managed to turn one of them (that provided reverse lookups for some of our IP addresses) into a lame server for about 48 hours; they also just stopped providing reverse lookups for any IPs in another block. They refuse to delegate these addresses to us and seem unable to provide consistent working service. This makes life difficult, as we don't see the problems immediately because our systems use our name servers for everything and they always provide correct information. We have no problem with name-based lookups, because we are the owners of gbch.net and can always provide correct DNS service.

Service interruption [2001-04-12 00:17-06:27]
Due to a failure by our ISP's equipment to notice that our line had dropped out at 00:17, it rejected our attempts to reconnect. This was discovered after 285 wasted calls (cost: $52.70). The ISP's NOC chief claimed that they could not fix the problem until their NZ office opened at 08:30 NZ time. When the NZ office called me, it took less than one minute to fix the problem, but we were off the air for over 6 hours, making a total of 8 hours 37 minutes lost in the past day.

Service interruption [2001-04-11 14:18-16:45]
We lost connectivity for two and a half hours due to a cut in an Optus fibre-optic cable. It took two hours before the ISP's NOC could even tell us what the problem was.

Operating system upgrade [2001-03-14 17:25-22:45]
Upgraded operating system to FreeBSD-4.2-Release with several security and performance patches; took the opportunity to reorganise the server to take advantage of last month's disk upgrade, with approximately half a million files moving to new locations. This may lead to surprises, so be sure to let us know if things seem to have broken.

Power interruption [2001-03-08 14:59-15:09]
Off line for ten minutes while electrical contractors added a new circuit for our air conditioning. This was just outside the capacity of our UPS.

Routing problem [2001-03-03 02:00 to 2001-03-05 16:00]
Another glitch in our connectivity to the world: a routing problem caused by some broken filters installed by our ISP prevented us from getting packets back onto the Net. Impact: incoming email was delayed by up to 72 hours. Comment: the takeover of our ISP by a multinational group has had a negative impact on the quality of service. The technical manager has now assured us that they will act promptly if there are any future problems.

Routing problem [2001-02-28 01:00-07:00]
Something went awry with our connectivity to the world during the early hours of the morning; the problem had corrected itself by the time we started for the day. Mail deliveries were delayed, but there is no sign that anything was lost. [Footnote: the subsequent routing mistake on 2001-03-03 to 2001-03-05 (see above) makes it pretty clear that this event was also a mistake by the ISP as all symptoms were identical.]

Hardware repair/upgrade [2001-02-13]
Power supply replaced; disk capacity increased to 24 Gbytes. Off-line for approximately two hours while we were trying to get the second disk recognised -- why does the BIOS have to have two places where you have to enable the second IDE channel?

Start [2001-02-13]
Start of incident record.


Copyright © 2001, 2002, 2003, 2004 Greg Black -- All Rights Reserved
Questions and comments about this page to webmaster@gbch.net
$Id: incidents.html 2.36 2004-08-31 15:12:38+10 gjb Exp gjb $

Back to top