Incident history
Introduction
This section lists significant incidents in reverse
chronological order.
It does not attempt to reconstruct past history.
The expression DoS attack which appears in various places in
the following material stands for denial of service attack. It
describes any of a variety of attacks that work by consuming resources
to a degree that renders the service in question either inaccessible or
so slow as to be useless. E.g., an attack that sends meaningless data
packets to a machine in such a volume that the pipe to the machine is
saturated with the bogus data is one such attack. These attacks are
becoming increasingly popular, perhaps because there is no real defence
against them.
A note about times: In the history below, all times are
expressed in local time. This is explained to some extent on the site
home page.
The Incident History
ADSL service failure [2004-10-31 02:46 to 06:48]
Noted that we had no connectivity for four hours this morning. No
explanation forthcoming from Pacific Internet and things magically went
back to normal as we began to investigate.
Power interruption [2004-08-26 06:27 to 08:14]
After managing 205+ days of uptime, we had a local unexplained power
outage for an hour and three-quarters.
ADSL service failure [2004-07-24 11:30 to 13:10]
Yet another failure of our ADSL service, neither acknowledged nor
explained by our ISP.
ADSL service failures [2004-07-22 10:11 to 10:51 and 16:36 to 17:13]
Our ISP admitted to part of the first service interruption but not to
the second; no explanation was offered for either.
Telstra line cut [2004-07-14 15:13 to 2004-07-15 14:11]
Telstra workers installing services to a new house a few doors away cut
our ADSL line. It took several service calls to get them to fix it.
Power interruption [2004-01-30 16:25 to 2004-01-31 22:36]
Continuing storms brought down more local trees and our power yet again,
this time for 30 hours.
Power interruption [2004-01-28 17:14 to 2004-01-29 15:18]
Widespread power outages across Brisbane on the fourth successive day of
storms left us without power for 22 hours.
Power interruption [2003-12-13 21:39 to 22:31]
Yet another power outage in the Indooroopilly and St Lucia area. This
privatisation of services really is a wondrous thing.
Power interruption [2003-12-11 15:46 to 17:32]
A hot day, so all the air conditioners were on; so the new fuse in the
street pole (replaced after the tree falling incident on 29 October)
failed. When the Energex technician finally arrived, he expressed
surprise that the previous people had installed a 45 amp fuse; and he
replaced it with a 60 amp fuse. Let's hope that one lasts the
summer.
ADSL failure - update [2003-12-10 11:17, 17:21]
In the absence of any information from Pacific Internet or Telstra,
decided to just try the ADSL line this morning; it worked and has been
up since 11:17 (it's now 18:39). At 17:21, Pacific internet rang to say
that Telstra had reset the line (didn't they claim to have done that
yesterday?) and that they believed it was OK. I pointed out that I'd
been using it for about 6 hours and they suggested that they could close
the ticket. I guess we're done with this, for now. Who knows what the
problem was, or if it was fixed.
ADSL failure [2003-12-09 01:30 to 06:46]
Discovered that ppp was continually retrying on the ADSL line; setup
connection via dialup modem. Pacific Internet say the fault is with
Telstra; Telstra say they reset the line with no effect and will have to
deploy a "specialist". We have been given until close of business
tomorrow for resolution. No idea at all if/when this will be fixed.
Install a new kernel [2003-11-25 15:48 to 15:52]
Unfortunately, the previous reboot was not entirely successful, as the
kernel on the disk we had installed had been built for 686-class
machines and would not boot on a Pentium. Fortunately, there was a
GENERIC kernel on the disk that allowed us to boot; but it had way too
much stuff in it to be a viable thing for such a low end machine. So,
over almost an hour we built a new kernel with 586-class support and
then shut down again for a few minutes to reboot from that kernel.
More hardware modifications [2003-11-25 14:04 to 14:56]
Since the previous effort this morning was a failure, we have now put
the disks from our gateway box into an older machine as a temporary
measure. This means that the gateway is now running on a Pentium-100
instead of a Celeron-366, but it still has 384 MBytes of memory and will
be able to limp along for a while. This outage was to allow both boxes
to be shutdown and various disks and some other minor components to be
swapped around. The necessary data was copied to the disks while the
systems were running normally.
Hardware modifications [2003-11-25 11:12 to 11:22]
Brief shutdown to allow installation of a supported video card in the
temporary hardware - its own card is too new for the OS release in use;
but it turns out that the older, supported, AGP card from the original
machine cannot be fitted to a current AGP slot. Whatever are these
people thinking of when they design these things? Must be money.
Disk failure [2003-11-21 04:20 to 20:15]
Another recent disk (Maxtor, manufactured 03/2003) failed in the early
hours this morning, causing incoming email to be rejected until the
system was shutdown at 06:41. New Seagate disk purchased but it was
dead on arrival, although it gave bizarre symptoms causing two hours to
be wasted as it was tested in a variety of machines. Second replacement
disk was installed in the gateway system, but it had developed other
faults as well - multiple hardware failures in one box are difficult to
diagnose quickly. Eventually, the two disks were installed in another
machine (which is now languishing without its real disks) so that we
could get back on the Net. The task of restoring several GB from two
tapes took a couple of extra hours. There will be further (hopefully
brief) interruptions to service in the next few days when we build a
temporary gateway, get it running, rebuild the cannibalised system and
eventually revert to a new gateway box.
Storm damage [2003-10-29 11:11 to 14:33]
Six months ago, an arborist told us the large palm tree at the front of
our house would still be standing long after the house fell down. As of
today, the house is about 88 years old; the palm tree was about 45 years
old. We had a huge electrical storm last night; today we had strong
gusty winds. At 11:05, the palm tree snapped off and ripped out the
power lines to the house as it dived across the front fence and into the
street. Good work by various emergency services people saw us back on
the air within three hours. There was a further slight delay in getting
back on the Net due to my need to get some cold water and air into me
after chasing around to get the power back.
Power interruption [2003-09-19 17:24 to 18:05]
Yet another major power outage in Indooroopilly.
Power interruption [2003-09-13 15:19 to 15:51]
Major power outage in Indooroopilly.
Telstra ADSL failure [2003-09-12 04:29 to 15:15]
We lost our ADSL link at 04:29, due to some undeclared problems at
Telstra. We switched to a 28k modem backup link at 09:42 and left that
up until we discovered (by accident, as there was no advice from any of
the responsible parties) that the ADSL service was working at 15:15.
Power interruption [2003-08-24 07:00 to 10:10]
Power outage in our street when a fuse failed in a street transformer;
for some reason, it took almost 3 hours to get workers to the site for a
five minute fix.
Power interruption [2003-08-11 19:28 to 20:52]
Major power outage in Indooroopilly and St Lucia area.
Power interruption [2003-07-01 21:14 to 22:43]
Major power outage in Indooroopilly and St Lucia area.
Replacement UPS [2003-06-22 13:26 to 13:29]
Our damaged UPS was finally returned yesterday after 2 months away for
service.
It seems that the people in The Philippines who were supposed to provide
components managed to supply the wrong parts three times.
We kept the outage as short as possible while switching out the loan UPS
and restoring our unit.
With a bit of luck, we'll be OK for a while now.
Disk failure [2003-05-17 03:30 to 15:02]
Our newest (and therefore most important) disk failed catastrophically
and without any warning some time between 03:15 and 03:30 this morning.
It has been returned for warranty replacement, but this is expected to
take up to one month (thanks Seagate for the great service).
We purchased a replacement disk and installed it around 11:00, but it
took several more hours to restore what we could from backup tapes.
Unfortunately, 20 years of reliable disks had got us into a habit of not
backing up a lot of bulky data that seemed to be not quite essential and
so we've lost 230,000 files (about 7 GB) and will be wasting plenty of
time over the next few months recovering from that.
A better backup plan will result, although we hope it won't need to show
its worth for another 20 years.
We were off the Net for almost 12 hours and no doubt some incoming email
bounced if it was on servers with a short fuse, for which our
apologies.
System maintenance [2003-05-16 17:16 to 17:40]
Our UPS repair people have now admitted that they're waiting on parts
from The Philippines and offered us a loan UPS.
While we went through the mandatory period off-line to hook up the UPS,
we took the opportunity to do full memory checks and fsck runs on
each system.
The gateway box has over 500,000 files, so the fsck run took some
time.
There will be another brief outage when the UPS is removed for return to
the repair place and again when we return our UPS to service.
Power interruption [2003-05-02 12:05 to 12:09]
Brief power outage in the local area; had the UPS been on-line, we would
have kept running, but without it we could not.
System maintenance [2003-04-22 16:58 to 17:46]
We shut the network down for a short time this afternoon in order to
remove the UPS for service and to re-route many cables.
There will be a further (very brief) outage when the UPS returns.
Power interruption [2003-04-03 17:44 to 18:10]
Power went out in our area during a brief but severe electrical
storm.
Power interruption [2002-12-24 19:02 to 22:49]
Power went out in our area during a severe electrical storm which
appears to have put the Tennyson power station off the air for some
hours.
System upgrade [2002-07-03 17:35 to 18:15]
We were off-line for 40 minutes while we transferred several Gbytes of
data between two disks and swapped them between machines.
There will be a further brief outage sometime soon when we switch our
gateway machine over to a different box, but we expect that disruption
to be minimal.
Network outage [2002-06-07 23:20 to 00:45]
Pacific Internet experienced some unexplained network connectivity
problems for Brisbane ADSL customers.
Network outage [2002-02-20 08:45 to 09:15]
Unexplained outage on ADSL line.
Power interruption [2002-01-09 13:47-17:52]
Power went out in our area for several hours this afternoon.
Network packet loss [2002-01-02 to 2002-01-09]
For the past week, we have been experiencing packet loss in common with
all Pacific Internet's Brisbane customers. They claim the problem is
now cured (albeit without revealing what the problem was), and it
seems to us that line quality is back to normal.
Power interruption [2001-12-04 15:30-15:52]
Off line briefly while electrical contractors added a new circuit for
our air conditioning.
Network outage [2001-11-27 11:50 to 12:10]
Our network connectivity was severely broken for about 20 minutes
today. One of Pacific Internet's links went down (reason not known) and
was replaced by an alternate.
Switched to ADSL connection [2001-11-26 15:10]
Returned from long weekend to discover that we had sync on the ADSL
modem; brought up the link and it seems to be running OK.
Wildlife on power lines [2001-11-21 14:35-17:50]
The power went off across a wide area of Brisbane just after we solved
the previous problem this afternoon. Initially, Energex had no idea
about either the reason for the outage or when service would be
restored. At 16:30, they said the problem was "wildlife on power lines"
and service would be restored by 17:00. At 17:15, they claimed service
would be back at 18:00. It was in fact restored at 17:51.
Kernel panic problem fixed [2001-11-21 14:00]
It turns out that there's a bug in FreeBSD releases prior to 4.3 in
which things get confused if netgraph(4) is used on an interface that is
not "up". Release 4.3 does not exhibit the problem and 4.2 can work
around it by ensuring the interface is up before doing anything.
Downtime with kernel panics [2001-11-20 13:00-16:00]
I've finally managed to crash a FreeBSD kernel (several times). For
reasons which I haven't understood at all (yet), the addition of the
netgraph(4) module to the kernel (required for PPPoE which is required
for the ADSL modem) has caused it to crash every time ppp was started
with the PPPoE profile. Each crash was different and none of them
provided any real help. There's more to be done on this.
Failed ADSL connection [2001-11-16]
The tech arrived with the new ADSL modem and installed it but failed to
get it working because some inept person in the chain provided the wrong
phone number to Telstra who duly did the ADSL setup on the wrong line.
Now we wait until some unknown future date for another attempt.
Network outage [2001-11-15 12:30 to 13:15]
Our network connectivity was severely broken for some unexplained reason
for about 45 minutes today. The rest of the world (or some part
thereof) seems to have been able to get to us, but we could not get
out.
Local network upgrade [2001-11-11]
We had intermittent interruptions to service during the day today while
we added a few bits of new hardware to some of the local machines and
changed our LAN over from the 10 Mbps half-duplex setup we've been
running for several years to a 100 Mbps full-duplex network. We're now
ready, at least from the hardware perspective, for our ADSL connection,
but we still don't know when it will be installed.
Preparation for ADSL connection [2001-11-07]
Rebooted several times during the day between 06:00 and 14:15 while we
installed and tested the additional NIC that will be connected to the
ADSL line shortly. Also noted late in the day that all the DNS
delegations we have been waiting on have now been effected; and the
reverse lookups have now been set up. There are still a few minor bits
of DNS housekeeping to be done, but it's all under our control now and
should not provide any surprises along the road.
Switch to Pacific Internet [2001-11-04 02:00]
We're now running via a 33.6k modem link to Pacific Internet after
several hours of switching back and forward between them and the old
AsiaOnline service while we got our Melbourne clients moved without
having anybody on site. Quite an adventure: it seems to have gone
well, but there are still many details to be concluded.
As part of this project, we have ordered a 256k/64k ADSL line and we
expect to have this installed some time in the next 1 to 3 weeks. Watch
this space for further news.
Power station explosion and fire [2001-11-02 15:15-16:32]
We were off the air for an hour and a quarter because of an explosion
and subsequent fire in a local power station. Since our nice long
uptimes have now been clobbered, we'll have a few more short outages
over the next few days while we experiment with some new hardware and
with connections to our new ISP. More on that as it happens.
AsiaOnline goes under [2001-10-26 17:43]
We received a notice from AsiaOnline (our current ISP) with the
following statement: Asia Online's Australian subsidiaries were
placed in voluntary administration on Wednesday 24th October 2001.
It would have been nice to have been notified sooner, rather than
waiting until the end of the work week. We are now negotiating for a
new service with another ISP. This will involve various disruptions,
including a new IP address--so there will be some problems with access
until things settle down. Apologies to all who are affected by this.
As part of our immediate response, we have significantly shortened the
TTLs on all our DNS records until we complete the switch.
Subscribed to MAPS RBL+ service [2001-07-31]
This is not an incident per se, but seems relevant in the light
of our RBL problems in June. We have now become full subscribers to the
new MAPS RBL+ service, so all incoming email is checked against the
three MAPS databases in a single DNS query instead of the three separate
queries that we used to have to make.
DoS attack [2001-07-18 19:40 to 2001-07-19 15:40]
When it became clear that this ping flood was not going to stop, we
dropped our Internet connection at 21:54 (having received 6294 ICMP
packets of 1052 bytes from 193.226.179.169. We made several brief
connections over the following 17 hours to allow mail in and out and
eventually persuaded our ISP to block the offender, which allowed us
back on the Net. No email should have been lost.
DoS attack [2001-07-09 22:45 to 2001-07-11 16:00]
The exact start time of this attack is unclear, but it was somewhere
between 2100 and 2200. It was noticed at 2245 and the link was dropped
until we had evidence that it had petered out. For most of the time, it
completely saturated our modem, making it pointless to maintain our
connection. We were able to restore service before mail started
bouncing from our backup MX host, so no mail was lost although many
people will have been irritated by sendmail `warnings' once their
messages had been in the queue for four hours. And our HTTP server was
inaccessible for about 42 hours. Apologies to everybody who was
affected.
Memory upgrade 2001-06-28 15:56-16:59[]
Although it's less than two years old, the memory that is specified for
this box is no longer available and our vendor was unwilling to just
sell us additional memory to install; instead he wanted to see the box
and install and test the memory himself. As it happened, the 256 Mbyte
stick we tried did not work; but a pair of 128 Mbyte sticks appear to be
doing the right thing. So we now have 384 Mbytes of memory.
Bogus UDP packet attack [2001-06-17 02:09 to 2001-06-20 11:35]
During this period of 81hrs 26min, we were receiving bogus UDP packets
addressed to random ports with no listeners, purporting to come from
193.231.202.166 and arriving at the rate of about 1 packet/second with a
volume of about 5 Mbytes/hour (almost half the bandwidth of our modem).
Despite repeated requests to our ISP to block this traffic, and very
helpful responses from one of the ISP's staff, no blocking was provided.
The perpetrator seems to have lost interest or perhaps was caught
somewhere upstream. We were disconnected from the Internet for several
hours on 19 June, until our ISP agreed not to bill us for the bogus
traffic. We made connections every 3 hours for long enough to allow
email to come in from our secondary MX, and we don't think any email was
lost. Of course, people who wanted to access our HTTP server or our
GNATS server were out of luck for much of that day. Apologies to all.
RBL lookup failures [2001-06-12 10:45 to 2001-06-13 02:15]
After rejecting over 1,000 incoming email messages with `451 temporary
RBL lookup error', we turned off RBL checks for the time being. The
reason for the problem has not been made clear. It caused a substantial
amount of incoming email to be delayed by up to 15.5 hours, but nothing
should have permanently bounced in that time.
UPS monitor tests [2001-05-29 15:15-15:45]
Rebooted twice after simulating power failures to test the latest
revision of upsd, our internal UPS
monitoring software.
Routing problem [2001-05-25 10:00-15:45]
Our 203.9.155.248/29 addresses were dropped from our ISP's routing
tables (again). Since most services travel via 203.24.22.66, this had a
limited effect on services to outside (except those looking for
gba.oz.au hosts); but it prevented several internal services from
working.
Operating system upgrade [2001-05-22 07:10-07:18]
Added some security patches to FreeBSD-4.2-Release and took the
opportunity to do a full fsck for the first time since February.
DNS problems [2001-04-21 to 2001-04-23]
Our ISP was playing around with their DNS servers over the weekend and
managed to turn one of them (that provided reverse lookups for some of
our IP addresses) into a lame server for about 48 hours; they also just
stopped providing reverse lookups for any IPs in another block. They
refuse to delegate these addresses to us and seem unable to provide
consistent working service. This makes life difficult, as we
don't see the problems immediately because our systems use our name
servers for everything and they always provide correct information. We
have no problem with name-based lookups, because we are the owners of
gbch.net and can always provide correct DNS service.
Service interruption [2001-04-12 00:17-06:27]
Due to a failure by our ISP's equipment to notice that our line had
dropped out at 00:17, it rejected our attempts to reconnect. This was
discovered after 285 wasted calls (cost: $52.70). The ISP's NOC chief
claimed that they could not fix the problem until their NZ office opened
at 08:30 NZ time. When the NZ office called me, it took less than one
minute to fix the problem, but we were off the air for over 6 hours,
making a total of 8 hours 37 minutes lost in the past day.
Service interruption [2001-04-11 14:18-16:45]
We lost connectivity for two and a half hours due to a cut in an Optus
fibre-optic cable. It took two hours before the ISP's NOC could even
tell us what the problem was.
Operating system upgrade [2001-03-14 17:25-22:45]
Upgraded operating system to FreeBSD-4.2-Release with several security
and performance patches; took the opportunity to reorganise the server
to take advantage of last month's disk upgrade, with approximately half
a million files moving to new locations. This may lead to surprises, so
be sure to let us know if things seem to have broken.
Power interruption [2001-03-08 14:59-15:09]
Off line for ten minutes while electrical contractors added a new
circuit for our air conditioning. This was just outside the capacity of
our UPS.
Routing problem [2001-03-03 02:00 to 2001-03-05 16:00]
Another glitch in our connectivity to the world: a routing problem
caused by some broken filters installed by our ISP prevented us from
getting packets back onto the Net.
Impact: incoming email was delayed by up to 72 hours.
Comment: the takeover of our ISP by a multinational group has had a
negative impact on the quality of service. The technical manager has
now assured us that they will act promptly if there are any future
problems.
Routing problem [2001-02-28 01:00-07:00]
Something went awry with our connectivity to the world during the early
hours of the morning; the problem had corrected itself by the time we
started for the day.
Mail deliveries were delayed, but there is no sign that anything was
lost.
[Footnote: the subsequent routing mistake on 2001-03-03 to 2001-03-05
(see above) makes it pretty clear that this event was also a mistake by
the ISP as all symptoms were identical.]
Hardware repair/upgrade [2001-02-13]
Power supply replaced; disk capacity increased to 24 Gbytes.
Off-line for approximately two hours while we were trying to get
the second disk recognised -- why does the BIOS have to have
two places where you have to enable the second IDE channel?
Start [2001-02-13]
Start of incident record.
Copyright © 2001, 2002, 2003, 2004 Greg Black -- All Rights Reserved
Questions and comments about this page to
webmaster@gbch.net
$Id: incidents.html 2.36 2004-08-31 15:12:38+10 gjb Exp gjb $
|