Más contenido relacionado La actualidad más candente (16) Similar a Blackout Task Force Highlights SCADA System Woes (20) Más de ARC Advisory Group (20) Blackout Task Force Highlights SCADA System Woes1. THOUGHT LEADERS FOR MANUFACTURING & SUPPLY CHAIN
ARC INSIGHTS
By Harry Forbes
The Joint US-Canadian Task Force
named 3 major causes of the US
Blackout. These were “situational
unawareness”, poor tree trimming, and
lack of effective diagnostic support for
critical operations.
INSIGHT# 2003-51MP
DECEMBER 10, 2003
Blackout Task Force
Highlights SCADA System Woes
Keywords
Blackout, Critical Condition Monitoring (CCM), SCADA
Summary
The interim report on the US Blackout of 2003
points to failures of SCADA systems and critical
software applications as the chief culprits. Com-
panies who operate SCADA systems and advanced
online applications should note carefully the role of
these systems in the Blackout events.
Analysis
The joint US-Canadian Task Force charged with investigating the causes of
the 2003 US Blackout issued its interim report in November, 2003. The re-
port adds much to the publicly available information concerning the events
of August 14, 2003. Press reports have emphasized that the Task Force
blamed Ohio utility FirstEnergy (FE) for the outage.
A careful reading of the document, however, shows that the blame is fixed
both on FirstEnergy and on the Midwest Independent System Operator
(MISO). The new information shows that personnel in both organizations
0
10
20
30
01530456075
Minutes Before 4:11PM EDT on August 14
NumberofAbnormalEvents
FirstEnergy SCADA Alarms inoperable
MISO State Estimator inoperable
2. ARC Insights, Page 2
©2003 • ARC • 3 Allied Drive • Dedham, MA 02026 USA • 781-471-1000 • ARCweb.com
were effectively “flying blind” during the critical 1-2 hours before the out-
age began to cascade. Their key tools for detecting and managing abnormal
operation had failed, and for most of the time they were unaware of that
fact.
Background
Electric system operators use a “State Estimator” (SE) application to gauge
the current state of a power system. This is essentially a data reconciliation
application that contains a math model of the power system. It estimates a
single consistent set of process data based on the various real-time meas-
urements available. The output of the SE is used by a Real Time
Contingency Analysis (RCTA) application, which calculates the effect of
additional faults on the power system. NERC reliability regulations require
that a power system be able to withstand any single contingency without
moving beyond equipment operating limits.
State Estimator Maladies
The report concludes:
“The MISO state estimator and real time contingency analysis tools were
effectively out of service between 12:15 EDT and 16:04 EDT. This pre-
vented MISO from promptly performing pre-contingency “early warning”
assessments of power system reliability over the afternoon of August 14.”
The reason for this loss of function was that some data inputs to the SE
were manual rather than automatic. Twice during the day, with the open-
ing of a two different transmission lines, the SE failed to converge to an
acceptable solution. The first occurrence was fixed by a technician at 13:00
EDT, who forgot to reschedule the state estimator to run every 5 minutes.
By the time this second error was discovered, another transmission line had
opened, so that the SE again failed to converge. Then the second abnormal
line status was manually entered. The application was restored fully only
at 16:04 EDT. By that time the situation was nearly hopeless. The cascade
was less than 6 minutes away. The report states that “MISO considers its
SE and RCTA tools to be still under development and not fully mature.”
That is a memorable example of understatement.
Another interesting note from the report tells the status of FirstEnergy’s
own RTCA software:
3. ARC Insights, Page 3
©2003 • ARC • 3 Allied Drive • Dedham, MA 02026 USA • 781-471-1000 • ARCweb.com
“FirstEnergy (FE) had and ran a state estimator every 30 minutes...FE in-
dicated that it has experienced problems with the automatic contingency
analysis operation since the system was installed in 1995. As a result, FE
operators or engineers ran contingency analysis manually rather than
automatically and were expected to do so when there were questions about
the state of the system. Investigation team interviews of FE personnel in-
dicate that the contingency analysis model was likely running but not
consulted at any point in the afternoon of August 14. “
Loss of SCADA Alarms
FirstEnergy’s SCADA system operators were not alerted concerning events
in the two hours previous to the blackout because their system stopped
processing alarm messages at 14:14 EDT. A couple of quotes from the re-
port bring out the key points:
“FE’s computer SCADA alarm and logging software failed shortly after
14:14 EDT (the last time a valid alarm came in). After that time, the FE
control room consoles did not receive any further alarms nor were there
any alarms being printed or posted on the EMS’s [Energy Management
System] alarm logging facilities.”
“At 14:41 EDT the primary server hosting the EMS alarm processing ap-
plication failed…Following preprogrammed instructions, the alarm system
application and all other EMS software running on the first server auto-
matically transferred onto the backup server. However, because the alarm
application moved intact onto the backup while still stalled and ineffective,
the backup server failed 13 minutes later, at 14:54 EDT. Accordingly, all
of the EMS applications on these two servers stopped running.”
The report also notes that the SCADA system was not running the latest
version of software, and was slated for replacement by FE. The alarm func-
tion was not restored until FE performed a complete “cold” reboot of the
SCADA system the following day. During the crisis period, the option of a
cold reboot was considered but rejected due to the need for SCADA infor-
mation during such a perilous operating period. It is interesting to note
that in this case a “hot standby” configuration did not add reliability be-
cause a faulted software application was transferred onto it.
4. ARC Insights, Page 4
©2003 • ARC • 3 Allied Drive • Dedham, MA 02026 USA • 781-471-1000 • ARCweb.com
Model Mismatch
A third cause highlighted by the Task Force was excessive tree growth un-
der transmission lines in FE’s right-of-ways. This caused the loss of three
345kV lines during the sequence, as heavily loaded transmission lines
sagged and came into contact with trees. While high branches are certainly
the root cause, the effect of this neglected maintenance was that power sys-
tem operators were working with overly optimistic models of these lines.
The faults occurred “at conditions well within specified operating parame-
ters.” So at the time of the Blackout these transmission lines had far less
actual capacity than was credited to them in the online models which were
used to calculate grid reliability.
Recommendations
• Carefully consider the level of maintenance given to all your SCADA
software, including vendor upgrades. Do so even for systems that are
“legacy” or expected to be replaced.
• Evaluate your high availability hardware system configurations for
vulnerability to common software faults.
• Process models which are used in everyday decision-making need to be
validated on an ongoing basis.
• Evaluate the effectiveness of your key online applications. How often
are they actually being used? Are they providing useful assistance to
your normal and abnormal operations?
Please help us improve our deliverables to you – take our survey linked to this
transmittal e-mail or at www.arcweb.com/myarc in the Client Area. For further
information, contact your account manager or the author at HForbes@arcweb.com.
Recommended circulation: All MAS-P clients. ARC Insights are published and
copyrighted by ARC Advisory Group. The information is proprietary to ARC and
no part of it may be reproduced without prior permission from ARC.