SCADA Design – Communication Resiliency & Reliability
Business strategy should always be supported by a solid technology strategy; however, sometimes business strategy puts undo, and unreasonable requirements on what technology is capable of. Within the utility world, one of those requirements that I often hear from Utility Directors is that we need access to control systems all the time, every time. Within the controlled environment of a data center, five or six 9’s are reasonably possible; providing communication access to a large number of PLC locations over 50 square miles in rough terrain poses a more unique challenge.
With the challenge laid down, a very talented PLC and SCADA developer, Roe Vernon, and myself worked diligently to come up with an advanced design that takes into consideration every possible data requirement and communication failure that can occur within this type of environment. The design and build we accomplished met the goal of ensuring 100% communication uptime, 100% of the time.
To lay the groundwork, our design was built to support a very challenging water utility system, comprising of over 40 pressure zones, moving water almost a mile vertically, through wells, pump stations, control valves and storage tanks. The SCADA system is programmed to automatically move and chlorinate water through 130 miles of pipeline, as demand fluctuates. Ground to tap without a single human intervention. Such a system requires dozens of data points to move between stations located miles apart, quickly and without data loss. To ensure the PLC logic can perform this automation, the communication backbone is critical. Without it, the chance to overfill a tank and create an environmental hazard, or depressurize a pipeline and cause a health issue are both real possibilities.
The system, as depicted in the image, starts with a dual-data center configuration that is connected to each and every water station by a fiber optic SONET (ringed dual-path). The data centers are stretched by Layer-3 core switches, with each core connecting to one port on each Layer-2 edge switch within each station. This dual hub/spoke configuration makes up a 1GB/s high-bandwidth (HBW) redundant transport layer. During normal operating conditions, all data between PLC’s, the Historian, and control center HDMI operated over HBW. This is all normal and typical of Ethernet based SCADA systems. The real cool part is how we added a serial component to this system to compensate for switch failure, fiber cuts, or even port-flapping.
Before describing how the serial communication works, it is important to understand how we grouped stations into regionally dependent systems called PODs. These PODs are made up of multiple stations, usually not exceed 5, which are hydrologically dependent upon each other to move the water in a smaller region. Within this POD, a single station is designated the Primary station, with all the other POD stations taking on a secondary roll. Within the POD system, the serial communications always are between the Primary and secondary stations, or between Primary stations of differing PODs to allow flow between PODs.
By coupling the PLC, OIT and local historian together between an Ethernet connection and a serial connection, we were able to develop a high-bandwidth (HBW)/low-bandwidth (LBW) communication redundancy. If the HBW path fails, a number of things happen. First, data collection stops going to the data center located historian is begins to be stored locally at the edge. Second, the PLC begins to communicate with its POD Primary over serial radio. If a Primary station losses HBW, it will communicate serially to another Primary station in order to provide data back to the main HDMI and data centers. Once the HBW path becomes available again, communication immediately transfers back.
To accomplish this a couple of designs were required. The first design was developing a packet size for HBW and a second packet size for LBW. Serial can only carry so much data quickly, and as such, only a preset of the most critical data is packaged together: tank level, pump run status, valve position etc. As my colleague loves to point out: “The only way to move a camel through the eye of the needle, is to blend it up and poor very slowly”. With the HBW, all data is transported regardless of importance. The second design was a clock system to determine when HBW was lost between stations. Two clocks were designed: an IP Clock and a Serial Clock. Like a ping, each PLC listened for activity from another PLC, as communication continued, the clock continued to increase, once the clock stopped, then the PLC would check the serial clock, and if it continued to spin, then the system would fail over to the serial environment. Once the IP Clock began to spin again, failover back was instantaneous. During HBW, both the LBW and HBW packets were transferred between PLC’s to ensure we always had the latest data for both systems. During serial failover, this ensured that we were not waiting for new information over slower serial, and could continue to operate the system with no loss of data. During all this failover, the local historian would continue to keep track of historical data at the station, waiting for HBW to return, and when it did, would upload all the data to the centrally stored historian. The result is that even with the loss of Ethernet connectivity, historical data would never have a gap, and trends would be continuous regardless of communication failures.
How have you improved the resiliency of your SCADA communications. Let me know, and leave a comment.