Beware the M580; as well as your own Accountability
Do you remember dominoes? The game where you spent hours designing the perfect system, with really cool tricks; like splitting the line, or having a block fall off the table, only to begin the next run by knocking down the next block on the floor. All that hard work, planning, design, development, and finally perfect execution.
Except, implementing the M580 PLC into our SCADA system didn’t go so smoothly. In fact, while the blocks continued pushing other blocks, it was a design that I was no longer in charge of, creating a destructive path of redevelopment and redesign, all due to a product that was not ready for prime-time, as well as a lack of true understanding by myself or my team.
This story begins with the Great California drought around 2014, followed by another California State un-funded mandate. The mandate had a noble goal, to conserve water and improve environmental protection. This mandate required water providers to implement highly accurate flow meters with no data loss. At the time, before this requirement, we implemented the M340 PLC, which worked flawlessly. It was able to scan across subnets. It allowed you to shut down ports not in use. It allowed you options to connect it to the network via a backplane or an Ethernet port. And it was more than powerful enough for our needs to run our system. What it could not do however, was communicate with the HART module, an add-on module designed to communicate with the water totalizers our operations purchased to meet the California regulation. The HART module could only communicate with the M580 via the backplane. Why this design decision was made by Schneider Electric (SE), is beyond me, but at this point, the first domino fell.
Having to swap out our M340’s wherever we deployed a HART module was not only expensive, but the start of multiple dominos falling in our attempt to build a resilient and redundant system. With our network built upon a Layer 3 architecture, with each remote station on its own subnet, we required all of our PLC’s to communicate across subnets. This is when the second domino fell. The M580 could not scan across subnets. It is as if the engineers at SE had never heard of a Layer 3 network before. As a result, we had to re-write a good majority of our code to scan for PLC communications, using the tried and true method of read_VAR/write_VAR, a slower process with less packet size capabilities. This was frustrating enough, but as we proceeded to add security to the PLC, we encountered several other limitations.
First, we tried to add a management VLAN to the PLC to allow data to flow through the data ports and administration to be secured to a separate port. The M580 wouldn’t allow us to do this, even though the M340 did. As such, we had to run the two different types of data on the same port and same subnet. This left us with two ports we no longer needed to use. So, as any good cyber security analyst would do, we shut them down and disabled them.
Shortly thereafter, another domino fell. This time, during a power outage at a remote site where we did not deploy a generator. The outage was long enough that our UPS finally gave up the ghost, and our system shut down. This was expected and planed as designed. The station was not critical enough to spend on generation, and after 14 hours of no power, communications were shut down. What we did expect, however, was that when power returned, so too would communications. While our switches and other network gear came alive, our M580 and HART module did not.
After much troubleshooting, we discovered that the M580 could not recover if ports were disabled. This was an incredible discovery, because SE’s own cyber security documentation specifically states to disable unused ports. What quality control test was performed to let this major security flaw make it to a customer’s hands? We had to un-disable the ports and find IT work arounds to secure this flaw.
The last, and so far final domino to fall, after designing security work arounds to fix the port disable flaw, was the HART module still would not recover. It seems that the HART module would not allow assigning a static IP to the device, and instead relied on the DHCP protocol; the only device in the network requiring it. Upon starting up and looking for an IP address, it was creating a DHCP boot storm, trying to communicate with every M580, because every M580 is by default setup as a DHCP provider. The HART module could never get an IP to restart backup. As such, we went back to our trusty IT network to fix an OT problem and had to block DHCP protocols from leaving the M580 entirely.
One domino, the California drought, lead to another, an un-funded mandate, leading to another, the purchase of the HART module, leading to another, the replacement of M340’s with M580’s, leading to a long redevelopment of our entire SCADA System. None of this would have been a significant issue, had the M580 been able to provide the same functionality its little brother, the M340 did, or had me and my team done our due diligence in understanding the products limitations.
I could lay blame at the feet of the developer, however, that is a poor personal action, and does not allow anyone to grow and learn. It is every technology leader’s responsibility, when implementing new technology into your system, to understand its limitations fully, because the accountability lies with you alone. While we may have had to go down this road regardless, due to the California regulation, it was a large learning experience for my self and the team in my charge. It is a lesson I can provide you free of charge, because the next lesson you face may be extremely expensive.
Have you had a bad implementation experience? Leave a comment, and let us know, and what have you learned from it.