Industrial Network Troubleshooting
Root-cause analysis and permanent fix for intermittent communication losses across a PROFINET ring topology in a high-EMI press shop.
Context & challenge
A press shop running four S7-1500 PLCs interconnected over a PROFINET MRP (Media Redundancy Protocol) ring via SCALANCE X switches experienced daily communication faults. Each fault caused a line stop of 3–15 minutes while operators acknowledged alarms and restarted affected programs. Maintenance had replaced cables and switches multiple times without resolving the issue. The faults were intermittent — they did not follow a predictable schedule — which made diagnosis difficult.
Approach
Rather than replacing hardware again, the investigation started from data collection. Three parallel sources were set up:
- SCALANCE diagnostic logs — port statistics (CRC errors, late collisions, port resets) were captured continuously via SNMP and visualised in a simple Python dashboard to correlate fault events with switch-port anomalies.
- PLC diagnostic buffer — TIA Portal diagnostic buffers on all four CPUs were exported at each fault event using automated scripts, building a timeline of which device dropped off the ring first and when.
- Network baseline capture — a Wireshark capture on the ring manager port over three days established a baseline for frame sizes, cycle times, and LLDP topology advertisements.
Root causes found
Data analysis identified three independent contributing factors:
- EMI-induced CRC errors on a 40 m cable segment running alongside an unshielded servo drive power cable. The segment was re-routed in a separate cable duct and replaced with shielded Cat 6A.
- MRP ring manager misconfiguration — two switches had both been configured as manager, creating split-brain events during ring healing. One was reconfigured to client mode.
- Incorrect port speed/duplex negotiation on one SCALANCE uplink. Auto-negotiation was disabled in favour of fixed 100 Mbps / full duplex on all inter-switch links, eliminating a class of transient half-duplex collisions.
Outcome
After all three fixes were applied, the press shop ran for 60 days without a single PROFINET fault. The SNMP dashboard developed during the investigation was kept in place as a permanent monitoring layer, feeding alerts to the maintenance team before errors escalate to line stops.
The case also led to the site creating a standard PROFINET commissioning checklist — covering cable routing, MRP role assignment, and speed/duplex configuration — that is now applied to all new installations.