Fabric management at CERN was facing challenges due to a growing number of PCs and disk servers exceeding the capabilities of their existing automated tools. While previous tools were effective for clusters of tens of nodes, the thousands of systems now required a single source of configuration information, improved installation processes, and service-level monitoring to support grid computing needs and system reconfigurations over time. The goals were to develop standards-based management that provided reproducible installations across nodes, defined node roles and states, and enabled automatic recovery from failures or needed reconfigurations.