Removed unnecessary fences, added pause operations to tight spin loops to optimize power consumption and performance on HT-enabled systems.