Performance optimization: use scoped_ptr instead of shared_ptr to hold lock in slot_call_iterator. Reduces per-slot invocation overhead by about 50%.