Abstract—The multithreading and multicore techniques are widely adopted in the design of the modern high-performance CPUs. Multithreading technique allows multiple threads to share the functional units (FUs) within a core for the better utilizationof the FUs. Thus there will be conﬂiction on the use of some FUs, the ﬂoating-point unit (FPU) for instance. In such a case, some ﬂoating-point instructions will be suspended until the FPU is available for use. Multicore technique implements a small-scalemultiprocessor on a chip. A thread that runs on one core cannot use the FUs of other cores. This results in poor utilization of the FPU in some cores if the threads running on those cores do not contain ﬂoating-point instructions at all, although in other cores,the threads are straggling to complete for the FPU. Different from the traditional multiprocessors that are implemented withmultiple CPU chips, because the multicore CPUs implement multiprocessors on the same chip, it becomes possible to let the threads in a core group share all the FPUs in the group. When a conﬂict on the use of FPU occurs, some ﬂoating-point operations can be redirected to the cores of the same group in which the FPUs are in idle state, so that the overall performance of the multicore CPU will be improved. This paper investigates such a group architecture and gives the performance improvement of the proposed architecture to that of the traditional multicore architecture. Our experimental results show that, on average for the ﬂoating-point benchmarks, 4.25%, 7.34%, and 7.45% performance improvements can be achieved by redirecting the ﬂoating-point operations to other cores within the group with the group sizes of two, four, and eight, respectively, under the conditions of instruction redirecting overhead is assumed to be zero.