This paper investigates various issues of pairing Java applications for multithreaded execution on Intel?s Hyper- Threading Pentium 4 processor. We first quantify the overall performance of multiprogrammed Java applications using a metric called combined speedup. Using the performance counters provided by the Pentium 4, we then quantitatively evaluate the performance of underneath microarchitecture components and their implications to the combined speedup. A statistical model is proposed to analyze the collected data. This novel approach reveals that trace cache is the major factor determining the pairing performance. In particular, we find that the trace cache miss rates of Java applications can be utilized to predict the combined speedups. Three new scheduling strategies are proposed based on these observations and then evaluated. The experimental results show that the proposed strategies have better performance than the conventional round-robin scheduling scheme. Overall, our best strategy enables an reduction in execution time of 10.5% over the serial execution, comparing with a reduction of 5.92% achieved by the round-robin scheduling. The improvement will be increasingly significant on future SMT processors.