An efficient I/O subsystem enables cost-effective network processing. To improve high-speed data transfer, the I/O subsystem sends data directly into the processing core's register file. An implementation of this subsystem in a single-chip network processor, the Pro3, can sustain advanced inspection firewall processing at 2.5-Gbps TCP traffic.