This paper proposes a memory efficient real-time 3-D DWT algorithm and its architectural implementation. As the running JD-DWT refreshes the wavelet coefficients with the arrival of every two new frames, the latency of the conventional 3D- DWT reduces by at least ¼ times. For realization of the transform canonical signed digit multiplier has been used. Parallelism being an added advantage for fast processing has been used with three pipelined stages in this architecture. For coefficient mapping, correlation between LPF and HPF in orthogonal Daubechies wavelet filter has been used. In this design the mem- ory requirement has been optimized to the order O(KN2 + (K - 2) x N). The proposed architecture has been implemented on Xilinx FPGA devices at an operating frequency of 75MHz. This low complex architecture ensures 100% hardware utilization.