IEEE Transactions on Parallel and Distributed Systems

IEEE Transactions on Parallel and Distributed Systems (TPDS) is a scholarly archival journal published monthly. Parallelism and distributed computing are foundational research and technology to rapidly advance computer systems and their applications. Read the full scope of TPDS.

Expand your horizons with Colloquium, a monthly survey of abstracts from all CS transactions! Replaces OnlinePlus in January 2017.

From the February 2018 Issue

Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes

By Erfan Azarkhish, Davide Rossi, Igor Loi, and Luca Benini

Free Featured Article High-performance computing systems are moving towards 2.5D and 3D memory hierarchies, based on High Bandwidth Memory (HBM) and Hybrid Memory Cube (HMC) to mitigate the main memory bottlenecks. This trend is also creating new opportunities to revisit near-memory computation. In this paper, we propose a flexible processor-in-memory (PIM) solution for scalable and energy-efficient execution of deep convolutional networks (ConvNets), one of the fastest-growing workloads for servers and high-end embedded systems. Our co-design approach consists of a network of Smart Memory Cubes (modular extensions to the standard HMC) each augmented with a many-core PIM platform called NeuroCluster. NeuroClusters have a modular design based on NeuroStream coprocessors (for Convolution-intensive computations) and general-purpose RISC-V cores. In addition, a DRAM-friendly tiling mechanism and a scalable computation paradigm are presented to efficiently harness this computational capability with a very low programming effort. NeuroCluster occupies only 8 percent of the total logic-base (LoB) die area in a standard HMC and achieves an average performance of 240 GFLOPS for complete execution of full-featured state-of-the-art (SoA) ConvNets within a power budget of 2.5 W. Overall 11 W is consumed in a single SMC device, with 22.5 GFLOPS/W energy-efficiency which is 3.5X better than the best GPU implementations in similar technologies. The minor increase in system-level power and the negligible area increase make our PIM system a cost-effective and energy efficient solution, easily scalable to 955 GFLOPS with a small network of just four SMCs.

download PDF View the PDF of this article      csdl View this issue in the digital library

Editorials and Announcements


  • We are pleased to announce that Manish Parashar, a Distinguished Professor of Computer Science at Rutgers, The State University of New Jersey University, has been selected as the new Editor-in-Chief of the IEEE Transactions on Parallel and Distributed Systems starting in 2018.
  • According to Clarivate Analytics' 2016 Journal Citation Report, TPDS has an impact factor of 4.181.


Guest Editorials

Reviewers List

Annual Index

Access recently published TPDS articles

RSS Subscribe to the RSS feed of recently published TPDS content

mail icon Sign up for e-mail notifications through IEEE Xplore Content Alerts

preprints icon View TPDS preprints in the Computer Society Digital Library

TPDS is indexed in ISI