# Processor-Based Strong Physical Unclonable Functions With Aging-Based Response Tuning

Joonho Kong, Department of Electrical and Computer Engineering, Rice University
Farinaz Koushanfar, Department of Electrical and Computer Engineering, Rice University

Pages: 16–29

Abstract—A strong physically unclonable function (PUF) is a circuit structure that extracts an exponential number of unique chip signatures from a bounded number of circuit components. The strong PUF unique signatures can enable a variety of low-overhead security and intellectual property protection protocols applicable to several computing platforms. This paper proposes a novel lightweight (low overhead) strong PUF based on the timings of a classic processor architecture. A small amount of circuitry is added to the processor for on-the-fly extraction of the unique timing signatures. To achieve desirable strong PUF properties, we develop an algorithm that leverages intentional post-silicon aging to tune the inter- and intra-chip signatures variation. Our evaluation results show that the new PUF meets the desirable inter- and intra-chip strong PUF characteristics, whereas its overhead is much lower than the existing strong PUFs. For the processors implemented in 45 nm technology, the average inter-chip Hamming distance for 32-bit responses is increased by 16.1% after applying our post-silicon tuning method; the aging algorithm also decreases the average intra-chip Hamming distance by 98.1% (for 32-bit responses).

Keywords—Physically unclonable function; multi-core processor; secure computing platform; postsilicon tuning; circuit aging; negative bias temperature instability

## I.   Introduction

Achieving secure and trustworthy computing and communication is a grand challenge. Several known data/program security and trust methods leverage a root of trust in the processing units to achieve their goals. Microprocessors and other heterogeneous processing cores—which form the kernels of most modern computing and communication—have become increasingly mobile, limiting the amount of available energy and resources. Traditional security and trust methods based on classic cryptography are often computationally intensive and thus undesirable for low power portable platforms. Mobility and low power also favor smaller and simpler form factors that are unfortunately known to be more susceptible to attacks such as side-channels or invasive exploits. There is a search for low overhead and attack-resilient security methods that operate on low power computing platforms.

Physically unclonable function (PUF) is a promising circuit structure to address the pending security needs of several portable and resource-constrained computing platforms. Thanks to the unique and unclonable process variations (PVs) on each chip, PUFs can generate specific signatures for each manufactured IC. Technically, PVs mainly affect threshold voltage Z_$(V_{th})$ _Z or effective gate length Z_$(L_{eff})$ _Z of the devices in a chip [1], [2]. These unique device characteristics can be measured by the structural side-channel tests such as timing or current of specific test vectors. To ease integration into higher-level digital security primitives, it is desirable to transform the measured structural test results to digital values. The unclonability and inherent uniqueness properties of signatures makes PUF an attractive security primitive choice [3].

PUF signatures are typically extracted by a challenge-response protocol . In response to a challenge (or input) , the PUF generates a unique response (or output) that is dependent on the specific PV of the underlying chip. PUFs have been classified into two broad categories: Weak and Strong. Weak PUFs have a limited number of challenge-response pairs (CRPs), which restricts their application scenarios to those requiring a few secret bits such as key generation. Strong PUFs generate an exponential number of CRPs from a limited number of circuit components. Strong PUFs enable a wider range of security and trust protocols by leveraging their huge space of CRPs.

Although the already proposed strong PUFs have shown promising results [4], their application is still limited due to their non-negligible overhead and instability. For example, AEGIS secure processor design [5] which realizes a trustworthy hardware platform, has a non-negligible hardware overhead of the added logic including the arbiter PUF for supporting secure execution. Apart from the PUF logic itself, a large portion of hardware overhead often comes from error correction logic. Since PUFs should be able to produce stable outputs under various environmental conditions (e.g., voltage and temperature fluctuations), error correction logic overhead is inevitable, yet desired to be reduced. Moreover, natural PUFs may have undesirable statistical distributions in terms of inter-chip variations, which significantly restricts their practical applicability. The statistical distribution becomes even worse when spatial correlations between the device characteristics due to process variation (in particular, systematic variations) are prevalent across the chips.

In this paper, we introduce an alternative strong PUF architecture, based on a conventional multi-core processor. Our PUF design is a realization of a low-overhead and stable strong PUF. By leveraging the built-in structures (adders in ALUs) in typical multi-core microprocessors instead of building additional delay logic (e.g., a series of switches and a series of inverter chains in arbiter PUFs and ring oscillator (RO) PUFs [6], respectively), our design realizes a low-overhead and secure strong PUF which can be employed to many security applications. A proof-of-concept implementation is demonstrated on a two-core architecture. To further improve security, reliability, and stability of the PUFs as well as make up for possible drawbacks of the two-core PUF design, we also propose a systematic post-silicon tuning method for our PUF. Our new algorithm leverages an intentional aging method based on one of the most significant circuit aging mechanisms: negative bias temperature instability (NBTI) [7]. Our proposed post-silicon aging algorithm does not incur any performance overhead in most of the chips by careful consideration of selecting the gates that will be intentionally aged. Also, our algorithm greatly improves statistical properties of our PUF design in terms of both inter-chip and intra-chip variations.

Our main contributions include:

• We propose a low overhead strong PUF design, two-core PUF, which leverages built-in components in general processor architectures.
• Our new PUF design shows good statistical results, comparable to the previously proposed strong PUF designs. The hardware overhead of the new PUF is lower than the previously proposed ones.
• We propose a systematic method to further enhance statistical properties of our multi-core PUF in terms of both inter-chip and intra-chip variations by leveraging intentional aging, which complements the possible drawbacks of our PUF design.
• Our simulation results on a two-core architecture prove that our intentional aging algorithms successfully improve the statistical property of the two-core PUF with negligible performance overhead in most cases.

The rest of this paper is organized as follows. Section II outlines background information for process variation, delay model, and circuit aging mechanism/model. Section III explains our two-core PUF design while Section IV introduces our systematic tuning method by leveraging intentional aging to tune the statistical properties of the introduced PUF. Evaluation results for the two-core realization and intentional aging algorithms are discussed in Section V. Section VI provides a brief review of the recent literatures regarding PUFs and intentional post-silicon aging methods. Lastly, we conclude in Section VII.

## II.   Background and Preliminaries

In this section, we provide general background information and preliminaries for process variation, delay, and aging mechanism. The background and preliminaries are to make the paper self-contained and accessible to a broader audience who may not be familiar with process variation, delay model, and aging.

### A. Process Variation

Process variation (PV) generates inherent randomness in silicon structures. PV mainly affects threshold voltage Z_$(V_{th})$ _Z and effective gate length Z_$(L_{eff})$ _Z of devices, resulting in various side-effects (e.g., delay and power consumption) across chip instances.

PV can be classified into two broad categories: random and systematic variation. Random variation is caused by random dopant fluctuations or random defects in devices. Random variation does not have any spatial correlation between the devices. Unlike random variation, systematic variation incurs spatially correlated device fluctuations. It means that the devices which are close together have a higher probability to have similar device characteristics than those located far away. In contemporary process technologies, both random and systematic variation coexist in manufactured chips.

Fig. 1 shows sample Z_$V_{th}$ _Z distribution maps generated by a quad-tree PV model [1]. Z_$V_{th}$ _Z distribution is shown to be fairly random in a single chip as well as across the chips, while similar colors tend to agglomerate together (i.e., Z_$V_{th}$ _Z distributions are spatially correlated).

Fig. 1. Four process variation map examples generated by quad-tree process variation model [1]. The number in the right side of the figures means Z value of Gaussian distribution.

### B. Delay Model

To figure out the Z_$V_{th}$ _Z-dependent gate delay, we use the delay model described in [8]. The gate-level delay model can be represented as follows:$Delay\propto ({{L_{eff}}\over{\phi_{t}}})^{2}\times{{V_{dd}}\over{(ln(e^{{(1+\sigma)V_{dd}-V_{th}}\over{2n\phi_{t}}}+1))^{2}}}\eqno{\hbox{(1)}}$ where Z_$\phi_{t}$ _Z and Z_$\sigma$ _Z are thermal voltage and subthreshold slope, respectively. There are several other key factors that affect gate-level delay: supply voltage Z_$(V_{dd})$ _Z, threshold voltage Z_$(V_{th})$ _Z, and effective gate length Z_$(L_{eff})$ _Z. Due to process variations, these factors fluctuate, which in turn results in delay differences across the gates in chips. Furthermore, circuit aging (it will be covered in detail in Section II-C) also affects gate delay since circuit aging increases Z_$V_{th}$ _Z of the gate.

### C. Aging Model

Circuit aging is a phenomenon in which performance of the circuits is degraded by the circuit usage. This may eventually result in a malfunction of the circuit under intensive utilizations or extreme environmental conditions (e.g., extremely high temperature). Compared to fresh chips (i.e., not aged), aged chips have relatively lower performance due to Z_$V_{th}$ _Z shift by hot carrier injection (HCI) and negative bias temperature instability (NBTI). Z_$V_{th}$ _Z of devices is continuously increased as those devices are switched or have a high duty cycle, resulting in higher delay and lower power consumption.

In deep submicron process technologies, NBTI is known to be the most threatening aging mechanism [7]. Thus, in this paper, we consider NBTI as our main aging mechanism. The Z_$V_{th}$ _Z shift Z_$(\Delta~V_{th})$ _Z by NBTI is commonly modeled as follows:$\Delta V_{th}=A\times e^{(B V_{g})}\times e^{{-E_{\alpha}}\over{kT}}\times t^{0.25}\eqno{\hbox{(2)}}$ where Z_$V_{g}$ _Z and Z_$E_{\alpha}$ _Z are gate voltage and activation energy respectively. Z_$A$ _Z and Z_$B$ _Z are technology dependent constants. As shown in (2), the Z_$V_{th}$ _Z shift heavily depends on temperature Z_$(T)$ _Z and stress time Z_$(t)$ _Z. By applying this aging model, one can derive an appropriate stress time Z_$(t)$ _Z under a certain temperature Z_$(T)$ _Z to intentionally increase a certain amount of Z_$V_{th}$ _Z.

Stress time Z_$t$ _Z is strongly dependent on the signal probability (SP) [9] that represents a fraction of time when a gate output stays logic high (1) during the circuit operation. Depending on SP of a gate, Z_$V_{th}$ _Z of the gate will be increased (stress period) or decreased (recovery period). Hence, to make the gate intentionally aged, one should carefully determine SP of the gate so that it stays in the stress period much more than in the recovery period.

## III.   Two-Core PUF

### A. Design Philosophy and Design Decisions

#### 1) Base Platform-Multi-Core Microprocessor

Since our design is fundamentally based on the delay comparison mechanism of arbiter PUFs, we need symmetric (homogeneous) structures to generate diverse path delays affected by process variations. The symmetric multi-core microprocessor is one of the best design candidates since most commodity microprocessors (or microcontrollers) have multiple homogeneous cores.

Typical strong PUF designs have separate delay circuits to generate PUF responses, which incur additional area and power overhead. In contrast, our PUF design utilizes built-in components in typical multi-core microprocessors, which minimizes additional hardware and communication overhead. Compared to the AEGIS design [5] which employs separate switches to implement an arbiter PUF, our design is implementable with a much smaller logic overhead.

#### 2) Path Delay Source—ALUs

Our design chooses ALUs as path delay sources. The main reason is that ALUs can accept an exponential number of operands, which can also be used as challenge inputs. Moreover, they can generate challenge-dependent responses when using add instructions by stimulating the complex carry-chains in adder structures. Add instructions can have an exponential number of different operands (Z_$2^{64}$ _Z with 32-bit operands) and our PUF can also generate an exponential number of diverse responses depending on the challenge inputs as well as disorders in silicon structures. It means our ALU-based PUF design can be classified as a strong PUF.

The other reason for choosing ALUs as path delay sources is that ALUs are combinational logics in microprocessors and they have delay paths which are comprised of a long series of gates. It makes adversaries difficult to perform a model building attack. This is because the adversaries should perform multiple stages of gate-level delay table lookups and additions to obtain the accurate path delays through their PUF model. Determination of carry propagation behaviors also introduce a lot of control dependencies, which means it is difficult for adversaries to exploit the massively parallel computations in order to acquire a PUF response time comparable to that from the real PUF hardware. In this case, one can give a timing constraint (time-bound) during the PUF challenge in order to distinguish the real PUF and the modeled PUF. Time-bounded authentication by PUF has been introduced earlier [10].

Our PUF design can be applied to any adder structures, though in this paper we build our PUF based on ripple-carry adders (RCAs) as a proof-of-concept. In fact, PUFs are broadly used in small embedded systems (e.g., sensor nodes or RFIDs) [11], [12] or FPGAs [13][15] in which RCAs are more beneficial for energy-efficiency than high-performance adders such as carry-lookahead adders (CLAs). Note that the first design consideration of those embedded systems is typically energy-efficiency, not performance.

### B. Overall Design

#### 1) PUF Design

Delay-based PUFs [6] exploit delay differences between multiple paths which have inherently different delays across chips due to process variations. One may deploy arbiters (or counters/comparators in case of ring-oscillator PUFs) to capture the delay difference between two delay lines and convert it into a digitized value. In this paper, we propose an alternative strong PUF design which utilizes already built-in components in a processor architecture as our delay lines instead of building separate delay lines (e.g., a series of the switches in arbiter PUFs or a series of the inverters in ring oscillator PUFs).

Although our new strong PUF can be built based on any multi-core processor architecture, in the remainder of the paper we focus on a two-core proof-of-concept design. Generalization to more cores is straightforward. Fig. 2 shows a high-level design of our two-core PUF. For simplicity, we provide a simple 4-bit two-core PUF design in this figure. Our PUF utilizes arithmetic logic units (ALUs) in the multi-core microprocessors/controllers as symmetric delay lines. In order to give a challenge input to the PUF, the identical challenge program runs in both cores. As shown in Fig. 2, two 4-bit operands (operand Z_$A$ _Z and Z_$B$ _Z) are fed into each ALU and a 4-bit output Z_$(S_{1}\sim S_{4})$ _Z can be obtained from each ALU. For delay comparison, the Z_$n$ _Z-th output lines Z_$(S_{n})$ _Z from each ALU are connected to the Z_$n$ _Z-th arbiter Z_$(Arbiter_{n})$ _Z. The challenge program should start at the same cycle in both cores to guarantee correct PUF operations. Note that the arbiters in the circuit layout should be very carefully placed for correct operations of the two-core PUF. In addition, the wire lengths from two ALUs to the arbiter should be symmetric not to generate biased PUF outputs.

Fig. 2. The basic structure of our two-core PUF (bit Z_${\rm width}={4}$ _Z-bit).

In our proof-of-concept example, Z_$bitwidth$ _Z of our base microprocessor is 32-bit. Hence, each core has a 32-bit ALU. Z_$S_{n}$ _Z from Core0 and Core1 are connected to the Z_$Arbiter_{n}$ _Z, where Z_$n$ _Z is 1–32. Thus, we need 32 arbiters for delay comparison. Note that our design can be easily extended to 64-bit microprocessors by simply adding 32 more arbiters and connecting the corresponding ALU output ports to those arbiters.

#### 2) Security Enhancement by XOR Obfuscation

Typical security applications desire a high inter-response variations (i.e., high unpredictability). A low inter-response variation may make the PUF vulnerable to the modeling attack [16] because only a small set of CRPs may enable an accurate modeling of a specific PUF by adversaries. For better inter-response variations of our PUF design, one can deploy an additional XOR obfuscation step between two different response bits as described in [17].

By paying a little more hardware cost, one can perform an XOR operation between Z_$i$ _Z-th bit and Z_$(i+{{bitwidth}\over{2}})$ _Z-th bit from a response, as shown in Fig. 3. PUF operations should be performed twice with different challenges in order to generate a Z_$bitwidth$ _Z-bit response, which also incurs timing overhead. Considering the trade-off among the hardware cost, performance, and security, one can employ the additional XOR obfuscation step only for the case where a high level of security is required.

Fig. 3. Additional logic for XOR obfuscation.

As shown in Fig. 4, the inter-response variation is greatly improved by adding the XOR obfuscation step. Comparing between the case with and without XOR obfuscation, an average inter-response Hamming distance is increased from 5.06 bits to 10.64 bits and from 11.81 bits to 20.53 bits when using 32-bit and 64-bit two-core PUF, respectively.

Fig. 4. Inter-response Hamming distance variations when 10,000 random different inputs are fed into the two-core PUF. The x-axis and y-axis corresponds to the Hamming distances and probability mass function.

### C. Detailed Design and Architectural Modifications

Delay characteristics in our PUF depend on the carry propagation behavior in the conventional ripple-carry adder (which is included in ALUs). As shown in Fig. 5, two operands (Z_$A_{i}$ _Z and Z_$B_{i}$ _Z) are fed into the full adders. Between the full adders, there are carry bits Z_$(C_{i})$ _Z, which depend on the operands (Z_$A_{i}$ _Z and Z_$B_{i}$ _Z) and previous carry bit Z_$(C_{i-1})$ _Z. Depending on the carry bit, delay characteristics of the full adder rely on those of either the preceding full adders or only the current full adder. These carry propagation behaviors generate an exponential number of the signal propagation behaviors in the adder, which eventually enables a generation of challenge-dependent PUF outputs. The summation result bits Z_$(S_{i})$ _Z from the ALU (in each core) are connected to the arbiters. Z_$S_{i}$ _Z is also connected to the ALU output storage which is already implemented in general processor architectures, though it is not shown in Fig. 5. The signals from two separate ALUs race to the arbiter, which in turn generates a digitized output depending on which delay line is faster. The arbiter output is stored to a temporary register ('PUF Z_${\rm Response}_{i}$ _Z'in Fig. 5).

Fig. 5. A more detailed structure of our two-core PUF. For simplicity, only one arbiter and one temporary register (flip-flop) are shown in the figure. The XOR obfuscation logic is drawn in a dashed-line since it is an optional logic.

## VI.   Related Work

### A. Physically Unclonable Functions (PUFs)

A plausible method for unique and unclonable identification of devices and objects is based on the inherent and hard to forge randomness or disorder of their underlying physical fabrics. To overcome the exposure associated with storage of digital keys, a novel class of secret embedding, storage, and extraction widely known as PUFs has emerged. The secret generation and storage mechanisms in PUFs are based on the inherent disorder present in the silicon [3]. Memory-based PUFs, which are a type of weak PUFs [4], [28], [29], are typically used for secure key storages. Arbiter PUFs [6], that are known to belong to the strong PUF family, are composed of a series of switches (MUXes), which change delay paths according to the input challenge bits. For better statistical properties and to make the structure resilient to modeling attacks, different PUF outputs can also be XOR-ed [30]. Ring-oscillator (RO) PUFs [6] are composed of a long chain of inverters. Glitch PUFs [31] exploit a glitch propagation variability along the delay paths. In [5] and [32], the PUF structures combined with microprocessor architecture are proposed. Apart from PUF design studies, there exists work in literature on detailed analysis [33], [34], formal models [35], and modeling attacks on PUFs [16].

In this work, we proposed a new strong PUF design, which is fundamentally based on delay comparison between two symmetric paths by using arbiters. Our PUF design is instruction-controlled, and leverages built-in components, i.e., arithmetic logic unit (ALU) in a classic processor architecture for path delay sources instead of deploying separate delay sources as presented in [6].

### B. Leveraging Aging to PUFs and Circuits

Circuit aging is a common mechanism by which performance of the circuits is degraded as they are used. Though a large body of work for aging resilience in circuit structures has been studied, in this paper, we focus on the case where one leverages the intentional aging of the PUFs for tuning the statistical properties of the PUF responses. Reference [36] provided the first set of formal properties for the statistical distribution of the PUF responses in terms of the inter-chip and intra-chip variation.

A hardware aging-based software metering technique [37] precisely tracks the software usage by feeding the test vectors to the specific circuit. Device-aging based PUF design [38] leverages aging mechanism to shape the PUF responses. It can also be used for a graduation of the PUF responses which is robust to PUF modeling attacks or for better statistical properties of the PUF by changing the PUF responses. Public PUFs (PPUFs) [39][41] leverage the aging to shape the PUF responses. The main purpose of applying aging to PPUFs is to make the responses of the PUFs, which are shared among the trusted parties, identical for low-power consumption and fast authentication. Leveraging intentional aging for generating stable outputs in SRAM (static random access memory) PUFs was also proposed [42]. Negative bias temperature instability (NBTI) aging mechanism enables a more stable output generation from SRAM PUFs.

To the best of our knowledge, our work is the first to introduce systematic aging of a strong PUF (two-core PUF) to get a better statistical distribution of PUF responses (i.e., signatures) both in terms of inter-chip and intra-chip variations.

## VII.   Conclusion

In this paper, we proposed a two-core strong PUF architecture. Our design is low overhead and robust to systematic variations because of its inherently symmetric construction. To improve the statistical distribution of the PUF outputs, we devised a novel intentional aging algorithm which makes the PUF instances much more secure and stable in terms of both inter- and intra-chip variations. Our evaluation results suggest that our proposed algorithms greatly improve the quality of the PUF challenge-response statistical properties. By applying the algorithm to increase inter-chip variations, one can obtain the PUF responses which have higher uniqueness across different chip instances. Also, the algorithm to reduce intra-chip variations make our PUF much more robust to the environmental fluctuation, which also enables a deployment of low overhead error correction schemes for robustness and stability of our PUFs.

## Acknowledgment

The contractor acknowledges government support in the publication of this paper. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of AFRL.

## References

• [1]B. Cline, K. Chopra, D. Blaauw, and Y. Cao, “Analysis and modeling of CD variation for statistical static timing,” in Proc. IEEE/ACM Int. Conf. Comput., Aided Design, Nov.2006, pp. 60–66.
• [2]S. R. Sarangi, B. Greskamp, R. Teodorescu, J. Nakano, A. Tiwari, and J. Torrellas, “VARIUS: A model of process variation and resulting timing errors for microarchitects,” IEEE Trans. Semicond. Manuf., vol. 21, no. 1, pp. 3–13, Feb.2008.
• [3]U. Ruhrmair, S. Devadas, and F. Koushanfar, Security Based on Physical Unclonability and Disorder. New York, NY, USA: Springer-Verlag, 2011.
• [4]R. Maes and I. Verbauwhede, Physically Unclonable Functions: A Study on the State of the Art and Future Research Directions. New York, NY, USA: Springer-Verlag, 2010.
• [5]G. E. Suh, C. W. O'Donnell, and S. Devadas, “AEGIS: A single-chip secure processor,” Inf. Security Tech. Rep., vol. 10, no. 2, pp. 63–73, 2005.
• [6]G. E. Suh and S. Devadas, “Physical unclonable functions for device authentication and secret key generation,” in Proc. 44th ACM/IEEE DAC, Jun.2007, pp. 9–14.
• [7]R. Vattikonda, W. Wang, and Y. Cao, “Modeling and minimization of PMOS NBTI effect for robust nanometer design,” in Proc. 43rd Annu. Design Autom. Conf., 2006, pp. 1047–1052.
• [8]D. Markovic, C. Wang, L. Alarcon, T.-T. Liu, and J. Rabaey, “Ultralow-power design in near-threshold region,” Proc. IEEE, vol. 98, no. 2, pp. 237–252, Feb.2010.
• [9]F. Najm, “A survey of power estimation techniques in vlsi circuits,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 2, no. 4, pp. 446–455, Dec.1994.
• [10]M. Majzoobi and F. Koushanfar, “Time-bounded authentication of FPGAs,” IEEE Trans. Inf. Forensics Security, vol. 6, no. 3, pp. 1123–1135, Sep.2011.
• [11]S. Schulz, A.-R. Sadeghi, and C. Wachsmann, “Short paper: Lightweight remote attestation using physical functions,” in Proc. 4th ACM Conf. Wireless Netw. Security, 2011, pp. 109–114.
• [12]S. Devadas, G. E. Suh, S. Paral, R. Sowell, and T. Ziola, “Design and implementation of PUF-based 'unclonable' RFID ICs for anti-counterfeiting and security applications,” in Proc. IEEE Int. Conf. RFID, Apr.2008, pp. 58–64.
• [13]L. N. Chakrapani, K. K. Muntimadugu, A. Lingamneni, J. George, and K. V. Palem, “Highly energy and performance efficient embedded computing through approximately correct arithmetic: A mathematical foundation and preliminary experimental validation,” in Proc. Int. Conf. Compil., Architect. Synth. Embedded Syst., Oct.2008, pp. 187–196.
• [14]Z. M. Kedem, V. J. Mooney, K. K. Muntimadugu, and K. V. Palem, “An approach to energy-error tradeoffs in approximate ripple carry adders,” in Proc. 17th IEEE/ACM ISLPED, Jan.2011, pp. 211–216.
• [15]D. G. Bailey, Design for Embedded Image Processing on FPGAs. New York, NY, USA: Wiley, 2011.
• [16]U. Rührmair, F. Sehnke, J. Sölter, G. Dror, S. Devadas, and J. Schmidhuber, “Modeling attacks on physical unclonable functions,” in Proc. 17th ACM Conf. Comput. Commun. Security, 2010, pp. 237–249.
• [17]M. Majzoobi, M. Rostami, F. Koushanfar, D. Wallach, and S. Devadas, “Slender PUF protocol: A lightweight, robust, and secure authentication by substring matching,” in Proc. IEEE Symp. SPW, Jun.2012, pp. 33–44.
• [18]J. Kong, S. W. Chung, and K. Skadron, “Recent thermal management techniques for microprocessors,” ACM Comput. Surv., vol. 44, no. 3, pp. 1–42, 2012.
• [19]E. Humenay, D. Tarjan, and K. Skadron, “Impact of process variations on multicore performance symmetry,” in Proc. Conf. Design, Autom. Test Eur., 2007, pp. 1653–1658.
• [20]D. A. Patterson and J. L. Hennessy, Computer Organization and Design—The Hardware/Software Interface (ser. Comput. Architecture and Design), 4th ed. San Mateo, CA, USA: Morgan Kaufmann, 2012.
• [21]J. Kong, Y. Pan, S. Ozdemir, A. Mohan, G. Memik, and S. W. Chung, “Fine-grain voltage tuned cache architecture for yield management under process variations,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 8, pp. 1532–1536, Aug.2012.
• [22]S. V. Kumar, C. H. Kim, and S. S. Sapatnekar, “NBTI-aware synthesis of digital circuits,” in Proc. 44th ACM/IEEE DAC, Jun.2007, pp. 370–375.
• [23]M. Chen, V. Reddy, S. Krishnan, V. Srinivasan, and Y. Cao, “Asymmetric aging and workload sensitive bias temperature instability sensors,” IEEE Design Test Comput., vol. 29, no. 5, pp. 18–26, Oct.2012.
• [24]M. Valdes-Pena, J. F. Freijedo, M. M. Rodriguez, J. Rodriguez-Andina, J. Semiao, I. Teixeira, J. Teixeira, and F. Vargas, “Design and validation of configurable online aging sensors in nanometer-scale FPGAs,” IEEE Trans. Nanotechnol., vol. 12, no. 4, pp. 508–517, Jul.2013.
• [25]J. Kong, J. K. John, E.-Y. Chung, S. W. Chung, and J. S. Hu, “On the thermal attack in instruction caches,” IEEE Trans. Dependable Secure Comput., vol. 7, no. 2, pp. 217–223, Apr.–/Jun.2010.
• [26]J. Kong and S. W. Chung, “Exploiting narrow-width values for process variation-tolerant 3-D microprocessors,” in Proc. 49th Annu. DAC, Jun.2012, pp. 1197–1206.
• [27]P. Zicari and S. Perri, “A fast carry chain adder for Virtex-5 FPGAs,” in Proc. 15th IEEE Medit. Electrotech. Conf., Apr.2010, pp. 304–308.
• [28]D. Holcomb, W. Burleson, and K. Fu, “Power-up SRAM state as an identifying fingerprint and source of true random numbers,” IEEE Trans. Comput., vol. 58, no. 9, pp. 1198–1210, Sep.2009.
• [29]S. S. Kumar, J. Guajardo, R. Maes, G. J. Schrijen, and P. Tuyls, “The butterfly PUF: Protecting IP on every FPGA,” in Proc. IEEE Int. Workshop Hardw., Oriented Security Trust, Dec.2008, pp. 67–70.
• [30]M. Majzoobi, F. Koushanfar, and M. Potkonjak, “Lightweight secure PUFs,” in Proc. IEEE/ACM ICCAD, Nov.2008, pp. 670–673.
• [31]D. Suzuki and K. Shimizu, “The glitch PUF: A new delay-PUF architecture exploiting glitch shapes,” in Proc. 12th Int. Conf. Cryptograph. Hardw. Embedded Syst., 2010, pp. 366–382.
• [32]A. Maiti and P. Schaumont, “A novel microprocessor-intrinsic physical unclonable function,” in Proc. 22nd Int. Conf. FPL, Aug.2012, pp. 380–387.
• [33]S. Katzenbeisser, Ü. Kocabas, V. Rozic, A.-R. Sadeghi, I. Verbauwhede, and C. Wachsmann, “PUFs: Myth, fact or busted? A security evaluation of physically unclonable functions (PUFs) cast in silicon,” in Proc. 14th Int. Conf. Cryptograph. Hardw. Embedded Syst., Sep.2012, pp. 283–301.
• [34]M.-D. M. Yu, R. Sowell, A. Singh, D. M'Raïhi, and S. Devadas, “Performance metrics and empirical results of a PUF cryptographic key generation ASIC,” in Proc. IEEE Int. Workshop Hardw., Oriented Security Trust, Sep.2012, pp. 108–115.
• [35]F. Armknecht, R. Maes, A.-R. Sadeghi, F.-X. Standaert, and C. Wachsmann, “A formalization of the security features of physical functions,” in Proc. IEEE Symp. Security Privacy, Aug.2011, pp. 397–412.
• [36]M. Majzoobi, F. Koushanfar, and M. Potkonjak, “Techniques for design and implementation of secure reconfigurable PUFs,” ACM Trans. Reconfigurable Technol. Syst., vol. 2, no. 1, pp. 133, 2009.
• [37]F. Dabiri and M. Potkonjak, “Hardware aging-based software metering,” in Proc. Conf. Design, Autom. Test Eur., 2009, pp. 460–465.
• [38]S. Meguerdichian and M. Potkonjak, “Device aging-based physically unclonable functions,” in Proc. 48th DAC, 2011, pp. 288–289.
• [39]M. Potkonjak, S. Meguerdichian, A. Nahapetian, and S. Wei, “Differential public physically unclonable functions: Architecture and applications,” in Proc. 48th DAC, Jun.2011, pp. 242–247.
• [40]S. Meguerdichian and M. Potkonjak, “Matched public PUF: Ultra low energy security platform,” in Proc. 17th IEEE/ACM Int. Symp. Low-Power Electron. Design, Aug.2011, pp. 45–50.
• [41]S. Meguerdichian and M. Potkonjak, “Using standardized quantization for multi-party PPUF matching: Foundations and applications,” in Proc. IEEE ICCAD, Nov.2012, pp. 577–584.
• [42]M. Bhargava, C. Cakir, and K. Mai, “Reliability enhancement of bi-stable PUFs in 65 nm bulk CMOS,” in Proc. IEEE Int. Symp. HOST, May2012, pp. 25–30.

Joonho Kong (S'07–M'11) received the B.S. degree in computer science from Korea University, Seoul, Korea, in 2007, and the M.S. and Ph.D. degrees in computer science and engineering from Korea University in 2009 and 2011, respectively. He is currently a post-doctoral researcher with the Department of Electrical and Computer Engineering, Rice University. His research interests include computer architecture design, temperature-aware microprocessor design, reliable microprocessor cache design, and hardware security.
Farinaz Koushanfar (S'99–M'06) received the Ph.D. degree in electrical engineering and computer science and the M.A. degree in statistics from the University of California Berkeley, in 2005, and the M.S. degree in electrical engineering from the University of California Los Angeles. She is currently an Associate Professor with the Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA, where she directs the Texas Instruments DSP Leadership University Program. Her research interests include adaptive and low power embedded systems design, hardware security, and design intellectual property protection.
She is a recipient of the Presidential Early Career Award for Scientists and Engineers, the ACM SIGDA Outstanding New Faculty Award, the National Academy of Science Kavli Foundation Fellowship, the Army Research Office Young Investigator Program Award, the Office of Naval Research Young Investigator Program Award, the Defense Advanced Project Research Agency Young Faculty Award, the National Science Foundation CAREER Award, the MIT Technology Review TR-35, the Intel Open Collaborative Research Fellowship, and the Best Paper Award at Mobicom.