|
| This Article | ||
| ||
| Share | ||
| Bibliographic References | ||
| Add to: | ||
| | ||
| Search | ||
| ||
Active Learning and Effort Estimation: Finding the Essential Content of Software Effort Estimation Data
PrePrint
ISSN: 0098-5589
| ASCII Text | x | ||
| Ekrem Kocaguneli, Tim Menzies, Jacky Keung, David Cok, Ray Madachy, "Active Learning and Effort Estimation: Finding the Essential Content of Software Effort Estimation Data," IEEE Transactions on Software Engineering, vol. 99, no. 1, pp. 1, , 5555. | |||
| BibTex | x | ||
| @article{ 10.1109/TSE.2012.88, author = {Ekrem Kocaguneli and Tim Menzies and Jacky Keung and David Cok and Ray Madachy}, title = {Active Learning and Effort Estimation: Finding the Essential Content of Software Effort Estimation Data}, journal ={IEEE Transactions on Software Engineering}, volume = {99}, number = {1}, issn = {0098-5589}, year = {5555}, pages = {1}, doi = {http://doi.ieeecomputersociety.org/10.1109/TSE.2012.88}, publisher = {IEEE Computer Society}, address = {Los Alamitos, CA, USA}, } | |||
| RefWorks Procite/RefMan/Endnote | x | ||
| TY - JOUR JO - IEEE Transactions on Software Engineering TI - Active Learning and Effort Estimation: Finding the Essential Content of Software Effort Estimation Data IS - 1 SN - 0098-5589 SP EP EPD - 1 A1 - Ekrem Kocaguneli, A1 - Tim Menzies, A1 - Jacky Keung, A1 - David Cok, A1 - Ray Madachy, PY - 5555 KW - Estimation KW - Indexes KW - Labeling KW - Frequency selective surfaces KW - Euclidean distance KW - Complexity theory KW - Principal component analysis KW - k-NN KW - software cost estimation KW - active learning KW - analogy VL - 99 JA - IEEE Transactions on Software Engineering ER - | |||
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/TSE.2012.88
Background: Do we always need complex methods for software effort estimation (SEE)? Aim: To characterize the essential content of SEE data; i.e. the least number of features and instances required to capture the information within SEE data. If the essential content is very small then (1) the contained information must be very brief and (2) the value-added of complex learning schemes must be minimal. Method: Our QUICK method computes the Euclidean distance between rows (instances) and columns (features) of SEE data; then prunes synonyms (similar features) and outliers (distant instances); then assesses the reduced data by comparing predictions from (1) a simple learner using the reduced data and (2) a state-of-the-art learner (CART) using all data. Performance is measured using hold-out experiments and expressed in terms of mean and median MRE, MAR, PRED(25), MBRE, MIBRE, or MMER. Results: For 18 data sets, QUICK pruned 69% to 96% of the training data (median=89%). K=1 nearest neighbor predictions (in the reduced data) performed as well as CART’s predictions (using all data). Conclusion: The essential content of some SEE data sets is very small. Complex estimation methods may be over-elaborate for such data sets and can be simplified. We offer QUICK as an example of such a simpler SEE method.
Index Terms:
Estimation,Indexes,Labeling,Frequency selective surfaces,Euclidean distance,Complexity theory,Principal component analysis,k-NN,software cost estimation,active learning,analogy
Citation:
Ekrem Kocaguneli, Tim Menzies, Jacky Keung, David Cok, Ray Madachy, "Active Learning and Effort Estimation: Finding the Essential Content of Software Effort Estimation Data," IEEE Transactions on Software Engineering, 11 April 2013. IEEE computer Society Digital Library. IEEE Computer Society, <http://doi.ieeecomputersociety.org/10.1109/TSE.2012.88>
Usage of this product signifies your acceptance of the Terms of Use.

