Subscribe
Issue No.01 - January (2009 vol.31)
pp: 86-98
Dongfang Zhao , Information, Distribution & Marketing, Inc., Atlanta
Li Yang , Western Michigan University, Kalamazoo
ABSTRACT
Most nonlinear data embedding methods use bottom-up approaches for capturing the underlying structure of data distributed on a manifold in high dimensional space. These methods often share the first step which defines neighbor points of every data point by building a connected neighborhood graph so that all data points can be embedded to a single coordinate system. These methods are required to work incrementally for dimensionality reduction in many applications. Because input data stream may be under-sampled or skewed from time to time, building connected neighborhood graph is crucial to the success of incremental data embedding using these methods. This paper presents algorithms for updating $k$-edge-connected and $k$-connected neighborhood graphs after a new data point is added or an old data point is deleted. It further utilizes a simple algorithm for updating all-pair shortest distances on the neighborhood graph. Together with incremental classical multidimensional scaling using iterative subspace approximation, this paper devises an incremental version of Isomap with enhancements to deal with under-sampled or unevenly distributed data. Experiments on both synthetic and real-world data sets show that the algorithm is efficient and maintains low dimensional configurations of high dimensional data under various data distributions.
INDEX TERMS
Pattern Recognition, Models, Geometric, Statistical, Design Methodology, Feature evaluation and selection, Discrete Mathematics, Graph Theory, Graph algorithms, Database Management, Database Applications, Data mining.
CITATION
Dongfang Zhao, Li Yang, "Incremental Isometric Embedding of High-Dimensional Data Using Connected Neighborhood Graphs", IEEE Transactions on Pattern Analysis & Machine Intelligence, vol.31, no. 1, pp. 86-98, January 2009, doi:10.1109/TPAMI.2008.34
REFERENCES
 [1] G. Ausiello, G.F. Italiano, A.M. Spaccamela, and U. Nanni, “On-Line Computation of Minimal and Maximal Length Paths,” Theoretical Computer Science, vol. 95, no. 2, pp. 245-261, Mar. 1992. [2] J.A. Blakeley, P.-A. Larson, and F.W. Tompa, “Efficiently Updating Materialized Views,” Proc. ACM SIGMOD '86, pp. 61-71, May 1986. [3] A.L. Buchsbaum, P.C. Kanellakis, and J.S. Vitter, “A Data Structure for Arc Insertion and Regular Path Finding,” Proc. First Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 22-31, 1990. [4] F. Chin and D. Houck, “Algorithms for Updating Minimal Spanning Trees,” J. Computer and System Sciences, vol. 16, no. 3, pp. 333-344, 1978. [5] T.F. Cox and M.A.A. Cox, Multidimensional Scaling, second ed. Chapman Hall, 2001. [6] P. Demartines and J. Herault, “Curvilinear Component Analysis: A Self-Organizing Neural Network for Nonlinear Mapping of Data Sets,” IEEE Trans. Neural Networks, vol. 8, no. 1, pp. 148-154, Jan. 1997. [7] G. Dong and R. Topor, “Incremental Evaluation of Datalog Queries,” Proc. Fourth Int'l Conf. Database Theory, pp. 282-296, Oct. 1992. [8] S. Even and H. Gazit, “Updating Distances in Dynamic Graphs,” Methods of Operations Research, vol. 49, pp. 371-387, 1985. [9] L.R. Ford and D.R. Fulkerson, Flows in Networks. Princeton Univ. Press, 1962. [10] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979. [11] G.H. Golub and C.F.V. Loan, Matrix Computations. Johns Hopkins Univ. Press, 1996. [12] J.V. Harrison and S.W. Dietrich, “Maintenance of Materialized Views in a Deductive Database: An Update Propagation Approach,” Proc. JICSLP Workshop Deductive Databases, pp. 56-65, 1992. [13] J. Kruskal, “Multidimensional Scaling by Optimizing Goodness-of-Fit to a Nonmetric Hypothesis,” Psychometrika, vol. 29, pp. 1-27, 1964. [14] J. Kruskal, “Comments on a Nonlinear Mapping for Data Structure Analysis,” IEEE Trans. Computers, vol. 20, no. 12, p.1614, Dec. 1971. [15] M.H.C. Law and A.K. Jain, “Incremental Nonlinear Dimensionality Reduction by Manifold Learning,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 3, pp. 377-391, Mar. 2006. [16] J.A. Lee, A. Lendasse, N. Donckers, and M. Verleysen, “A Robust Nonlinear Projection Method,” Proc. Eighth European Symp. Artificial Neural Networks, pp. 13-20, Apr. 2000. [17] D. Matula, “$k$ -Blocks and Ultrablocks in Graphs,” J. Combinatorial Theory B, vol. 24, no. 1, pp. 1-13, Feb. 1978. [18] C. Pang, G. Dong, and K. Ramamohanarao, “Incremental Maintenance of Shortest Distance and Transitive Closure in First-Order Logic and SQL,” ACM Trans. Database Systems, vol. 30, no. 3, pp. 698-721, Sept. 2005. [19] S.T. Roweis and L.K. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, pp. 2323-2326, Dec. 2000. [20] J.J.W. Sammon, “A Nonlinear Mapping for Data Structure Analysis,” IEEE Trans. Computers, vol. 18, no. 5, pp. 401-409, May 1969. [21] O. Shmueli and A. Itai, “Maintenance of Views,” Proc. ACM SIGMOD '84, pp. 240-255, June 1984. [22] P.M. Spira and A. Pan, “On Finding and Updating Spanning Trees and Shortest Paths,” SIAM J. Computing, vol. 4, no. 3, pp. 2015-2021, Sept. 1975. [23] R.R. Tarjan, “Testing Graph Connectivity,” Proc. Sixth Ann. ACM Symp. Theory of Computing, pp. 185-193, 1974. [24] J.B. Tenenbaum, V. de Silva, and J.C. Langford, “A Global Geometric Framework for Nonlinear Dimensionality Reduction,” Science, vol. 290, pp. 2319-2323, Dec. 2000. [25] L. Yang, “$k$ -Edge Connected Neighborhood Graph for Geodesic Distance Estimation and Nonlinear Data Projection,” Proc. 17th Int'l Conf. Pattern Recognition, vol. 1, pp. 196-199, Aug. 2004. [26] L. Yang, “Building $k$ -Edge-Connected Neighborhood Graphs for Distance-Based Data Projection,” Pattern Recognition Letters, vol. 26, no. 13, pp. 2015-2021, Oct. 2005. [27] L. Yang, “Building $k$ Edge-Disjoint Spanning Trees of Minimum Total Length for Isometric Data Embedding,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1680-1683, Oct. 2005. [28] L. Yang, “Building $k$ -Connected Neighborhood Graphs for Isometric Data Embedding,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 5, pp. 827-831, May 2006.