This Article 
 Bibliographic References 
 Add to: 
A New Approach to the Design of Reinforcement Schemes for Learning Automata: Stochastic Estimator Learning Algorithms
August 1994 (vol. 6 no. 4)
pp. 649-654

A new class of learning automata is introduced. The new automata use a stochastic estimator and are able to operate in nonstationary environments with high accuracy and a high adaptation rate. According to the stochastic estimator scheme, the estimates of the mean rewards of actions are computed stochastically. So, they are not strictly dependent on the environmental responses. The dependence between the stochastic estimates and the deterministic estimator's contents is more relaxed when the latter are old and probably invalid. In this way, actions that have not been selected recently have the opportunity to be estimated as "optimal", to increase their choice probability, and, consequently, to be selected. Thus, the estimator is always recently updated and consequently is able to be adapted to environmental changes. The performance of the Stochastic Estimator Learning Automaton (SELA) is superior to the previous well-known S-model ergodic schemes. Furthermore, it is proved that SELA is absolutely expedient in every stationary S-model random environment.

[1] S. Lakshmivarahan and M. A. L. Thathachar, "Absolutely expedient learning algorithms for stochastic automata,"IEEE Trans. Syst., Man. Cybernetics, vol. SMC-3, pp. 281-286, May 1973.
[2] M. A. L. Thathachar and P. S. Sastry, "A Class of rapidly converging algorithms for learning automata,"IEEE Trans. Syst., Man, Cybernetics, vol. SMC-15, no. 1, pp. 168-175, Jan./Feb. 1985.
[3] G. I. Papadimitriou, "Hierarchical discretized pursuit nonlinear learning automata with rapid convergence and high accuracy," submitted for publication.
[4] R. Viswanathan and K. S. Narendra, "Stochastic automata models with applications to learning systems,"IEEE Trans. Syst., Man, Cybernetics, vol. SMC-3, pp. 107-111, Jan. 1973.
[5] R. Simha and J. F. Kurose, "Relative reward strength algorithms for learning automata,"IEEE Transactions on Systems, Man and Cybernetics, vol. 19, pp. 388-398, Mar./Apr. 1989.
[6] K. S. Narendra and M. A. L. Thathachar, "Learning automata: A survey,"IEEE Trans. Syst., Man, Cybernetics, vol. SMC-4, no. 4, pp. 323-334, July 1974.
[7] K. S. Narendra and S. Lakshmivarahan, "Learning automata: A critique,"J. Cybernetics and Inform. Sci., vol. 1, pp. 53-66, 1977.
[8] O. V. Nedzelnitski and K. S. Narendra, "Nonstationary models of leaning automata routing in data communication networks,"IEEE Transactions on Systems, Man and Cybernetics, vol. SMC-17, no. 6, pp. 1004-1015, Nov./Dec. 1987.
[9] G. I. Papadimitriou and D. G. Maritsas, "WDM passive star networks: Receiver collisions avoidance algorithms using multifeedback learning automata," in17th IEEE Conf. Local Comput. Networks, Minneapolis, MN, USA, 13-16 Sept. 1992.

Index Terms:
finite automata; stochastic automata; unsupervised learning; reinforcement schemes; learning automata; stochastic estimator learning algorithms; stochastic estimator; nonstationary environments; high adaptation; SELA; Stochastic Estimator Learning Automaton; S-model ergodic scheme; absolute expediency
G.I. Papadimitriou, "A New Approach to the Design of Reinforcement Schemes for Learning Automata: Stochastic Estimator Learning Algorithms," IEEE Transactions on Knowledge and Data Engineering, vol. 6, no. 4, pp. 649-654, Aug. 1994, doi:10.1109/69.298183
Usage of this product signifies your acceptance of the Terms of Use.