Internet and Web Applications and Services, International Conference on (2010)
May 9, 2010 to May 15, 2010
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICIW.2010.80
In this article, we propose an improved structured and progressive electronic dictionary for the Arabic language (iSPEDAL) which can be presented in the form of a relational database or in the form of an XML document which can be easily exploitable using suitable query languages. Indeed, many Arabic dictionaries are found but are not structured and not directly exploitable since they are in flat textual files form. iSPEDAL doesn’t contain any duplicated data (roots, prefixes, suffixes, the infixes, the patterns and the derived words). Moreover, for a given word, it provides links to its root, to their associated affixes, and to its patterns. iSPEDAL is supplied automatically from one or several traditional textual dictionaries and is enriched permanently with any Arabic textual corpus using system that we built. This system is composed of a Parser, a Selector, a Classifier, an Extractor, a Comparator, an Analyzer, and a Validator. The Parser allows the transformation of a textual source (dictionary or textual corpus) into a set of words. The Selector determines if a word is new or already exists in iSPEDAL. The Classifier allows to classify a given word and to add it to iSPEDAL as a root or as a derived word. The Extractor uses the Arabic extraction method to deduce the root of all words arriving to this component without their root or any indication about their root. The Comparator permits to avoid duplication of roots, affixes or patterns in iSPEDAL. The Analyzer allows the extraction of the affixes and the pattern from a derived word and of its root. The Validator can validate the information (word, root, patterns, and affixes) before adding to iSPEDAL database. This dictionary can be used to evaluate the information extraction methods from an Arabic document, given that; the vocabulary of the Arabic language is essentially built from the roots.
Arabic Language, Corpus, Dictionary, Information Extraction, Root
Mohammad Hajjar, Abd El Salam Al Hajjar, Khaldoun Zreik, Patrick Gallinari, "An Improved Structured and Progressive Electronic Dictionary for the Arabic Language: iSPEDAL", Internet and Web Applications and Services, International Conference on, vol. 00, no. , pp. 489-495, 2010, doi:10.1109/ICIW.2010.80