The SIGNUM
lexicon is based on a core of over 115,000
terms, which includes a good part of the local
terms used in Spanish-speaking countries and
technical vocabulary used in many areas of
science and trade. It contains new terms that
are currently used in publications but do not
appear in many dictionaries, as well as seldom
used words found in specialized literature.
The 115,000 lemmas core is supplemented by
the inflections of these words. It encompasses
inflections for gender and number, as well
as inflections for diminutives, augmentatives,
superlatives and pejoratives; all verb conjugations;
the most commonly used enclitics; and also
derivative morphemes, such as "-mente,
-ismo, super-, semi-, pre-, pos-" among
others. Taking all these inflections into account,
the lexicon comprises some 5,000,000 words.
Each term of the lexicon includes dozens
of attributes to provide morphological, grammatical,
semantical and other types of data, such as
a frequency index that tells us how common
or rare the word is.
Two application areas of the SIGNUM lexicon
are worth pointing out: our linguistic engines
draw on the lexicon in a very efficient manner
to perform their processing; and this lexicon
is the basis from which word lists, consisting
of specific types of lemmas and inflected forms,
are generated according to the specific requirements
of an application or user in need of a high-quality
lexicon.
Lexicon Benefits
- The lexicon provides a comprehensive and
updated vocabulary of the Spanish language.
- The contents of the lexicon have undergone
rigorous quality controls to make it error
free and highly reliable.
- This lexicon is categorized and labeled
with morphological and syntactical information.
- The user needing a word list can specify
the type of relevant information required
from our lexicon.