Arabic NLP Help

Mar 30, 2009 at 12:30 PM
Edited Mar 30, 2009 at 12:34 PM

Introduction :

---------------------

Description

In this project the students are expected to implement different Arabic Morphological analyzer at word and sentence level. The implementation must be done in a distributed environment to ensure the scalability of the service. The service will then be available to the public through web service architecture.

Requirements

Networking, Distributed Computing, Service Oriented Architecture

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

rooting is useful to improve IR effectiveness.

words in Arabic :
  • nouns .
  • verbs
  • particles.

roots can be 3-4 letters, rarely 5 letters.

stem : any derived word from a root. (root------Template----> Stem ).

word can contain prefix , suffix in it .. thus will add complexity to the process ..

Word = PREFIX + STEM (=root + template)+ SUFFIX

Prefix : waw , fa2 , a2l , ka , la , wal .....
Suffix : ho , hom , ha , kam , ke , ee ...
template : fa3al -> fa3el , fa3al , maf3ol , fa3a3el , fa3ol ....

Vowels in Arabic : Long Vowels { alf , waw ,ya2} . Short Vowels {fat7a , damma , kasra, skoon , madda , shadda }

Arabic Word has different meanings depends on Syntactic Category ... (7afla means bus or alot ....)