Morphological Tagging for Arabic
MADAMIRA: A new version of MADA, called MADAMIRA, is now available. It is
entirely rewritten in Java and requires only the SAMA database mentioned
below, but not SVMTools or the SRI Toolkit. It
can be obtained here:
MADA is a Perl-based, full morphological tagger for Modern Standard Arabic
developed by Nizar Habash, Owen Rambow and Ryan Roth. It is distributed along with TOKAN,
a highly-customizable Arabic tokenizer. Descriptions of both
(along with pointers to papers) can be found on the main MADA+TOKAN Web site.
We distribute MADA and TOKAN free of charge for educational, research, and in-house uses. However, please note that MADA currently requires the following tools to function. These tools are not distributed with the MADA package. If they are not already present on your system, they must be downloaded and installed separately before you can use MADA. The required tools are:
- Perl 5.8 or later
- SVMTools 1.3.1
This is an update of version 1.3, which fixes a major Perl-incompatibility bug. We strongly recommend all users update to this version.
- The SRI Toolkit
Specifically, the disambig utility is the only tool needed.
- Either SAMA 3.1 (LDC catalog number LDC2010L01) or BAMA 2.0 (LDC catalog number LDC2004L02)
SAMA 3.1 is the successor of BAMA 2.0. Because MADA was built with it in mind, SAMA 3.1 is preferred. However, SAMA 3.1 may not yet be available to the general public. Therefore, MADA can also be configured to use BAMA 2.0 (at the cost of a small, ~2-4% absolute accuracy drop). Unfortunately, obtaining BAMA 2.0 currently requires obtaining a license from the LDC (see the above link for details). Note that MADA does not actually make use of the SAMA or BAMA software; instead, MADA utilizes the prefix/word/suffix lexicon files that are included with these tools.
MADA and TOKAN are currently created only for Linux/Unix systems. They have not been tested on Windows or Macs.
To obtain MADA+TOKAN, go to the MADA Download location on foliodirect.net, read the license agreement, and download the package directly. With this new distribution system, it is no longer necessary to submit signed license forms via fax. Once you have downloaded the package, follow the instructions in the included MADA.README file. Additional information can be found in the included MADA+TOKAN Manual.
In addition, you may want to join the MADA-users mailing list. We use this list to send email announcements to users of MADA and TOKAN, such as when new releases and patches are available.
If you have any questions about MADA or TOKAN, please contact Owen Rambow
(<last-name> <at> ccls . columbia . edu).