Morphological Tagging for Arabic

MADAMIRA: A new version of MADA, called MADAMIRA, is now available. It is entirely rewritten in Java and requires only the SAMA database mentioned below, but not SVMTools or the SRI Toolkit. It can be obtained here: Download MADAMIRA.

MADA is a Perl-based, full morphological tagger for Modern Standard Arabic developed by Nizar Habash, Owen Rambow and Ryan Roth. It is distributed along with TOKAN, a highly-customizable Arabic tokenizer. Descriptions of both (along with pointers to papers) can be found on the main MADA+TOKAN Web site.

We distribute MADA and TOKAN free of charge for educational, research, and in-house uses. However, please note that MADA currently requires the following tools to function. These tools are not distributed with the MADA package. If they are not already present on your system, they must be downloaded and installed separately before you can use MADA. The required tools are:
MADA and TOKAN are currently created only for Linux/Unix systems. They have not been tested on Windows or Macs.

To obtain MADA+TOKAN, go to the MADA Download location on, read the license agreement, and download the package directly. With this new distribution system, it is no longer necessary to submit signed license forms via fax. Once you have downloaded the package, follow the instructions in the included MADA.README file. Additional information can be found in the included MADA+TOKAN Manual.

In addition, you may want to join the MADA-users mailing list. We use this list to send email announcements to users of MADA and TOKAN, such as when new releases and patches are available.

If you have any questions about MADA or TOKAN, please contact Owen Rambow (<last-name> <at> ccls . columbia . edu).