Foma - a finite-state compiler and C library
Foma is a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses. It has specific support for many natural language processing applications such as producing morphological analyzers. Although NLP applications are probably the main use of foma, it is sufficiently generic to use for a large number of purposes.
The library contains efficient implementations of all classical automata/transducer algorithms: determinization, minimization, epsilon-removal, composition, boolean operations. Also, more advanced construction methods are available: context restriction, quotients, first-order regular logic, transducers from replacement rules, etc.
Installation
apt install foma
(for Debian/Ubuntu packaging)- source code @ GitHub
- Precompiled binaries for Win32, OSX, Linux
Prerequisites (for compilation)
Most of these, if missing, can be installed on Linux systems with apt install
Features
- Xerox-compatible regular expression and scripting syntax (xfst/lexc)
- Python bindings
- Separate C API for constructing and handling automata
- Import/export from Xerox/AT&T/OpenFST tools
- Separate utility (flookup) for applying automata with various strategies
- Supports flag diacritics
- Contains functions for constraining reduplication (_eq())
Documentation
Cite
If you use foma, you can use the following citation for attribution. Please note that the details regarding the functionality of foma in that article are somewhat obsolete by now.
@inproceedings{hulden2009,
title={Foma: a finite-state compiler and library},
author={Hulden, Mans},
booktitle={Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics},
pages={29--32},
year={2009},
organization={Association for Computational Linguistics}
}