Open-source
No longer maintained, but feel free to have a look.Maui - Multi-purpose automatic topic indexingCurrent repository: github.com/zelandiya/mauiOld repository: maui-indexer.googlecode.comMaui extends the keyphrase indexing algorithm Kea and is a GNU GPL Licensed library.
It performs the following tasks:
keyphrase extraction
automatic tagging
term assignment with a controlled vocabulary, thesaurus or a taxonomy
subject indexing
extracting most relevant concepts and entities from Wikipedia
It can also be used for terminology extraction and semi-automatic topic indexing.
About the name: In Māori mythology, Māui is a culture hero.
He fished out the North Island of New Zealand with a hook
made out of his jaw-bone (above).
KEA - Keyphrase extraction algorithm -
I have extended the original version of the keyphrase extraction algorithm Kea-3.0 (designed for free indexing) into a new version that performs controlled indexing Kea-4.1 (also known as Kea++).
Given a document and a thesaurus or controlled vocabulary (Kea accepts any vocabulary in the SKOS format), Kea selects a list of phrases from this vocabulary describing the document's main topics. (See examples of Kea's performance on different domains).
ELKB - Electronic Lexical Knowledge Base
Java package for accessing and exploring Roget's Thesaurus, originally developed by Mario Jarmasz, University of Ottawa. ELKB includes several NLP-applications: for detecting lexical chains in text, determining semantic distance between words and phrases, clustering words based on their meaning and solving a word quiz.
Other interesting projects:
FLAX (Flexible Language Acquisition Project) by Shaoqun Wu.
Smart Information Retrieval tool Koru by Dave Milne as well
Realistic Books by Veronica Liesaputra
Digital library tool kit Greenstone by Digital Libraries group