No longer maintained, but feel free to have a look.Maui - Multi-purpose automatic topic indexingCurrent repository: github.com/zelandiya/mauiOld repository: maui-indexer.googlecode.comMaui extends the keyphrase indexing algorithm Kea and is a GNU GPL Licensed library.
It performs the following tasks:
term assignment with a controlled vocabulary, thesaurus or a taxonomy
extracting most relevant concepts and entities from Wikipedia
It can also be used for terminology extraction and semi-automatic topic indexing.
About the name: In Māori mythology, Māui is a culture hero.
He fished out the North Island of New Zealand with a hook
made out of his jaw-bone (above).
KEA - Keyphrase extraction algorithm -
I have extended the original version of the keyphrase extraction algorithm Kea-3.0 (designed for free indexing) into a new version that performs controlled indexing Kea-4.1 (also known as Kea++).
Given a document and a thesaurus or controlled vocabulary (Kea accepts any vocabulary in the SKOS format), Kea selects a list of phrases from this vocabulary describing the document's main topics. (See examples of Kea's performance on different domains).
ELKB - Electronic Lexical Knowledge Base
Java package for accessing and exploring Roget's Thesaurus, originally developed by Mario Jarmasz, University of Ottawa. ELKB includes several NLP-applications: for detecting lexical chains in text, determining semantic distance between words and phrases, clustering words based on their meaning and solving a word quiz.
Other interesting projects:
Smart Information Retrieval tool Koru by Dave Milne as well
Realistic Books by Veronica Liesaputra