Software

Maui - Multi-purpose automatic topic indexing - maui-indexer.googlecode.com

Maui builds on the keyphrase indexing algorithm Kea (below), but provides additional functionalities:

it allows the assignment of topics to documents based on terms from Wikipedia using Dave Milne's tool Wikipedia Miner. Maui also has many new features that help identify topics more accurately.

Maui performs the following tasks:
  • keyphrase extraction
  • automatic tagging
  • term assignment with a controlled vocabulary
  • subject indexing
  • topic indexing with terms from Wikipedia.
It can also be used for terminology extraction and semi-automatic topic indexing.

About the name: In Māori mythology, Māui is a culture hero.
He fished out the North Island  of New Zealand with a hook
made out of his jaw-bone (above).



KEA

KEA - Keyphrase extraction algorithm - www.nzdl.org/kea

I have extended the original version of the keyphrase extraction algorithm Kea-3.0 (designed for free indexing) into a new version that performs controlled indexing Kea-4.1 (also known as Kea++).

Given a document and a thesaurus or controlled vocabulary (Kea accepts any vocabulary in the SKOS format), Kea selects a list of phrases from this vocabulary describing the document's main topics. (See examples of Kea's performance on different domains).




ELKB
ELKB - Electronic Lexical Knowledge Base www.nzdl.org/elkb

Java package for accessing and exploring Roget's Thesaurus, originally developed by Mario Jarmasz, University of Ottawa. ELKB includes several NLP-applications: for detecting lexical chains in text, determining semantic distance between words and phrases, clustering words based on their meaning and solving a word quiz.



Other interesting projects: