Always Learning!

The world through the prism of my mind

Computational Linguistics – News update for Oct 9, 2006

Posted by Alexandre Rafalovitch on October 9, 2006

Couple of interesting things happened recently in the Computational Linguistics related fields that I thought were worth linking to:

  • ACM Queue had an interview with Mike Cohen of Google (previously of Nuance Communications) discussing recent advances and changes in speech recognition technology.
  • Pluggd, with its hotly discussed demo of HearHere, uses speech recognition and some sort of topic clustering to show a time heatmap of your search keyword inside the podcast. The idea is that the heatmap allows you to skip straight to the discussion of the topic you are interested in and ignore parts unrelevant to your interest (and adverts). They have a short presentation about the product in DEMOfall archives. Warning: sometimes it takes a couple of tries to get DEMO video to play (depending on system load).
  • PodZinger that already used speech recognition to search within podcasts for search terms, just added an advertising platform that is based on classifying by the content and the search term.
  • Netflix has created a challenge where they provide recommendation data, so that other people can try developing an algorithm better than Netflix’s own data mining team. With the big prise of a million dollars (1,000,000$), there is a lot of competitors already. While the dataset provided only has movie titles and therefore not enough to do any text/description analysis, it is still a huge dataset to try various graph and neural network methods on. Most of the people suggest mashing it up with IMDB or some other movie information database, but that obviously requires additional data matching work.
  • ClearForest on the other hand is only offering 2000 dollars (2000$) in their competition and you have to bring your own data, but at least they provide an API that does named entities recognition. Beats having to load up GATE every time and, who knows, maybe somebody can create another Gutenkarte-style mashup.
  • And to finish on a funny note, maybe you would like the one generated by the STANDUP (popularised writeup): What do you get when you cross a car with a sandwich? A traffic jam.

3 Responses to “Computational Linguistics – News update for Oct 9, 2006”

  1. Hi everybody,
    TermExtractor, my master thesis, is online at the address !!!

    TermExtractor is a software package for automatic
    building, validation and maintenance of glossaries in
    english language.

    TermExtractor extracts terminology consensually
    referred in a specific application domain. The package
    takes as input a corpus of domain documents, parses
    the documents, and extracts a list of “syntactically
    plausible” terms (e.g. compounds, adjective-nouns,
    etc.). Documents parsing assigns a greater importance
    to terms with text layouts (title, bold, italic,
    underlined, etc.). Two entropy-based measures, called
    Domain Relevance and Domain Consensus, are then used.
    Domain Consensus is used to select only the terms
    which are consensually referred throughout the corpus
    documents. Domain Relevance to select only the terms
    which are relevant to the domain of interest, Domain
    Relevance is computed with reference to a set of
    contrastive terminologies from different domains.
    Finally, extracted terms are further filtered using
    Lexical Cohesion, that measures the degree of
    association of all the words in a terminological
    string. Accept files formats are: txt, pdf, ps, dvi,
    tex, doc, rtf, ppt, xls, xml, html/htm, chm, wpd and
    also zip archives.

  2. Francesco,
    This is not really a good place to announce new software/service. Nobody will find it.

    You will have much better luck at any of the repositories listed at ACL Wiki

  3. […] Lots of new sightings of CL/NLP technologies since the last update: […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: