Update on grammar challenge

Posted by Alexandre Rafalovitch on May 2, 2006

About a month ago, Steve Kaufmann (creator of theLinguist service) and I had a disagreement on whether grammar is important in studying language. In summary, he thinks that the grammar should be studied last if at all, while I think grammar allows one to create a mental infrastructure that would make learning easier.

Since Steve is currently studying Russian, we agreed to have a Skype discussion where he can demonstrate his Russian to me. Today, this happened. I am not sure if he recorded it as originally planned or not, but it was interesting either way.
I have the say that his pronunciation is quite good. The words he used were clear and only a couple of times he slipped into a strong accent, usually around soft and hard vowels. In one instance, I could hear him self-correct an accented pronunciation (нэ->не), so the correct sounds are obviously very much on his mind.

The grammar was a bit more difficult. The cases were not correct and neither was plural/singular use. That of course is where I think grammar would help and Steve thinks that just remembering appropriate sentence fragments in context (из дома, домой) will do the trick. I would be very impressed if that turns out to be true.
So, the challenge is settled with a 3-2 score (Steve is winning on the pronunciation), but more importantly I have something to look forward to in another 9-12 month when we should be able to converse in my mother tongue instead of his.

Screencast about IBM’s UIMA text processing architecture

Posted by Alexandre Rafalovitch on May 1, 2006

UIMA is a new-ish framework on the block competing/cooperating with GATE framework to do NLP processing, annotation and search. Jon Udell recorded a screencast with a couple of IBM-ers to show off and explain UIMA.

While the screencast moves a little slow for a person familiar with sentence tokenizing principles, it is still interesting to see how it hangs together. 

The only problem I see with UIMA  is the confusion in licensing. One version of UIMA is under alphaWorks (you'll pay us later) license; another under uncommon Common Public License; yet another one is under IBM commercial license. This may or may not matter for a researcher, but is still something that needs to be carefully considered.

Still, running under Eclipse UIMA (which btw. stands for Unstructured Information Management Architecture) has a very nice interface. Nicer than GATE, which I find quite clunky. And it is theoretically possibly to plug GATE and OpenNLP components into UIMA with no or little wrapper coding.

