Always Learning!

The world through the prism of my mind

Archive for May, 2006

Update on grammar challenge

Posted by Alexandre Rafalovitch on May 2, 2006

About a month ago, Steve Kaufmann (creator of theLinguist service) and I had a disagreement on whether grammar is important in studying language. In summary, he thinks that the grammar should be studied last if at all, while I think grammar allows one to create a mental infrastructure that would make learning easier.

Since Steve is currently studying Russian, we agreed to have a Skype discussion where he can demonstrate his Russian to me. Today, this happened. I am not sure if he recorded it as originally planned or not, but it was interesting either way.
I have the say that his pronunciation is quite good. The words he used were clear and only a couple of times he slipped into a strong accent, usually around soft and hard vowels. In one instance, I could hear him self-correct an accented pronunciation (нэ->не), so the correct sounds are obviously very much on his mind.

The grammar was a bit more difficult. The cases were not correct and neither was plural/singular use. That of course is where I think grammar would help and Steve thinks that just remembering appropriate sentence fragments in context (из дома, домой) will do the trick. I would be very impressed if that turns out to be true.
So, the challenge is settled with a 3-2 score (Steve is winning on the pronunciation), but more importantly I have something to look forward to in another 9-12 month when we should be able to converse in my mother tongue instead of his.

Posted in Language acquisition | Leave a Comment »

Screencast about IBM’s UIMA text processing architecture

Posted by Alexandre Rafalovitch on May 1, 2006

UIMA is a new-ish framework on the block competing/cooperating with GATE framework to do NLP processing, annotation and search. Jon Udell recorded a screencast with a couple of IBM-ers to show off and explain UIMA.

While the screencast moves a little slow for a person familiar with sentence tokenizing principles, it is still interesting to see how it hangs together. 

The only problem I see with UIMA  is the confusion in licensing. One version of UIMA is under alphaWorks (you'll pay us later) license; another under uncommon Common Public License; yet another one is under IBM commercial license. This may or may not matter for a researcher, but is still something that needs to be carefully considered.

Still, running under Eclipse UIMA (which btw. stands for Unstructured Information Management Architecture) has a very nice interface. Nicer than GATE, which I find quite clunky. And it is theoretically possibly to plug GATE and OpenNLP components into UIMA with no or little wrapper coding.

Tags: ,

Posted in Computational Linguistics | 1 Comment »