Always Learning!

The world through the prism of my mind

Archive for November, 2006

On open e-book standards and whether translating to Esperanto will bring more readers?

Posted by Alexandre Rafalovitch on November 5, 2006

There is a fight brewing between David Rothman of TeleRead and Bill Janssen of Plucker fame. The point of contention (as I understand the issue) is what would be good format to produce e-books in.

Bill’s position is that any format that is not already accepted (specifically not html) is a lock-in and a disadvantage, whether that format is an open standard (like OpenReader) or a proprietary one (like Sony’s BBeB). He advocates using web browsers as ebook readers.

David’s point (and he invokes me in there) is that HTML format is not sufficient for all e-books, mostly due to the layout and browser changes issues. So, if HTML is not sufficient, we have to chose a new format. Thefore, it is better if the format is an open standard that can be implemented and maintained by multiple parties.

I am with David here and mostly for the reasons he pointed out. For my interests (language learning e-books), HTML is not a good enough format. Sure, I could hack HTML into submission for some of my goals, but it will require so much javascript, that it will not work in anything but a full-blown browser. I invite Bill to replicate the functionality of the Pocket e-Sword. so that it works well in IE, Firefox, Opera and Safari. Maybe that’s why Pepper Pad is integrating FBRReader despite already having a built in Firefox web browser.

So, where does Esperanto comes into it? Well, here is Bill’s quote (emphasis is mine):

Trying to standardize on a common “ebook format”, be it some IDPF creation, some OASIS masterpiece, or even the so-called OpenReader, would only be an attempt to force them all to publish in Esperanto, instead of their house languages. They still wouldn’t have customers.

Publishing in Esperanto does not bring customers? Really! I wonder where Bill gets that data. I don’t know how many (human)  languages he speak, but the only reasonable way I could interpret that statement was as “publishing English material in Esperanto would not bring any more English customers”. That could be a a point, where he would be mostly correct. Of course, the market for Esperanto is not English, it is global.

As an example, I want to take the book/movie Night Watch by my favourite author Sergey Lukyanenko. The book started in Russian, was made into the Russian movie with english subtitles, impacted American market and finally was translated (quite well) into English. What about Chinese or Egyptians? Would they be interested in this book? Maybe, but there is no easy way to find out because translation or even subtitling is very expensive.

Except that there is a way. Night Watch has just been translated into Esperanto (announcement in russian). There is even an excerpt available (unfortunately in PDF). Now, the book is accessible to people in China, Egypt or Germany, as long as they can read Esperanto. And if there is enough interest from those people, the book can be translated into their native languages as well to reach to the rest of the audience. The push model of finding the markets suddenly becomes a pull model of market finding you. This is not a new idea, it is already used by newspapers and even Vatican. It is called establishing a beachhead, I believe.
And that’s exactly the strength of open standards. They can expand the audience beyond original planned targets and bring new markets to your solution, adapting the solution to the market needs in the process.

Closed standards control the markets they know about, open standards create new, unplanned markets. I am currently in the market segment, Sony does not want to think about. Do I wait another 5 years for Sony to catch up or do I look for open standard and open source alternatives? There should be no need to guess.

Posted in Esperanto, Publishing, e-books | 2 Comments »

Lirix – computational linguistics aspects

Posted by Alexandre Rafalovitch on November 2, 2006

In my last update on applied computational linguistics, I have written about PodZinger that uses speech recognition to figure out which advertisement to match to the podcast you are searching with their service.

Another company is claiming to do that with songs – Lirix. Their upcoming AdLirix platform is supposed to be so effective that Lirix would be able to give away songs for free and make back the income by embedding well-targeted advertisements.

The devil of course is in details – many songs have so little meaning in them, that it might be a trial to even figure out what they are about manually, never mind automatically at the volume required to fill an attractively large catalog.

Their DEMOFall presentation did not go into that level of details, so I emailed some questions to Lirix people directly. They promptly replied with an example:

…, here’s a lyrical excerpt from a hiphop song named “How We Do” by a rapper named “The Game”. (This song was a big radio hit last year.)

“I put Lamborghini doors on the Escalade
Low-pro so it looks like I’m riding on blades”

In this case, we would tag the specific words “Lamborghini” and “Escalade”, the phrase “low profile”, and the themes “high-end automotive”, “after-market automotive”, and “bling”.

This looks quite advanced, if the algorithm uses true computational methods. Unfortunately, I have doubts that it does.

I can see how Lamborghini could be matched to the high-end automotive subject (named entity recognition, clustering, even database-lookup). I have no idea how they would also connect the sentence above to the after-market automotive.

I suspect that behind the scenes, Lirix will be doing a lot of manual categorisation. I asked my contact about this issue and got the reply that effectively said “good question – no answers at this stage”. Fair enough. If they can do it automatically, they have a strong competitive advantage; if they cannot do, this may mean they cannot scale fast. Either way, they may have a reason to keep quiet for now.

We will wait and see. I imagine the competition for making money from ‘free(ish)’ songs is heating up. Many techniques will be tried and Natural Language Processing algorithms may prove to be important for the successful business.

Posted in Computational Linguistics | Leave a Comment »

There! are the blogs of computational linguists

Posted by Alexandre Rafalovitch on November 1, 2006

Nine months ago, I had asked “Where are the blogs of computational linguists?” Now, there is an answer.

The Association for Computational Linguistics has moved its documents (formerly ACL Universe) into the Wiki and there is now a separate page for blogs. It has all of the blogs I found so far and more. It even has my blog in it. Must be scraping the bottom of the barrel :-) . I don’t know why more CL people are not blogging.

The best news however is that – being a wiki – it can be updated by anybody. So if you have a blog about computational linguistics, add yourself in there. This is not a Wikipedia, (modest) self-promotion is allowed.

You can also register and add the pages you are interested in to your watchlist page, though at this point there is no RSS or email notification when the change occurs.

Posted in Computational Linguistics | Leave a Comment »