Posted by Alexandre Rafalovitch on November 29, 2006
I frequently say that public domain books are a great source of further innovation and small business ideas. Today I found another example that brings together several of the themes I track: Language acquisition, Publishing and Public Domain books.
Mark Phillips has taken Tarzan of the Apes book that is now available in public domain and rewritten parts of it to teach grammar as part of the story. The resulting self-published book Tarzan and Jane’s Guide to Grammar (or Amazon link) has been selling quite well in schools for a year or so. The book’s idea is similar to the one of The Twisted Doors, but is targetted at English readers wishing to increase their vocabulary rather than at learners of a foreign language. It also feels to me like a precursor to my 3rd idea from the earlier article on How e-books could revolutionize language-learning.
About a month ago (from what I can tell), Mark decided to push the book to the general public more aggressively. He set up the website and sent some copies out as promotion. I heard of it in one of the Grammar Girl‘ podcasts.
He did not contact me (this is not a sponsored post), but I liked the idea of the book since – as I mentioned at the start – it connects to multiple of my interests. I hope his work will become more known and spur other people to experiment with using public domain material in innovative ways. Especially, if they are innovative language-learning ways.
Posted in Language acquisition, Publishing | Leave a Comment »
Posted by Alexandre Rafalovitch on November 15, 2006
Lots of new sightings of CL/NLP technologies since the last update:
- On the commercial speech recognition front, Nexidia is currently in beta with phonemes-mapping audio search. But don’t go to the company’s site. Instead, read the explanation and collection of links is in the ResourceShelf’s article.
- If, instead of waiting for commercial offerings, you would like to contribute to the open source one, VoxForge always needs more transcribed audio recordings to improve their Command and Control acoustic models.
- Switching from speech recognition to the speech synthesis, E-health-insider has a fascinating podcast from the field (Somalia), with practical example of how even an imperfect technology can bring tangible benefits to people in need.
- Text generation might also soon become a more interesting topic. Indiana university recently launched The Synthetic Worlds Initiative and – as part of it – very recently started ARDEN project that will try to produce a synthetic 3D world in the universe of William Shakespeare. They are not planning to have bots in there, but can they resist it, given that a virtual world interface and availability of full texts of Shakespeare’s works make it ideal playground for advanced A.L.I.C.E competitions.
- If you like text classifications tasks and/or machine learning, there is an Agnostic Learning vs. Prior Knowledge Challenge & Workshop. Dataset Nova is the one for text classification, there are others for different machine learning tasks. There might even be a small prize.
- For those who only get out of bed for big(ger) prizes, there is the Second Annual CyC prize. The prize is $2,500, but to get it you must publish an academic paper that has something to do with CyC’s knowledge base of assertions about the everyday world. This may or may not be a hard task; you can judge it for yourself by checking out the winners of the last year’s prize. The deadline is February 21st, 2007 and some people may have had an early start since the competition has been running since February this year.
- Named Entities and Semantic Web come together in the demo put together by InFact that parsed and cross-linked public domain books in a web of names, places and relations. Just don’t try to manually change the urls; the implementation itself is a bit brittle (company was notified). Speaking on a more abstract level, this demo also shows benefits of actually having unrestricted full-text access to books. I feel that public domain books are just waiting to be remixed and experimented with beyond what we see now.
- Finally, those who missed AOL’s attempt to beat Google’s release of n-gram models, by releasing and then withdrawing 20 million web queries that included private data can still get access to that data from multiple websites, including one with a semi-useful search interface. One wonders if AOL’s executive responsible for the release decision likes the proverbs, specifically the one that goes “A word spoken is past recalling”.
Posted in Computational Linguistics | Leave a Comment »
Posted by Alexandre Rafalovitch on November 13, 2006
I keep hearing the claims that one should try learning a foreign language like children do. Roseta Stone is a famous example of software that convinces people that they can do just that.
I have a couple of problems with that approach.
First one is that even if the immersion method was sufficient, it would have to be as immersive as what a child gets – 24 hours a day minus sleep. One hour a day is not sufficient in my opinion. And if you are studying foreign language in an immersive environment, Roseta Stone is just a way to concentrate your mind more than anything. And with its price tag, a very expensive way to concentrate the mind.
The other reason is that when people say immersive environment, they usually mean no grammar rules. Just listening and talking, reading and writing. That’s what children do, right?
Wrong! At least it is wrong for the Russian language. School in USSR used to have a class called Russian Language which run for several school years. It was not about the Russian literature, that was a second, separate class. Russian Language class was about learning the orthography and grammar of our own mother tongue and – trust me! – it was hard.
Declensions were hell. Russian language has six of them and we had to have mnemonics to just remember their order (I still remember «Иван Родил Девчонку, Велел Тащить Пелёнку») The rules for when to write soft and hard sign letters were a story of their own. And dictations! That is when you think that the teacher’s whole purpose in life is to make you want to cry. When every misspelling and a missing coma would drop your grade! And then (the next year) you get rephrasing exercises where you listen to a story three times and have to write it out in your own words afterwards. And you are marked for style as well as orthography.
And, I am sorry to say, we made fun of Georgians and Armenians, because – trying to learn their own complex languages – they never sounded quite right speaking Russian, even though they were also part of USSR. We learned how to say things correctly, because we had anecdotes being told and retold on exactly how they got it wrong.
I always admire people who decide to learn Russian and persevere with its alphabet, its grammar and its pronunciation. But those who think that ‘learning like children’ approach means learning through absorption and with no grammar study, I don’t have much time for. It did not work for us, when we were children. I don’t see how it will work for you, however much you will pay for the software with the fancy claims on its cover.
Posted in Language acquisition | Leave a Comment »
Posted by Alexandre Rafalovitch on November 9, 2006
Amie St is a very interesting business idea with a good execution. They are music discovery and store with a twist – songs start free and the price goes up based on how popular they get. To encourage ratings and downloads, they even pay to the users who discovered good songs early and recommended them to others. And the songs never get as expensive as iTunes.
I like free music. I have enough music in the personal collection not to buy new tracks for a while (especially not from RIAA members), but I will listen to free songs to see if something really special will catch my attention.
I have tried Pandora and liked it, but obviously those songs are not for download and cannot be repeated easily (if one cares to stay legal anyway). I have tried iTunes’ free single of the week and was very disappointed.
Amie St makes it easy to discover good songs in the genres I like and will certainly keep me coming back for more.
One feature I wish they had is an ability to subscribe to a channel (artist, genre, price-range) and have it downloadable as a podcast of samples. Price range as a parameter might be useful for somebody ready to buy good songs, so they could mark them at 50-60 cents range and get a good deal, yet filtered by crowd’s rating. The delivery could be either individual 30/45 seconds files or one big file with chapter marks to skip easily and with embeded info/album art. That way I could listen to the songs on the move and buy/download them later. This obviously also increases user’s stickiness to the site, as the user interacts with the site even when not on the computer.
Or maybe they just need to open an API and somebody else will do that for them.
(Update: Another good review of the service is at: CoolBusinessIdeas.com)
Posted in web2.0 | 2 Comments »
Posted by Alexandre Rafalovitch on November 7, 2006
The Royal Scottish Country Dance Society has updated their website. It now looks prettier, runs on more modern technological base and promises better up-to-date information.
This is the next step after the redesign of the society magazine to move forward in times, while preserving the original goals of the Society.
It is good to see the society recognising that internet is worth putting time and effort into, especially with members and branches all over the world.
Posted in RSCDS | Leave a Comment »