Tuesday, July 10, 2007

Hutter prize

First post! (with the Clipmarks Firefox extension)

I wish I could include this in my 'Shared RSS' feed, but hey, what's a boy to do?

edit: It includes linebreaks somehow. Very annoying, can't figure out why either. If anyone knows, please do tell me... I need an Adams-esque readership to fix these things for me, damnit.
clipped from cs.fit.edu

The Large Text Compression Benchmark and the Hutter Prize are designed to encourage research in natural language processing (NLP). I argue that compressing, or equivalently, modeling natural language text is "AI-hard". Solving the compression problem is equivalent to solving hard NLP problems such as speech recognition, optical character recognition (OCR), and language translation. I argue that ideal text compression, if it were possible, would be
equivalent to passing the Turing test for artificial intelligence (AI), proposed in 1950 [1]. Currently, no machine can pass this test [2]. Also in 1950, Claude Shannon estimated the entropy (compression limit) of written English to be about 1 bit per character [3]. To date, no compression program has achieved this level. In this paper I will also describe the rationale for picking this particular data set and contest rules.


blog it

No comments: