First post! (with the Clipmarks Firefox extension)
I wish I could include this in my 'Shared RSS' feed, but hey, what's a boy to do?
edit: It includes linebreaks somehow. Very annoying, can't figure out why either. If anyone knows, please do tell me... I need an Adams-esque readership to fix these things for me, damnit.
The Large Text Compression Benchmark and the Hutter Prize are designed to encourage research in natural language processing (NLP). I argue that compressing, or equivalently, modeling natural language text is "AI-hard". Solving the compression problem is equivalent to solving hard NLP problems such as speech recognition, optical character recognition (OCR), and language translation. I argue that ideal text compression, if it were possible, would be equivalent to passing the Turing test for artificial intelligence (AI), proposed in 1950 [1]. Currently, no machine can pass this test [2]. Also in 1950, Claude Shannon estimated the entropy (compression limit) of written English to be about 1 bit per character [3]. To date, no compression program has achieved this level. In this paper I will also describe the rationale for picking this particular data set and contest rules.
|
No comments:
Post a Comment