Estimating the Entropy of Chinese Using the Sliding-Window Entropy Estimator

Main Article Content

Tan Choon-Peng
Yap Saw-Teng

Abstract

Three different text sources, namely a Chinese newspaper, the classical novel “Red Chamber Dream†and the modem prose “The Sahara†are selected for small-sample studies of the entropy of Chinese. We use the sliding window entropy estimator with the window size fixed at 1000 characters. By varying the number of window shifts up to 1000, we obtain entropy estimates of Chinese for the three different text sources. To improve the slow rate of convergence of the sliding window entropy estimator, we adopt the restricted sliding window estimator due to Kontoyiannis et al. Experimental indications are that modem Chinese has an entropy of less than 4.5 bits/character and that this entropy is less than that of classical Chinese.

Downloads

Download data is not yet available.

Article Details

How to Cite
Choon-Peng, T., & Saw-Teng, Y. (2002). Estimating the Entropy of Chinese Using the Sliding-Window Entropy Estimator. Malaysian Journal of Science, 21(1&2), 77–83. Retrieved from https://sare.um.edu.my/index.php/MJS/article/view/8593
Section
Original Articles