Theory and applications of universal data compression and probability estimation

Calificaciones

Para ver las notas de los alumnos aprobados haga click aquí.
Cualquier consulta respecto a las notas, escriba a eci@dc.uba.ar

Turno:

Tarde (14 a 17 hs)

Idioma:

Inglés

Requisitos:

Conocimientos básicos de probabilidades y estadística

Síntesis:

The class covers the theory and applications of universal data compression, and explores their relation to probability estimation. No prior knowledge is assumed and the material ranges from standard textbook to ongoing research.
Time permitting, topics will include:

Compression and its entropy limit. Standard compression algorithms such as Shannon, Huffman, and arithmetic coding. Showing that they compress certain information processes to their entropy.
Challenges arising when the underlying distribution is unknown, for example when compressing text or images. The universal compression approach. Commonly used universal compression algorithms such as Krichevsky-Trofimov and Lempel-Ziv. Their optimality.
The necessity to compress data over large alphabets. The effect of alphabet size on compression performance. Techniques for data compression over large alphabets.
The relation between universal compression and probability estimation. Estimation of low- probability events, population sizes, and the number of species. The Good-Turing estimator.
The relation between universal compression and investment portfolio theory.
Relations to combinatorial topics such as the Bell and Stirling numbers and to Hardy and Ramanujan’s seminal work on the number of partitions.

Bibliografía

P.S. Laplace. Philosphical essays on probabilities. Springer Verlag, New York.
G.H. Hardy and S. Ramanujan. Asymptotic formulae in combinatory analysis. Proceedings of London Mathematics Society, 17(2):75–115, 1918.
R. Fisher, A. Corbet, and C. Williams. The relation between the number of species and the number of individuals in a random sample of an animal population. jae, 12:42–48, 1943.
C.E. Shannon. A mathematical theory of communication. Bell Systems Technical Journal, 27:379–423, 623–656, 1948. I.J. Good. The population frequencies of species and the estimation of population parameters. Biometrika, 40(3/4):237–264, December 1953.
J. Rissanen. Universal coding, information, prediction, and estimation. IEEE Trans. on Information Theory, 30(4):629–636, July 1984.
R. Thisted and B. Efron. Did shakespeare write a newlydiscovered poem? Biometrika, 74:445–455, 1987.
T. Cover and J. Thomas. Elements of Information Theory. John Wiley and sons., 1991.
T.M. Cover. Universal portfolios. Mathematical Finance, 1(1):1–29, January 1991.
A. Orlitsky, N.P. Santhanam, and J. Zhang. Always Good Turing: Asymptotically optimal probability estimation. Science, 302(5644):427–431, October 17 2003.
A. Orlitsky, N.P. Santhanam, and J. Zhang. Universal compression of memoryless sources over unknown alphabets. IEEE Trans. on Information Theory, 50(7):1469–1481, July 2004.

Profesor:

Alon Orlitsky, ECE Department, University of California, San Diego, USA.

Alon Orlitsky joined the UCSD faculty in 1997, after a two-year stint as a quantitative analyst with D.E. Shaw & Company. From 1986-1996, he was a member of the technical staff at AT&T Bell Labs' Mathematical Sciences Research Center. Orlitsky received his Ph.D. in Electrical Engineering from Stanford University in 1986, and his M.Sc., also from Stanford, in 1982. He did his undergraduate work in mathematics (B.Sc. 1980) and Electrical Engineering (B.Sc. 1981) at Israel's Ben-Gurion University. His honors include an ITT International Fellowship in 1982, and an IEEE W.R.G. Baker best-paper award in 1992. Orlitsky co-edited a book on "Theoretical advances in neural computation and learning".