Computationele leertheorie

De computationele leertheorie is een onderdeel van de theoretische informatica waarin algoritmes die gebruikt worden in het machinaal leren worden geanalyseerd.

Overzicht

Binnen de computationele leertheorie wordt voornamelijk onderzoek gedaan naar een manier van inductief leren die supervised learning wordt genoemd. Bij supervised learning krijgt een algoritme voorbeelden (samples), die door de supervisor op een bepaalde bruikbare manier worden gelabeld. De samples kunnen bijvoorbeeld beschrijvingen van paddenstoelen zijn, en de labels kunnen aangeven of de paddenstoelen eetbaar zijn of niet. Het algoritme leidt een classificatiefunctie af uit de eerder gelabelde samples. Deze classificatiefunctie kent etiketten toe aan samples, inclusief samples die het algoritme nog nooit eerder is tegengekomen. Het doel van een algoritme voor supervised leren is om de prestatie op een bepaald gebied te optimaliseren, bijvoorbeeld het minimaliseren van het aantal fouten dat gemaakt wordt bij nieuwe samples.

Computationele leertheoretici bestuderen, naast de grenzen van algoritmes voor machinaal leren, ook de tijdscomplexiteit en de haalbaarheid van het leren. In de computationele leertheorie wordt een berekening als haalbaar beschouwd als het in de polynomiale tijd uitgevoerd kan worden. Er zijn twee soorten resultaten bij tijdscomplexiteit:

positieve resultaten laten zien dat een klasse van functies in polynomiale tijd kan worden geleerd;
negatieve resultaten laten zien dat een klasse van functies niet in in polynomiale tijd kan worden geleerd.

Negatieve resultaten hangen meestal af van veronderstellingen. Veel voorkomende veronderstellingen bij negatieve resultaten zijn:

Computationele complexiteit: P ≠ NP
Cryptografie - eenrichtingsfuncties bestaan.

Er bestaan verschillende methodes in de computationele leertheorie. Ze verschillen in de veronderstellingen over de principes van gevolgtrekking die worden gebruikt om algemene conclusies te trekken op basis van een beperkt aantal gegevens. Hiertoe behoren de gebruikte vorm van kansrekening (bijvoorbeeld frequentistische kansrekening en Bayesiaanse kansrekening) en de verschillende veronderstellingen over de productie van samples. Voorbeelden van de verschillende benaderingen zijn:

probably approximately correct learning (PAC-leren), ontwikkeld door Leslie Valiant;
VC-theorie, ontwikkeld door Vladimir Vapnik;
Bayesiaanse statistiek, ontstaan uit onderzoek dat werd begonnen door Thomas Bayes.
algoritmische leertheorie, ontwikkeld door E.M. Gold;
Online machinaal leren, ontwikkeld door Nick Littlestone.

De computationele leertheorie heeft een aantal praktische algoritmes opgeleverd. Op basis van de PAC-theorie ontstond bijvoorbeeld boosting, de VC-theorie leverde support vector machines op en met behulp van de Bayesiaanse statistiek ontwikkelde Judea Pearl probabilistische netwerken.

Zie ook

Informatietheorie

Bibliografie

Onderzoeken

Angluin, D. 1992. ‘Computational learning theory: Survey and selected bibliography.’ In: Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing (May 1992), blz. 351-369. http://portal.acm.org/citation.cfm?id=129712.129746
Haussler, D. 1990. ‘Probably approximately correct learning.’ In: AAAI-90 Proceedings of the Eight National Conference on Artificial Intelligence, Boston, MA. blz. 1101-1108. American Association for Artificial Intelligence. https://citeseer.ist.psu.edu/haussler90probably.html

VC theorie

Vapnik, V. and A. Chervonenkis. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2). blz. 264-280.

Feature selection

Dhagat, A. and L. Hellerstein. 1994. PAC learning with irrelevant attributes. In Proceedings of the IEEE Symp. on Foundation of Computer Science. https://citeseer.ist.psu.edu/dhagat94pac.html

Inductieve gevolgtrekking

Gold, E.M., Language identification in the limit. Information and Control. 10. blz. 447-474.

Leren van Optimale O notatie

Goldreich, Oded, en Dana Ron. On universal learning algorithms. https://citeseer.ist.psu.edu/69804.html

Negatieve resultaten

Kearns, M., en Leslie Valiant. 1989. ‘Cryptographic limitations on learning boolean formulae and finite automata.’ In: Proceedings of the 21st Annual ACM Symposium on Theory of Computing, blz. 433-444. https://citeseer.ist.psu.edu/kearns89cryptographic.html

Boosting

Schapire, Robert E., 1990. ‘The strength of weak learnability.’ In: Machine Learning, 5(2). Blz. 197-227. https://citeseer.ist.psu.edu/schapire90strength.html

Ockhams scheermes

Blumer, A., A. Ehrenfeucht, D. Haussler en M.K. Warmuth. 1987. ‘Occam’s razor’. In: Inf.Proc.Lett. 24. Blz. 377-380.
Blumer, A., A. Ehrenfeucht, D. Haussler en M. K. Warmuth. 1989. ‘Learnability and the Vapnik-Chervonenkis dimension.’ In: Journal of the ACM 36(4). blz. 929-865.

Probably approximately correct learning

Valiant, L., ‘A Theory of the Learnable.’ In: Communications of the ACM 27(11). blz. 1134-1142.

Foutentolerantie

Kearns, Micheal, en Ming Li. 1993. ‘Learning in the presence of malicious errors.’ In: SIAM Journal on Computing 22(4). blz. 807-837. https://citeseer.ist.psu.edu/kearns93learning.html
Kearns, M. 1993. ‘Efficient noise-tolerant learning from statistical queries.’ In: Proceedings of the Twenty-Fifth Annual ACM Symposium on Theory of Computing blz. 392-401. https://citeseer.ist.psu.edu/kearns93efficient.html

Equivalentie

Haussler, D., M. Kearns, N. Littlestone en M. Warmuth. 1988. ‘Equivalence of models for polynomial learnability.’ In: Proc. 1st ACM Workshop on Computational Learning Theory blz. 42-55.
Pitt, L. en M. K. Warmuth. 1990. ‘Prediction preserving reduction’ In: Journal of Computer System and Science 41 blz 430-467.

Externe links

On-line boek: Information Theory, Inference, and Learning Algorithms, door David MacKay, geeft een nauwkeurige beschrijving van Bayesiaanse benadering van machinaal leren.
Bespreking van An Introduction to Computational Learning Theory
Bespreking van The Nature of Statistical Learning Theory
Basics of Bayesian inference