The impact of population size on code growth in GP: Analysis and empirical validation

Riccardo Poli; Nicholas Freitag McPhee; Leonardo Vanneschi

The impact of population size on code growth in GP: Analysis and empirical validation

Riccardo Poli, Nicholas Freitag McPhee, Leonardo Vanneschi

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

The crossover bias theory for bloat [18] is a recent result which predicts that bloat is caused by the sampling of short, unfit programs. This theory is clear and simple, but it has some weaknesses: (1) it implicitly assumes that the population is large enough to allow sampling of all relevant program sizes (although it does explain what to expect in the many practical cases where this is not true, e.g., because the population is small); (2) it does not explain what is meant by its assumption that short programs are unfit. In this paper we discuss these weaknesses and propose a refined version of the crossover bias theory that clarifies the relationship between bloat and finite populations, and explains what features of the fitness landscape cause bloat to occur. The theory, in particular, predicts that smaller populations will bloat more slowly than larger ones. Additionally, the theory predicts that bloat will only be observed in problems where short programs are less fit than longer ones when looking at samples created by fitness-based importance sampling, i.e. samplings of the search space in which fitter programs have a higher probability of being sampled (e.g., the Metropolis-Hastings method). Experiments with two classical GP benchmarks fully corroborate the theory.

Original language	English (US)
Title of host publication	GECCO'08
Subtitle of host publication	Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation 2008
Pages	1275-1282
Number of pages	8
State	Published - 2008
Event	10th Annual Genetic and Evolutionary Computation Conference, GECCO 2008 - Atlanta, GA, United States Duration: Jul 12 2008 → Jul 16 2008

Publication series

Name	GECCO'08: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation 2008

Other

Other	10th Annual Genetic and Evolutionary Computation Conference, GECCO 2008
Country/Territory	United States
City	Atlanta, GA
Period	7/12/08 → 7/16/08

Keywords

Bloat
Genetic programming
Population size

OpenUrl availability

Full text

Cite this

The impact of population size on code growth in GP: Analysis and empirical validation. / Poli, Riccardo; McPhee, Nicholas Freitag; Vanneschi, Leonardo.
GECCO'08: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation 2008. 2008. p. 1275-1282 (GECCO'08: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation 2008).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Poli, R, McPhee, NF & Vanneschi, L 2008, The impact of population size on code growth in GP: Analysis and empirical validation. in GECCO'08: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation 2008. GECCO'08: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation 2008, pp. 1275-1282, 10th Annual Genetic and Evolutionary Computation Conference, GECCO 2008, Atlanta, GA, United States, 7/12/08.

Poli, Riccardo ; McPhee, Nicholas Freitag ; Vanneschi, Leonardo. / The impact of population size on code growth in GP : Analysis and empirical validation. GECCO'08: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation 2008. 2008. pp. 1275-1282 (GECCO'08: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation 2008).

@inproceedings{15cb3ccf2d4349248be5ce58063092b6,

title = "The impact of population size on code growth in GP: Analysis and empirical validation",

abstract = "The crossover bias theory for bloat [18] is a recent result which predicts that bloat is caused by the sampling of short, unfit programs. This theory is clear and simple, but it has some weaknesses: (1) it implicitly assumes that the population is large enough to allow sampling of all relevant program sizes (although it does explain what to expect in the many practical cases where this is not true, e.g., because the population is small); (2) it does not explain what is meant by its assumption that short programs are unfit. In this paper we discuss these weaknesses and propose a refined version of the crossover bias theory that clarifies the relationship between bloat and finite populations, and explains what features of the fitness landscape cause bloat to occur. The theory, in particular, predicts that smaller populations will bloat more slowly than larger ones. Additionally, the theory predicts that bloat will only be observed in problems where short programs are less fit than longer ones when looking at samples created by fitness-based importance sampling, i.e. samplings of the search space in which fitter programs have a higher probability of being sampled (e.g., the Metropolis-Hastings method). Experiments with two classical GP benchmarks fully corroborate the theory.",

keywords = "Bloat, Genetic programming, Population size",

author = "Riccardo Poli and McPhee, {Nicholas Freitag} and Leonardo Vanneschi",

year = "2008",

language = "English (US)",

isbn = "9781605581309",

series = "GECCO'08: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation 2008",

pages = "1275--1282",

booktitle = "GECCO'08",

note = "10th Annual Genetic and Evolutionary Computation Conference, GECCO 2008 ; Conference date: 12-07-2008 Through 16-07-2008",

}

TY - GEN

T1 - The impact of population size on code growth in GP

T2 - 10th Annual Genetic and Evolutionary Computation Conference, GECCO 2008

AU - Poli, Riccardo

AU - McPhee, Nicholas Freitag

AU - Vanneschi, Leonardo

PY - 2008

Y1 - 2008

N2 - The crossover bias theory for bloat [18] is a recent result which predicts that bloat is caused by the sampling of short, unfit programs. This theory is clear and simple, but it has some weaknesses: (1) it implicitly assumes that the population is large enough to allow sampling of all relevant program sizes (although it does explain what to expect in the many practical cases where this is not true, e.g., because the population is small); (2) it does not explain what is meant by its assumption that short programs are unfit. In this paper we discuss these weaknesses and propose a refined version of the crossover bias theory that clarifies the relationship between bloat and finite populations, and explains what features of the fitness landscape cause bloat to occur. The theory, in particular, predicts that smaller populations will bloat more slowly than larger ones. Additionally, the theory predicts that bloat will only be observed in problems where short programs are less fit than longer ones when looking at samples created by fitness-based importance sampling, i.e. samplings of the search space in which fitter programs have a higher probability of being sampled (e.g., the Metropolis-Hastings method). Experiments with two classical GP benchmarks fully corroborate the theory.

AB - The crossover bias theory for bloat [18] is a recent result which predicts that bloat is caused by the sampling of short, unfit programs. This theory is clear and simple, but it has some weaknesses: (1) it implicitly assumes that the population is large enough to allow sampling of all relevant program sizes (although it does explain what to expect in the many practical cases where this is not true, e.g., because the population is small); (2) it does not explain what is meant by its assumption that short programs are unfit. In this paper we discuss these weaknesses and propose a refined version of the crossover bias theory that clarifies the relationship between bloat and finite populations, and explains what features of the fitness landscape cause bloat to occur. The theory, in particular, predicts that smaller populations will bloat more slowly than larger ones. Additionally, the theory predicts that bloat will only be observed in problems where short programs are less fit than longer ones when looking at samples created by fitness-based importance sampling, i.e. samplings of the search space in which fitter programs have a higher probability of being sampled (e.g., the Metropolis-Hastings method). Experiments with two classical GP benchmarks fully corroborate the theory.

KW - Bloat

KW - Genetic programming

KW - Population size

UR - http://www.scopus.com/inward/record.url?scp=57349088128&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=57349088128&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:57349088128

SN - 9781605581309

T3 - GECCO'08: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation 2008

SP - 1275

EP - 1282

BT - GECCO'08

Y2 - 12 July 2008 through 16 July 2008

ER -

The impact of population size on code growth in GP: Analysis and empirical validation

Abstract

Publication series

Other

Keywords

OpenUrl availability

Other files and links

Fingerprint

Cite this