created 2025-03-14, & modified, =this.modified

tags:y2025lost

Generation loss is the loss of quality in subsequent copies of data. File size increase is a common result of generation loss, as the introduction of artifacts may actually increase the entropy of data through each generation.

Digital generation loss induced by rotating a JPEG image 90 degrees (from top to bottom) 0, 100, 200, 500, 900, and 2000 times (without using lossless tools)

Model Collapse

Model collapse is a phenomenon where machine learning models gradually degrade due to errors coming from uncurated training on the outputs of another model, including prior versions of itself.

In early model collapse the model begins losing information about the tails of distribution - mostly affecting minority data. Later work has identified that model collapse is hard to notice, since overall performance may appear to improve while the model loses performance on minority data. In late model collapse, the model loses a significant proportion of performance, confusing concepts.

In the context of large language models, research found that training LLMs on predecessor-generated text — language models are trained on the synthetic data produced by previous models — causes a consistent decrease in the lexical, syntactic, and semantic diversity of the model outputs through successive iterations, notably remarkable for tasks demanding high levels of creativity.[