Prediction Bounds’ Implicit Consequences on Model Composition and Validation

Due to the requirements imposed by physicalism which in turn derives from the history of predictions and obtaining tight prediction bounds, making a prediction (or series of predictions in an area) implicitly constrains the models that can successfully predict them, and also limits the ways in which correct models can differ.

Consider the statement, “at time X the value of Y will be A.” With time as the domain/x-axis, any sort of crazy model with wild fluctuations could be correct, as long as it has value A when it needs to. However, if the statement is “the value of Y will monotonically increase until it reaches A”, then far fewer model types apply.
When we construct causation models in the real world, we demand that they backtest (match up with historical data) and don’t overtrain (don’t fail when applied to future data). This effectively turns the historical record into a series of predictions that models must hit, and further subdivides the historical record into a series of subdivisions of “past time” and “immediate/bounded future” where the past time, as input into the models, must output the future. By value definition (i.e. what can you use a model for), a model with tighter bounds predicts better than one with loose bounds, even though they both output valid data.

Historical data about the real world is a series of scalar measurements. The relationships between those measurements may be modeled/approximated/recreated by very complex functions with random/stochastic components, but the data itself is the result of measurement and recording of scalar values at given points in time. Hence, within the bounds claimed by a given model, the model is accepting scalars in time and outputting scalars in time, and the output of scalars has to match the bounds it claims.

From a mathematical perspective, the valid models for a given data series are infinite; terms and constants can be arbitrarily added and manipulated, even to create “pure-synthetics” that still match the reality. From an Occam’s razor or Karl Popper-type of scientific perspective, these infinite models would be eliminated/disregarded by the principles of simplicity and strongest constraint on future bounds. However, the strongest constraint on future bounds does not really agree with the principle of avoiding overtraining, nor does it accord with well-understood issues surrounding measurement uncertainty. However, there are other types of bounds that almost universally apply: physical limitation of known entities and measurement’s own constraints on what type of future could produce a given measurement.

Physical limitation of known entities: the notions of “caused and uncaused” are relevant here. In the sum total of all predictions and history, the vast majority of them, even in the nominal/un-critical historical record, are grounded in the activity and existence of normal physical entities, in conventional cause-and-effect relationships. Miracles are few in proportion to the total record, and can’t be reliably activated. Consequently, if a prediction model is to be reliable, it cannot rely on miracles; and, in order to distinguish its composition from miracles, it must consist of some physical-based entities.

Measurement constraints: in the weather-prediction case, if cloud cover is expected to arrive over a given area, then the measurement device – a satellite, a human eye, or other detector – must receive the light emanations and other physical registers of that existence. Hence, some entity, almost certainly known to physics, must deliver those emanations or otherwise register on the measurement device.

Combining the physical limitation and the measurement constraints, then a model must, within its claimed bounds, propose some mapping of existing physical entities and future influences on them within the constraints of some physics so that they register on the measurement devices. Since human/sentient beings adhere to physicalism in their actions (but perhaps not in the mind that is assumed to produce the actions), then these are the only proper objects for a model that proposes to be based on historical data.

Again, in theory, a model could propose a miracle, but in that case it cannot be generally relied upon, because miracles cannot be reliably reproduced, so it would be said to have effectively infinite bounds, and would hence be inferior in predictive power to any concrete problem, to models with tighter bounds. (Even if you consider that miracles really happened, without a very tight rule of when and how they occur, incorporating them into your model would cause its overall accuracy to decrease.)

Since the models incorporate physical realities for their non-probability terms at a macro (non-subatomic) level, then their relationships must consist of mathematical propositions that correspond to the relationships in macro primitives. The uncertainty bounds must correspond to assumed measurement/bad historical data tolerances, or to known issues with equations’ approximation of physical realities.

The treatment of a probability term in a model has a huge impact on the ability to validate a model. Example, an important and common probability term: human behavior. If this is assumed to be non-constant over time, then it is always possible to construct a model that asserts that whatever humans would do in the past, that they will now do something different. This term is actually non-constant, but bounding its non-constant nature is required in order to reject “end of history” models which failed in the past. The means by which this is done is to refer to the subdivisions of history. The “end of history” models, applied at various times, would have failed to predict the last 2500 years of history.
Some human behaviors have changed over the last 2500 years of history, and some have not. The sub-model of non-constant human behavior proposed by a model has to explain both the ones that changed, and the ones that did not, and must therefore be consistent with that history. In particular, they must explain why some behaviors have persisted across technological eras and civilizations.

Hence the areas where models might deviate from each other are:
– Meta-model of which historical data is really accurate
– Assumed measurement data completeness/quality/precision limitations
– Known issues with predicting science experiments/resulting ripple to macro model (e.g. butterfly effect)
– Differing models of how human behavior is non-constant; what parts have changed and how much
– General difficulties in predicting specific values of how many humans will respond in a certain way to a given action

In most social models, the human behavior component completely dominates. In models of technological progress and futurism, human behavior differences play a part, but also known lack of knowledge of measurement data (e.g. future compounds’ properties) is a huge factor. In models of weather, known issues with long-term stability of equations and measurement data limitations are the bounding factor.

The bounds of a model imply “confidence” levels as well. Making a statement with no time bounds about a future event fits a large number of models, including models that are pure-synthetic or miraculous and have no relationship to historical data/reality. By contrast, a model that predicts at many frequent intervals can be validated with a high number of trials.