**Statistics**

**
Research ideas under development**

**A new graded concept of randomness and probability: **We
introduce a new concept of statistical randomness. The concept of
randomness in statistics depends on the observer. It is due to lack of
information from the cognitive observer. If we were in the place of the creator
we might not need the concept of randomness and statistical probability at
all. But we are from the point of view of a human mind, and much of information
from the causalities is missing when observing a phenomenon. Traditionally in
statistics the concept of randomness is defined by a large sample of
independent, identically distributed repeated events. The sample defines in its
turn the histogram and probability distribution, and therefore the statistical
frequencies or probabilities of the various events to happen. It is often
criticized that there are not such a thing as independent and identically
distributed events in the physical reality. But sometimes a simplification
is a fair start. We nevertheless introduce a graded concept of randomness and
statistical probability that accounts for a whole range between strongly
non-independent and non-identically distributed events in physical reality and
the simplification of independent and identically distributed events. **
The key is that any sample is always a stratified or layered sample, from the
single event to a largest available number of repetitions of events. **In
other words we have a sample of samples, and each element of the latter sample
is a again a sample till the final one-element samples or events. At the bottom
layer of samples of single events, we get for each such sample a probability
distribution, which is nevertheless upper limited on the permitted size of
samples. For larger samples we have to account for samples of samples (2nd
layer) and therefore already the 1st layer statistical frequencies or
probabilities become random variables, which permits the initial probability
distribution to vary randomly. And so on till the largest available sample of
samples. It is obvious that e.g. if we have a largest possible sampling of N
repetitions of events, then the above scheme, definitely is different from a
flat and straight-jacket concept that always for this large N, there will be
fixed statistical frequencies (probabilities) of their occurrence. The sample
stratification or layering requires of course some key sizes for the maximum
size of samples in each layer, that have to be introduced by universal concepts
of time and space when such statistical experiments take place e.g. on the
surface of the planet earth. It holds also the principle of "**synergy**" or
"non-totalitarian distribution of information" , in other words that no level
can contain all the randomness or probabilistic and statistical
information of all levels.

In the next paragraph we apply somehow the above idea on the stochastic processes and time series and we get new concepts of statistical forecasting.

In the next paper we formulate the idea that a forecasting
e.g. with a time series, should be carried out in more than one scale , so as to
include short, mid and long term scales or levels or layers. Since each scale gives by estimation a
different model , it is not possible to assign a single model to the phenomenon
, unless we resort to the concept of higher stochastic order random variables.
That is, random variables that their expected average (1st moment ) and other
moments are already
an other one (hidden) stochastic variable etc. Both the * Top-down*
(hidden)
influence of coarser resolutions and the

The lines of quantitative interpretation include the following principles:

**Based on a set of data for a phenomenon:**

**0) Events are defined never in one level only ,
although we may attribute a focus level and a spectrum of significant neighborhood
levels on which the event has a trace-counterpart. (multi-level events, no-event
is only single-leveled)**

**1) Information across all levels can be projected as
information on a single focus level only, that can be the bottom, the Top, or a
middle always without violating the principle of simplicity . Nevertheless this is only the partial classical description. The standard
one, in this approach, is that , it is defined the distribution of the
data-information among levels, so that no level, has the information of all
other levels, without violating the principle of simplicity. It is possible nevertheless, to make an inclusive order of the
data-information, which can be bottom-up, top-down or Middle-out by violating
the principle of simplicity. If the reductions is done without violating the
principle of simplicity, this
does not give an adequate description of the horizontal, and vertical stochastic
causalities,
that remain non-reducible. Forecasting under the principle simplicity at single
level (e.g. with ARMA(p,q) models and the Box-Jenkins parsimony of low order p,q=0,1,2)
would lead to rather clumsy forecasting with a lot higher forecasting error.
(Law of Fair or "non-totalitarian" distribution of information across levels.
They tell me that Buckminster Fuller, has repeatedly used and coined a term
"SYNERGY" for system thinking with this "democratic" or non-reductionism
meaning: No part (here, level) can contain the information for a forecasting of the whole (here all levels). )**

**2)Horizontal Stochastic Causality and innovation,
exists and is entering, at each level, in a basically separate and
different way from that of the other levels describable within the
principle of simplicity. (Horizontal stochastic
causality is always level-wise. E.g. with ARMA(p,q) models, we would
require different coefficients and different p, q for different levels))**

** 3) The inter-level stochastic causality, relates and
gives the interplay, of the different horizontal stochastic causalities, and is
defined always on a reference observation level. (There is , also a
vertical inter-level dependence of horizontal level-wise causalities, over a
reference binding level. This would give for example with ARMA(p,q) models of
p,q=0,1,2, a recursive way to integrate all levels on s ingle **stochastic
process

**4) Horizontal stochastic causalities of different
levels than the focus level, smaller or larger, have a corresponding ,
higher stochastic order horizontal causality, on the focus level, after
appropriate nested enhancement, of the information of the focus level. (Orders
of hidden horizontal causalities may correspond to the horizontal causalities of
other levels , as projected to the focus level. E.g. with ARMA(p,q) , p,q=0,1,2
models if we are to project all information of other levels to single
level we must violate the principle of simplicity (p,q <=2) and we would
create higher order ARMA)**

**5) The horizontal causality of a focus level, is not
derivable, through the horizontal causality of other levels, larger or
smaller, and the inter-level, interplay. (Law of Irreducibility of a horizontal level-wise
1st-stochastic order causality to smaller or larger scale causality, and
inter-level interplay. E.g. with ARMA(p,q) , p,q=0,1,2 models at different
levels , if we integrate at a single focus levels by higher order ARMA(p,q) ,
the latter cannot be derived by the ARMA(p,q) models of each level alone
without some binding inter-scale law which is an extra information)**

The representation of the phenomenon

This technique in its main ideas but with all its details, could be
combined with a * pattern recognition statistical forecasting* or a robust **
non-parametric statistical forecasting. **This means that although we may
accept a finite memory in the process we need not assume stationarity not
even time invariance of the partial correlations.** **We assume only time
invariance of the conditional (on values of the memory horizon) distributions,
that may be different for different patterns of values. Such a class of
processes requires a particular sampling technique for its statistics that resemble
the matching technique in sampling. It is known that matching increases the
power of hypotheses tests. We can use of course parametric approach with **multilevel
statistical models , multilevel random coefficient models** (MRCM) and **Hierarchical Linear Models**
(HLM). *The technique to create a next sampling granulation level and a next
equation (causality) by varying constants parameters (or summarizing
varying parameters in to a random constant) that were within the previous sampling
granulation level, of the previous equation, with a new
equation, sampled on the next granulation level, can be used both for
bottom-up or top-down or both (middle-out, middle-in), enhancement of the single-level
model.* There is in the Internet easily accessible software (statistical packages)
for MRCM, and HLM and multilevel models (E.g. HLM, BIRAM, BMDP,BUGS,
EGRET, GENSTAT, ML3, VARCL,SABRE, SAS. These should be used with SPSS, or
Minitab, Lisrel etc For relevant books see H. Goldstein
"Multilevel Statistical Models" Arnold Publications 1995, and N.
T. Longford "Random Coefficient Modles" Oxfort University press 1993). We may enhance the traditional HLM that contains only **horizontal
linear equations**, with **vertical linear equations** also and appropriate
further sampling technique on the same sequence of data. In multi-resolution
Analysis this is taken care by the self-similarity property. The linear equations
have the exploratory meaning of partial correlations structure, that always
exists in random variables, rather than ad hoc confirmatory meaning. Resorting
also to structural equations modeling (SEM) and **Factor analysis **only for
the vertical equations, we may explain
inter-scale causality and clustering of different scales as factors. Hierarchical
Linear Models, permit the combination of the best-fit simple linear models for
forecasting at different sampling granulation scales (resolution levels), to a single vector stochastic
process. Although at each level the equations are linear, in the overall, the
system is not a linear system of equations, but rather a system of
multivariate polynomials of higher order . The polynomial order increasing with
the number of levels entering. It can be proved that with such a way
of combining linear models, the forecasting error at each level is less, than
the error that would result if we would extend best-fit linear forecasting from
the finest granulation, to other horizon levels. The author has designed
and supervised an
MSc dissertation, especially so as to prove the above. This paradox
is very relevant to the **Simpson’s
Paradox. **In
this case a single granulation level or resolution may lead to biased forecasting. The effect
of a granulation level to an other can also be handled with the Mantel-Haenzel
method for confounding. The confounding factor here is the effect of a
granulation level or strata, to an other.

The
above analysis is also very relevant to the recent developments of **wavelet
analysis**, and **multi-resolution analysis** (MRA) in signal processing and more
generally in numerical harmonic analysis (see e.g. G. Kaiser "A friendly
guide to wavelets" Birkauser 1994, L. Debnath "wavelets and
signal Processing" Birkauser 2003, and in particular the article of A.
Benassi, S. Cohen, S. Deguy, J. Istas "**Self-similarity and Intermittency**"
in the last mentioned book). The multi-resolution analysis and the resulting
wavelet bases , is a remarkable new system of techniques, (with an algebra of
up-sampling , down-sampling operators, resembling the above analysis) with
impressive success among different scientific disciplines. Although wavelet
analysis is concerned mainly with the analysis and composition of a single path
(signal) in the above discussion we are interested in stochastic processes, or
the statistical properties of a group (sample) of parallel paths. Of course such
groups of paths in a particular special cases could be produced by a
deterministic dynamic system. The concept of
self-similar stochastic processes (among e.g. resolution levels) is a very restrictive
condition, as we see it, and in the above discussion we adopt a more flexible,
approach, that of (**partial)** **self-similarity of a stochastic processes
with respect
to a statistical property**, **among a sequence of different resolution levels**. This
statistical property may not be or may be a characteristic and defining property
of the stochastic process. If it is a characteristic and defining property of
the stochastic process, then we get the classical definition of self-similar
stochastic process. What is the sequence of different resolutions over
which self-similarity holds, is part of what can be called **"vertical
law"** of the process (see also C. A. Gabrelli, and U. M. Molter Generalized
Self-Similarity J Math Ann. Appl.230 (1999) ,251-260). While the technique of
HLM gives ways of how the coefficients of the "Law" of the process may
change among the resolution levels, the partial self-similarity shows what is
not to change among the resolution levels.

Such substantial improvements in forecasting are based of course in a much more
sophisticated multiple analysis of the data and may have
interesting consequences in** weather statistical forecasting , earthquake
statistical forecasting , medical statistical forecasting, ecological phenomena
statistical forecasting, social phenomena statistical forecasting, **and in
combining **real time forecasting** with **seasonal probabilistic
forecasting** of them. E.g. for the
forecasting of earthquakes, we may assume event processes, that their hazard
rate, follows a Hierarchical Linear Model with random coefficients level-wise
equations. Each equation is like a difference linear equation that the
time-steps or space-granulation steps are at different time and space scales.
Thus e.g. although in small time scale, the hazard rate maybe uniform, at a
different time scale may have cycles. Such cycles (e.g. depending on the
cycles of motion of pieces of the earth's solid surface, or of the rotation of the moon, and its tide effects on earth), in their turn
and at different scale (new level-wise equation on the coefficients of the
previous) have fluctuations on their period or amplitude. (* These
fluctuation might be due to the sun's 11 or 22 years cycles or due to
alignment of many planets in the same direction, sun or moon eclipses, the times
of 21 June/22 December when the earths orbital motion changes direction: toward
or way from the sun, etc*). Earthquake forecasting
or more exactly probability of earthquakes hazard rates forecasting may
seem to have a very negative emotional effect to many, but the real goal is
simply an intelligent use of the available information hitherto. Of course we
should not think of the celestial bodies as a main source of the earthquake
events, as geology considers as main source the stresses of the earths solid surface.
In addition celestial bodies cycle may very well smooth out the intensity of the
earthquakes as they trigger many earthquakes of smaller intensity, that reduce
the stresses. Still triggering of the event of earthquakes may be correlated
with celestial bodies cycles. Similar reasoning may
apply in the weather formations and events, and daily or yearly seasonal
cycles. Again similar reasoning may apply in medical and health events in populations.
A hierarchical Linear Model with level-wise random coefficient models, may
describe the effects and contingencies of health events at different scales of
social groups of the population. Here the different sizes of the samples or of
the active population, give rise to different equations that are the level-wise
equations of a HLM in an overall larger population scale. In the same way
the HLM model could be about fertility or water-levels ecological events, that
again may dependent on the period of moon, earths daily and seasonal cycles but
furthermore Sun's 11 or 22 years cycles. In addition cycles of alignment of
planets may affect the solar wind that reaches earth thus again fertility and
ecological cycles. Similar forecasting (with HLM) may apply to social events and event of interest to sociology. The levels in the
sampling, statistical causality and level-wise equations shall correspond to
levels of population , social organization scale and time. E.g. Business,
Domestic Economy, Unions of Societies (e.g. European Union), or all societies
in the planet, year cycles, political cycles, etc.

0) **The Rainbow Stochastic Growth Model. By Dr
Costas Kyritsis (2006) **

We give an example of a new concept of stochastic process
that has some of the above properties and represents the stochastic growth (of
clusters of cells, trees or animals populations, clusters of human activities
that can measured, etc) . We call the stochastic process, **The Rainbow
Stochastic Growth Model. **The stochastic process has the features of
multi-level causality, (12 layers) as described above. If we would like to
approximate this process, with linear ARMA(p,q), or ARIMA(p,q) or SARIMA or
other familiar type that reduces to linear systems would need a memory of at
least 144 terms. But the chosen formulation is much simpler that a linear
time series and reflects the a stochastic growth with innovation that is not
"white noise" or "pink noise" but rather a "multi-color noise". This is also the
inspiration of the chosen term: Rainbow. We have tuned the "spectral colors" to
12 basic cycles that the science of astronomy, meteorology and ecology has
detected for normal conditions of events in this planet.

**1) "Multi-resolution system of stochastic processes, and forecasting,
with higher stochastic order random
variables and Bayes estimators." **By Dr Costas Kyritsis 1999

In this paper we define stochastic differential equations and calculi , over finite resolutions. We compare them with the usual stochastic differential equations with limits of ITO, or Stratonovitch or by generalised functions (distributions) and discuss their tremendous advantages and simplicity in definition and solution. While very few ITO stochastic differential equations have been solved, practically all stochastic differential equations over finite resolutions are easily solved. The choice of the rounding relative to the accuracy level, in the stochastic differential equations over finite resolutions is a key point. As a first simple approach only 1st stochastic order random variables are used.

**2)Multi-resolution stochastic differential calculi**

By Dr Costas Kyritsis 2000

This paper is a direct application of the previous and simple qualitative analysis on the solutions of random coefficient linear system of two 1st order stochastic differential equations.

**3) Application of the solution of stochastic differential
equations over multi resolution systems, in the creation of a statistical method to
estimate the probability of
the random formation of hurricanes (Tornados). **By Dr Costas Kyritsis 2001

4) To appear in the Journal "Archives of Economic History"

** How
to chose the time scales in statistical time series forecasting**

By
Dr Costas Kyritsis

University
of Portsmouth UK

Department
of Mathematics and Computer Science

(Franchise
in Athens)

and

Software
Laboratory

National
Technical University

of
Athens

*Comments and Interpretation:*

In
this paper we analyze how the practice of time series forecasting for a
phenomenon depends on the time scale that we must chose. We discuss the
possibility of acceptance of different models for the same phenomenon and data, and in
particular for different time scales, the
structural equation modeling and the hierarchical linear
models (HLM). In general if forecasting at an horizon h is required then
it is optimal to fit models at the same scale with time bin h rather than at the
densest bins of the data, where we might consider there is more information. We
suggest a new method based on the spectrum of the time
series. The time scales of best forecasting are defined from the multiple maxima
of the spectrum, if they exist. For strongly discrete spectrums this method works
even better. The bins of each scale are defined from the maxima of the
spectrum, again. We expand the time series to a superposition
of independent component time series that have narrow spectrum only around the
maxima of the original time series. We discuss how the standard forecasting (at
the shortest time scale) can be represented as a superposition of
forecasting terms each corresponding to the previous mentioned maxima of the
spectrum of the time series. Among the many maxima only one selected from
specific features gives the best with minimum error forecasting compared
to other scales. This unique peak defines the best forecasting scale, and the
relevant expansion term in the series, the best fit model of the time series at
this optimal scale. We
suggest also a different algorithm based on the time domain and least squares, and not the
frequency domain, to find the best time scale for forecasting.
We also prove a new theorem on stratified sampling that comes
directly from the law of large numbers. We introduce new measures for the forecasting error, than
the usual goodness
of fit of
least squares estimation, and we give examples to show how the above analysis
leads to time series forecasting with less error on the same data.*
*

*
*

**Key words
**

Statistical forecasting, time series, <