The nature of forecasting
- It is impossible to forecast the future.
The world is just not like that.
This leaves us with our two best options for assessing the future.
- 1. Think/assume that things happen by luck (chance).
2. Assume that the future will be like the past, like last time.
- Currently, the most popular option is 'by chance', and which looks more scientific to the priests (see parametric statistics - writing down stats : using the standard normal distribution table).
- The priests always like long, complicated words and phrases that make them look knowledgeable/important.
People, including the 'priests', like to believe, to think that they understand the world. Much of this is wishful thinking.
- We just do the best we can, and statistics is one of those attempts in understanding.
- The centre of human understanding is that the future is more or less like the past.
- The second option for assessing the future is the empiric method, which is based on what happened last time.
This is known as
non-parametric statistics, or Bayes' theorem.
(See also pages on parametrics - how to work out averages - mean, mode, and median.)
- The very bottom line is you cannot forecast or tell the future.
- In climate change, one of the known factors is when certain gases are added to the atmosphere, the world temperature goes up.
Ash clouds in the air, produced by active volcanoes, lower the atmospheric temperature locally.
However, volcanic ash landing on ice (in the Arctic for instance) stops the ice's capability to reflect back the sun's heat,
- These events, the increase in CO2 and receding glaciers, are basic to measuring the extent and advance of global warming.
- These are two ways of looking at data, as a matter of chance, or as a matter of experience.
- In the real world, there are many variables, and the more you attend to, the closer you can get to understanding the realities of a problem, and any real situation.
- At present, energy situations are high among human concerns, and so are on the agenda of politicians, reporters, capitalists and others.
I have covered many of these variables
in the sections of abelard.org that start a discussion on global warming, the environment, and energy.
- Statistical methods have often been developed by humans with a great interest in gambling, for example, horse racing.
- Hume on cause.
advertisement
how Bayesian statistics work
- A key feature of Bayesian methods is the notion of using an empirically
derived probability distribution for a population parameter. The Bayesian
approach permits the use of objective data or subjective
opinion [2] in specifying a prior distribution [3].
With the Bayesian approach, different individuals might specify different
prior distributions. Classical statisticians argue that, for this reason,
Bayesian methods suffer from a lack of objectivity.
- Bayesian proponents argue, correctly, that the classical methods of statistical
inference have built-in subjectivity (through the choice of a sampling
plan and the assumption of ‘randomness’ of distributions)
and that an advantage of the Bayesian approach is that the subjectivity
is made explicit [4]. However, a prior distribution
cannot easily be argued to be strongly ‘subjective’.
- Bayesian methods have been used extensively in statistical decision theory.
In this context, Bayes's theorem provides a mechanism for combining a
prior probability distribution for the states of nature with new sample
information, the combined data giving a revised probability distribution
about the states of nature, which can then be used as a prior probability
with a future new sample, and so on. The intent is that the earlier probabilities
are then used to make ever better decisions. Thus, this is an iterative
or learning process, and is a common basis for establishing computer programmes
that learn from experience (see Feedback
and crowding).
black and blue
taxis
- Consider the witness problem in law courts. Witness reports are notoriously
unreliable, which does not stop people being locked away on the basis
of little more.
- Consider a commonly cited scenario.
- First piece of data:
A town has two taxi companies, one runs blue taxi-cabs and the other uses
black taxi-cabs. It is known that Blue Company has 15 taxis and the Black
Cab Company has 85 vehicles. Late one night, there is a hit-and-run accident
involving a taxi. It is assumed that all 100 taxis were on the streets
at the time.
- Second piece of data:
A witness sees the accident and claims that a blue taxi was involved.
At the request of the defence, the witness undergoes a vision test under
conditions similar to those on the night in question. Presented repeatedly
with a blue taxi and a black taxi, in‘random’ order, the witness
shows he can successfully identify the colour of the taxi 4 times out
of 5 (80% of the time). The rest or 1/5 of the time, he misidentifies
a blue taxi as black or a black taxi as blue.
- Bayesian probability theory asks the following question, “If the witness
reports seeing a blue taxi, how likely is it that he has the colour correct?”
- As the witness is correct 80% of the time (that is, 4 times in 5), he
is also incorrect 1 time in 5, on average.
- For the 15 blue taxis, he would (correctly) identify 80% of them as being
blue, namely 12, and misidentify the other 3 blue taxis
as being black.
- For the 85 black taxis, he would also incorrectly identify
20% of them as being blue, namely 17.
- Thus, in all, he would have misidentified the colour of 20 of the taxis.
Also, he would have called 29 of the taxis blue where there are only 15
blue taxis in the town!
- In the situation in question, the witness is telling us that the taxi was
blue.
- But he would have identified 29 of the taxis as being blue. That is, he has
called 12 blue taxis ‘blue’, and 17 black taxis he has also called
‘blue’.
- Therefore, in the test the witness has said that 29 taxis are blue and only
been correct 12 times!
- Thus, the probability that the taxis the witness claimed to be blue actually
being blue, given the witness's identification ability, is 12/29, i.e. 0.41 (41%).
- When the witness said the taxi was blue, he was incorrect therefore nearly
3 times out of every 5 times. The test showed the witness to be correct less
than half the time.
false positives and false negatives
- Bayesian probability takes account of the real
distribution of taxis in the town. It takes account, not just of the ability
of a witness to identify blue taxis correctly (80%), but also the witness’s
ability to identify the colour of blue taxis among all the taxis in town.
In other words, Bayesian probability takes account of the witness’s propensity
to misidentify black taxis as well. In the trade, these are called ‘false
positives’.
- The ‘false negatives’ were the blue taxis that
the witness misidentified as black. Bayesian probability
statistics (BPS) becomes most important when attempting to calculate comparatively
small risks. BPS becomes important in situations where distributions
are not random, as in this case where there were far more black taxis than
blue ones.
- Had the witness called the offending taxi as black, the calculation would
have been {the 68 taxis the witness correctly named as black} over {the 71
taxis the witness thought were black}. That is, 68/71 (the difference being
the 3 blue taxis the witness thought were black); or nearly 96% of the time,
when the witness thought the taxi was black, it was indeed black.
- Unfortunately, most people untrained in the analysis of probability tend
to intuit, from the 80% accuracy of the witness, that the witness can identify
blue cars among many others with an 80% rate of accuracy. I hope the example above will convince you that
this is a very unsafe belief. Thus, in a court trial, it is not the ability
of the person to identify a person among 8 (with a 1/8th, or 12.5%,
chance of guessing ‘right’ by luck!) in a pre-arranged line up
that matters, but their ability to recognise them in a crowded street or a
darkened alleyway in conditions of stress.
advertisement
Testing for rare
conditions - HIV example
- Virtually every lab-conducted test involves sources of error. Test samples
can be contaminated, or one sample can be confused with another. The report
on a test you receive from your doctor just may belong to someone else, or
be sloppily performed. When the supposed results are bad, such tests can produce
fear. But let us assume the laboratory has done its work well, and the medic
is not currently drunk and incapable.
- The problem of false positives is still a considerable
difficulty. Virtually every medical test designed to detect a disease or medical
condition has a built-in margin of error. The margin of error size varies
from one test procedure to another, but it is often in the range of 1-5%,
although sometimes it can be much greater than this. Error here means that
the test will sometimes indicate the presence of the disease, even when there
is no disease present.
- Suppose a lab is using a test for a rare condition, a test that has a 2%
false-positive rate. This means that the test will indicate the disease in
2% of people who do not have the condition.
- Among 1,000 tested for the disease and who do not have it;
the test will suggest that about 20 persons do have it. If, as we are supposing,
the disease is rare (say it occurs in 0.1% of the population, 1 in 1000),
it follows that the majority (here, 95%, 19 in 20) of the people whom the
tests report to have the disease will be misdiagnosed!
- Consider a concrete example [5]. Suppose that a woman
(let us suppose her to be a white female, who has not recently had a blood
transfusion and who does not take drugs and doesn’t have sex with intravenous
drug users or bisexuals) goes to her doctor and requests an HIV test. Given
her demographic profile, her risk of being HIV-positive is about 1 in 100,000.
Even if the HIV test was so good that it had a false-positive rate as low
as 0.1% (and it is nothing like that good), this means that approximately
100 women among 100,000 similar women will test positive for HIV, even though
only one of them is actually infected with HIV.
- When considering both the traumatising effects of such reports on people
and the effects on future insurability, employability and the like, it becomes
clear that the false-positive problem is much more than just an interesting
technical flaw.
- If your medic ever reports that you tested positive for some rare disorder,
you should be extremely skeptical. There is a considerable likelihood the
diagnosis itself is mistaken. Knowing this, intelligent physicians are very
careful in their use of test results and in their subsequent discussion with
patients. But not all doctors have the time or the ability to treat test results
with the skepticism that they often deserve.
how bad (imprecise) can it
get?
- In general:
The more rare
a condition and the less precise the test (or judgement), then the more likely
(frequent) the error.
- Consider the HIV test above. Many such tests are wrong 5%, or more, of the
time. Remember that the real risk for our heterosexual white woman was around
1 in 100,000, but the test would indicate positive for 5000 of every 100,000
tested! Thus, if applied to a low risk group like white heterosexual females
(who did not inject drugs, and did not have sex with a member of a high-risk
group like bisexuals, or haemophiliacs, or drug injectors) then the HIV test
would be incorrect 4999 times out of 5000!
- In general, if the risk were even less and the test method
still had a 5% the error rate, the rate for false
positives would be even greater. The false positive
rate would also increase if the test accuracy were lower.
- Some reference keywords/tags:
- Bayes,Bayesian probability,Bayesian theory,Bayesian
logic,Bayes statistics,Bayes probability,Bayes logic,theorem,false
positives,spam filter,false negatives,statisticalinference,prior,subjectivity,estimation,induction,distribution.
advertisement
|