Method natural frequency tree

No test is ever perfect - How reliable are HIV self-tests?

Consumer decisions on the basis of tests and algorithms

Before consumers make decisions, they usually gather information. Therefore, they read certain magazines, for example, to find out how well a product has scored in a test. In the health sector, however, consumers often do a test first in order to have a basis for a decision, e.g. on whether to change their diet. But also beyond the health sector, more and more tests and prognostic tools are being used, e.g. algorithms for decision support. An example would be Robo-advisors that make recommendations for the decision on which type of loan you should take out. The problem is that there is no perfect test, no perfect algorithm, and no perfect model. This means that the statement, the result or the recommendation given by a test or algorithm can always be wrong.

Why is it relevant to support consumer decisions on the basis of tests and algorithms?

The increasing spread of diagnostic and predictive tests and algorithms in the health sector, particularly for consumers who are not at all affected by the respective disease, represents a highly lucrative business field for the developers of these tests and algorithms. Earlier detection or even the prediction of future diseases are powerful purchasing arguments. In addition, products with algorithmic prediction models for decision support are developed in all other areas of life. However, since all tests or algorithms can produce false results, the question always arises whether they are at least reliable enough before purchase or use.

Why is it problematic to support consumer decisions on the basis of tests and algorithms?

Obstacles for an informed use of tests and algorithms lie in the lack of awareness of their limited reliability, in the lack of knowledge about where key information can be found, but also in the limited opportunities to be willing and able to deal with such statistical questions. A test is always an "observation" of reality, which always raises the question: If the test makes an observation, how likely is it to be correct? This conditional probability requires a consumer-friendly approach.

What is a suitable scientific approach?

Natural Frequency Trees (NFTs) allow an improved understanding of conditional probabilities, particularly in comparison to Bayesian formula calculations and unstructured text tasks (McDowell & Jacobs, 2017). It is important for NFTs that consistent subsets of a single base quantity (initial group) in the form of frequencies are used. It is not normalised or converted. Instead, the simple frequencies can be combined in their dependencies in such a way that the conditional probabilities can be easily determined (Gigerenzer & Hoffrage, 1995; Hoffrage & Gigerenzer, 1998). In this way, the use of the Bayesian formula is no longer necessary. It has been shown that even primary school students can use NFT training to make Bayesian conclusions and better calculate how likely it is that a test result is to be true (Zhu & Gigerenzer, 2006).

NFTs show - beginning with a fictitious group - how often something that the test is looking for or that the algorithm wants to predict occurs (see example below). In other words: How many people are actually affected? At the same time, it is contrasted how often it does not occur (unaffected people).

For each of these subgroups (those affected and those not affected), it is recorded separately how many times the test or algorithm detects those affected and how many of those not affected receive a false alarm. A false alarm occurs when a non-affected person receives an alert, although the test or algorithm should stay silent.

For each of the subgroups (those affected and those not affected), it is also recorded separately how many times the test or algorithm "remains silent", i.e. how many of those not affected correctly do not receive an alert. However, it is also important to ask how many of those affected are overlooked if the test or algorithm stays silent although it should detect them.

Consumers then receive the most important information if they only look at those subgroups for which the test or algorithm issues an alert. They can now see how many of those alerted are actually affected. This is also called positive predictive value (PPV). Hence, the user knows in advance how likely it is that a test or algorithm is going to be correct. They can then consider whether it is worth using it.

Natural frequency trees allow consumers to check the positive predictive value of tests and thus their reliability, e.g. if they are offered as an individual health service (known as "IGeL" services). Consumers can also test the reliability of tests that they can purchase directly (in retail shops) or via the Internet (Direct2Consumer area). In addition, natural frequency trees allow citizens to address key questions to organisations with algorithms that influence decisions - and to check them: How often does the search occur, how many are recognised, how many are overlooked, how many are exonerated, how many are falsely alerted?

How to construct a natural frequency tree?

A. What do you need?

For the evidence-based development of Natural Frequency Trees (NFTs) for your own questions, you need three key pieces of information: how often the feature you are looking for is present (prevalence of the feature if it is about all affected persons; incidence of the feature if it is only about new discoveries), how many of the feature carriers or affected people are detected (sensitivity of the test), and how many of the non-feature carriers or not affected people would be correctly cleared (specificity of the test).

B. How do you proceed?

Regarding health questions, it is recommended to research the information (prevalence or incidence, sensitivity, specificity) using services offered by established organizations of evidence-based medicine (e.g. www.igel-monitor.de, www.gesundheitsinformation.de), unless you have your own medical service or public health department. This procedure should be applied before proceeding to a systematic scientific literature search.

This scientific research should be orientated towards avoiding a review of your own. This would require at least two staff members with experience in the area of systematic search and health science, who would then prepare a summary (also statistically) of published studies. Instead, you should search exclusively via www.cochrane-library.org or www.ncbi.nlm.nih.gov/pubmed/ for Cochrane Reviews or systematic reviews for your own research question.

Outside the health sector, the evidence-based appraoches are rather fragmented. Exceptions are www.campbell-collaboration.de (political measures), whatworks.college.police.uk (crime) and educationendowmentfoundation.org.uk (education). The evidence base will be very difficult to determine, particularly for questions of algorithmic tests and prognostic tools. This is structurally related to the fact that neither requirements nor supervision as with medical products are required so far. The frequency with which the sought-after feature occurs can be determined by research independently of the algorithm. However, the information on sensitivity ("recall" in the language of developers) and specificity or false alarms (also fall-out), usually require a direct exchange with developers. If you want to develop your own model, please consult the final report on the Risk Atlas project from July 2020 or contact us. Contact details can be found here.

Recommended literature on methodological basics

Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review, 102(4), 684.
Hoffrage, U., & Gigerenzer, G. (1998). Using natural frequencies to improve diagnostic inferences. Academic Medicine, 73(5), 538–540.
McDowell, M., & Jacobs, P. (2017). Meta-analysis of the effect of natural frequencies on Bayesian reasoning. Psychological Bulletin, 143(12), 1273–1312.
Zhu, L., & Gigerenzer, G. (2006). Children can solve Bayesian problems: The role of representation in mental computation. Cognition, 98(3), 287–308.

How can you implement the method?

If you would like to adopt a consumer topic from our website, you can do so in the following three ways:

You are using a digital copy. Either you directly save an illustration or download our PDF, or you integrate the illustration via Link(a href) or iframe.
You take your analogue copy and print out our PDF. The resolution and vector-based graphic is suitable for posters and brochures.
You recommend the app and refer to the Risikokompass from the PlayStore and AppStore.

If you would like to develop your own model, please consult the final report on the RiskAtlas project from July 2020 or contact us. Contact details can be found here.

When using the instruments, please mention the funding agency, which is the German Federal Ministry of Justice and Consumer Protection, and the Harding Centre for Risk Literacy as the responsible developers.

Logos can be downladed here.

Links to other methods

FFT-method expert validity

FFT-method case validity

FFT-method model

FFT-Methode getesteter Merkmalsvalidität

Method risk reading assistant for your browser

Visualisation with frame text

No test is ever perfect – How reliable are HIV self-test kits?

If you want to be tested for HIV, you might feel uncomfortable going to a counselling centre for sexual health and have a consultation. Ordering an HIV test over the Internet or buying it from a drugstore or pharmacy is more anonymous. Since 2018, these tests have been available for purchase over the counter, starting at around 20 euros. But can you trust the test result? How can you assess the reliability of such quick tests? Scientists assume that only 1 out of 13 people with a positive HIV self-test result is actually infected - 12 would thus be wrongly alarmed. With our frequency tree you can see for yourself how reliable such an HIV self-test is.

How to read this figure?

Among 100,000 women and men, 17 are infected with HIV without knowing it.

According to the manufacturer, these 17 are always detected due to the sensitivity of the test. At the same time, however, 200 men and women who have no HIV infection nevertheless receive a critical test result, which should be examined further.

This means that 17 of all those who have a critical result (17 + 217) actually have HIV. Simplified: 17 out of 217 critical results. This corresponds to a positive predictive value of 7.8% (=17/217).

The probability that the HIV self-test is correct when it says "You probably have HIV" is 8%. Please decide for yourself whether buying such a test without any particular reason can be beneficial for you.

Please note that the probability is even lower if you do not belong to a risk group (e.g. drug addiction with syringe use).

All statistics on HIV self-testing are currently based on manufacturers' studies. Future studies must first show what the actual benefit-to-harm ratio of these tests is. The possibility of new risk behaviour cannot be ruled out.

Sources and quality of the data

Source of the proposed prevalence value: an der Heiden M et al. (2017). Epidemiologisches Bulletin, 47, 531-545.
Please note that this value can be significantly higher for certain risk groups.

Source of the proposed sensitivity and specificity values: indication from the test autotest VIH (manufacturer's conflict of interests).

Empirical evaluation with consumers

All research results on the fundamentals and on the effectiveness of the RiskoAtlas tools in terms of competence enhancement, information search and risk communication will be published together with the project research report on 30 June 2020. If you are interested beforehand, please contact us directly (Felix Rebitschek, rebitschek@mpib-berlin.mpg.de).

Links to other topics

Intraocular pressure measurement

Down-Syndrom-Test

Diabetes-Test HbA1c

Harnblasenkrebs-Test NMP22

Toxoplasmose