The misuse of statistics is a delicate subject. However, it’s a discussion worth having because statistical analyses can be beneficial for your business only if the data is adequately gathered, processed and analysed. In the words of Karl Pearson, an English mathematician and biostatistician, “Statistics is the grammar of science”, but only when it’s used accurately, and the conclusions drawn from the numbers are not affected by biases.
In today’s world, the misuse of statistics is common, and it can be spotted everywhere, from news to marketing, and in this article, we are going to focus on the ways you can detect that practice and avoid it in your business.
Data Science is about collecting and processing meaningful information from massive amounts of raw data. The goal is to detect patterns and trends from the gathered information and to prepare them in a way which would afterwards help you in the decision-making process.
The most significant advantage of data science and statistical analysis is that numbers don’t lie, but you are free to interpret them and present them in a way that does.
For a business manager, the hardest part about data is to interpret the numbers, which is where Minerra’s business intelligence experts can help. Most of the time, when it comes to “reading the numbers right”, the potential for them to be misleading is high, because they can sometimes be used as half-truths that paint a different picture than the one that is accurate. You can find examples of this practice everywhere, from social media to news outlets, from advertising companies to simple sale postings.
A study was done by Dr Daniele Fanelli from The University of Edinburgh in 2009, which showed that 33.7% of data scientists use questionable research practices. This can result in modifying results, subjective testing, biased interpreting of data, withholding of important analytical details and more.
The damage this can cause to your business is more significant than merely operating on gut feeling when it comes to decision making, because what you thought was backed up by numbers is, in reality, a modified “truth”.
Misleading statistics examples
In this section of the article, we are going to give you misleading statistics examples so that you can be aware of the misuse of statistics and be more cautious in the future.
Polling is the most common way to track people’s opinions and tastes in society; you can see this practice in every political campaign executed in the last 50 years. However, the one thing about polling that might make you reconsider the results is the way the question is asked and formulated. For example:
- Do you support a tax reform that would imply bigger taxes?
- Do you support a tax reform that would be beneficial for social redistribution?
Both of the questions are essentially asking the same thing, but they will not have the same effect and results in the polls. This is called the “loaded questions” run-through.
Polling should be impartial, and if you want the “real picture” and opinions of the subjects, you would have to ask a question that doesn’t imply the answer and is not affecting emotional responses. If you’re not sure about the statistics from a poll, always take a look at the question that was presented to the target group.
Weak correlations happen when you process so much data that eventually you’re going to recognise patterns that are emerging just because you processed a lot of data. This is a type of common statistical manipulation in cases where there is not enough evidence to prove causation, but the amount of analysed data makes it possible. You can recognise this in the news when an absurd factor is taken into consideration as the cause for an unrelated issue, like millennials eating avocado is affecting the diamond industry.
Data fishing or data dredging is when you analyse vast amounts of data (as in the case of weak correlations) to discover affiliations between data points, without having a working hypothesis. You can see examples of this every day and in every industry branch, where a scandalous fact was proven with data mining, and it was contradicted one week later by another fact more outrageous than the one before, again with data mining.
Confusing data visualization
Dashboards and visual, statistical graphs and charts are only useful if they provide insights on a specific subject in an orderly manner. If you don’t have the context or if you’re not familiar with the basics of the issue, then data visualisation can be confusing and, at the same time, damaging, because the conclusions drawn from it would be faulty.
The most prudent way to go around this is to implement intelligent solution automation in order to apply a variable data point comparison, which is, in fact, helpful for the growth of your business.
Purposeful bias is the most hazardous statistical misuse because it implies manipulation with results and deliberate efforts to impact data findings. As it was found in the study conducted by Dr. Daniele Fanelli that we mentioned above, 33.7% of the scientists have done faulty analysis, and that is only in the science community.
How statistics can be misleading
The easiest way to answer the question of how statistics can be misleading is to give you the Colgate example, which the U.K.’s Advertising Standards Authority (ASA) deemed was in breach of U.K. advertising rules. The slogan said “More than 80% of Dentists recommend Colgate”, but in reality, the slogan was found to be misrepresentative and misleading. An independent research company did not even do the polling.
You can find these examples of misuse of statistics in news, politics, advertising and even in science. What you can do about these types of faulty statistics is to take them with a reserve, and use experts and business software which are not affected by bias.