Aamir Omair ( Department of Medical Education, King Saud-bin-Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia. )

#### November 2012, Volume 62, Issue 11

### Learning Research

This is the follow-up of the article on \'Descriptive Statistics\' by Mehwish Hussain which was published in the July 2012 issue of JPMA.^{1} As mentioned in the previous article the presentation of the results in a comprehensive manner presents a challenge to many authors. This is because they might not be familiar with the statistical concepts related to the presentation and interpretation of the results.^{2,3} The inappropriate use of statistics and errors in the presentation of statistical results has been reported extensively in medical literature.^{4} A review of articles published in psychology journals found that the error in statistical reporting of the results was more common in low impact factor journals as compared to high impact factor journals.^{5} In a recent review of indexed journals from Pakistan, it was found that there was a wrong use of statistical test / analysis in 29% of the 80 articles reviewed.^{6}

The purpose of this article is to present some guidelines, along with some illustrative tables and graphs, to facilitate the authors to present their results in an effective manner. The first part of the article will present the basic concepts of inferential statistics and the second part will discuss some effective methods for presenting the statistical output in tables and graphs. A more detailed presentation of the common statistical errors in biomedical research articles is given in the article by Lang^{7} and a comprehensive guideline on how to report statistics in medicine is available in the book by Lang and Secic for authors, editors, and reviewers.^{8}

**Basic Concepts:**

One of the first things to consider is having an adequate sample size and a representative sample in order to confidently generalize the results of the study. This requires that the desired objectives and outcomes are clearly identified before the start of the study. The degree of accuracy (margin of error) which is acceptable as well as the confidence level (p-value) and the desired power for the study must be specified to determine the required sample size.2 There are different statistical software/websites available for determining the required sample size for different types of study designs and specific conditions.^{9,10}

The other important point to consider is regarding which statistical tests are to be used for comparing the outcomes between the different groups.

Table-1 gives the basic outline for choosing a statistical test depending upon the type of input (grouping) and outcome variables.^{11} Categorical (Qualitative) variables are those which are grouped into categories. These may be dichotomous (two categories) or there may be more than two categories. These variables can be nominal (e.g. ethnicity, gender) or ordinal (e.g. education level, socioeconomic status). They are generally presented as percentages and it is important to mention the absolute number (n) for the categorical variable along with the percentage.^{12} The numerical (Quantitative) variables are generally those variables that are measured or have some associated numerical value e.g. height, weight, age, parity or number of cigarettes smoked. These variables are generally presented as mean ± standard deviation^{2} and are considered as \\\'scale\\\' data, in which the difference between subsequent numbers is equal.

Another statistical concept to consider is the use of confidence intervals where appropriate, since most results are based on samples, which can give varying results in different samples.^{12,13} The appropriate epidemiological measures of association must be used in accordance with the type of study design being employed. Prevalence of a disease or the risk factor can be ascertained only from a cross-sectional study. The association between the risk factor and outcome can be determined in the cross-sectional or case control study designs by using the Odds Ratio. The cohort study can determine the incidence of a disease which can be compared between the exposed/non-exposed groups using the Relative Risk/Risk Ratio.^{14} It is preferable to report the 95 percent CI when using any of the epidemiological measures of association.^{15}

The p-value is the most widely used measure for determining the statistical association between different variables. It shows the chance of error if one was to state that there is a difference/association between the variables.^{15} The generally accepted cutoff value is p<0.05, which means that there is less than 5% chance of being wrong if the variables are determined to be associated with each other. It is recommended that the exact p-value should be reported instead of just stating it as p<0.05.^{13,16} The p-values should generally be given up to two decimal places e.g. 0.02 or 0.46, except when the values are less than 0.01, then three decimal places may be used e.g. 0.006 or it can be stated as p<0.01. Sometimes the statistical output gives the p-value as \'0.000\' - this does not mean that there is zero chance of error. It only shows that there is no significant value up to the third decimal place - so it is preferable to state the p-value as p<0.001 (instead of p= 0.000) when there is no significant value up to the third decimal place.^{5}

**Presentation of Inferential Statistics:**

The results section of an article presents the findings of the study and is generally the main part of the paper that is based extensively on the efforts of the investigators. So there is generally some freedom to write this section as comprehensively as possible. At the same time care should be taken not to get overwhelmed and try to present all the results that have been obtained from the study, as quoted by Grewal^{17} "the fool collects facts, the wise selects them". It is important to remember that the reader does not have the time or the patience to sift through all the results output and presenting too many tables or graphs may distract the readers from the main message. So care must be taken to present only those results (both positive and negative) that are related to the specific objectives/hypotheses of the study. Any secondary findings or other relevant information can be briefly mentioned in the article and made available online in the electronic version of the journal if applicable.^{12,16}

Most journals limit the number of tables and graphs/figures to a maximum of three to four per article. Therefore it is important to carefully consider which information is to be presented in the text, tables or graphs^{16} and also that the text should not just repeat all the information that is shown in the tables and graphs.^{12,13} It is recommended to present the main descriptive statistics in the first paragraph of the results section^{16} and to present any relevant demographic or baseline characteristics in the form of a single table. The most important/significant findings of the study can be presented in the form of one or two graphs (if suitable), while the main statistical results can be presented in one or two tables.^{17} The second paragraph of the results section presents the main findings of the study as well as reporting any negative findings that may be relevant. The main findings from the tables and graphs can be summarized in this paragraph as well. This paragraph can be broken down into two to three smaller paragraphs if there are different categories of results to be reported. The last paragraph of the results section may be used to present any further sub-analysis or multivariable analysis (if applicable). The results should be presented in an objective manner and the discussion on the findings should be left for the discussion section.^{16}

**Tables and Graphs:**

It is essential to keep the tables and graphs simple and easy to understand from the readers\\\' point of view. There is a strong temptation to try and show all the results in a tabulated form - but this is not advisable as it may distract the reader from the main message.^{2} Also care should be taken not to submit the tables and graphs as obtained from the statistical programmes. The tables and graphs should be standalone (i.e. the reader should not have to refer to the text to understand them) with a comprehensive title, appropriate legends, and explanations for any abbreviations used. The units of measurement for any laboratory or technical data should also be clearly shown.^{17} The exact p-values should always be given as well as the statistical test that was used should also be indicated in the footnote of the table/graph. The significant p-values may be marked with an asterisk (*) as appropriate.^{18}

It is important to always give the relevant \'n\' for all the column or row subheadings as relevant. It is also necessary to give the standard deviation (SD) values wherever the mean is mentioned.^{2,13,18} This is demonstrated in the second and third rows of Table-2. The general rule for the number of decimal places for the mean and SD for numerical data is that it should be just one more than how it was measured for the original data.^{19} Categorical data is generally presented as frequencies \'n\' with the percentages given in parentheses (%). It is recommended that the percentages should not have any decimal places in them and should be rounded off in whole numbers, especially if the relevant \'n\' is less than one hundred.^{20} All this is important to avoid cluttering of numbers in the tables and figures to make them more easily readable.

Table-2 shows an example of a table comparing numerical and categorical data between two groups.

Three-dimensional graphs should be avoided, as well as unnecessary gridlines on the graphs. The y-axis values should always start from zero and if more than one graph is being presented it is preferable to have the same vertical scale for all the graphs (if presenting similar data). It is also better to have wider scaling for the y-axis i.e. instead of having the major tick marks at intervals of \'5\', it may be better to have intervals of \'10\' or \'20\' as appropriate. The values for the bars in the graphs may be given as \'data labels\' providing that they do not cause too much cluttering of the graphs.^{21} The preferred graphical representation for categorical data is the \'Bar Chart\' as shown in Figure-1.

\'Pie charts\' are not recommended for presenting the data in scientific publications. Numerical data should not be presented in the form of bar charts - instead box plots as shown in Figure-2.

Scatter plots may be used as appropriate or the data may be better given in the form of a table.^{21}

**Summary:**

The results are the most significant part of the article since they represent the original work of the authors. It is essential to apply the correct statistical tests. Data should be presented in a comprehensive but simple manner that is easily understandable by the general readership. Care should be taken to keep the tables and graphs simple and uncluttered. Also avoid duplication of the information in the text and tables/graphs. The most important thing to keep in consideration is that it is more relevant to present the main findings rather than all the findings of the study.

### References

1. Hussain M. Descriptive statistics - Presenting your results I. J Pak Med Assoc 2012; 62: 741-3.

2. Okeh UM. Statistical problems in medical research. African J Biotech 2008; 7: 4819-26.

3. Butt A, Awais SM. Importance and understanding of biostatistics among postgraduate students at King Edward Medical University, Lahore, Pakistan. Ann King Edward Med Uni 2009; 15: 107-10.

4. Altman DG. Statistical reviewing for medical journals. Stat Med 1998; 17: 2661-74.

5. Bakker M, Wicherts JM. The (mis)reporting of statistical results in psychology journals. Behav Res 2011; 43: 666-78.

6. Hanif A, Ajmal T. Statistical errors in medical journals (A critical appraisal). Ann King Edward Med Uni 2011; 17: 178-82.

7. Lang T. Twenty statistical errors even YOU can find in biomedical research articles. Croatian Med J 2004; 45: 361-70.

8. Lang T, Secic C. How to report statistics in medicine: Annotated guidelines for authors, editors, and reviewers. 2nd ed. Philadelphia: American College of Physicians; 2006.

9. Raosoft Inc. Raosoft Sample Size Calculator. (Online) 2004. (Cited 2012 Aug 19). Available from URL: http://www.raosoft.com/samplesize.html.

10. Lenth RV. Java Applets for Power and Sample Size [Computer software]. Iowa: University of Iowa; 2006 [updated 2012 Oct 2;]. (Online) (Cited 2012 Oct 3). Available from URL: http://www.stat.uiowa.edu/~rlenth/Power.

11. Swinscow TDV, Campbell MJ. Study design and choosing a statistical test. In: Statistics at Square One. London: BMJ Publishing Group; 1997. Ch 13. (Online) (Cited 2012 Aug 17). Available from URL: http://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/13-study-design-and-choosing-statisti.

12. International Committee of Medical Journal Editors (ICMJE). Uniform requirements for manuscripts submitted to biomedical journals: Writing and editing for biomedical publication. 2007. Retrieved from: http://www.icmje.org/2007_urm.pdf.

13. Altman DG, Gore SM, Gardner MJ, Pocock SJ. Statistical guidelines for contributors to medical journals. BMJ 1983; 286: 1489-93.

14. Williams CFM, Nelson KE (eds.). Study design. In: Infectious Disease Epidemiology: Theory and Practice. 2nd ed. Sudbury, MA: Jones and Bartlett Publishers; 2007; p. 74.

15. Akobeng AK. Confidence intervals and p-values in clinical decision making. Acta Paediatr 2008; 97: 1004-7.

16. Drotar D. Editorial: How to write an effective results and discussion for the Journal of Pediatric Psychology. J Pediatr Psychol 2009; 34: 339-43.

17. Grewal A. How to write a paper for publication. (Online) (Cited 2012 Aug 23). Available from URL: http://www.pacifichealthvoices.org/ files/how_to_write.pdf.

18. Hesson-McInnis M. Reporting statistics in APA style: A short guide to handling numbers and statistics in APA format. (Online) (Cited 2012 Aug 23). Available from URL: http://my.ilstu.edu/~mshesso/apa_stats.htm.

19. San Francisco Edit. Effective use of numbers and statistics (Online) (Cited 2012 Aug 23). Available from URL: http://www.sfedit.net/ numbersandstatistics.pdf.

20. Kahn J. Reporting statistics in APA style. (Online) (Cited 2012 Aug 23). Available from URL: http://my.ilstu.edu/~jhkahn/apastats.html.

21. Schriger DL, Cooper RJ. Achieving graphical excellence: Suggestions and methods for creating high-quality visual displays of experimental data. Ann Emerg Med 2001; 37: 75-87.

**Journal of the Pakistan Medical Association has agreed to receive and publish manuscripts in accordance with the principles of the following committees:**