Breaking News

Selecting An Appropriate Statistical Test

Background and Motive

Often statistical tests are part of managerial business decisions. In other words, behind every successful managerial decision, data analysis becomes a necessity. Starting from opening a Retail Banking Operation in any geographic location to extending today's internet banking facility, is not an exception.To arrive at proper decision, often Management is driven by diversities of business research methods.

How to determine which test is needed

Appropriate statistical test for any experiment is mostly dependent on the nature of the independent and dependent variables to be analyzed. In order to choose a statistical test, variables can be classified into: Categorical and Continuous. We are aware of the fact that categorical variable(e.g. Male-Female, Asian-American, Windows-Macintosh etc.) values cannot be sequentially ordered or differentiated from each other using any mathematical method.

Continuous variables are nothing but numeric values(e.g. weight of a Coca-Cola bottle, Number of characters in a webpage etc.) that can be ordered sequentially, and that do not naturally fall into discrete ranges.

The model is straightforward, which exemplifies how the nature of the independent and dependent variables drive the choice of any statistical test.

With understanding of the basic model for choosing a test, we can add relevant details to the model. First, we will address two additional types of variables, ordinal and interval.

Ordinal variables are similar to continuous variables; in addition to the fact that they can be ordered sequentially. They are also similar to categorical variables because they perhaps cannot be distinguished from each other using a mathematical method. As an example, various levels of educational achievement (high school, college, undergraduate degree etc.) can be sequenced in the order in which they are achieved, and when defined as such, cannot be differentiated from each other mathematically. So the doubt is, using the simple model for choosing a statistical test, is an ordinal variable Categorical or Continuous? It is totally dependent on how the researcher defines the variable. When education levels are defined as high school, some college, undergraduate degree, etc., the levels are categorical, and we should choose a test for categorical data. We can, however define education level in a slightly different way. If we instead define education level as years of full-time education, then the variable takes on characteristics of Continuous variable, and we should choose a statistical test for a Continuous variable.

Interval variables too exhibit characteristics of Categorical and Continuous variables. Interval variables fall into equally spaced ranges. If we collect per annum salary levels using below mentioned ranges:

  • $30 K – $60 K

  • $60 K – $90 K

  • $90 K – $120 K

  • $120 K – $150 K

  • $150 K – $180 K, etc.

  • These values can be sequenced numerically, so they are similar to Continuous variables. Because the ranges are equally spaced, though, an unnatural restriction is placed on the values. Hence they are similar to Categorical values. When it comes to choosing a statistical test, there is no hard and fast rule for defining interval data as Categorical or Continuous, and the we can use our discretion in making the choice. Granularity of ranges is a reasonable guide for deciding how to define the data. For example, when intervals are granular, we may decide to define the variable as Continuous, and for coarser intervals as Categorical.

    Number of variables

    The number of independent and dependent variable in the experiment also affect which statistical test to choose. For example, linear regression applies when the researcher compares 1 continuous dependent variable and 1 continuous independent variable. Multiple Regression applies when the researcher compares 2 or more continuous independent variables against 1 continuous dependent variable.

    The number of levels of a categorical variable can also drive which statistical test to use. For example, if we want to compare whether gender affects the amount of time to perform a task using a given user interface. Gender serves as a 2 level categorical independent variable since it has 2 possible values(Male and female). Time to complete coding for a responsive web page would serve as continuous dependent variable. In this example, a 2-sample t-test would be the most appropriate statistical test. If the categorical independent variable has more than 2 values, however, one-way ANOVA should be applied.

    Similarly when both independent and dependent variables are categorical, Chi-Square test(For example if there is any gender relation between “Voted” and “Did not Vote” in “U.S. general elections” would be very much appropriate.

    When these concepts are combined, they can be made a simple model to select a correct statistical test, This is summarized in tabular format: