Calculate the Product–moment Correlation Coefficient

The product-moment correlation coefficient allows you to work out the linear dependence of two variables (referred to as x and y). An example in economics might be that you are the owner of a restaurant. For every 10th customer you record the time he stayed in your restaurant (x, in minutes) and the amount spent (y, in dollars). Is it generally true that the long stayers are also the bigger spenders? This would be a positive correlation. Or is it actually the other way around, e.g., the richer the client the less time he takes for his lunch? This would be a negative correlation. In order to shed some light on this mystery you can calculate the product-moment correlation coefficient, r, sometimes known as Pearson's correlation.

Note: The equations are for the linear least squares fit which statistically fits the set of data pairs to a straight line.

Steps

  1. Remove incomplete pairs. In the next steps, use only the observations where both x and y are known. However do not exclude observations just because one of the values equals zero.
  2. Summarize the data into the values needed for the calculation.
    • n - the number of data pairs.
    • Σ(x2) - the sum of the squares of the x values.
    • Σx - the sum of all the x values.
    • Σ(x*y) - the sum of each x value multiplied by its corresponding y value.
    • Σy - the sum of all the y values.
    • Σ(y2) - the sum of the squares of the y values.
  3. Calculate ssxy, ssxx and ssyy using these values.
    • ssxy=Σxy-(ΣxΣy÷n)=283-(12*93/5)=59.8
    • ssxx=Σx2-(ΣxΣx÷n)=40-(12*12/5)=11.2
    • ssyy=Σy2-(ΣyΣy÷n)=2089-(93*93/5)=359.2
  4. Insert these values into the equation for r, the product-moment correlation coefficient. The value should be between 1 and -1, inclusive.
    r=ssxy/(ssxx*ssyy)**0.5=59.8/(11.2*359.2)**0.5=0.9428
    • A value close to 1 implies strong positive correlation. (The higher the x, the higher the y).
    • A value close to 0 implies little or no correlation.
    • A value close to -1 implies strong negative correlation. (The higher the x, the lower the y).



Tips

  • Always make a scatter plot. Otherwise you may miss your discovery because the product moment correlation coefficient only takes straight lines into consideration when predicting the value of y from x.
  • There is often a reason why a lot of questionnaires feature the same questions, making them incredibly boring to answer. The researchers often know a lot about question x and question y, but they don't know yet how they are related or correlated.

Warnings

  • When the correlation is significant you still have not demonstrated causality - that one variable "causes" the other. You have only proven that knowledge of the value of x may help to some degree in predicting the value of y and/or the other way around.
  • Before you state that two variables are correlated make sure the correlation coefficient is statistically significant. That is to say that the calculated correlation coefficient is unlikely to be a result of pure chance. For example, all your points may lay on the same line, this has a coefficient of +1 or -1, but it would still be inconclusive. (When the coefficient is not significant there is generally no point in reporting its value.)

Related Articles

Sources and Citations

You may like