Create a Brief Linear Regression Model in Excel

Excel does a nice job with statistics, or they have a third party write their Add-On, which is available for free. In this article, you'll learn how to project a trend using Excel and Linear Regression Analysis.

Steps

The Tutorial

  1. Open a new workbook in Excel and make 3 worksheets: Data, Chart, and Saves. Save the workbook as Linear Regression - Brief Lesson, or something similar, into a logical file folder.
  2. Set Preferences: Open Preferences in the Excel menu. Recommended Settings: Set General to R1C1 Off and Show the 10 Most Recent Documents; Edit - set all the Top options to checked except Automatically Convert Date System. Display number of decimal places = blank (for integers preferred), Preserve display of dates and set 30 for 21st century cutoff; View - show Formula Bar and Status Bar, hover for comments and all of Objects, Show gridlines and all boxes below that auto or checked; Chart - show chart names and data markers on hover. Leave rest unchecked for now; Calculation -- Automatically and calc before save, max change .000,000,000,000,01 w/o commas as goal-seeking is done a lot and save external link values and use 1904 system; Error checking - check all; Save - save preview picture with new files and Save Autorecover after 5 minutes; Ribbon -- all checked except Hide group titles and Developer.
  3. Select between the 1 and the A in the top left worksheet corner to select the entire sheet and do Format Cells Alignment Horizontal Center and Font Size 9 or 10, or whatever you're comfortable with viewing.
  4. Enter the column headers in row 1: B1: X; C1: Y; D1: X. E1: Y F1: Y. Enter the column headers of rows 2: A2: Student; B2 SAT; C2: GPA; D2: VARIANCE; E2: VARIANCE; F2: TREND, G2: Change.
  5. Edit Go To cell range A3: A22, enter 1, and do Edit Fill Series Columns Linear Step Value 1 OK, There are 20 students who want to know the Trend of their Dependent GPAs given the Independent Variable of the SAT Exam score when entering a new college, i.e. are their grades likely to rise, fall or remain about the same?
  6. Enter the data to answer this question. In {x,y} pairs as shown in the picture below Step 22, or as follows: for student 1, {X,Y} = {935 for X or SAT, 2.2 for Y or GPA}; student 2 {1260,3,1}; then {1105,2.6}; {1320,3.3}; {1450,3.8}; {960,2.2; {1360,3.2}; {900,2.2}; {1020,2.2}; {1380,3.6}; {940,2.3}; {1190,2.8}; {1000,2.2}; {945,2.3}; {990,2.4}; {1000,2.2}; {1040,2.3}; {1570,3.9}; {1530,3.8}; {980,2.4}.
  7. Select cell A23 and enter MEAN. Input the formula w/o quotes into cell B23 "=AVERAGE(B3:B22)" and copy it and paste it to cell C23. Select cell C23 and Insert Name Define name Y_Bar for cell $C$23. Select cell B23 and Insert Name Define name X_Bar for cell $B$23. Select cell range B23:C23 and Format Cells Font color Red and Bold.
  8. Edit Go To cell range D3:D22 and with cell D3 the active and high-lighted cell, enter w/o quotes the formula "=(B3-X_BAR)^2" and Edit Fill Down. Edit Go To cell range E3:E22 and with cell E3 the active and high-lighted cell, enter w/o quotes the formula "=(C3-Y_BAR)^2" and Edit Fill Down. Select cell range D3:D23 and do Format Cells Number Number Decimal places 4.
  9. Select cell D23 and enter w/o quotes the formula "=SUM(D3:D22)/(20-1)" and copy it and paste it to cell E23. Select cell range E3:E23 and do Format Cells Number Number Decimal places 6. Select cell A24 and enter VARIANCE and select cell B24 and input the formula "=D23" and copy it and paste it to cell C24. Select cell range B24:D25 and do Format Cells Number Number Decimal places 6.
  10. Select cell A25 and enter STD DEV. (for Standard Deviation) and select cell B25 and enter w/o quotes the formula "=STDEVPA(B3:B22)" and copy and paste it to cell C25. Do Format Cells Number Number Decimal Places 7 for selected cell range B25:C25.
  11. Select cell A27 and enter FORECAST and select cell B27 and enter w/o quotes the formula "=STDEVPA(B3:B22)" which is an array formula so you must press CONTROL+ SHIFT+ENTER to properly enter the array formula with brackets. That is the projection of one student's GPA given an SAT of 1290 and the rest of the population or sample data. S/he may expect a GPA of 3.170409192 once you Format Cells Number Number Decimal Places 9 (which is matched later on, exactly). Not that such an accurate GPA is necessary, but the point is to prove the formula, and that requires some accuracy.
  12. Edit Go To cell range F3:F22 and input the formula w/o quotes "=TREND(C3:C22,B3:B22,,TRUE)" which is an array formula so you must press CONTROL+ SHIFT+ENTER to properly enter the array formula with brackets.That is the projection of the Student Population's GPA's given their prior performance.
  13. Edit Go To cell range G3:G22 and with G3 the active high-lighted cell, enter the formula w/o quotes "=F3-C3" and Edit Fill Down. Do Format Cells Number Number Custom +0.0;-0.0;+0.0.

Explanatory Charts, Diagrams, Photos

  1. Create the chart (dependent on the tutorial data above). Select cell range B3:C23 and go to the Ribbon (or do Chart Wizard) and select Charts, All, scroll down to Scatter, Marked Scatter.. Then Edit cut or copy the new chart to the Charts worksheet. Do Chart Layout Chart Title - Title Above Chart and fill it in with "Line Regression - SATs vs. GPAs" (without quotes). Select Gridlines Vertical Gridlines Major Gridlines checked. Select Axis Titles Horizontal Axis Title, Title Title Below Axis and edit in "SAT Score" (without quotes). Select Axis Titles Vertical Axis Title, Horizontal Title and edit in "G.P.A." (without quotes). By convention, the Independent x variable goes on the bottom horizontal axis and the Dependent y variable goes on the left vertical axis.
  2. Locate the data marker for the Mean at {1143.75,2.755} which will read "Series 1 Point 1143.75" (1143.75, 2.8) when you're hovering over it or have clicked on it. Click on it and do menu Format, Data Point Marker Style Automatic Size 9, then Marker Fill Color Red.
  3. Return to the Data worksheet to figure out the Regression Line of Y' = mX + b, where m = the slope and b = the y-intercept. Looking at the chart data with low values below 10 on the left and values around 1000 on the bottom, one would expect a very slight decimal slope and a y-intercept close to 0. Scaled charts can be deceiving as to slope sometimes.
  4. Select cell G1 and enter Y. Select cell H1 and enter m and copy H1 and paste it to I1 via command c, select I1, and command v. Select cell H2 and enter Numerator and select cell I2 and enter Denominator. Select cell range H3:H22 and with H3 as the active highlighted cell, enter w/o quotes the formula "=(B3-X_BAR)*(C3-Y_BAR)". Edit Fill Down. Select cell range I3:I22 and with I3 as the active highlighted cell, enter w/o quotes the formula "=(B3-X_BAR)^2". Edit Fill Down. Select columns H and I and Format Cells Number Number Decimal Places 1.
  5. Select cell H23 and enter the formula w/o quotes "=SUM(H3:H22)" and Format Cells Border Black bold Outline and copy it to I23.
  6. Select cell H24 and enter m and do Format Cells Font Color Red. Copy it to cell H25 and enter b in cell H25. Select cell I24 and enter w/o quotes the formula "=H23/I23", which is the slope m, and then select cell I25 and enter the formula w/o quotes "=Y_BAR-I24*X_BAR", that is, the y-intercept b = Y_Mean-m*X_Mean.
  7. Copy cell H25 and paste it to cell I26 and enter into I26 Y' - mX + b. Select cell H27 and enter Forecast, then enter the formula into I27 w/o quotes "=I24*1290+I25". Your answer should exactly equal the FORECAST answer in B27 once you have Format Cells Number Number Decimal Places 9.
  8. Edit copy cell I26 to cell range H29:I29. Into cell H29 enter Line X and into cell I29 enter Line Y. Enter 800 into cell H30 and 1600 into cell H31. Select cell I 27 and copy its formula up in the formula bar -- do not copy the cell and paste it -- that won't work out right. Select cell I30 and paste into the editing Formula Bar the formula you just copied. Do the same for cell I31. Edit cell I31's formula to read "=I24*H31+I25" and press enter and then edit cell I30's formula in the formula bar to read "=I24*H30+I25" and press enter . I30'S result should be 1.7786108729206 and I31's result should be 4.05093465957812 (which I realize is higher than a 4.0 GPA, but we're creating a regression line, so that doesn't matter so much).
  9. Activate the Chart worksheet and click in the chart and in the menu, do Chart Add Data and in response to the range query, go back to the Data worksheet and select cell range H30:I31. Now mine comes out wrong and I have to edit the series. Not a big deal. Select the data marker and in the formula bar, edit the series to read "=SERIES(,Sheet1!$H$30:$H$31,Sheet1!$I$30:$I$31,2)" and click on the data marker at {800,1.78} and make Line red and Weight 0.75 pt, then Marker Fill Color Red, then Marker Style Round Dot size 5.

Helpful Guidance

  1. Make use of helper articles when proceeding through this tutorial:
    • See the article How to Create a Spirallic Spin Particle Path or Necklace Form or Spherical Border for a list of articles related to Excel, Geometric and/or Trigonometric Art, Charting/Diagramming and Algebraic Formulation.
    • For more art charts and graphs, you might also want to click on Microsoft Excel Imagery, Mathematics, Spreadsheets or Graphics to view many Excel worksheets and charts where Trigonometry, Geometry and Calculus have been turned into Art, or simply click on the category as appears in the upper right white portion of this page, or at the bottom left of the page.

Warnings

  • ERRORS: If you have errors or error values, either the sheet in incomplete and needs further input or Lookup Tables for critical variables or perhaps you've made a mistake somewhere along the line. If the instructions have been completed and there are still errors, select the cell that has the error value that is furthest left and topmost first. Look for a typo in a formula or unmatched parentheses. Possibly, a Defined Name is wrong -- they need to be input into the formulas exactly as they were defined. Do Insert Name Define to check. If you have DIV/0!, I do not, so look for a variable that somehow did not get filled in with a value perhaps. At any rate, what you want to do is select the cell with the error, and after checking all those typical errors, do Tools Auditing Trace Precedents and/or Trace Error. If fixing all the topmost leftmost errors does not fix the rest of your errors on your worksheet, you may need to do it the hard way, from the bottom right upwards then leftwards; that is the slow but sure way to fix all errors.
  • Also, errors in your chart data will most likely plot as zeroes. This may be acceptable or desirable even. However, if too many lines (or curves) are returning to 0, it may indicate a logical flaw in the data -- or too many tiny values and then perhaps rescaling the chart is needed by inspecting the horizontal and vertical axes and changing them to zero in on the problem. Hover over or click on a data marker on the series plot and then do a search in the proper column by value for that value, and identify its precedents.

Things You'll Need

  • Microsoft Excel. The above was done with MicroSoft® Excel® for Mac 2011, v 14.3.1 and other versions are compatible, except for Reflection, Shadow, Glow and other more recent Effects additions.

Related Articles

Sources and Citations

  • "Statistical Analysis with Excel for DUMMIES", Joseph Schmuller, 2005, Wiley Publishing Inc., Hoboken, NJ, ISBN 0-7645-7594-5, LOC Card #2005920688, [pps. 219-252 (Ch. 14)]
  • The source file for this article is "Linear Regression -Brief Lesson.xlsx".