Statistics: Data Distribution

Mean, Median, Mode, Range

Mean, median, mode, and range are values that describe data. Mean and median are measures of central tendency and provide a number to represent the central position of data in a set. Mode is also a measure of central tendency and can be a numerical or nonnumerical value and represents the data item that occurs the most. The range describes data by identifing the difference between the high and low data values.

For a given data set, It’s important to know which measure of central tendency best describes the data. The presence of outliers may skew the mean to be a higher or lower number resulting in the median or mode to be the best measure.

This data set represents a student’s test scores: 59, 88, 92, 94, 95, 96, 100

To measure the central tendency of this data set the median of 94 is the best measure. Because of the outlier, 59, the mean is skewed lower, and there is no mode.

• mean: 89
• median: 94
• mode: none

Quartiles and Box and Whisker Plot

Quartiles divide numerical data into four groups or quartiles. Values are identified as: minimum, first quartile, second quartile (median), third quartile, and maximum.

A box and whisker plot is used to graphically represent quartile data.

From a series of basketball games, here are the scores for a team:

• 87 (minimum)
• 92
• 103 (median of lower data)
• 104
• 108
• 108
• 109
• 112 (median of upper data)
• 116
• 118 (maximum)

The box and whisker graph shown here is used to display the data with a box around the interquartile range of $q{3}-q{1}$ with end points shown on the minimum and maximum values.

Standard Deviation

The standard deviation measures the distance of a data value from normal. The symbol for the standard deviation is the Greek letter sigma.

The formula for the standard deviation is the square root of the variance: $\sigma =\sqrt{variance}$

This data sets list the height in inches of 5 students.The mean is equal to 59.6: 63, 64, 58, 48, 65.

Here is the calculation of the standard deviation equal to the square root of the variance. The variance is the average of the squared differences from the mean.

\begin{align} \sigma &=\sqrt{(3.4^{2} +4.4^{2} + (-1.6)^{2} + (-11.6)^{2} +5.4^{2})\div5}\\ \sigma &=\sqrt{(11.56 + 19.36 + 2.56 + 134.56 +29.16)\div5}\\ \sigma &=\sqrt{197.2\div5}\\ \sigma &=\sqrt{39.44}\\ \sigma &= 6.28 \end{align}

Scatter Plots, Line of Best Fit, and Correlation

Scatter Plots are used to determine if there is a correlation between two sets of data. If data points are shaped around the line of best fit, a correlation is indicated. The line of best fit may pass through some of the points, none of the points, or all of the points graphed on a scatter plot.

When the data points are shaped around the line of best fit and slanted upward, the relationship is a high positive correlation. If the shape is slanted downward around the line of best fit, the relationship is a high negative correlation. If the data is not shaped around the line, the correlation can be low positive, low negative, or no correlation at all.

This data is shaped around the line of best fit and slants upwards, so there is a high positive correlation between the two sets of data.

Algebra equations have recognizable graphs. Learning to match equations to graphs can help you save time when completing algebra problems.

The graph of a linear equation is a straight line. $y=mx +b$

The graph of a quadractic equation is in the shape of a parabola. $y= ax^{2} + bx +c$

The graph of an exponential function is graph is asymptotic to the x-axis . The graph gets close to the x-axis but does not touch it. $y=b^{x}$

Videos in this Topic

Statistics: Data Distribution (7 Videos)

Worksheets in this Topic

Statistics: Data Distribution (7 Worksheets)