- Scatter Plot Definition
- Common Core Reference
- Scatter Plot Exercises: Let's practice
while having fun
Unlock this video in just a few steps, and benefit from all sofatutor content:
Basics on the topic Scatter plots
Scatter Plot Definition
A scatter plot is a graphical way of presenting a set of data consisting of ordered pairs, usually using Cartesian coordinates.
A scatter plot is drawn by plotting all the ordered pairs in the given set of data. The main objective is to show the relationship between the variables. If the points on the graph are grouped together, we can say that there is a strong correlation between the two variables.
- The correlation is positive when both variables increase together.
- It is negative when one variable decreases as the other increases.
When points are grouped together, we can draw a trendline or the line of best fit. From the line of best fit, we can predict the expected value for one variable given another from the other. Learn how to make and analyze scatter plots by helping Billy Fakespeare the Ghost plan his 400th birthday party.
Common Core Reference
Scatter Plot Exercises: Let's practice
Understanding scatter plots
A scatter plot is a type of data visualization that uses dots to represent the values obtained for two different variables - one plotted along the x-axis and the other plotted along the y-axis.
Scatter plots are used to determine the relationship between two variables. They can show whether the variables are correlated and the strength and direction of the correlation.
Plotting Points on Scatter plots
The scatter plot will show a positive correlation.
The scatter plot will show a negative correlation.
Interpreting Scatter Plots
The more hours students study, the higher their test scores tend to be.
The more hours students spend watching TV, the lower their test scores tend to be.
Drawing Lines of Best Fit
The line of best fit is a straight line that best represents the data on a scatter plot. It shows the trend in the data.
It will slope upwards from left to right.
Using Scatter Plots for Predictions
Based on the trend, a student who studies for 6 hours might score higher than 80%.
The 10-year-old car will likely have a lower resale value than the 5-year-old car.
Challenges with Scatter Plots
Just because two variables have a correlation doesn't mean that changes in one variable cause changes in the other. There might be other factors at play.
With a weak correlation, predictions may not be very accurate because the relationship between the variables isn't strong.
Application of Scatter Plots
The shop owner might decide to stock up on more ice cream during days with more sunlight, expecting higher sales.
The store owner might expect an increase in sales of winter coats.
Transcript Scatter plots
Poor Billy Fakespeare the Ghost - his Medieval Party was a bust. Hardly any ghost guest showed. But, to celebrate his 400th birthday, he’s determined to have a big Luau themed shindig with lots and lots of guests. To plan the perfect party, he uses scatter plots.
On a Cartesian plane, scatter plots are used to show the relation between variables to identify trends. Take a look at this scatter plot – it shows the relation of the popularity of a DJ to the number of guests attending a party. For example, a DJ with a 50 percent popularity rating had 200 guests in attendance and a DJ with a popularity rating of 80 percent had 350 guests. The graph indicates a trend: The more popular the DJ, the greater the attendance at the party. Notice the points on the graph are grouped together - this indicates a high correlation.
And since both variables increase together, the correlation is positive. When points are grouped together, you can draw a 'trend line' also known as the 'line of best fit'and by using any two points that lie on or near the line, you can calculate the slope of the line. And then use the slope and one of the known points to write an equation for the trend line. For this line, using slope equal to 5 and the ordered pair 50 and 200, we can figure out the equation of the line. You can also use the trend line to predict unknown values for 'x' and 'y'. For 'x' equal to 20, we can determine that 'y' is equal to 50 is a better prediction than 'y' is equal to 300.
Fakespeare thinks he’s got the entertainment for the party all figured out. He invites DJ Mozart to rock the house, but he wonders, is music enough? What about games? He does some research. Take a look at the table. Is there a trend between the number of silly party games and party attendance? Let’s design a scatter plot. For the x-axis, list the number of games, and for the y-axis, list the attendance. Now, plot the order pairs. Hmmm, the points are grouped together, so the data is highly correlated, but as the number of games increases, the number of guests decreases and this indicates a negative correlation.
When there is a negative correlation, as one variable increases, the other decreases. You don’t need to be a genius to figure out that party games are a terrible idea, so Fakespeare decides, there will be no party games. What about refreshments? Will having tropical drink umbrellas make people want to come to the party? Let’s take a look at the scatter plot and see if there's a trend. The points on the graph are very spread out, so there is no correlation and no trend. Tropical drink umbrellas might not increase attendance, but they won’t have an adverse effect either, so Fakespeare orders a case just because he likes them. It seems as though Fakespeare has got everything under control, but do you? Let’s make sure you are good to go with scatter plots.
When the data is spread out with no pattern, that means there is little to no correlation and no trend. Althought this scatter plot shows the points grouped together, there is no trend. If the line of best fit is horizontal that means that what we measure on the x-axis has no influence on what we're measuring on the y-axis. What if the line of best fit is vertical? Since the slope of a vertical line is undefined, there is no correlation and no trend. One last note: If there is a correlation, don’t automatically jump to the conclusion that there is also a trend. You will need to use common sense because sometimes a correlation is not causation – meaning, one thing does not necessarily cause the other. Take a look at this example. Based on the trend line you might think the house number and party attendance are related, but that’s coincidence, not a trend. When interpreting trends, remember to use common sense. Fakespeare’s party is a huge success! Too bad though. none of the photos that were snapped lasted very long, maybe they're on to something?
Scatter plots exercise
Summarize your knowledge about scatter plots.Hints
We examine if the value of $x$ has an impact on the value of $y$.
A line is given by the equation $y=mx+b$, where $m$ is the slope and $b$ is the $y$-intercept.
A line with a positive slope is increasing. This one has a negative slope and is decreasing.Solution
On a coordinate plane scatter plots are used to show relationships between variables in order to recognize trends.
Take the first scatter plot as an example: It shows the impact of the popularity rating of a DJ to the number of guest attending a party. A DJ with a 50% popularity rating has 200 guests in attendance and one with 80% leads to 350 guests.
So we can assume a trend (correlation): The higher the DJ popularity rating the higher the number of guests.
The points on the graph are grouped closely together. This indicates a high correlation. In this case the correlation is positive.
So you can draw a trend line, also called the line of best fit.
For the line of best fit you can calculate the slope as well as the $y$-intercept using two given points on the line.
Interpret the different scatter plots.Hints
Is there a line that fits the given data? If so, this line is given by the equation $y=mx+b$. Where $m$ is the slope.
An increasing line of best fit has a positive slope and thus a positive correlation.
If the data isn't grouped at all there is no correlation.
If the data doesn't change depending on $x$, that means a line of best fits parallel to the $x$-axis, there is no correlation.Solution
Let's consider the diagrams from left to the right:
- When the date is spread out with no pattern we can conclude that there is no correlation and no trend.
- But even if data is grouped together we can't conclude a correlation. If the line of best fit is horizontal we have then measure on the $x$-axis has no influence on what we're measuring on the $y$-axis. Therefore, no correlation exists.
- If the line of best fit is a vertical line, the slope is undefined. Thus, we have no correlation and no trend.
- An increase in grouped data from left to right represents a positive correlation
- A decrease in grouped data from left to right represents a negative correlation.
Draw a scatter plot.Hints
Pay attention to the labelling of the $x$- as well as $y$-axis.
If you want to draw the point $(220,190)$ draw a line parallel to the $x$-axis passing $y=190$ and one parallel to the $y$-axis passing $x=220$. The intersection of those lines is the wanted point.
The age is represented by $x$, while the number of friends is represented by $y$.Solution
Here you see the resulting scatter plot. To each age ($x$) there is a number of friends ($y$) given. So we can conclude, in total seven, ordered pairs, which you can see in this diagram from left to the right:
Let's have a look at $(280,140)$:
- Draw a line parallel to the $x$-axis passing $y=140$.
- Draw a line parallel to the $y$-axis passing $x=280$.
- The intersection of those lines is the wanted point.
Interpret the given scatter plot.Hints
An increasing line of best fit stands for a positive correlation.
The $x$-axis represents the amount of effort needed and the $y$-axis represents the amount of fun had.Solution
Let's pick some pets:
With turtles, the effort they take isn't so much... however, the resulting amount of fun isn't too high either.
With cats and dogs, perhaps the most beloved pets, the effort for a cat is a little bit less than the effort for a dog. According to this diagram, the fun is also a little bit less for a cat than for a dog. But perhaps cat lovers wouldn't agree.
The pets which take the most effort are the horses, and they are also the animal which are the most fun.
We can conclude that the data seems to be grouped, and that the line of best fit is increasing. So we have a positive correlation. So, the higher the effort the higher the fun and vice versa.
Determine the slope-intercept form of the line of best fit.Hints
Use this formula to find the slope.
Use the slope-intercept form of a line ($y=mx+b$) to find the $b$ term by plugging in either point as $x$ and $y$.
"DJ with a 50% popularity rating has 200 guests in attendance" can be represented by the ordered pair $(50, 200)$.
"DJ with 80% popularity leads to 350 guests" can be represented with the ordered pair $(80, 350)$.
$(50, 200)$ this point gives us $x_1 = 50$ and $y_1 = 200$.
$(80, 350)$ this point gives us $x_2 = 80$ and $y_2 = 350$.Solution
Any linear equation can be expressed in slope intercept form as $y=mx+b$.
- We first determine the slope $m$ by the formula:
- So we need two points. Those are given by the information of the impact of 50% (80%) popularity rating on the number of guests 200 (350).
- So we have two points $(50,200)$ and $(80,350)$. Now we put the coordinates of those points in the formula above to get
- Subtracting $250$ results in the y-intercept $b=200-250=-50$.
Explain what kind of data you can represent in a scatter plot.Hints
Here you see an example of a bar graph.
An ordinal data set is one where each data point is assigned a numerical quantity which establishes an ordering on the entire set of data.
A nominal data set is one where each data point is assigned to a distinct category, which does not provide a measurement or order on the set of data.
The bar graph represents a nominal data set.
Let's have a look at an example: if three people lived in house number $1$, four people lived in house number $2$, five people lived in house number $3$, and so on, then we couldn't conclude that the house number tells us anything about the number of people living in that house.Solution
Scatter plots are used to show relations between variables to recognize trends. The data use must be ordinal in order to make a scatter plot, as there must be a way to order the points so that they can be compared.
You can use scatter plots to try to find correlations. However, a positive (or negative) correlation doesn't have to imply a trend. For example, if three people lived in house number $1$, four people lived in house number $2$, five people lived in house number $3$, and so on, then we couldn't conclude that the house number tells us anything about the number of people living in that house.
For ordinal data, bar or line graphs can also be used as well.
nominal data cannot be represented with a scatter plot, so bar graphs are usually used instead.