Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Development Tools

michaelmugo
michaelmugo
3,971 Points

I don't understand the combination of JOIN with GROUP BY

So I understand how GROUP BY works to combine result-sets by columns, and it makes sense coming from the same table: reviews.

SELECT movie_id, MIN(score) AS minimum_score, MAX(score) AS maximum_score,
AVG(score) AS average, 
FROM reviews GROUP BY movie_id;

But later on in the video, we SELECT title to show the results in a more human-understandable form:

SELECT title, MIN(score) AS minimum_score, MAX(score) AS maximum_score,
AVG(score) AS average, 
FROM movies JOIN reviews ON movies.id = reviews.movie_id 
GROUP BY movie_id;

How exactly are the numeric functions still able to get their data, isn't the movies table being used where there's no score ?

Maybe I'm not understanding it properly, but as far as I've learnt the numeric functions use the data using the FROM some_given_table

1 Answer

LaVaughn Haynes
LaVaughn Haynes
12,397 Points

It looks like you are joining 2 tables: Movies and Reviews. The Movie table has titles and the Reviews table has scores. When you join 2 tables it's like you create a virtual 3rd table. So if you had the tables

Movies

ID  |  TITLE
-------------------------
1   |  The Avengers
-------------------------
2   |  The Happening

Reviews

ID  |  SCORE
-------------------------
1   |  100
-------------------------
2   |  50

If you join them

SELECT title, score FROM movies JOIN reviews ON movies.id = reviews.id

Then you really get a table like this as the result

TITLE             |  SCORE
-------------------------------------
The Avengers      |  100
-------------------------------------
The Happening     |  50

So in the original SQL statement from your question, the MAX(), MIN(), AVG() functions are not operating on data from the movies table, it's operating on the data from the reviews table. You just have the data from both tables being queried and returned to that 3rd virtual result table in the same statement. It might be less confusing for you if you used the more specific dot syntax names in your query like this:

SELECT movies.title, MIN(reviews.score) AS minimum_score, MAX(reviews.score) AS maximum_score...

I don't know if that makes sense to you or not?

michaelmugo
michaelmugo
3,971 Points

So, the numeric functions are dealing with the whole result-set and not just the movies table?

LaVaughn Haynes
LaVaughn Haynes
12,397 Points

I would say more that the numeric functions are acting on the rows of the reviews table and pushing that into the result set

SELECT movies.title, AVG(reviews.score)

So above I'm saying, "give me the title from the first table, and an average of all of the scores from the second table"

The only reason that you can write AVG(sore) instead of just AVG(reviews.score) is because there is no score column in the movies table, so you don't have to be specific. But honestly I'm always specific because I feel like it's more clear if I have to look at a query months down the line.