Join returning seemingly too many (duplicate) rows.

Question

-- Use a JOIN to select all patrons with outstanding books. Select their first name and email address.
SELECT p.first_name, p.email FROM patrons as p 
INNER JOIN loans as l ON p.id = l.patron_id
WHERE l.returned_on IS NULL;

1) patron names are being returned multiple times, once for each row in loans where that patron has an outstanding loan (where returned_on IS NULL). It doesn't seem like this can possibly be the results set we want (confusing duplicates included). Can the JOIN or WHERE be written differently to have each relevant patron name returned just once?

2) Adding DISTINCT seems to solve the problem of multiple, duplicate rows...

SELECT DISTINCT  p.first_name, p.email FROM patrons as p 
INNER JOIN loans as l ON p.id = l.patron_id
WHERE l.returned_on IS NULL;

BUT... is this "fix" legitimate/correct? If this were a real-life table with thousands of rows, I wouldn't be able to visually check and confirm the result. What if the patrons table had multiple "Dave"s (for example), who had oustanding loans, then I'm pretty sure the DISTINCT "trick" would NOT work, is this correct?

This version will give ALL the patron names needed (even if/when there are duplicate first names), BUT it selects the id field from patrons, which we don't really want. This is the closest to seemingly-correct query that I can get, so far.

SELECT DISTINCT  p.id, p.first_name, p.email FROM patrons as p 
INNER JOIN loans as l ON p.id = l.patron_id
WHERE l.returned_on IS NULL;

3) Is there a place we can go to see some possible "correct answers" for these challenges/exercises?

Answer 1 · 2022-03-01T23:07:29Z

March 1, 2022 11:07pm

You can trust DISTINCT to remove all duplicates from a result set of any size. Even if there were multiple "Dave"s, the result rows would still be distinct based on their emails.

There's often no single "correct answer" for a challenge or exercise. For example, another approach to this exercise could omit both the DISTINCT and the JOIN by using a sub-query:

SELECT first_name, email FROM patrons
WHERE id in (SELECT patron_id from loans WHERE returned_on IS NULL);

But note that your own solution is more appropriate, since JOINs are the topic of this stage.

Welcome to the Treehouse Community

Looking to learn something new?

Fred Russell

Fred Russell

Join returning seemingly too many (duplicate) rows.

1 Answer

Steven Parker

Steven Parker