Welcome to the Treehouse Community

Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.

Looking to learn something new?

Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.

Start your free trial

Databases Querying Relational Databases Joining Table Data with SQL Review and Practice

Join returning seemingly too many (duplicate) rows.

-- Use a JOIN to select all patrons with outstanding books. Select their first name and email address.
SELECT p.first_name, p.email FROM patrons as p 
INNER JOIN loans as l ON p.id = l.patron_id
WHERE l.returned_on IS NULL;

1) patron names are being returned multiple times, once for each row in loans where that patron has an outstanding loan (where returned_on IS NULL). It doesn't seem like this can possibly be the results set we want (confusing duplicates included). Can the JOIN or WHERE be written differently to have each relevant patron name returned just once?

2) Adding DISTINCT seems to solve the problem of multiple, duplicate rows...

SELECT DISTINCT  p.first_name, p.email FROM patrons as p 
INNER JOIN loans as l ON p.id = l.patron_id
WHERE l.returned_on IS NULL;

BUT... is this "fix" legitimate/correct? If this were a real-life table with thousands of rows, I wouldn't be able to visually check and confirm the result. What if the patrons table had multiple "Dave"s (for example), who had oustanding loans, then I'm pretty sure the DISTINCT "trick" would NOT work, is this correct?

This version will give ALL the patron names needed (even if/when there are duplicate first names), BUT it selects the id field from patrons, which we don't really want. This is the closest to seemingly-correct query that I can get, so far.

SELECT DISTINCT  p.id, p.first_name, p.email FROM patrons as p 
INNER JOIN loans as l ON p.id = l.patron_id
WHERE l.returned_on IS NULL;

3) Is there a place we can go to see some possible "correct answers" for these challenges/exercises?

1 Answer

Steven Parker
Steven Parker
230,178 Points

You can trust DISTINCT to remove all duplicates from a result set of any size. Even if there were multiple "Dave"s, the result rows would still be distinct based on their emails.

There's often no single "correct answer" for a challenge or exercise. For example, another approach to this exercise could omit both the DISTINCT and the JOIN by using a sub-query:

SELECT first_name, email FROM patrons
WHERE id in (SELECT patron_id from loans WHERE returned_on IS NULL);

But note that your own solution is more appropriate, since JOINs are the topic of this stage.