Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Let's get started!
Pre-requisites
Get the code
OR
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
[MUSIC]
0:00
Hello, I'm Craig, and I'm a developer.
0:09
In this course,
0:11
we're going to to be exploring
the wonderful data library, pandas.
0:12
Now, pandas is a portmanteau,
or a combination of two words,
0:15
in this case, the words panel and data.
0:19
Panel data is data that
is multidimensional,
0:21
involving measurements over time.
0:23
pandas are also an adorable creature, and
I hope that you're here for the former,
0:25
but I totally understand that I might
have clickbaited you into the latter.
0:29
pandas provides fast, flexible, and
expressive data structures that have been
0:33
designed to make working
with relational or
0:37
labeled data not only easy,
but also intuitive.
0:39
It's the fundamental high level
building block for doing practical and
0:42
real world data analysis in Python.
0:46
Before we get cooking, let's make
sure that we're on the same page.
0:48
There are definitely some
prerequisites for this course,
0:51
so please double check
that you're all caught up.
0:54
The most important of
the prerequisites is NumPy.
0:56
I'd like to make sure that you had a nice
introduction to the NumPy library.
0:59
pandas relies heavily on NumPy, and
1:03
I'm going to assume that you have a basic
understanding of its overarching concepts.
1:05
Now, don't worry if it's been
a while since you've used it,
1:10
we'll retouch upon the concepts
that you need here in just a bit.
1:13
Don't forget to check the teacher's
notes that are attached to each video.
1:15
I'll try and
remind you to look in there, but
1:19
please do get in the habit of
checking that section out.
1:21
Lots of great information is tucked away
in there waiting for you to dig into it.
1:23
In this course,
I'm gonna try a new approach.
1:27
In an effort to give you more practice
of how data professionals interact,
1:30
I'm going to rely more heavily
than usual on Jupyter notebooks.
1:34
As you are most likely already aware,
1:37
Jupyter notebooks are a great
place to capture your learnings.
1:39
They're also intended to be used for
teaching.
1:42
I've gone ahead and build up some
interactive content that will assist you
1:45
in exploring the pandas library.
1:48
In the Treehouse app,
1:50
you'll encounter these notebooks
as textual instruction steps.
1:51
I've included information in the teacher's
notes about how to get a hold of
1:54
the notebooks so that you can run them and
follow along locally.
1:57
I'd love for you, as a lifelong learner,
2:00
to get in the habit of exploring
every notebook that you come across.
2:02
Use it to poke around as
you learn a new library,
2:06
much like you might expect
to use the Python shell.
2:08
Explore the API and practice different
approaches, and most importantly,
2:11
keep your own notes.
2:15
A common data science workflow
involves multiple stages.
2:16
First you clean the data and
then you analyze and model it.
2:19
And finally, you organize the results
of the analysis into either a graph or
2:22
a table.
2:26
Great news, pandas can do all that,
the entire workflow.
2:27
Even better news,
it's really a pleasure to use.
2:30
Since you already have a fundamental
understanding of the numerical library,
2:34
NumPy, pandas is going to
feel very familiar to you.
2:38
In fact, pandas sits directly on
top of NumPy like a little hat.
2:41
I don't know about you, but
2:45
one of the things that I have trouble
with in NumPy is when I have an array.
2:46
I never know just which value is which.
2:49
Like for instance, in this array here, I
don't really know who got the high score.
2:53
I have to remember that Robbie is
the first one here at index zero.
2:58
But I just have to know that.
3:02
pandas gives you a new ability,
you can label each value.
3:03
It's like a dictionary, a key and a value.
3:07
And that works great for
a single dimension.
3:09
This example is the series of high
scorers for a single game, Donkey Kong,
3:11
labeled by players' initials.
3:16
But as you know, we often want
to have multidimensional data.
3:18
We could track more games by
adding a new game dimension,
3:21
like we could add Pac-Man scores.
3:25
But now we have to remember two indexes,
and
3:27
I have to remember that index zero is
Donkey Kong and index one is Pac-Man.
3:29
Now, again,
pandas does a great job with labeling.
3:34
You can also label each of these columns,
so you end up with tabular data.
3:37
The two-dimensioned data structure
here is known as a data frame.
3:41
This is a data frame of high scores
on multiple games indexed by players'
3:46
initials.
3:49
And that ought to feel pretty familiar,
assuming you've used tabular or
3:50
table based data before,
like a spreadsheet or a database table,
3:54
anything with rows and columns.
3:58
With pandas,
you can put any sort of data in there too.
4:00
It doesn't have the same
restrictions like NumPy did.
4:03
pandas also lets you
relate datasets by label.
4:05
So you can merge and
4:08
join together related information
in a very straightforward manner.
4:09
pandas is a full-featured library, and we
simply won't be able to get to all of its
4:13
amazing powers in this
introductory course.
4:18
I do hope to give you a firm foundation
and guide you to where you can learn and
4:20
practice more.
4:24
For this course, I'm gonna ask that you
imagine that there is a new company
4:25
in town jumping in on that social banking
app craze, like Cash App or Venmo.
4:29
They call themselves Cash Box.
4:32
Basically the way that their app works is
that a user signs up, chooses a username,
4:34
and then they can send money to other
users of the system by their username.
4:39
Now, a common use case for
their app is when it's lunch time and
4:43
people don't have cash on them.
4:46
Their users can just send money through
Cash Box to the person picking up
4:47
the bill.
4:51
Now, each user on Cash Box keeps
a balance of their funds, and
4:51
the app tracks their transactions.
4:55
Good news, Cash Box is hiring and they
are looking for a junior data scientist.
4:57
They've sent out a hiring challenge and
5:03
access to a sample of
some of their datasets.
5:05
So what do you say we
explore their data sets and
5:07
pick up some job skills along the way?
5:09
Let's get ready to rock the Cash Box.
5:12
[LAUGH] Good thing we aren't applying
to be part of the marketing department.
5:14
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up