1. Section 1: Introduction
What is the data science?
- drawing useful conclusions from data using computation as our primary tool.
And data science, as a practice, has three core activities:
- Exploration is figuring out what patterns exist in the data.
- When you have many observations about some phenomenon, what can you conclude about the phenomenon itself?
- Oftentimes instead of just looking at large tables of numbers, we'll draw data visualizations because it's much easier to interpret lot of information at once if it's portrayed in some kind of visual way.
- Once we've found a pattern, we need to perform statistical inference, and that's because some patterns are there just by chance and some are there because they're a reflection of some underlying process that's really interesting about the world.
- So the goal of statistical
- inference is to quantify whether
- the patterns that we observe
- during the exploration phase are
- reliable.
- If we collected more data, would
- we see this pattern again or
- not?
- The primary tool we have is
- randomization because by
- simulating random processes, we
- can see what kinds of patterns
- appear just by chance.
- And if the pattern we observe is
- not the kind of thing that could
- just appear by chance, then we
- can conclude that it's because
- of some robust or reliable
- pattern in the underlying
- phenomenon we want to study.
- And finally, we'll perform
- prediction.
- This is where we have partial
- information about something we
- want to know, and we want to
- guess about the things we don't
- know yet.
- Here we're making informed
- guesses, quantitative guesses
- using a discipline called
- machine learning.
- Normally when we write programs,
- we just focus on the particular
- logic of what the computer
- should do, but machine learning
- is about not programming every
- detail, but instead using the
- data to make decisions or choice
- within that program.
- So when we write a program, for
- instance, to recognize speech or
- automatically translate
- languages or control a car or a
- robot, we don't actually write
- down all the details of what to
- do, but instead use examples
- from the world to help computers
- automatically learn how to
- behave.
- And that's a form of prediction,
- one that we'll talk about in
- this course.
- And these three stages
- correspond to how we'll approach
- the material in this course.
- We'll first talk about how to
- identify patterns, then we'll
- talk about quantifying whether
- those patterns are reliable.
- And finally, based on the
- patterns we've discovered, the
- reliable ones can help us make
- informed guesses about the
- information that we wish we
- knew.
- Once you can do all that,
- you're well on your way to being
- a data scientist.
- Now in the process of doing all
- these things, it's important
- that you learn how to program a
- computer, because computing
- underlies each step of the way
- and learning to program is just
- an essential part of
- participating in this
- discipline.
2. Section 2: