Skip to content
This repository has been archived by the owner on Jan 8, 2023. It is now read-only.

Latest commit

 

History

History
154 lines (103 loc) · 8.64 KB

README.md

File metadata and controls

154 lines (103 loc) · 8.64 KB

Introduction to Python For Data Science

This repo contains the teaching material for the Introduction to Python (and useful libraries) masterclass at the Data Science Retreat.

Table of Content

About me

Slides for this section can be found here.

The Python Programming Language

Slide deck for this entire section is available here.

Why Python?

Slides on this topic start here

Python for DS Components

Slides on this topic start here

Python 2 vs. Python 3

Slides on this topic start here

A great notebook covering the main differences has been written by Sebastian Raschka.

To keep your code compatible with both Python 2 and Python 3, you might also want to use this Cheat Sheet.

Installing Python and all useful packages

Slides on this topic start here

Tools for writing Python Code (from Kristian)

Python shell

The most basic interactive Python command line, where each line starts with a >>>.

IDLE

Standard editor in Python distributions, easy to use but very basic.

IPython

A more sophisticated interactive Python command line. It incorporates tab-completion, interactive help and regular shell commands. Also look up the %-magic commands.

Spyder

Spyder is part of the Anaconda Python distribution. It is a small IDE mostly for data analysis, similar to RStudio. It automatically highlights Syntax errors, contains a variable explorer, debugging functionality and other useful things.

Jupyter Notebooks

Interactive environment for the web browser. A Jupyter notebook contains Python code, text, images and any output from your program (including plots!). It is a great tool for exploratory data analysis.

Sublime2

A general-purpose text editor that works on all systems. There are many plugins for Python available. There are a free and a commercial version available.

Atom

The Open Source cousin of Sublime2.

PyCharm

PyCharm is probably the most luxurious IDE for Python. It contains tons of functions that are a superset of all the above. PyCharm is a great choice for bigger Python projects.

Notepad++

If you must use a text editor on Windows to edit Python code, refuse to use anything worse than Notepad++.

Vim

I know people who are successfully using Vim to write Python code and are happy with it.

Emacs

I know people who are successfully using Emacs to write Python code, but haven't asked them how happy they are.

Running the IPython interpreter and a python file

Slides on this topic start here

Jupyter Notebook

A live demo will be given during the masterclass.

Experiment further with the IPython Notebook environment with this Jupyter Notebook. Try to clone or download it, before opening it, running and modifying its cells.

Many more Jupyter features in this blog post.

Python basics

Times to get your hands dirty. Read and test for yourself the examples provided in: The SciPy Lectures -- The Python Language.

Practice those examples using alternatively python files, the IPython interpreter and an IPython Notebook.

To practice:

Pandas

Intro tutorials on pandas basics

Data munging with pandas

NumPy and Matplotlib

NumPy

Start with the official NumPy Tutorial. Note: if this link returns an error, move to the PDF version.

Move on to these exercises.

Matplotlib

Learn the basics and some more advanced plotting tricks in Matplotlib with this hands-on tutorial.

Scikit-learn and your first Data Science case

Scikit-learn

Your first data science case

A great source of data problems nowadays is the Kaggle platform. We'll be starting today with a simple but representative dataset: Titanic: Machine Learning from Disaster.

  • Guide for orientation to approach the problem

IMPORTANT: you will find plenty of materials to analyze this data, however you'll learn the most if you give the problem some thought and try out several things before resorting to ready-made answers.

SciPy

SciPy is a collection of mathematical algorithms and convenience functions built on the Numpy extension of Python. Here is a hands-on overview of this collection, together with practical exercises and more advanced problems.

For those willing to go further on the statistical aspects of SciPy, I recommend having a look at these IPython Notebooks on Effect Size, Random Sampling and Hypothesis Testing.

License

This repository contains a variety of content: some developed by Amélie Anglade, some derived from or largely inspired by third-parties' work, and some entirely from third-parties.
The third-party content is distributed under the license provided by those parties. Any derivative work respects the original licenses, and credits its initial authors.

Original content developed by Amélie Anglade is distributed under the MIT license.