Learning Data Science: Day 1 - Environment and Python

As promised, today I will continue my journey of learning Data Science. So, let’s get started.

Image taken from DeZyre

Okay, we are ready. Let’s jump on to Data Science materials. Whoa, hold on. Before that, we have to consider our fluency in Python. Why Python? Recently Python has gathered a lot of interests as the primary choice for Data Science and Analysis. Since I am not familiar enough with Python itself, I think I should revisit a lot of lessons about Python. I’m not going to cover the introduction, history, tips, or trick of Python, so for those of you who came here expecting that I’m so sad to see you disappointed. So here is what I learned today about Python.

Python 2 or Python 3

Image taken from Learn to Code with Me

I’m not going to pick any winner here. The are pros and cons of using either one of them. There is no right or wrong choice in this case. To help you choose which version you should use, the link below more or less may help you.

http://sebastianraschka.com/Articles/2014_python_2_3_key_diff.html

I personally choose Python 3 at this point and throughout the day I haven’t found any definite reason to switch to Python 2. But, feel free to use Python 2.

Installing Python with Anaconda (Optional)

This section is an optional one, but I personally recommend installing Python with Anaconda. It’s complete and by far the easiest way to set everything up. It is available with Python 2.7 or Python 3.5 to date. So whichever you choose in the previous section, will still let you to use Anaconda. It packs with everything you will need later. But, if you like to live dangerously, you obviously can install all of it one by one by yourself (that’s what I did several years ago).

What I learn next

Actually, we didn’t really have to learn an advanced use of Python. There are 2 resources that I use to learn today.

1. Dataquest

Dataquest

I’m not affiliated with Dataquest in any mean, I just found that their Python courses are easy to understand especially for beginners. It taught a lot of fundamental in Python programming including Files, Loops, Boolean, List, Dictionaries, Function, Modules, Classes, Error Handling, Regex, etc. This course is useful especially if you’re not from Computer Science or IT related background.

2. CS109 Data Science

As for the alternative, I found a good Data Science course (CS109 Data Science). It is accessible here: http://cs109.github.io/2015/

The lesson about Python itself from the course is available here: https://github.com/cs109/2015lab1

Similar to the previous link, it also provides fundamental of Python. However, the lesson is a bit straight forward so for those of you who are not familiar with Python at all might have difficulties in understanding the material. One of the good thing about this course is because it also covers an introduction to Pandas, one of useful data analysis library. I’m personally recommend you to take both of them if you’re not sure. If you’re already familiar with Python I suggest you take this one instead. The files provided in this course requires you to use Jupyter Notebook. For those of you who choose to use Anaconda distribution will have an easy time here since it’s already available. One quick note for this course is if you choose Python 3, you are going to have some hard time due to the available codes are in Python 2. But I do think migrating those Python 2 codes to Python 3 codes might give you a bit advantage because you will learn more. But, for those of you who want an easy way, I suggest you use Python 2.

Final Words

For all of you who wondering how much time you should spend on this courses. It depends on yourself, for people who are not familiar with Python I suggest to learn it for around 4 hours. That should give you an overall understanding of how to utilize Python. You can skip several points you already understand and focuses on fundamentals you haven’t understand.

Feel free to write responses, suggestions, or discussions about this and see you tomorrow.

For people who haven’t read the other chapters of the series.

Day 0: Motivation

Day 1: Environment and Python - You are here

Day 2: Data Scraping and Effective Visualizations

Half Data Engineer, Half Software Engineer