Travel Diary of a Data Scientist: Getting a Grip on the Tools

Resources I have been exploring:

The Quandl API for grabbing copious amounts of clean, good, useful data (free and paid) via the Quandl API and brining it right into Python as a DataFrame.

Listening to the "Talk Python to Me" podcast series, which is a great resource for hearing how experienced programmers think, how they talk to each other, and how they see the world.

Update: Finally decided to plow through Wes McKinney's "Python for Data Analysis." I admire his intelligence, dedication and entrepreneurship in writing Pandas for Python. I've been avoiding really going through this book for 3 years. Time to take it seriously and dive in.

Key Concepts:

import json
path = 'path1/path2/file.txt'   ## assigns path to a variable
records = [json.loads(line) for line in open (path, 'rb')] ## loads the records, line by line, from path (not sure what 'rb' means/does)

records[0] ## reads the first full record, outputs multiple lines
records[0]['some_column_name']   ## displays one column of first record, including unicode
print records [0]['some_column_name']   ## displays one column of first record, sans unicode

from pandas import DataFrame, Series ## imports the modules DataFrame and Series from the Pandas package

import pandas as pd

frame = DataFrame(records) ## converts the records in the variable "records" to a data frame, assigns the new data frame a name called "frame"

frame.info()  ## yields the summary statistics view, namely number of total records, and non-null counts of each column/attribute

We need to import these in order to properly plot with matplotlib:

import matplotlib
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Travel Diary of a Data Scientist

Tuesday, January 31, 2017

Getting a Grip on the Tools

No comments:

Post a Comment