Resources I have been exploring:
The Quandl API for grabbing copious amounts of clean, good, useful data (free and paid) via the Quandl API and brining it right into Python as a DataFrame.
Listening to the "Talk Python to Me" podcast series, which is a great resource for hearing how experienced programmers think, how they talk to each other, and how they see the world.
Update: Finally decided to plow through Wes McKinney's "Python for Data Analysis." I admire his intelligence, dedication and entrepreneurship in writing Pandas for Python. I've been avoiding really going through this book for 3 years. Time to take it seriously and dive in.
Key Concepts:
import json
path = 'path1/path2/file.txt' ## assigns path to a variable
records = [json.loads(line) for line in open (path, 'rb')] ## loads the records, line by line, from path (not sure what 'rb' means/does)
records[0] ## reads the first full record, outputs multiple lines
records[0]['some_column_name'] ## displays one column of first record, including unicode
print records [0]['some_column_name'] ## displays one column of first record, sans unicode
from pandas import DataFrame, Series ## imports the modules DataFrame and Series from the Pandas package
import pandas as pd
frame = DataFrame(records) ## converts the records in the variable "records" to a data frame, assigns the new data frame a name called "frame"
frame.info() ## yields the summary statistics view, namely number of total records, and non-null counts of each column/attribute
We need to import these in order to properly plot with matplotlib:
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
The Quandl API for grabbing copious amounts of clean, good, useful data (free and paid) via the Quandl API and brining it right into Python as a DataFrame.
Listening to the "Talk Python to Me" podcast series, which is a great resource for hearing how experienced programmers think, how they talk to each other, and how they see the world.
Update: Finally decided to plow through Wes McKinney's "Python for Data Analysis." I admire his intelligence, dedication and entrepreneurship in writing Pandas for Python. I've been avoiding really going through this book for 3 years. Time to take it seriously and dive in.
Key Concepts:
import json
path = 'path1/path2/file.txt' ## assigns path to a variable
records = [json.loads(line) for line in open (path, 'rb')] ## loads the records, line by line, from path (not sure what 'rb' means/does)
records[0] ## reads the first full record, outputs multiple lines
records[0]['some_column_name'] ## displays one column of first record, including unicode
print records [0]['some_column_name'] ## displays one column of first record, sans unicode
from pandas import DataFrame, Series ## imports the modules DataFrame and Series from the Pandas package
import pandas as pd
frame = DataFrame(records) ## converts the records in the variable "records" to a data frame, assigns the new data frame a name called "frame"
frame.info() ## yields the summary statistics view, namely number of total records, and non-null counts of each column/attribute
We need to import these in order to properly plot with matplotlib:
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
