Data Science #1: Drop Everything and Learn SQL
So, you, like many tech companies and marketing strategists throughout the world, have come to the conclusion that you should probably learn Data Science. Why? Because it is, by far, one of the most profitable and useful skill sets you can develop in an era where data is resplendent (DATA!). Are you a scientist? Are you an engineer? Are you a Taco Cart owner? Guess what - Data Science can help you become better at whatever you're trying to do, whether it's collecting beach glass on Lake Michigan or predicting the stock market.
You too can appear like a genius to your friends and family with SQL! |
Alright, next step - what's in the data science tool box and where should you start? Well, from the title of this post, you should start with SQL (but really!). This title, however, assumes many years of programming experience (in Python nonetheless), a physics BS and MS and PhD in Space Science (easy peasy), and problem solving skills. If you don't have these things, you're hot wired version is this: go learn Python (Python Start Here) for 2 weeks and make sure you're good on Calc I-II and can build some cool stuff with legos (like this!). Yep, that's it.
Now onto SQL - what is SQL? It's the Structured Query Language. OK, that's basically gibberish. Real world definition: SQL is a way to store large amounts of data and access them using relational terms like join, union, intersection, etc. SQL is powerful because it's crazy fast and it's amazing for accessing huge amounts of data that's... related! If you are entertaining a dream of working for Facebook then... you should really become a SQL ninja!
Here's a scenario where you should use SQL - let's say you have satellite data from 2 different satellites over a 10 year period. You have a hunch that both satellites have a glitch on Mondays and your satellites measure lower plasma values than they should. You have put your satellite data into 2 different tables, each table representing a different satellite. From there, you can looking for intersections on Mondays and your suspicious University and in one line of code, you have a list of of all the intersections for your analysis. It would look something like this:
SELECT plasmaData
FROM sat1, sat2
WHERE (sat1.time = 'Monday') AND (sat2.time = 'Monday');
Bam. Now you have all the plasma data on Mondays between the two lists and it took a fractional amount of time that it would have taken for you to do so with Python pickles or text files or whatever nonsense you were using before (never fear, I was there too... before). Even better about SQL is that it interfaces with Python beautifully. Look up psycopg2 (Your new best friend awaits you here!) or check out my github for ActiveMe (It isn't pretty but it works!). If you really want to see some powerful, well-written SQL (in space science context...), I really recommend code written by the DaViTpy team out at Virginia Tech (DaViTpy).
Now to get you started, the big thing this blog post is going to tell you is how to think in SQL and then I'm going to give you a bunch of links so you can go set forth and conquer the SQL world. SQL is the most formulaic language I have ever used - you think backwards, starting with what you want your end product to be and then working out the details of how you get there. For instance, you should start every SQL query by just writing down:
SELECT YOUR_DESIRED_OUTPUT
FROM A_TABLE
And that's it really - that's the structure of this language. You are selecting something from a table and printing that output. The FROM A_TABLE part is where things get crazy - you have to think in terms of breaking things down into wheres, joins, etc. But keep it simple - always start with SELECT THE THING YOU WANT FROM A TABLE! You can get some crazy complex SQL queries where you sort in order ASC | DESC, count the length of strings with '____', compare sizes, left joins/right joins, and all these incredible venn-diagram-esque things that are crazy fast and easy obtained once you put your data into SQL data bases. This makes it easy for you to process later and you know who wins then? YOU!
You SQL master you! |
OK, so those awesome links!
For a great place to start learning syntax:
http://www.w3schools.com/sql/
And, on that same site, the best beginner problems I have found yet - you can even test your solutions live if you click on the "Click me to see the solution" buttons:
http://www.w3resource.com/mysql-exercises/restricting-and-sorting-data-exercises/#PracticeOnline
For advanced practice problems (#mastery) and some good instruction, definitely this site is AWESOME:
https://lagunita.stanford.edu/courses/DB/SQL/SelfPaced/courseware/ch-sql/seq-exercise-sql_movie_query_extra/
And really, if you can answer those lagunita problems without cheating, you're ready to go for job interviews. Add SQL to your resume!
1 Comments