Handy Command-Line One-liners for Starting Data Scientists

[6/5/2017 update: I was asked if I had a PDF version of the one-liners below. Here it is. Data-Science-One-Liners.pdf ]

Experienced data scientists use Unix/Linux command-line utilities (like grep, sed and awk) a great deal in everyday work. But starting data scientists, particularly those without programming experience, are often unaware of the power and elegance of these utilities.

When interviewing candidates for data scientist positions, I ask simple data manipulation questions that can be done with a command-line one-liner. But often the answer is “I will fire up R, import the CSV into a data frame, and then …” or “I will load the data into Postgres and then …”.

The command-line can be much simpler and faster, especially for getting large data files ready for consumption by specialized tools like R. For example, rather than try to load a million-row CSV into R and sample 10% of it, you can quickly create a 10% sample using this one-liner … (read the rest of the post on Medium )

Share/Bookmark

3 thoughts on “Handy Command-Line One-liners for Starting Data Scientists”

  1. You are so awesome! I don’t think I’ve read through a single thing like that before.
    So good to find another person with genuine thoughts on this
    issue. Seriously.. thank you for starting this up.
    This site is something that is needed on the internet, someone
    with a bit of originality.

  2. This is a very good tip especially to those fresh to the blogosphere.
    Short but very precise info… Thank you for sharing this one.
    A must read post!

Leave a Reply

Your email address will not be published. Required fields are marked *