June 19th, 2013

Convert DOS to Unix – Insert Tab A into Slot B

RSS icon RSS Category: Uncategorized
Fallback Featured Image

Every day as part of my 0x immersion program one of our hackers tries to explain something he is working on –  an especially beautiful bit of code or something about data science and how the mechanics of our project work, or whatever.  Every day, at least once, I am completely confused. I realize that this must be exactly how someone who has never had a statistics class must feel sometimes when we talk about analysis.

Anyhow, today I spent a shameful amount of time taking the hardest path possible to figuring out this data for a submission to Kaggle. Specifically, before I could even begin to look at the data, I had to tinker with the file. Of course it's like 50,000 observations – huge for a social scientist, small for a corporate analyst, and more geared toward small data tools than big ones. I read the file into R, hit enter, and… radio silence. If you upload the same into H2O, there is zero problem. I totally assumed the source of the issue was me (it still may be).

While H2O will inhale and parse anything, Tom taught me some handy code for converting files that were born in DOS (and for whatever random reason won't work properly on my mac) to Unix. Functioning under the assumption that not all 5 of the people who read my blog are code hackers, I'll start with the very basics.
In terminal make sure you are in the right directory – the right directory is the directory where you have  put the file that will parse in H2O, but not in R (this may go without saying, but seriously, I totally forget this on a regular basis and as a result got to learn the technical term “drop a turd” this evening).
Here's your instruction line: perl -pe 's/\r\n|\n|\r/\n/g'   inputfile > outputfiletest.  Specify the input file (the troublesome file you would like to fix), and give it a name you will recognize for outputfiletest. And voila. This has the caveat of working on DOS to UNIX, but if Microsoft isn't the source of your sadness, this probably won't work, and the aforementioned help won't help you. Even so, if I find anything else out, I will definitely share.

Leave a Reply

+
Developing and Retaining Data Science Talent

It’s been almost a decade since the Harvard Business Review proclaimed that “Data Scientist” is

May 12, 2022 - by Jon Farland
+
The H2O.ai Wildfire Challenge Winners Blog Series – Team Too Hot Encoder

Note: this is a community blog post by Team Too Hot Encoder - one of

May 10, 2022 - by H2O.ai Team
+
The H2O.ai Wildfire Challenge Winners Blog Series – Team HTB

Note: this is a community blog post by Team HTB - one of the H2O.ai

May 10, 2022 - by H2O.ai Team
+
Bias and Debiasing

An important aspect of practicing machine learning in a responsible manner is understanding how models

April 15, 2022 - by Kim Montgomery
+
Comprehensive Guide to Image Classification using H2O Hydrogen Torch

In this article, we will learn how to build state-of-the-art models in computer vision and

March 29, 2022 - by H2O.ai Team
+
H2O Wave Snippet Plugin for PyCharm

Note: this blog post by Shamil Dilshan Prematunga was first published on Medium. What is PyCham? PyCharm

March 24, 2022 - by Shamil Prematunga

Start Your Free Trial