Chocolate Cake (Wednesday, June 5, 2013)Ā Ā
Ā You know how sometimes you have one bite of really good chocolate cake, or a really amazing peach and totally assume that you could eat another 30lbs of whatever without regard for good manners or physical limitations?Ā Yeah. Decreasing marginal returns dictate that it almost always turns out that the last bite isnāt as good as the first one ā having a little and having a lot are different.
Similarly, ingesting 1000 bytes of data and 1 byte are pretty different, and when youāre used to little bytes and start fooling around with the big ones the differences might not be immediately obvious or intuitive (maybe they are, and if thatās the case ā awesome! Now go eat your cake).
When Iām trying to make sense of a problem I like to start with a small example and work through it to get a feel for the mechanics. With Big Data this gets a little weird, since weāre almost always mining, so we donāt always know well what to look for or expect, and because we need some intuition for how to get from the small to the big.Ā To help that, I am trying to build some intuitive explanations.Ā You can look at them topically under the posts beginning with header āBig vs. Littleā¦ā
Sometimes it is the case that using H2O to look at small data sets really makes no sense for whatever reason. In those cases weāll talk about why, and Iāll use R for comparison. Iāll also provide you with relevant output for each (so that you can see how to get from one to the other). If youāre not familiar with R go hereĀ .Ā Additionally, itās worth mentioning that Iām tackling one set of assumptions at a time, so in general Iāll work as though we are going through some ad-hoc analysis instead of post-hoc analysis.Ā There are some super cool differences between mucking vs. mining, but I want to talk about those separately.