June 24th, 2013

Saving Big Data Science is Saving Science

RSS icon RSS Category: Uncategorized [EN]
Fallback Featured Image

For time is the ultimate non-renewable resource!

Data Science represents the convergence of Domain knowledge, Data Collection and a series of hypotheses validated or invalidated by use of Math. And Big Data Science takes that one step further into the realm of massive datasets that become necessary and pre-condition in Science and Business. So the business of making this work is the business of doing science. From the scientific process to decision making & solving business problems. Removing the drudgery of Data Science is freeing the brightest minds of our time to ask BIG questions & refine their hypotheses a dozen times a minute. Just like the act of search by Google made each of us use the world of information better than ever before. Big Data Science is set to Change the world in so many dimensions!
(reprinted with permission from the experiments of our math geek!) This is emblematic of issues with state of R, the lingua franca of data science.

---------- Forwarded message ----------
From: Irene Lang <irene@0xdata.com>
Date: Mon, Jun 24, 2013 at 2:44 AM
Subject: Re: more workloads/datasets>
To: SriSatish Ambati <srisatish@0xdata.com>

OK. I need your help with this, please.
I can no longer run even really moderately sized datasets on my laptop. For example – I tried running a straightforward glm validation on a 2MB dataset after generating a model using a training set, and I finally gave up at about 2:30AM after letting it run all afternoon because I needed to get something done today.
Doing this sort of thing on my machine is slowing me down because I don’t have the memory to do anything quickly, and it means that I end up taking hours to run a single test – limiting my ability to play with the data in any meaningful way. It also means that I’ve effectively paper-weighted my computer for several hours, which is kindof painful in terms of getting other tasks done while tests are running – because apparently my whole memory is in use, other functions are nill. So, I know the limitation is my computer and not R, and I know that there has to be a relatively straightforward solution to this. I imagine that if I could run R either on the server, or on amazon web services, I don’t have to worry about immediately replacing my computer with one that has more memory (which isn’t my first choice, because this one is perfectly good, otherwise.)
So, this experiment will now continue onto EC2 and a bigger server – However the problem is not her computer..

Leave a Reply

A Brief Overview of AI Governance for Responsible Machine Learning Systems

Our paper “A Brief Overview of AI Governance for Responsible Machine Learning Systems” was recently

November 30, 2022 - by Navdeep Gill, Abhishek Mathur and Marcos V. Conde
H2O World Dallas Customer Talks

After three long years of not having an #H2OWorld, we finally held our first one

November 24, 2022 - by Vinod Iyengar
New in Wave 0.24.0

Another Wave release has arrived with quite a few exciting new features. Let's quickly go

November 21, 2022 - by Martin Turoci
Fallback Featured Image
H2O.ai Raises $40 Million to Democratize Artificial Intelligence for the Enterprise

Series C round led by Wells Fargo and NVIDIA MOUNTAIN VIEW, CA – November 30, 2017

November 20, 2022 - by
H2O.ai Placed Furthest in Completeness of Vision in 2021 Gartner Data Science and Machine Learning Magic Quadrant in the Visionaries Quadrant. — Copy

At H2O.ai, our mission is to democratize AI, and we believe driving value from data

November 18, 2022 - by Read Maloney, SVP of Marketing
H2O.ai Expands Market Footprint in Healthcare AI by Signing Hackensack Meridian Health and Other Key Providers

We’re excited to attend the HLTH conference this week in Las Vegas, NV. This industry

November 14, 2022 - by Prashant Natarajan

Start Your Free Trial