Customize and deploy open source AI models, create your own digital assistants and business GPTs.
Open weight small vision-language models for OCR and Document AI
Assess the performance, reliability, safety, and effectiveness of RAG and LLM-based applications.
By H2O.ai Team | minute read | July 09, 2013
Building A TB-Scale Math Platform
Datasets have gotten to PB-scale, but the modeling you can do has been limited to a single-node (e.g. R, SAS) or stuck inside the database or takes hours on Hadoop-like technologies. We have built a simple clustering package, and are using it to do distributed analytics on the sum of all ram in a cluster.
This talk focuses on how the clustering technology, plus a Java-based vector math API, is being used to build full algorithms like GLM/GLMNET, Random Forest and K-means. These algorithms are complex multi-pass programs and traditional distributed programming models expose the distributed boundaries making the algorithms hard to reason about. We have a basic JDK for doing at-scale math, we can run most Plain Olde Java in (distributed) inner loops, communicate via a K/V store with exact Java Memory Model consistency (not lazy consistency). Adding more cpus makes these algorithms run faster, and adding more ram allows larger datasets. We are bringing back Moore’s Law!
Cliff will be presenting.
At H2O.ai, democratizing AI isn’t just an idea. It’s a movement. And that means that it requires action. We started out as a group of like minded individuals in the open source community, collectively driven by the idea that there should be freedom around the creation and use of AI.
Today we have evolved into a global company built by people from a variety of different backgrounds and skill sets, all driven to be part of something greater than ourselves. Our partnerships now extend beyond the open-source community to include business customers, academia, and non-profit organizations.
Make data and AI deliver meaningful and significant value to your organization with our platform.