Open source h2oGPT
Create private, offline GPT with h2oGPT
"Open foundation model developers lead the way"
According to the 2023 Foundation Model Transparency Index by Stanford University Center for Research on Foundation Models
-
Compare
-
Chat
-
Configure
Compare different LLMs
Chat with your documents
Expert settings - configure your model to your use case
h2oGPT simplifies the process of creating a private LLM
By using a local language model and vector database, you can maintain control over your data and ensure privacy while still having access to powerful language processing capabilities.
One solution is h2oGPT, a project hosted on GitHub that brings together all the components mentioned above in an easy-to-install package. It includes a large language model, an embedding model, a database for document embeddings, a command-line interface, and a graphical user interface.
It supports several types of documents including plain text (.txt), comma-separated values (.csv), Word (.docx and .doc), PDF, Markdown (.md), HTML, Epub, and email files (.eml and .msg).
Released as open source under Apache-2.0 license
What is it?
Commercially usable code, data, and models
Prompt engineering
Ability to prepare open source datasets for tuning LLMs
Tuning
Code for fine-tuning large language models (currently up to 20B parameters) on commodity hardware and enterprise GPU servers (single or multi node)
Optimizations
LoRA (low-rank approximation)
8-bit quantization for memory-efficient fine-tuning and generation.
Deployable
Chatbot with UI and Python API
Evaluation
LLM performance evaluation
The making of h2oGPT
We have recently released a research paper detailing some of the work done to create the fine-tuned h2oGPT models. We show what data and models were used in the process.
Closed AI vs Open Source AI
While popular models such as OpenAI's ChatGPT/GPT-4, Anthropic's Claude, Microsoft's Bing AI Chat, Google's Bard, and Cohere are powerful and effective, they have certain limitations compared to open source LLMs:
Limitations of Existing Models
Data Privacy and Security: Using hosted LLMs requires sending data to external servers. This can raise concerns about data privacy, security, and compliance, especially for sensitive information or industries with strict regulations.
Dependency and Customization: Hosted LLMs often limit the extent of customization and control, as users rely on the service provider's infrastructure and predefined models.
Cost and Scalability: Hosted LLMs usually come with usage fees, which can increase significantly with large-scale applications.
Access and Availability: Hosted LLMs may be subject to downtime or limited availability, affecting users' access to the models.
Benefits of Open Source Models
Lower TCO: users can scale the models on their own infrastructure without incurring additional costs from the service provider.
Flexible: Deployed on-premises or on private clouds, ensuring uninterrupted access and reducing reliance on external providers.
Tunable: Allow users to tailor the models to their specific needs, deploy on their own infrastructure, and even modify the underlying code.