Return to page

H2O.ai WIKI

Structured vs Unstructured Data

Structured Data

Structured data is data in a standardized format, has a well-defined structure, complies with a data model, follows a continual order, and is easily accessed by humans and computer programs. Structured data is often stored in a relational database (RDBMS).

 

Unstructured Data

Unstructured data does not conform to a model, has no identifiable structure or organization, and cannot be stored in any logical way. Unstructured data has no format or rules and are stored in non-relational databases, or NoSQL, databases.

 

Common Characteristics of Structured Data

Structured data tends to have a range of common characteristics, such as:

  • An identifiable structure that conforms to a data model

  • Presented in rows and columns, such as in a relational database

  • Organized so that the definition, format, and meaning of the data are understood

  • Fixed fields in a file or record

  • Similar groups of data clustered together in classes

  • Data in the same group have shared attributes or types

  • Information is easy to access and query for humans and other programs

 

Why Structured Data is Important in Machine Learning

Structured data is more easily used by machine learning algorithms. Organized “structured” data used in machine learning algorithms is easier for the algorithm to understand when compared to unstructured data. It also allows for easier manipulation and querying of the data. 

An additional benefit of structured data is that it can more easily be used by average business users who have high-level knowledge of the data topic. This removes the need for an in-depth understanding of different data relationships. 

Structured data has a long history of use in comparison to unstructured data. This results in more tools built for structured data analysis giving data managers more product choices when compared to tools built for unstructured data. 

 

Structured and Unstructured Data FAQs

Is Social Media Structured Data?

The metadata (post id, hashtags, user, date, comments, likes, share counts, etc.) from social media is structured; the content itself is unstructured.

What Are Examples of Unstructured data?

  • Email

  • Text files

  • Social media content

  • Text messages

  • Voicemails

  • Instant messaging

How is Unstructured Data Stored?

There are numerous ways that unstructured data can be stored, such as:

  • Application forms

  • NoSQL databases

  • Data lakes

  • Data warehouses

What Tools are Used to Analyze Unstructured Data? 

  • Excel and Google Sheets

  • RapidMinder

  • KNIME

  • Power BI

  • Tableau