Merging structured and semi-structured data models gives you the flexibility to decipher and display data in a number of ways that best represents what is being analysed. It gives you the best of both worlds, high-performance and navigation, flexibility and scalability. Before we explain how this combination works though, it is important to understand the definitions of both structured and semi-structured data and therefore the differences between the two.
Here are the basics. Fully structured data follows a predefined schema. A typical example for fully structured data is a relational database system. Designing a database schema is an elaborate process, because a schema has to be defined before the content is created and the database is populated. The schema defines the type and structure of data and its relations. The pros? High performance and easy navigation. The cons? Structured data lacks flexibility and scalability.
Here is a more technical breakdown of structured data:
Structured data sits in columns, it has to be defined in the data source schema. Each object has a well-defined list of properties (columns) where data is collected.
- Object evolution requires schema evolution. Adding more properties means adding more columns.
- New objects need more tables that define the new object exactly.
- Defining multi values mean restricted schema definitions. One to many and many to many table relations.
- Relations between objects are well defined and have to evolve with the schema.
- Data is more compressed and efficiently stored.
- Storing and retrieving data is faster and more efficient.
On the other hand semi-structured data does not require a schema definition, but this doesn’t mean that you can’t define a schema if you wish to. It is merely optional, where it is required with structured data. The schema can also be defined by pre-existing instances. The typical example of semi-structured data is XML. The pros? Flexibility and scalability (Semi-structured data is much easier to work with, less time consuming, and you can build off of it) The cons? Not as efficient (queries are less efficient than in a constrained structure)
Here is a more technical breakdown of semi-structured data:
Semi structured data is loosely defined. An object has no defined list of column or values.
- New objects can be added without schema changes, object are defined without a need to add new tables or schema changes.
- Any objects can have any number or properties and is not constrained to by object type, i.e. each object type can have different list of properties
- Any object can have any number of multi values that in turn have a complex structure too. There is no restrictions on the number of nested properties or collections and the level of nesting (i.e. how many sub levels)
- Defining relations between objects is purely a matter of referring one object to another by reference value, there are no schema changes
- Data is stored in less tables but there is more overhead maintaining the complex structure
- Serving (storing and retrieving) objects (build time) is a more complex operation
Combining Structured and Semi-Structured Data Models
This will yield a model that has some defined columns (structure) as a base with an extension data that is collected on the fly from various tables/sources.
- Fast access and indexing on the structure part
- Allow evolution for model without schema changes using the semi structured part
- Adding new generic objects to the mix and linking them to the structure or semi structure part.
- Data is stored in a mix of data tables
- Serving objects can be split to the fast structured side with the additional cost for the semi structured side as an option
As you can see data modeling for performance and for flexibility sometime doesn’t work simultaneously, however, investing in a design that can combine the best of the two can be very beneficial for your business and data systems. Contact us to get more information and ideas on how advanced data model designs can work for your business.