Structured vs unstructured data management
Big Data is big business – if you have the skills to manage it
Effective use of data is an increasingly large priority for businesses, with databases and documents of all kinds scrutinised for trends and information that may provide a competitive edge.
Successfully carrying out this kind of analytics is sometimes easier said than done, though, not least because of the variety of data sources out there. Minds may fly quickly to the granular level when thinking about data types SQL, NoSQL, Excel, Oracle databases and more. Initially, though, it's best to take a more high-level view and consider whether the data in question is structured or unstructured, as it's this, rather than the format in which it's held, that can have the greatest influence on how it's managed and analysed.
It's worth, therefore, taking time to consider whether your data is structured, unstructured, semi-structured or a mixture of all three, and then deciding how best to manage it.
What is structured data and how is it managed?
Structured data is often what first springs to mind when you think of both data in general and Big Data analytics.
This is the kind of information that can be stored in traditional databases composed of columns and rows and is also known as relational data. A customer database consisting of names, addresses, telephone numbers, order frequency and type, and so on is an example of structured data. Similarly, a database for clinical trials containing demographic information, whether someone is on a placebo or the real treatment, dosage and impact would also be structured data.
To an extent, by its very nature, structured data is already "managed" it's kept in an orderly fashion in a single location. Another layer of management can be added to this, however, in the form of a relational database management system (RDBMS).
These systems allow users to create, update and administer relational i.e. structured databases. The majority are written in the open source SQL language, or a variant thereof like MySQL. A notable exception is Oracle's database system, Oracle DB, which is proprietary software that's particularly popular for managing large datasets and as such is often found being used by the financial services sector.
While we won't be discussing it in depth here, it's also worth noting that an RDBMS is often embedded in products that also offer far more bells and whistles than just managing data and making it available to queries. For example, Salesforce, the cloud-based customer relationship management (CRM) platform, manages the structured data put into it, but also offers tools like chat, access to the Force.com development platform, analytics and so on. So depending on your needs, it may be worth looking for more than a bare RDBMS.
What is unstructured data and how is it managed?
Unstructured data is anything that can't be organised into a structured database. Common examples are free-flowing text-based interactions, such as email conversations or chat logs, word processing documents, slideshow presentations, image libraries, or videos.
Removing the barriers to the experience economy
Don’t just collect data. Innovate with it.Download now
While this may not look how you would imagine data to at first, it makes up over 80% of data in existence and often offers a wealth of useful information. Together with structured data, it's also one of the three Vs of Big Data variety (the other two being velocity and volume).
Unstructured data is more difficult to manage than unstructured data as it doesn't have a uniform format, even if the data source is the same. Indeed, managing it in the way structured data is managed is something of a novel idea, as it's only been feasible to mine it for information since Big Data analytics and AI have taken off.
Unstructured data management (UDM) is essential for successfully making use of all this data. Rather than there being a handful of tools to point to for UDM, there are instead some basic tenets to be followed.
Sometimes also called "discovering" and other related terms, this involves compiling your data to see what's there, how long it's existed, how frequently it's accessed and so on. The aim of this is to determine if it's likely to bring future value to the business and therefore worth archiving and putting in a UDM system.
This is a long process it can take weeks to scan and sift all this information, so be prepared to put in a fair amount of time and effort at this stage. It's also the point at which metatags should be added, to ensure that the information is easily searchable later on.
Storage and availability
With the data sorted, it now needs to be stored in a suitable location with attributes that make it easily and automatically accessible.
Storage locations include general cloud storage such as Microsoft Azure or AWS S3 or on-premise data lakes. These both allow the information to be stored in its "natural" state - which is to say there's no need to try and put it into database format - and also make it available for automated querying through APIs.
When considering which type of storage to use, it's also worth taking into account how frequently the data being stored is accessed. If it's relatively infrequent, it should probably be put into "cold" storage, which is frequently much cheaper than if it's kept readily accessible at all times - although is slower to access initially when you finally do need to query it.
What you need to know about migrating to SAP S/4HANA
Factors to assess how and when to begin migrationDownload now
Your enterprise cloud solutions guide
Infrastructure designed to meet your company's IT needs for next-generation cloud applicationsDownload now
Testing for compliance just became easier
How you can use technology to ensure compliance in your organisationDownload now
Best practices for implementing security awareness training
How to develop a security awareness programme that will actually change behaviourDownload now