Migrating to Office 365 ? What is your metadata strategy ?

Are you in the process of planning your content migration to Office 365? If so have thought about your metadata strategy? The aim of this post is to highlight the importance of having a clear and consistent metadata strategy for the success of your electronic document and records management solution with SharePoint.

A large variety of documents is used today: in business, production and services. Applications, questionnaires, invoices, drafts and other documents are essential for any company. Modern information technology makes paper documents insufficient, and most data is converted to electronic form for storage, analysis and processing purposes.

Metadata is the single most important base artifact in SharePoint that which makes content actionable in SharePoint. What do I meant by actionable content ? Actionability of enterprise content refers to your ability to:

  • Your ability to make high precision content search
  • Your ability to visualize and navigate through the content efficiently
  • Your ability to route/workflow a documents
  • Your ability to apply content security on granular level
  • Your ability to discover the content fast

Unfortunately 80% of all SharePoint implementations is lacking a clear and consistent metadata strategy. Due to this reason most organizations are getting only 30-40% of the native capabilities built into SharePoint.

Where do we start? It usually starts with a network drive full of disorganised documents.

The common approach to bring this content into SharePoint is using a data migration tool migrate the documents in folders into a document library. The danger of this approach is you loose the actionability of your content. Why? You won’t have any metadata that provides the context into the content you migrate into SharePoint. This approach usually results in creating another expensive file Share or treat SharePoint just as a file repository.

Many organizations relied on end users to fill in metadata for documents. But it is proven fact that it was the least executed information policy in practice. During this post we will look at how can we work with documents smarter than harder. The challenge is not about document management anymore. It is about understanding your data at document level.

The Solution:

The most important aspect of arriving at a sustainable electronic document and records management solution (eDRMS) with SharePoint is to analyse the inbound and outbound documents at the content level.

Metadata is the is the underlying oxygen that powers most of the above great features built into SharePoint.Metadata forms the basis for Content types and managed properties. Content types and manage properties forms the basis for the design and formulation documents libraries, rule based collaborative workflows, record management and archiving policies and enterprise search strategy etc.

This is the foundation for designing a effective ECM solution to manage the document lifecycle. When you go to the content level the business documents can be broadly categorized into 3 types.

  1. Structured Documents
  2. Semi Structured Documents
  3. Unstructured Documents

Structured documents: Documents with dedicated data fields that remain constant in quantity, position and formatting throughout the document copies are called structured. These forms are often issued in printed form for filling by hand.

Semi-structured documents: Documents with data fields that differ in quantity, position and formatting from copy to copy are called semi-structured or flexible. Invoices are an example of this type, because they are often different in the number of items and formatting, for they are issued by different companies. An supplier invoice is a good example. All invoices include an account number and the amount of payment, but these are located in different parts of the document.

Non-structured documents: non-structured documents present information in free form, for example contracts, letters, orders, diagrams. Non-structured documents can be automatically identified as supplements to structured or flexible documents, and then exported to image and searchable PDF files. Index fields can be captured from non-structured documents automatically or manually. A typical non-structured document processing scenario would be converting a paper archive to electronic form, with capture of a couple of index fields required for attribute-based search.

70% of Business to Business (B2B) documents can be categorized under Structured or Semi structured type. Here are some specific examples by industry

Healthcare: patient records, medical prescription forms, clinical research surveys Accounting: invoices, purchase orders, Payments slips

Human Resources: job applications, personnel forms, curriculums, contracts Banking and Finance: deposit notes, credit application, contracts, mortgage documents, loan applications Insurance: insurance claim and benefit forms, policy agreements Transportation: shipping documents, receipt notifications, bills of lading, consignment notes

Education: examinations, voting ballots, student surveys, registration forms Government: tax forms, contracts, voting ballots

The most labor- and time consuming thing about electronic documents was data input into SharePoint. It could only be entered by hand, which was reasonable with a small amount of information. However, this doesn’t work well with large document volumes. The speed of manual entry cannot be momentarily increased when the situation so demands. Manual data entry is not the optimal way.

Its alternative, a simpler and more effective way, is using an automatic data capture system.

The modern OCR and document data capture software can process structured, semi-structured and non-structured documents with minimal knowledge worker intervention and has seamless integration with SharePoint. When such a tools is correctly integrated with your SharePoint farm you will look at SharePoint as a operational cost cutting platform that drive all your inbound business documents rather than a simple file repository to store your word, excel, and PowerPoint files

Automatic data capture consists of the following stages:

1. Scan: Documents are is scanned/captured using a document scanner or mobile device such as iPhone,iPad/tablet devices. Electronic documents that you already have in files shares does not require this step.

2. Auto classification and Indexing: The scanned documents are then sent to an intelligent document recognition software which will identify the document and extract its metadata based on business rules that you defined. Uncertainly recognized characters are sent to a operator for checking (verification)

3. Export to SharePoint: Confirmed data is finally auto exported to workflow enabled SharePoint document libraries as searchable PDF file with required metadata extracted and validated


You success:

The days of relying on your end users to input metadata into SharePoint is becoming a thing in the past. With right tools and process in place you can work smarter than harder with documents in SharePoint.

Having a solid metadata strategy with equipped with automation tools is a key for your success in gaining the business value out of SharePoint. The aim should be to align SharePoint to your departmental document centric business processes. It is so far the most proven method of getting a faster Return On Investment (ROI) from your SharePoint implementation. In doing that, you will

  • Create Faster Processes
  • Increase the business bottom line by reducing operational costs
  • Create process transparency
  • Increase environmental control
  • Improve data quality
  • Ensure information assurance and regulatory compliance

So what is your metadata strategy ? Whether you are planning to migrate from file shares to SharePoint or migrating from previous versions of SharePoint such as MOSS 2007 to SharePoint 2010 or SharePoint 2013 it is a good time to rethink about your metadata strategy and get it right.