PLANNING CONTENT MIGRATION FROM EDRM AND ECM PLATFORMS
Author: Jonathan Stuckey
Audience: Technical consultant, Solution designer, Information manager, Project manager.
Managing issues in content migration
As part of a migration project everyone focuses on the act of moving content - mostly without understanding what's actually involved. This article is one in a series looking at the practical tasks associated with the actual movement of content in a legacy platform migration to SharePoint Online.
NOTE: This article uses the terms 'Data' and 'Content' interchangeably as all topics covered below apply to migrating structured data, as well as unstructured and semi-structured content. In modern content management systems and data platforms, the same requirements, disciplines and controls are required, and we therefore do not differentiate.
There are a range of issues to manage with migrations, regardless of platform. The core information focus in any migration is covered below, but it is important to remember successful migration projects focus on much more than the just moving content and Guidance on other aspects of these kinds of projects are covered in other posts.
Data availability
For migration planning, once have basic reporting on size, volumes, growth-rates etc, then planning migrations is hugely dependent on Data Availability i.e. when, and how can manage exports and deployment.
With large volumes of data / content to migrate there is planning required around managing export, and imports. Extracting data from legacy system is often dependent upon internal access to content, how-often current and dormant content is referenced, if system or service jobs need to be suspended, or worked-a-round for export. On the flip-side Microsoft throttles large volumes of data being uploaded without proper scheduling and utilising available options for staging and bulk-loading via designated APIs.
Mapping out the usage profile(s) for access to data, and co-dependant task scheduling or management becomes a significant step to move. If you have large volumes (x0s TBs - Petabytes) this is a significant step in planning export, staging content, design for new structure and upload management.
Access and permissions
Ensuring that users retain appropriate access while maintaining data integrity is crucial. The key to planning the process involves:
mapping roles and role-based access,
aligning AD/AAD (Entra) security groups to these roles,
translating legacy system-specific permissions to their closest equivalents in SharePoint Online.
The complexity increases with significant volumes of uniquely permissioned content in the legacy environment, necessitating a thorough data clean-up to match roles to new role-based access types and equate unique permission profiles to target permission levels. Addressing these issues is essential to ensure seamless data access and integrity during the migration.
Data integrity
The effort required in ensuring data integrity in migration can often be hindered when you can have variable quality of data in legacy system. Issues related to
volume of items which may be corrupt in source
numbers of items which do not meet minimum upload requirement
naming and properties of the source content may have ineligible characters or format types
potential for paths and name-lengths to exceed the software boundaries
etc
Scanning and reporting on data state before, during migration and post migration will have significant benefit to planning and mapping for automation rules, but it also provides the 50% of the data needed for post-migration reconciliation and reports.
Reconciliation may have to offset migration numbers against files with no value (unrequired), no valid content (corrupt, zero in size), or with System specific features impacting counts of files between systems - in order to provide a valid dataset content in the target store. Working with partial, inaccurate or corrupt data will stop a migration in its tracks. Worse still it provides (potentially) false-reporting on sign-off.
Metadata
Identifying metadata required is critical path activity for the new schema and setting up the migration process so users and tools can accurately (and reliably) locate content by original reference information as well as new platform data. The general model of information design we represent metadata as core-pillar, but only 1:
Metadata critical to the new user experience in SharePoint has to enable backwards referencing to the old platform, while not preventing | crippling the capabilities in SharePoint Online and Microsoft 365. Key mapping becomes the legacy properties (ECM, EDRM) and how they are captured, and injected to the content as it is added to the new structure. See section on Structure.
Data history and audit
Depending on if you are migrating from a legacy Document and Records Management system, Enterprise Content Management or even just file-servers, there will be decisions on what and how-much from the Data and Content Audit and history can or should be retained.
EDRM and ECM systems have content versioning and history features that file-servers don't, and they usually have some degree of Audit tracking. So the focus here is:
understand source system's storage architecture and how content is managed in the structure
determine if there's a means to export or extract item and associated versions (sequenced),
captured the history of the item exported - including changes to metadata, access
Structure
First step with structure is understand source system's navigation and physical store model. This is key in evaluating the existing structure for embedded metadata, which can inform new design or could be captured as metadata, and for potential issues with the target SharePoint limitations and boundaries.
Without addressing these known constraints direct replication to SharePoint will cause issues. These structural blockers are used to inform pre-migration clean-up and the migration automation rules.
Important to identify and address before migrating to the new system are:
The depth of the navigation hierarchy (no. of subfolders)
Path lengths where could exceed the SharePoint boundary limit (400 characters), or segment limit (255 characters)
Content file-name length doesn't exceed the maximum (256 characters)
Path containers (folders) only have supported characters e.g. no macrons
Individual file upload size does not exceed the maximum limit for migration
System specific features
All EDRM and ECM platforms have specific features and functionality which differentiate them in market. In some cases these are the key for a specific business process or task management, but for most organisations we see they support the edge-cases of user requirements or they are gimmicks which haven't been adopted. These are unique to the platform and organisations use, consequently we have to tackle these as 'exceptions'. In order to support content migration your project will need to repeat the process of:
identification of features deployed,
its usage and importance,
minimum bar required for support in new platform,
solution option development
testing for migration
...and agree best approach with the customer based on this output. For example:
Documentum has a content-type concept of Virtual Document, which allows the user to combine document content of multiple formats through its life and treat all of them as a single document i.e.
A report may start as research notes captured in RTF file,
we capture some data for modelling and to generate tables or charts for the report but these are in XLSX file,
we merge content from both into the official Work report template as DOCX and
finally export it as a PDF for distribution.
In SharePoint these are captured and managed as separate documents, only related by the users' addition of optional metadata or retaining copies in the same location (library, folder).
The goal is to find a way to export from Documentum the items and keep the association both in terms of metadata and the virtual document references, so when we load all the different formats to SharePoint they are grouped together both physically (e.g. in documentset) and with metadata properties.
There are many such examples depending on the specifics of the legacy system. Common ones to be addressed include:
linked documents - sometimes called a soft-link, or redirection, where a document may have many references in the file plan or system structure which point to 1 physical file, but all the references look like the actual item. Often item cannot be removed or sentenced until all references to linked document have been removed.
external documents - sometimes called remote document or item - this is content usually rich media like video files, which each item may be of extreme physical size, held and managed on external storage like file-server. The lifecycle, access and management metadata is via the Legacy platform. The physical file is actually elsewhere.
physical files - reference information pointing to actual historic physical files and archive boxes. Usually pointing to reference in external managed storage or archive facility. This content is usually just a singular entry of metadata and linked references per physical item
...
There are loads of these in EDRM systems, and ECM systems will also hold reference for webpage, blog, wiki, streamed-media and other assets as well.
All need to be identified as type, process verified, tested and confirmed prior to migration.
What's next in the migration
During the process we are gathering important information which will inform design in the new world and direct the user and stakeholder engagement in making decisions to manage the issues. The identification of the issues, collection of the decisions and key design requirements in the future Information Architecture are natural flow-on from the above.
Second, ensure you have appropriate, robust and comprehensive tooling (not just PowerShell) as the different vendors who've created the various legacy platforms over the years have used a multitude of development models and technologies and many of them do not provide simple access for export. At Spoke we undertook an industry review a few years ago and settled on Tzunami migration suite as our preferred option.
To determine which tools you need, and how best evaluate which is most appropriate for your project, we have a model that enables organisations to short-circuit the process.
Need help with content migration? Give us a call.
If you want to talk about migrating from legacy Document Management or ECM systems to Microsoft 365 and SharePoint, without the running into all the roadblocks contact us at: hi@timewespoke.com
About the author: Jonathan Stuckey
Comments