Backup vs Archive
Data backups are usually risk mitigation measures in the event of a technical production issue, such as hardware failure, data corruption issue, accidental deletion, etc. Backups follow a cyclic process, and contain periodic copies of production data that can be restored at any given time in the event of an issue. They typically represent short to mid-term data storage and restore opportunities, data which will overwrite itself after the retention time is reached.
Backup applications tend to keep data in a proprietary format, and should not be considered as long-term data retention and protection solutions.
Backups are also related to data migration, data cleansing, technology upgrades, and are most likely to be part of “disaster recovery” (DR) strategies, including full and partial and incremental backups. However, DR approaches are not associated solely with data backup (and vice versa); many other elements must be taken into consideration, such as people, process, tools and technologies (…)
Data archives are similarly important, however they serve a complete different purpose: with ever growing production data, some information is not required anymore, or at least not needed on a regular or daily basis. Archives represent the ‘migration‘ of finished data that has reached it end-of-life and can be ‘shelved‘. Data might require to be kept for legislation purposes or in the unlikely event that a future project might need to retrieve archived data. With current low costs of storage, it makes sense to archive this type of used data which can then be kept for long term period of time.
Being able to search a data archive is vital, for business and compliance reasons, and especially when a formal legal search may impose penalties against late submission of legacy information.
From risk mitigation to effective technology deployment and maintenance
If not planned and tested extensively, restoring production data that has been backed-up can be a complex and expensive process, due to the fact that it might imply selective restoration, impact production users and their ongoing work in progress data. Typical backup strategies must carefully consider the restore process and multiple scenario to minimise production impacts.
Restoring or retrieving archived data is typically less impacting to production systems because it does not require the same level of urgency. This is assuming that it is possible to restore such data as they might be of unsupported format or from an unsupported system. Typically archive strategies must include neutral format support or the ability to consult the data easily and relatively quickly.
From a business perspective, archiving strategies are very important because they help focus on what is important (also recognise what is important):
- It is not required to carry on maintaining legacy data forever; as business rules change or evolve over time, keeping that legacy data up-to-date is very expensive.
- Legacy data contributes to complex (and costly) data migration or synchronisation which can bring business resistance to change.
- Archiving transaction-driven data is not as complex as archiving relational data, however there are ways to maintain some traceability and searchability without impacting production system performance; it is required to understand the logical data model and dependencies in defining archive strategies.
- Data associativity must be maintained for production data, but not for end-of-life legacy data.
- New technologies bring new ways to search, analyse and retrieve information, including legacy data.
- Selecting and implementing a new IT solution must be coupled with a valid archive and restore strategy to avoid carrying legacy issues to the new solution; many new Enterprise Resource Planning (ERP) and Product Lifecycle Management (PLM) system introduction struggle with deployment due to (among other things) inadequate or limited archiving and decommissioning strategies – which led to ineffective integration and data migration strategies (such as ‘integrate all‘ or ‘migrate all‘).
- Legacy data and processes might carry gaps and quality issues that might be barriers to archive and restore strategies.
In any cases, it is also important to consider data access security / protection once it is archived or backed-up (outside of its native system and format) and during the restore process itself.
Considerations related to Engineering data
Managing engineering data is not like any other data management activity: it requires understanding of the core enterprise data model and decision making process, combined with the ability to deal with creative design work, engineering virtual and physical correlation and validation, integration of silo-ed disciplines and technical domains, dealing with technology and process integration, etc. As the backbone of product creation, design and engineering data is (or should be) closely interconnected to enterprise data.
In order to define successful data backup strategies, it is necessary to understand the business logical data model, and the underlying IT tools and application physical data models, the data dependencies and relationships. Full back-up will require capture of multi-dependencies across possibly various databases and repositories, multiple Bills of Material (BoMs) systems, including alignment with multiple interfaces and integration processes… Incremental horizontal data backups might then be very complex due to the above. Data alignment and quality verification tools might be required to assess data consistency. In addition to that, engineering data is not only metadata, but regroup also technical publications, ECAD and MCAD data, multi-CAD and CAE data, with multi-directional dependencies and associativity.
As far as data lifecycle is concerned, there are multi-dimensional dependencies to manage as data matures and gets re-used or instantiated in design and engineering context; upon archiving, relevant data placeholders must be defined to maintain metadata consistency. Also, archiving certain files might require the implementation of data visibility, security and access mechanisms. Archiving and restoring data on-demand might require the re-alignment or re-connection of data that was already released and cannot be modified any further without impacting data integrity and consistency.
Archive and restore mechanisms must accompany engineering data migration and integration, in combination of technology introduction and legacy decommissioning approaches. Backup, archive and restore strategies must be part of every product development, PLM and IT strategies, hence part of every engineering strategies to manage the product realisation lifecycle.
What are your thoughts?
This post was originally published on LinkedIn on 21 November 2016.