Differences between import and synchronization

This section explains the different strategies that the import and synchronization operations use to handle data and the scenarios in which you can best use them.

Import

The import operation is idempotent. You can re-import the same data multiple times and the Import API will only make the necessary changes to update the database to the state provided in the input file. This means that importing the same data for the second time will be faster because previous changes will not be applied to the database. However, the Import API will check every attribute and relation of the assets from the input file.

The import operation only updates and creates new assets. It does not delete them.

Synchronization

The synchronization operation is intended to be used when you want to replicate the state of the external system, for example a physical database schema. Typically you perform this operation on a regular basis. You collect the metadata from the external system, such as database tables and columns and you upload the full schema to Collibra. The Synchronization component will make sure that Collibra data replicates exactly the external system. The difference between import and synchronization is that new assets will be not only added and existing ones updated, but that assets removed from the external system will be removed from Collibra. Another difference is performance: synchronizing the same data for the second time is faster than the simple import because Collibra stores the hash of every synchronized asset and uses it to decide if an asset needs to be updated. This is faster than comparing each attribute and relation individually.

Usage recommendations

If you need to import data once, use the import operation because the initial import is slightly faster than an initial synchronization and Collibra doesn’t need to store asset hashes in the database.

If you want to update only a subset of your existing assets, use the import operation because a synchronization will delete all the assets not present in the last input.

If you want to replicate the state of an external system on a regular basis and perform both updates and deletes, use the synchronization operation.