Collibra Connect sizing guidelines

General sizing guidelines

  • Only the number of CPU cores will have licensing implication. Memory and storage have no impact on license cost.
  • The required number of cores is usually determined by the throughput requirements. As a general guideline, we recommend 2 CPU cores per 100 transactions per second (TPS). A transaction in this context is a record being processed. The number of CPU cores should always be increased per 2. The CPU class should be Intel Xeon 2+GHz or equivalent.

    1. You run an integration once a week, but you have only a 1-hour window to process all records. You have estimated 400,000 transactions, then using the 2 CPU cores per 100 TPS rule, you should opt for 4 CPU cores.
    2. You run integrations near real-time (24x7), but only have a few hundreds records per hour. You could opt for 2 CPU cores.
  • For server memory:
    1. First, estimate the JVM heap size. As a general guideline, we recommend 4 GB + maximum size of payload (records) that need to be kept in memory, based on the requirements.

      If the largest payload size is about 4 GB, then the total JVM heap size should be at least 8 GB.

    2. Then use the JVM heap size to estimate the server memory size. This is assuming the server is dedicated to run Collibra Connect and no other enterprise application servers, for example Tomcat for MMC.
      • For Linux/Unix, the server memory size should be the JVM heap size times 2.

        If the JVM needs 8 GB, the server should have at least 16 GB.

      • For Windows, the server memory size should be the JVM heap size times 2 + another 1 GB for overhead.

        If the JVM needs 8 GB, the server should have at least 17 GB.

    3. The JVM heap size is set in wrapper.conf, must be set during installation, and best practice is to set initial and max the same.
  • For server storage:
    1. Storage size should be at least 4 GB + JVM heap size.

      Using the above figures, the server should have disk space for at last 12 GB.

    2. Server storage should be SSD-class or equivalent.

These rules are for general use cases with normal operations and expectations. If you need very low latency, then you need more cores. If you expect extra wide records, then you need more memory and storage.

Special use cases

High availability

High availability implies an extra cost for the additional active cores. The licensing cost is based on number of active cores. Please be aware that the Clustering license is not part of the OEM license. Customers can still run multiple nodes, just not with clustering enabled. For customers who need the clustering capability (e.g. for Zero Message Loss), they need to acquire the cluster license from MuleSoft directly. Customers with active-passive setup only need to license the active number of cores.

Zero message loss

If you need zero message loss, you have two options:

  • You request a clustering license from Mulesoft and enable this feature.
  • You can use a message broker, for example ActiveMQ, and implement persistent queuing and transaction in the integration pattern. The message broker must support high availability and be configured as such.