When to (and when not to) use Replicated tables in Synapse SQL Pool?

As we know, Azure Synapse Analytics has an MPP architecture, which means there are multiple nodes which can process queries in parallel. Another important concept is table distribution. There are various table distribution options suited to different use cases. We have already discussed Synapse table distributions in a previous post.

Replicated tables are replicated across every node i.e. every compute node has a copy of all the rows of the replicated table. While this sounds very convenient, there are various considerations for choosing replicated tables over others distribution types (round-robin, Hash-distributed)

Synapse SQL Pool : Replicated Table

Considerations for using Replicated Tables:

  • Table size < 2 GB: Microsoft recommends that the table size on disk should be less than 2GB, irrespective of the number of rows. To check the disk size of the table the DBCC PDW_SHOWSPACEUSED command can be used.
  • Table used in Joins: If the table is used in Joins, especially with a Hash-distributed table. Using replicated table instead of Round-robin distributed table, would reduce the amount of data that needs to be transferred between nodes, thereby increasing query performance.

Considerations for NOT using Replicated Tables:

  • Frequent DML Operations: Frequent Data Manipulation Language Operations (Insert, Update, Delete) require rebuilding of the replicated table, thereby resulting in slower performance
  • Frequently scaled SQL Pool: Scaling of SQL pool would change the number of compute nodes and require rebuilding of the replicated tables.
  • Few table columns accessed: If only a small subset of columns is being accessed from the table, then replicating the whole table may not result in better performance. A better option in this scenario, is to distribute the table using round-robin / hash distribution and then create an index that references the most frequently accessed columns.

Microsoft also recommends converting existing round-robin or Hash distributed tables to replicated tables if they satisfy the criteria listed above.

Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/design-guidance-for-replicated-tables

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: