Generating a unique identifier outside of SQL and persisting it…

I’ve been thinking about a question from a application developer regarding generating a unique identifier outside of SQL and persisting it in SSQL. I have done a little research into it and have found 2 problem with GUIDs:
1. GUID is a 16-byte type while INT is a 4-byte type. With storage nowadays very cheap, this normally is not a problem from the storage point of view. The problem is that reads (and writes) will be multiplied by 4x.
2. The most serious problem is when you use the GUID ID as the clustered index of the table. In this case, with every INSERT you would be changing the layout of the data potentially having to move many data pages. This is unlike IDENTITY INT clustered indexes where data is stored sequentially providing the best performance.
However I have come up with a solution based on information from other developers having the same problem. This is what I propose do to the entity.
1. Add an IDENTITY INT column (I would call it Id) and set it to be the clustered index. If you do not want to use an additional column, you could use a DATETIME column to store a timestamp which is usually very useful
2. Add a UNIQUEIDENTIFIER column (GUID data type in SQL Server) as the main ID. And set that to be the primary key.
3. Add a non-clustered index for the GUID ID
This will keep the data storage sequential while benefiting from system generated IDs instead of database generated IDs. There are two drawbacks: 1) storing an additional column 2) having an additional non-clustered index which takes some space and makes reads and writes slightly slower. Bear in mind, this is recommended only if you keep the entity only for transactional usage and to store and retrieve by key and not for reporting. In most cases, this is what you would normally do especially if you are implementing CQS or CQRS.


4 thoughts on “Generating a unique identifier outside of SQL and persisting it…

  1. Let me guess, you are using an ORM. ORMs want to generate the GUIDs at their tier and then use SQL Server only for persistence. It works great and your “trick” is how I’ve done it plenty of times. But you also correctly recognize the gotchas. You should really try to convince your co-workers to devise an “ID Factory” instead of using the ORM-default GUID. It takes a little bit of time to develop (or steal from Google) but it works better. Here’s the gist:
    1) CREATE SEQUENCE in SQL Server, and stored proc called GetSequenceRange that takes a parameter for how many IDs to reserve
    2) ORM calls GetSequenceRange with, say 5000. The ORM can cache those and dole them out as needed. When the cache runs out a new call is made to GetSequenceRange.
    3)if app is restarted the cache is blown away which means a new call to GetSequenceRange. You simply lost whatever was unused in cache, which is no big deal.

  2. @Dave
    You make some good points here and the idea of the “ID factory” using the SQL Server SEQUENCE is quite nice. I’m also using a combination of INT IDENTITY column for PKs with a clustered index and a separate GUID column with a non-clustered index however in this case I think it’s very justified: I’m not using an ORM (I dislike them because I think they obscure what is being done at the database level and also introduce another layer of potential failure and debugging requirements); the reason for the structure in this case is that the database is the back end for an online / offline application, is widely distributed and only periodically updates between several instances (out-of-date updates on a non-local instance [local to the user] are not a problem). I’m not using SQL Server’s in-built replication for this – it might work ok but I am hesitant because if it screws up then I would have a huge mess to try to sort out and so I’d rather keep a higher level of control over how and when this happens.

    In this situation, I believe that having a GUID for each major entity is really the only way to ensure that the between-instances reconciliation is performed correctly but I’d be very interested if anyone can offer insights and suggestions for a different, perhaps better, methodology?

  3. One can also think of creating a UNIQUEIDENTIFIER PRIMARY KEY column and setting a DEFAULT constraint using the function NEWSEQUENTIALID(). Not the best solution but a possible one though.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s