Once in a while, I need to migrate data from one database to another. One of the problems doing this is the auto incrementing ID values. There are a few ways to do this, but each has it's own set of issues.
A note to my friends on Facebook - this is a TECHNICAL posting, if you are not a programmer, database administrator, or system administrator, you probably won't understand or care... :)
One way to handle these is to simply replace them with new values in the new database. A gotcha here though is that you need to track what the OLD ID was and map it to the new ID when you have foreign keys or cross reference tables. Not a huge problem, but something to keep in mind.
Another option is to migrate your data directly, existing IDs and all. The issue with this is that databases do not allow you to set a value into an auto incrementing field. Sometimes though, the existing IDs MUST be maintained though, so how to get around this.
In my case, I am migrating from a MS SQL database to PostgreSQL. The IDs in question were also used to name related files, so changing the IDs would loose reference to the correct files.
I found a way to do it, though I didn't have much luck finding the "silver bullet" on Google, so I'm documenting my solution here to hopefully help others.
First, my table definition:
CREATE TABLE mytable
(
id serial NOT NULL,
name character varying(250),
CONSTRAINT mytable_pkey PRIMARY KEY (id)
)
WITH (OIDS=FALSE);
This is a basic table, using a serial data type for the ID field. This will automagically create a sequence called "mytable_id_seq" when the create command is run. Now, if we try to import our data into here, including the IDs, we'll get errors. So we need to disable the primary key temporarily. Here is how I did this:
delete from mytable
drop sequence mytable_id_seq cascade
alter table mytable drop constraint mytable_pkey
-- Import data here !!
create sequence mytable_id_seq start XXXX -- where the XXXX is the largest number in the ID field, plus one
alter table mytable alter column id set default nextval('mytable_id_seq')
alter table mytable add constraint mytable_pkey PRIMARY KEY (id)
And here are the details:
- clear the table of any records. This step isn't absolutely required, but a clean slate often helps.
- drop our sequence. We need the cascade in there so any dependent objects are also dropped. This should be fine seeing as we just cleared out our table. Even if we had data still, this step shouldn't remove or change any records from the table.
- drop the primary key constraint. An interesting side note here is that once these two commands are done, the data type for our ID field automatically changes to "integer". Reapplying the sequence and primary key constraint reset it to "serial". Trivia, but still interesting to know.
- Import our data. In my case, I am using a PHP routine to do this, and looping over all the records in the original MS SQL database. Do whatever you need here to get your data into the table - including the IDs.
- recreate the sequence. The "XXXX" part of the command should be replaced with a number that is reasonable for your data. I tracked the largest ID that got imported, added one to it, and used that value here.
- re-add the default value for the ID field. This should point to the "nextval()" of the sequence we just recreated.
- and finally, re-apply the primary key constraint to our table.
There may be ways to make this even more efficient, and if you know of one let me know. I don't think the sequence really has to be dropped, but it doesn't hurt.
Now that I have worked this out, I am debating going back over the rest of my migration routine to apply this technique instead of tracking old IDs. Doing an "exact" migration may mitigate some data issues in this project. Ultimately though, we very rarely use the IDs directly - that is the users are usually not aware of them, so this allows us some leeway in how we treat the IDs. But of course, there is always an exception to the rule, which forced me to work through this.
Hope this helps you out.