Skip to content Skip to sidebar Skip to footer

Return Data From Subselect Used In Insert In A Common Table Expression

I am trying to move bytea data from one table to another, updating references in one query. Therefore I would like to return data from the query used for the insert that is not use

Solution 1:

You need to get your table names and aliases right. Plus, the connection between the two tables is the column image (datain the new table file_data):

WITH inserted AS (
  INSERTINTO file_data (data)
  SELECT image
  FROM   task_log
  WHERE  image ISNOTNULL
  RETURNING id, data  -- can only reference target row
)
UPDATE task_log t
SET    attachment_id = i.id
     , attachment_type ='INLINE_IMAGE'FROM   inserted i
WHERE  t.image = i.data;

Like explained in my old answer you referenced, image must be unique in task_log for this to work:

I added a technique how to disambiguate non-unique values in the referenced answer. Not sure if you'd want duplicate images in file_data, though.

In the RETURNING clause of an INSERT you can only reference columns from the inserted row. The manual:

The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted (...) However, any expression using the table's columns is allowed.

Bold emphasis mine.

Fold duplicate source values

If you want distinct entries in the target table of the INSERT (task_log), all you need in this case is DISTINCT in the initial SELECT:

WITH inserted AS (
  INSERTINTO file_data (data)
  SELECTDISTINCT image  -- fold duplicatesFROM   task_log
  WHERE  image ISNOTNULL
  RETURNING id, data  -- can only reference target row
)
UPDATE task_log t
SET    attachment_id = i.id
     , attachment_type ='INLINE_IMAGE'FROM   inserted i
WHERE  t.image = i.data;

The resulting file_data.id is used multiple times in task_log. Be aware that multiple rows in task_log now point to the same image in file_data. Careful with updates and deletes ...

Solution 2:

I needed to replicate duplicates so I ended up adding a temp column for the id of the used data row.

altertable file_data addcolumn task_log_id bigint;
-- insert & update dataaltertable file_data dropcolumn task_log_id;

The full move script was

-- A new table for any file dataCREATETABLE file_data (
  id         BIGSERIAL PRIMARY KEY,
  data  bytea
);

-- Move data from task_log to bytes-- Create new columns to reference file_dataaltertable task_log addcolumn attachment_type VARCHAR(50);
altertable task_log addcolumn attachment_id bigintREFERENCES file_data;

-- add a temp column for the task_id used for the insertaltertable file_data addcolumn task_log_id bigint;

-- insert data into file_data and set referenceswith inserted as (
  INSERTINTO file_data (data, task_log_id)
    select image, id from task_log where image isnotnull
  RETURNING id, task_log_id
)
UPDATE task_log
SET   attachment_id = inserted.id,
      attachment_type ='INLINE_IMAGE'FROM  inserted
where inserted.task_log_id = task_log.id;
-- delete the temp columnaltertable file_data dropcolumn task_log_id;
-- delete task_log imagesaltertable task_log dropcolumn image;

As this produces some dead data I ran a vacuum full afterwards to clean up.

But please let me repeat the warning from @ErwinBrandstetter:

Performance is much worse than for the method using a serial number I proposed in the linked answer. Adding & removing a column require's owner's privileges, a full table rewrite and exclusive locks on the table, which is poison for concurrent access.

Post a Comment for "Return Data From Subselect Used In Insert In A Common Table Expression"