From SQL Joins to Arrays in PostgreSQL

Joins are in the heart of any relational database. Obviously, we use relations to model relationships between tables in SQL, and joins on different flavours (Inner, left, or right) are the semantics for manipulating a relational model. Well, I am not saying joins are bad, however, I believe that they are horrendously abused. If your schema requires a lot of joins in every SQL query to retrieve things, then consider changing it now.

We have a schema for modelling a tree structure, a very typical one.a table called NODE and a bridge table called NODE_RELATION that holds parent node id and child node id . A typical recursive relationship.
Unfortunately, these two tables are massive and queried quite a lot. Querying the bridge table was eating system’s resources and my responsibility was to enhance it.

Well, may be my solution was not SQL acceptable, but I think it is fine, the whole concept of NoSQL databases and CAP theory is to relinquish some ACID properties. Ugly solutions are acceptable if the intent is to improve performance.

The DB we talking about is under PostgreSQL, a durable and free ACID database. Unlike most SQL databases, PostgreSQL supports arrays as columns type with a comprehensive semantics for common functionality (push element,remove element,aggregation, etc..). In simple words, my solution was to store foreign ids in an array.We had to ensure backward compatibility, the bridge table was used everywhere in the code, so deleting it and using only arrays is not an option.
Let’s go back to our schema.

CREATE TABLE node -- ddl for the node table
(
  id integer NOT NULL DEFAULT nextval('node_seq'::regclass),
  name text,
  CONSTRAINT node_pkey PRIMARY KEY (id)
);

CREATE TABLE node_relation -- ddl for the bridge table
(
  id integer NOT NULL DEFAULT nextval('node_parent_child'::regclass),
  parent integer, -- node id
  child integer, -- node id
  CONSTRAINT node_relation_pkey PRIMARY KEY (id)
);

Quite simple I reckon. Well, let’s go to the hack then. Firstly, we need to add the arrays to the node table

ALTER TABLE "node" ADD COLUMN children integer[];
ALTER TABLE "node" ADD COLUMN parents integer[];

The arrays are empty so far, so we have to seed them from a the bridge table.In other words, we need the node’s parents and children.

UPDATE "node" SET children = v.children
FROM(
SELECT node_relation.parent as parent,array_agg(child) AS children FROM node_relation
GROUP BY node_relation.parent
)v
WHERE "node".id = v.parent;

array_agg is the stash here, a function that aggregates some input and convert them into an array.
So far so good, we have managed to add seed our table, but what about new relations that will be inserted or deleted from the bridge table. We need a simple and a quick solution. Nasty solutions are fine sometimes in some circumstances. So , I’ve decided to hire a trigger that watches the insertion and deletion of records on node_relation table and append elements or remove them from the array accordingly. Here we go!

CREATE OR REPLACE FUNCTION sync_node_and_node_relation() RETURNS TRIGGER AS $sync_node_node_relation$
 BEGIN
  IF (TG_OP = 'INSERT') THEN
    -- add new element to children's array
    UPDATE "node" SET children = array_append("node".children,NEW.child)
    WHERE "node".id = NEW.parent;

    -- add it to its parent	
    UPDATE "node" SET parents = array_append("node".parents,NEW.parent)
    WHERE "node".id = NEW.child;
    
    RETURN NEW;
 ELSEIF (TG_OP = 'DELETE') THEN
   -- remove ids from the arrays
   UPDATE "node" SET children = array_remove("node".children,OLD.child)
   WHERE "node".id = OLD.parent;
   
   UPDATE "node" SET parents = array_remove("node".parents,OLD.parent)
   WHERE "node".id = OLD.child;
   	
  RETURN OLD;
  END IF;
 END;
 
$sync_node_node_relation$ LANGUAGE plpgsql;  
DROP TRIGGER IF EXISTS sync_node_node_relation ON node_relation;

CREATE TRIGGER sync_node_node_relation AFTER INSERT OR DELETE ON node_relation
FOR EACH ROW EXECUTE PROCEDURE sync_node_and_node_relation();

This was it!! After we started using the arrays for fetching nodes’ parents and children, we’ve noticed a massive performance gains. Following the rules is usually good, but hacking is playing for cleverness.
Enjoy your day!!

Advertisements

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: