Add to del.icio.us

nested sets (again) with group_concat()

Written: 16:34 on September 02, 2008  |  By: jon  |  MORE…
A little development blog I've had on my mind lately. Consider the hierarchical nested set article which I have linked to previously. I began using this paradigm extensively for my hierarchal data and one thing that immediately leapt out at me (about the MySQL article) were the examples. I found that whilst informative, I couldn't really contrast or compare the two methods (adjacency and nested) in several ways, since the outputs were different. The example they gave for retrieving a "single path" from a "root" to a "leaf" is a good one for illustrating my point. Often whilst working with the nested set model, I have wanted to get results related to a node in the system (with joins with other tables). Also, I have wanted to grab all the tree that relates to something (for example, imagine a 'purchased' table that links items within the 'nested_category' table to a user) and also have information about the path to that product. I achieved this with a sub-query.

If you haven't used them before, sub-queries are a very handy addition to your developer's tool kit. Traditionally, they are used to grab parts of data where the grouping in the main SQL query makes it difficult, or for getting similar result sets to outer joins. Going back to our nested set model, if you look at the examples for the adjacency set you can see that when they are getting the path to a node, the result set is one row containing several fields.
+-------------+----------------------+-------------+-------+ | lev1 | lev2 | lev3 | lev4 | +-------------+----------------------+-------------+-------+ | ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH | +-------------+----------------------+-------------+-------+ 1 row in set (0.01 sec)
As the article rightly states, this method is extremely limited as the SQL query has to be written knowing how 'deep' the tree will go. If the depth is always the same, then it isn't so bad. The nested set model offers a lot more flexibility. However, when we are figuring out the 'path' from a specified root to a leaf, the results come out in the same column as different rows (shallowest node first). You can try this for yourself (I'm just reiterating what can be found in the article currently) with the following queries:
CREATE TABLE nested_category ( category_id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(20) NOT NULL, lft INT NOT NULL, rgt INT NOT NULL );
INSERT INTO nested_category VALUES(1,'ELECTRONICS',1,20),(2,'TELEVISIONS',2,9),(3,'TUBE',3,4), (4,'LCD',5,6),(5,'PLASMA',7,8),(6,'PORTABLE ELECTRONICS',10,19), (7,'MP3 PLAYERS',11,14),(8,'FLASH',12,13), (9,'CD PLAYERS',15,16),(10,'2 WAY RADIOS',17,18);
Now, the query…
SELECT parent.name FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt AND node.name = 'FLASH' ORDER BY parent.lft;
…will result in the following result:
+----------------------+ | name | +----------------------+ | ELECTRONICS | | PORTABLE ELECTRONICS | | MP3 PLAYERS | | FLASH | +----------------------+ 4 rows in set (0.00 sec)
Now, I get to my point. What if I wanted to get the full tree, with a column inside that result set that showed me the path to each node in the set. My initial reaction was to do it as a sub query. As my original result set is being built, I know everything about that node (with the select *). I can just use the path query on every row. However, as the above 'path' query pulls back several rows, they will overwrite each other. So, instead of obtaining a path to say, the 'FLASH' leaf, i'll just have the name of the product 'FLASH' again. This is because FLASH is the last row in the pathing query: it will overwrite 'MP3 PLAYERS', which in turn will overwrite 'PORTABLE ELECTRONICS', and so on and so forth.

So, what do we want to do? We could do it programatically with a loop, and use two queries. I am always incredibly weary about placing queries in loops and I decided against this. All we really need to do is concatenate all the node names together into a string. Makes sense? So as we traverse the path, PORTABLE ELECTRONICS is appended to ELECTRONICS, and then MP3 PLAYERS is appended to the previous string, and finally FLASH is appended to this string.

Unfortunately, this method doesn't work either. Your 'node' and 'parent' elements will concatenate together, but again they will overwrite the previous concatenation. The only way around this is to use an aggregate function that is present in MySQL 4.1 called GROUP_CONCAT(). This will concatenate a field together based upon a GROUP BY clause. If we run the query:
SELECT GROUP_CONCAT(parent.name SEPARATOR ' -> ') as path FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt AND node.name = 'FLASH' ORDER BY parent.lft;
It produces:
+-------------------------------------------------------------+ | path | +-------------------------------------------------------------+ | ELECTRONICS -> PORTABLE ELECTRONICS -> MP3 PLAYERS -> FLASH | +-------------------------------------------------------------+ 1 row in set (0.00 sec)
Notice the 'SEPARATOR' modifier I added in. All this does is decide the 'glue' of the concatenation. If you leave this out, it will default to a comma. As you can see, this would be useful for pulling out CSV (comma separated values) of certain datasets. Now, if we introduce this as a sub query to the original query, we shall get the entire result set required.
SELECT parent.name, (SELECT GROUP_CONCAT(parent2.name) FROM nested_category AS node2, nested_category AS parent2 WHERE node2.lft BETWEEN parent2.lft AND parent2.rgt AND node2.name = parent.name ORDER BY parent2.lft) as path FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt AND node.name = 'FLASH' ORDER BY parent.lft;
+----------------------+----------------------------------------------------+ | name | path | +----------------------+----------------------------------------------------+ | ELECTRONICS | ELECTRONICS | | PORTABLE ELECTRONICS | ELECTRONICS,PORTABLE ELECTRONICS | | MP3 PLAYERS | ELECTRONICS,PORTABLE ELECTRONICS,MP3 PLAYERS | | FLASH | ELECTRONICS,PORTABLE ELECTRONICS,MP3 PLAYERS,FLASH | +----------------------+----------------------------------------------------+ 4 rows in set (0.00 sec)
Please forgive my unimaginative naming in the sub-query… but you get the idea! if you use a 'EXPLAIN' query and check out what's happening in that query, I think you'll agree it isn't too bad. The only irksome part I'd say is the temporary table when checking for equality between node2.name and parent.name (i.e. where this path's node name is the same as the original's parent). Note that I also put an INDEX on the 'name' column in the table, and this drastically increases the efficiency of the query (in this instance cutting down row scan from n to just 1).

So there you have it. GROUP_CONCAT()… I feel stupid that I never knew about it before. I use it every now and then in queries like this to produce a CSV from a dataset and use WHERE IN() or just return the string and 'explode' the list with PHP. Hopefully my little adventure with the nested set model and this MySQL string function proved useful to you. If you have any further ideas for making the query more efficient, or you can rewrite this without a sub-query, please: leave a comment!
Add to del.icio.us

programming ides: annoyances

Written: 09:49 on June 09, 2008  |  By: jon  |  MORE…
I have used several IDEs for scripting PHP over the past five years. Since starting my job in 2006, I've been forced to switch a couple of times for hardware reasons. A bit of back story. When I first began to dabble in PHP I used UltraEdit32, on recommendation of a friend that was proficient in C. It suited my needs as a HTML editor originally, and was pretty flexible when it came to PHP (offering syntax highlighting) but no code completion.

Some time in 2004 or 2005 whilst I was at university, I was lucky enough to win a copy of Zend Studio 5. Obviously I began to realize what I was missing in my IDE. Before this point I don't think the IDE I was using really crossed my mind. However, the code completion was absolutely excellent (still unrivaled for PHP, I reckon) and the spread of features was awesome. One of my favourites was the built-in support for version control (CVS and Subversion) which became a bit of a killer app for me whilst coding my final year dissertation project.

Now, some of you might have tweaked that I said I switched due to hardware, and Zend Studio (as it is written in Java) is cross-platform. When I started my job I was thrown into the world of Apple Macintosh. After getting used to the basics of the operating system and falling in love with the BSD-esque underbelly, I got my copy of Zend running nicely under OS 9.0 Tiger. After a couple of months of glorious use, it began to get extremely sluggish. Loading it up took 20+ minutes, and each keystroke seemed to lag for about 2 seconds. Obviously, this was completely unusable. Apparently, it's something to do with the cache folder that Zend creates inside the user preferences directory. Despite following those instructions, it didn't remedy the situation.

I moved to OS X and a new (more powerful) iMac hoping that it would remedy the situation. Not really, I found. It still feels much more responsive on my PC in my home office, despite the specifications of the machines not being that distant from each other. In frustration, I went through some other popular IDEs for Mac:-

  • Dreamweaver CS3 - Terrible. Honestly, I can't understand why people use this IDE. Crash-happy, slow, irritating text-completion. The site-wide (or project-wide) search is a decent feature, though. I hope more IDEs pick it up.
  • Coda - Very enjoyable to use. Obviously an exercise in Cocoaforge for the authors, who have a very good grasp on GUI programming. It is missing certain tools, and some of the features are annoying (and cannot be changed via preferences). The IDE will shine with the advent of modules or plug-ins, if that ever happens.
  • TextMate - My current IDE of choice. I switched to this for it's flexibility (code completion via Textmate 'bundles' is a fantastic idea and full-circle, reminds me of UltraEdit32). Thanks to my co-worker for praising it enough for me to take notice(!)

And so, I come to the REAL reason I started writing this blog. The PHPDoc competition for classes in the standard PHPdoc textmate bundle is ass. It is nowhere near as good as Zend Studios… but since that IDE isn't an option for me currently whilst I'm in the office, I've reverted to hacking away at the bundles in order to make it more 'Zend-like'. After reading through and putting into practice the TextMate and phpDoc Comment Blocks article at Killersoft, it became obvious that it is completely possible to do what I am trying to achieve. (Note: if you also follow that article, make sure you get the newlines correct in the bundle- if you copy-paste like I did, it does not work. Get the text-only version they supply and copy-paste that!)

So yep. Zend is great and all, but for the time being I'm flitting between Coda and Textmate. Is there any better alternative for scripting on a Mac? I can't get the Eclipse-based Zend Studio 5.5 by the way, since my license key won't stretch that far, and I refuse to pay for the upgrade!
Add to del.icio.us

well... liam made me do it.

Written: 14:32 on December 29, 2006  |  By: jon  |  MORE…
It has taken long enough to find the motivation to put something here again, but here we have it. A really plain and rather pathetic blog (god, I hate that "word"). Last night in The Station, our local pub, I mentioned in passing to Liam that I'd found out a few new things about this server during my time at work. We got talking about the old Dovka website and both mentioned that we missed having somewhere to just dump some text, or write an article - you know; that sort of stuff! I also thought it'd be a great way to practice SEO for certain topics, a good place to post any crap about programming or web design etc. that I came across and certainly a good place to post rants about my principle hobby, Counter-Strike.

I did have a look around this morning at the "blogosphere" (another term I hate. I actually grimaced whilst typing it) and there are so many pages that could be carbon copies of each other. If I have to read another post about some person who just finished an awesome layout filled with AJAX, CSS2.x (compatible with 3.x, naturally!) and XHTML1.1S and the obligatory arse-lick comments from passers by that inevitably come with it, I might take a toaster with me next time I take a bath. I'm all for someone celebrating their achievements, but I'm sick of the same achievement being posted again and again ad infinitum on infinite weblogs. Having said all that, I better watch my step. Don't want to be labeled a hypocrite now, do I?
What is dovka?

this is a group blog run by a group of irc zealots, the prefabricators; each is a member of an exclusive irc channel on phrenzy.

Friends

here's a list of destinations that are worth visiting…

Categories

we talk about a range of stuff here at dovka:

Twitter

always dedicated to the cause, the prefabricators microblog whilst out exploring the real world…

  • anthony: This is Shady…not sure if I like the name…Jaedyhn just calls her "my dog"..but she's ours! http://t.co/CBjlIave — 18048 HRS AGO
  • liam: Hey. Terriers is good. #gettingintoshowswayaftertheyarecancelled — 18068 HRS AGO
  • johnny: Think Chipper is disappointed about how the fans acted tonight? #MLBPlayoffs — 18134 HRS AGO
  • johnny: But I feel this is relevant "and here come the pretzels!" #MLBPlayoffs — 18135 HRS AGO
  • johnny: The #Barves got two extra bases out of it, they should be happy #MLBPlayoffs — 18135 HRS AGO
  • johnny: He would've caught it, the ump calling infield fly made him think he was called off #MLBPlayoffs — 18135 HRS AGO
Archives

dig through our monthly archives and find old relics:

RSS Feed for this page
Add to Technorati Favorites