Add to del.icio.us

nested sets (again) with group_concat()

Written: 16:34 on September 02, 2008  |  By: jon  |  MORE…
A little development blog I've had on my mind lately. Consider the hierarchical nested set article which I have linked to previously. I began using this paradigm extensively for my hierarchal data and one thing that immediately leapt out at me (about the MySQL article) were the examples. I found that whilst informative, I couldn't really contrast or compare the two methods (adjacency and nested) in several ways, since the outputs were different. The example they gave for retrieving a "single path" from a "root" to a "leaf" is a good one for illustrating my point. Often whilst working with the nested set model, I have wanted to get results related to a node in the system (with joins with other tables). Also, I have wanted to grab all the tree that relates to something (for example, imagine a 'purchased' table that links items within the 'nested_category' table to a user) and also have information about the path to that product. I achieved this with a sub-query.

If you haven't used them before, sub-queries are a very handy addition to your developer's tool kit. Traditionally, they are used to grab parts of data where the grouping in the main SQL query makes it difficult, or for getting similar result sets to outer joins. Going back to our nested set model, if you look at the examples for the adjacency set you can see that when they are getting the path to a node, the result set is one row containing several fields.
+-------------+----------------------+-------------+-------+ | lev1 | lev2 | lev3 | lev4 | +-------------+----------------------+-------------+-------+ | ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH | +-------------+----------------------+-------------+-------+ 1 row in set (0.01 sec)
As the article rightly states, this method is extremely limited as the SQL query has to be written knowing how 'deep' the tree will go. If the depth is always the same, then it isn't so bad. The nested set model offers a lot more flexibility. However, when we are figuring out the 'path' from a specified root to a leaf, the results come out in the same column as different rows (shallowest node first). You can try this for yourself (I'm just reiterating what can be found in the article currently) with the following queries:
CREATE TABLE nested_category ( category_id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(20) NOT NULL, lft INT NOT NULL, rgt INT NOT NULL );
INSERT INTO nested_category VALUES(1,'ELECTRONICS',1,20),(2,'TELEVISIONS',2,9),(3,'TUBE',3,4), (4,'LCD',5,6),(5,'PLASMA',7,8),(6,'PORTABLE ELECTRONICS',10,19), (7,'MP3 PLAYERS',11,14),(8,'FLASH',12,13), (9,'CD PLAYERS',15,16),(10,'2 WAY RADIOS',17,18);
Now, the query…
SELECT parent.name FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt AND node.name = 'FLASH' ORDER BY parent.lft;
…will result in the following result:
+----------------------+ | name | +----------------------+ | ELECTRONICS | | PORTABLE ELECTRONICS | | MP3 PLAYERS | | FLASH | +----------------------+ 4 rows in set (0.00 sec)
Now, I get to my point. What if I wanted to get the full tree, with a column inside that result set that showed me the path to each node in the set. My initial reaction was to do it as a sub query. As my original result set is being built, I know everything about that node (with the select *). I can just use the path query on every row. However, as the above 'path' query pulls back several rows, they will overwrite each other. So, instead of obtaining a path to say, the 'FLASH' leaf, i'll just have the name of the product 'FLASH' again. This is because FLASH is the last row in the pathing query: it will overwrite 'MP3 PLAYERS', which in turn will overwrite 'PORTABLE ELECTRONICS', and so on and so forth.

So, what do we want to do? We could do it programatically with a loop, and use two queries. I am always incredibly weary about placing queries in loops and I decided against this. All we really need to do is concatenate all the node names together into a string. Makes sense? So as we traverse the path, PORTABLE ELECTRONICS is appended to ELECTRONICS, and then MP3 PLAYERS is appended to the previous string, and finally FLASH is appended to this string.

Unfortunately, this method doesn't work either. Your 'node' and 'parent' elements will concatenate together, but again they will overwrite the previous concatenation. The only way around this is to use an aggregate function that is present in MySQL 4.1 called GROUP_CONCAT(). This will concatenate a field together based upon a GROUP BY clause. If we run the query:
SELECT GROUP_CONCAT(parent.name SEPARATOR ' -> ') as path FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt AND node.name = 'FLASH' ORDER BY parent.lft;
It produces:
+-------------------------------------------------------------+ | path | +-------------------------------------------------------------+ | ELECTRONICS -> PORTABLE ELECTRONICS -> MP3 PLAYERS -> FLASH | +-------------------------------------------------------------+ 1 row in set (0.00 sec)
Notice the 'SEPARATOR' modifier I added in. All this does is decide the 'glue' of the concatenation. If you leave this out, it will default to a comma. As you can see, this would be useful for pulling out CSV (comma separated values) of certain datasets. Now, if we introduce this as a sub query to the original query, we shall get the entire result set required.
SELECT parent.name, (SELECT GROUP_CONCAT(parent2.name) FROM nested_category AS node2, nested_category AS parent2 WHERE node2.lft BETWEEN parent2.lft AND parent2.rgt AND node2.name = parent.name ORDER BY parent2.lft) as path FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt AND node.name = 'FLASH' ORDER BY parent.lft;
+----------------------+----------------------------------------------------+ | name | path | +----------------------+----------------------------------------------------+ | ELECTRONICS | ELECTRONICS | | PORTABLE ELECTRONICS | ELECTRONICS,PORTABLE ELECTRONICS | | MP3 PLAYERS | ELECTRONICS,PORTABLE ELECTRONICS,MP3 PLAYERS | | FLASH | ELECTRONICS,PORTABLE ELECTRONICS,MP3 PLAYERS,FLASH | +----------------------+----------------------------------------------------+ 4 rows in set (0.00 sec)
Please forgive my unimaginative naming in the sub-query… but you get the idea! if you use a 'EXPLAIN' query and check out what's happening in that query, I think you'll agree it isn't too bad. The only irksome part I'd say is the temporary table when checking for equality between node2.name and parent.name (i.e. where this path's node name is the same as the original's parent). Note that I also put an INDEX on the 'name' column in the table, and this drastically increases the efficiency of the query (in this instance cutting down row scan from n to just 1).

So there you have it. GROUP_CONCAT()… I feel stupid that I never knew about it before. I use it every now and then in queries like this to produce a CSV from a dataset and use WHERE IN() or just return the string and 'explode' the list with PHP. Hopefully my little adventure with the nested set model and this MySQL string function proved useful to you. If you have any further ideas for making the query more efficient, or you can rewrite this without a sub-query, please: leave a comment!
Add to del.icio.us

rpg2knet development: can it be too complex?

Written: 11:30 on August 25, 2008  |  By: jon  |  MORE…
The development of my pet project, rpg2knet 4 is continuing very nicely. I've overcome certain problems I was having earlier (including one small issue I had with my nested set model code) meaning so far… so good. No problems that remain unsolved. One of the more esoteric problems I have been pondering over lately is this: am I making this website too involved and complex?

Looking through what is complete at the moment (and I'd say we're definitely closing on half-way done), there is a tremendous amount of interactivity. These days, with the advent of the social web and the advances in user experience and interactivity, I'd say this isn't necessarily a bad thing. Conversely, the forum element has been dubbed 4um4-lite, as layers of complexity have been removed. The use of the forums is a much more tied in experience with the rest of the site. Previously, the forum had upwards of 85% of the traffic whilst the rest of the site languished behind. Now, there will be more of a reason to visit the main site as a lot of information is available there.

My main concerns, due to the lack of problems with the codebase, are turning to two other things. The design, which I have tried and tried to perfect but doesn't come off quite right, and content. The site will inevitably feed from user participation and submissions from the users, but I think a site of this size and complexity will definitely need some very devoted people to keep it fully stocked. With the issue of design, I'm contemplating asking (very nicely) a friend and colleague of mine to see what he can come up with. Either way, these concerns are trivial — I still have a lot of motivation, regardless of the criticism and comments from certain people.
Add to del.icio.us

rpg2knet development: nested set model

Written: 19:52 on July 26, 2008  |  By: jon  |  MORE…
Squirrel, the mascot of rpg2knetAs most visitors I imagine will know (since they are mainly friends), I used to run a website about game making called rpg2knet.com. The site has been down for many years now (since around about this time in 2006) and I have been trying to get it sorted and back online again. It has only been recently that I have really been pushing hard to get this all completed. I got really wrapped up with Counter-Strike: Source for several years (starting around about April 2005 and ending just recently), which caused me to struggle finding time for programming. Now that I have effectively retired from the source scene, I have had plenty of time to get on with rpg2knet.

I thought, since I'm coming up to roughly the half-way mark in terms of coding, I could start using the group blog here at dovka as a devlog as well, since posts can be filtered quite easily with our tagging system. So, today I thought I'd let people know about how things are progressing. This weekend has mainly been spent working on part of the site that I have toyfully been calling Nutforge (a play on sourceforge). It is a tool for starting a game-making project, finding your team/friends with certain skills to help you work (and rating them on how useful they've been, etc), talking about your game, showing off screenshots, and uploading/releasing your game to the masses. The 'end-game' for nutforge is to have your hard work reviewed by the panel of reviewers here that we shall (hopefully) hire for rpg2knet.com, just like it happened last time.

One of the features I have implemented is a "skill set" system. Using the excellent Hierachical (Nested) Set paradigm, I've managed to come up with a tree of skills that you can tag with different levels of proficiency. The idea being, you can search for 'free' people that have listed that they are looking for a project to help out with, and order them by their proficiency, filter them out and well… cherry pick the people you like. Obviously I can foresee a problem with trolls putting all their 'proficiencies' at maximum just to clutter up the search results. With that in mind, I've been thinking about adding in a modifier system where community members can vote a member up or down in skill level if they have been involved in a project with them. That way, if somebody is a time waster, they'll essentially be flagged as such and this will warn others about undertaking a project with them.

This tiny feature is something that I consider to be, at the time of writing, very unique. I can't see any other RPG maker based website doing anything like this, although I'm aware I may have been mildly inspired by the excellent assembla.com. This is what rpg2knet.com is all about in my eyes. Something original, that nobody else has thought of, or something that nobody else can do. If all goes to plan and we can couple this website up with the sort of community spirit the old rpg2knet.com had, I believe we are really onto a winner. And that spirit is exactly what keeps me programming every night to get it completed *hurl*. As cheesy as sounds, it's true.
What is dovka?

this is a group blog run by a group of irc zealots, the prefabricators; each is a member of an exclusive irc channel on phrenzy.

Friends

here's a list of destinations that are worth visiting…

Categories

we talk about a range of stuff here at dovka:

Twitter

always dedicated to the cause, the prefabricators microblog whilst out exploring the real world…

  • johnny: So happy it's my last day. All I wanna do is sleep — 7 pm
  • johnny: Just got asked to play beach volleyball this summer, I haven't played volleyball in years and I'm not that good #decisions — 28 HRS AGO
  • johnny: People always ask me why I sound so bored at work. You ask the same questions 50000 times and try and sound excited about it — 35 HRS AGO
  • johnny: I don't want to work tomorrow — 45 HRS AGO
  • johnny: I would much rather be anywhere having beers than at work. Gorgeous day #borderproblems #drinkingproblems — 60 HRS AGO
  • johnny: Ivan Nova just made Albert Pujols look stupid #MLB — 74 HRS AGO
Archives

dig through our monthly archives and find old relics:

RSS Feed for this page
Add to Technorati Favorites