Database implementation in SMF 2.0

November 02, 2007, 09:22:31 AM Posted by Grudge on November 02, 2007, 09:22:31 AM in Database implementation in SMF 2.0 | 17 Comments
There have been a few posts here and there about the way SMF 2.0 addresses database abstraction and I thought that people might appreciate a blog on the topic from a developer to get an insight into the reasons behind our implementation and the benefits we see to the approach.

Firstly, let's talk about $smfFunc - the SMF 2.0 version of $func - name changed to allow for better integration with other applications. Since 1.1 RC2 (I think) this has been used for string functions to allow efficient implementation of unicode support. $smfFunc is an array of function names, so, for example $smfFunc['strpos'] could be assigned to the name 'strpos' or 'uc_strpos'. Whenever $smfFunc['strpos'] is called the function whose name it's values corresponds is actually called. Take a made up example.


function cheap()
{
  return 2;
}

function expensive()
{
  return 200;
}

$smfFunc['price'] = $_POST['name'] == 'grudge' ? 'cheap' : 'expensive';


In the above piece of code if the script was submitted with the name of 'grudge' $smfFunc['price'] would equal 'cheap' - otherwise it is 'expensive'. This variable can then be used like in the below:

echo $smfFunc['price']();


Which would execute the function 'cheap' or 'expensive' dependant on whether the name was grudge - hence displaying either 2 or 200 in the browser.

The above code shows how we can dynamically assign a function to a variable. Note that the function never exists in the variable - only the name of the function. This means that when you access $smfFunc['price']() PHP calls the cheap or expensive function direct - there is no overhead at all. You could do the above set of functionality like below:


function cheap()
{
  return 2;
}

function expensive()
{
  return 200;
}

function price()
{
  if ($_POST['name'] == 'grudge')
    return cheap();
  else
  return expensive();
}


This would work exactly the same used with:

echo price();


However, note that now when you call the price function PHP actually goes through another function to get there. The value returned from cheap/expensive has to pass into the price function then back out again. For a simple return value like an integer this overhead is relatively minimal but now imagine doing this with mysql_fetch_assoc - which is the function used to fetch data from the database. This function can (And on an average page load will) return, say, 25kB of data in an array. In the example above all this data needs to transition through a "pass through" function - whereas the first example (Using $smfFunc) effectively allows PHP to access the function natively.

So - this explains what we are doing with SMF 2.0. In the database loading function you'll find a line effectively doing this:

$smfFunc['db_fetch_assoc'] = 'mysql_fetch_assoc';


And when we call $smfFunc['db_fetch_assoc'] all PHP needs to do is a quick lookup on the value of $smfFunc['db_fetch_assoc'] and then call mysql_fetch_assoc. If you looked in the database loading function for the postgreSQL database the line would say:

$smfFunc['db_fetch_assoc'] = 'postg_fetch_assoc';


and so forth for sqlite. As in the first example we could have done:

function db_fetch_assoc($handle)
{
  return mysql_fetch_assoc($handle);
}


But this has the memory overhead. Hence, and for consistantency, all database functions in SMF 2.0 now follow this $smfFunc model. There are other things we've done in 2.0 which I won't go into here but are quite cool. In particular there is now a whole set of functions for creating, altering and droping tables. This is to ensure mod authors can write there mods *once* and SMF will automatically sort out the database changes so they work regardless of which database the user is using (i.e. the mod will work on MySQL, PostgreSQL and SQLite without any extra effort from the author).

The other thing we've done with 2.0 is try to make the SQL much more standard to enable abstraction to work correctly. So, for example, all cross joins such as:
SELECT *
FROM (smf_topics AS t, smf_messages AS m)
WHERE m.id_msg = t.id_first_msg

Won't work on PostgreSQL and many other schemas as it's not standard. Instead SMF 2.0 now uses inner joins for all queries. These are just as quick but much more standard and IMHO easier to read - so the above function now looks like:
SELECT *
FROM smf_topics AS t
  INNER JOIN smf_messages AS m ON (m.id_msg = t.id_first_msg)

There have been many more changes than the above but we hope they are for the benefit of users and that mod authors will be quick to adapt to the new way of working with the database. Happy to answer any comments you might have.

Grudge

Comments


MPT. on November 03, 2007, 09:38:38 PM said
Nice! It would be useful..

Dannii on November 03, 2007, 10:27:58 PM said
What is cool (though I don't know if it's possible in PHP) are to have functions that rewrite themselves after their first use. The first time the function is run it checks to see which version should be used, and then rewrites itself to use it, meaning that the check is only ever run once, although if the function is never run, it's never actually checked. That's probably not useful in this situation as the settings files mean we know for sure whether the forum is using MySQL or PostgreSQL or something else, but it's very useful in javascript when you really can assume nothing about what the user's browser can do ;)

SleePy on November 04, 2007, 01:50:41 AM said

winrules on November 04, 2007, 09:52:20 AM said
And only if you have a certain extension installed ;)

Nao 尚 on November 08, 2007, 03:26:29 AM said
Hello,

Quote from: Grudge on November 02, 2007, 09:22:31 AM
Since 1.1 RC2 (I think) this has been used for string functions to allow efficient implementation of unicode support.
Yes, and it's a nice implementation. Too bad it's undocumented. People just aren't aware that $func/$smfFunc versions of the same functions are UTF8-compatible. (I don't use UTF8 in my code yet -- still too buggy in SMF, so...)

QuoteThis function can (And on an average page load will) return, say, 25kB of data in an array. In the example above all this data needs to transition through a "pass through" function - whereas the first example (Using $smfFunc) effectively allows PHP to access the function natively.
On average, a SMF page has about 15 queries I'd say... A bit more for the Display.php pages. Doesn't take a lot of memory. 25KB of data? Where? I'm pretty sure it's sci-fi at this point :) From using the db debug info variable, and checking out the viewquery page, I'd say the whole page of queries (all in all) is 10KB at the most. Hardly 25KB per query. I don't think the overhead is significant... And maybe it could be avoided by using variable referencing (&) in the db_query() dummy function? I'm not too proficient with parameter referencing so correct me if I'm wrong.
$smfFunc could still be used for performance-critical operations like maintenance features.

One of the other problems with the new syntax for db_query is the empty parameter at the beginning of the parameter list. Why wasn't it just put at the end, after the __LINE__ and __FILE__ parameters?
Something like $dummy = '' would allow for no changes. I noticed a similar problematic pattern in an SSI function where an "exclude boards" parameter is now available, but not at the end. Interestingly (for me at least), I had already modified my 1.1.3 code to include such a feature, but I did it differently -- I checked the include board list, and if a board had a negative value, then it would be added to the exclude board list. This allows for no changes at all in the SSI function structure, while retaining the "exclude boards" feature.

QuoteThe other thing we've done with 2.0 is try to make the SQL much more standard to enable abstraction to work correctly. So, for example, all cross joins such as:
SELECT *
FROM (smf_topics AS t, smf_messages AS m)
WHERE m.id_msg = t.id_first_msg

Won't work on PostgreSQL and many other schemas as it's not standard.
Ah, too bad... I love this syntax.

QuoteInstead SMF 2.0 now uses inner joins for all queries. These are just as quick
Are you 100% sure they're just as quick? Did you benchmark them?

What is the purpose of the "INNER JOIN" member, as opposed to the more common (in SMF) "LEFT JOIN"? Is "INNER JOIN" supposed to mean that there's always an equivalent in the joined table and that there should be no overhead as opposed to a LEFT JOIN call?

EDIT--I just checked, and INNER JOIN seems to do the same job as a "FROM table1, table2" indeed. It only returns the common entries. And it's more readable I have to agree, although the query is a bit longer. Still, I don't know if it's exactly the same performance as the double FROM.

QuoteThere have been many more changes than the above but we hope they are for the benefit of users and that mod authors will be quick to adapt to the new way of working with the database.
Well I'm not a mod author but I did convert a lot of big mods (Pretty URLs, Googlebot & Spiders and Ultimate Profile come to mind), and $smfFunc was sometimes annoying to implement (not because of the multiple search & replace, but because I had to make sure I added global $smfFunc where needed), but not as annoying as the completely new admin area (menu stuff). Since this is completely undocumented, I had to do a lot of trial & error and it probably took more than half the time I needed to convert the mods.

So, as a conclusion, I'd say it would be nice to explain somewhere the steps to apply when converting a mod that has entries in the admin area :) (Not that I need that now, since I've converted all the mods I needed, but I'm pretty sure most mod authors will be glad to avoid the whole trial & error stuff I had to go through!)

Aaron on November 08, 2007, 05:36:08 AM said
It's a bit logical these changes haven't been documented yet, don't you think?  As SMF 2.0 is still in an early beta stage, certain things are still likely to change and thus extensively documenting probably hasn't got a high priority.

Then again, I'm not a dev, so I can't tell for sure. ;)

Nao 尚 on November 08, 2007, 07:01:40 AM said
Quote from: Aäron on November 08, 2007, 05:36:08 AM
It's a bit logical these changes haven't been documented yet, don't you think?  As SMF 2.0 is still in an early beta stage, certain things are still likely to change and thus extensively documenting probably hasn't got a high priority.
I'm not asking for a full-featured document, but a topic somewhere would be nice, with explanations on the differences to take into account when upgrading ;)
There's already such a topic BTW, but it doesn't mention the admin panel changes.

Grudge on November 08, 2007, 08:12:47 AM said
The thing is that $smfFunc is, in the most part, globalised everywhere in 2.0 as nearly every function uses the database so I don't think it's particularly problematic.

The reason why '' is at the start of every query is that *eventually* every query will have a unique string there - we just have only added the names to those functions which we needed to alter for PostgreSQL and SQLite - it's not optional.

The comment on SSI is fair if that is the case - we shouldn't be changing the position of previous parameters in SSI IMHO.

As for db_fetch_assoc etc. That memory overhead is *very* significant as board sizes scale up to the size of, for example, the community here. I really don't see that using $smfFunc['db_query'] is really any different to smf_db_query in terms of typing. Also if you need to globalise $smfFunc then you'd need to globalize $db_prefix anyway so it's not like it's really extra work. Of course you can simply use smf_db_query if desired but I don't think it's as elegant so it certainly won't be used like that in the core.

INNER JOIN and cross joins should be comparable speed wise. We saw no different in speed at all (Positive or negative) on MySQL.

As for documentation - you are right that it could be better. I did however think that the admin menu stuff is documented in the Subs-Menu.php file?

Nao 尚 on November 08, 2007, 09:34:32 AM said
Quote from: Grudge on November 08, 2007, 08:12:47 AM
The thing is that $smfFunc is, in the most part, globalised everywhere in 2.0 as nearly every function uses the database so I don't think it's particularly problematic.
I did get SMF to crash several times because the mod conversions I was doing didn't add $smfFunc to the globals.
For instance, mod authors may add db queries to functions that usually don't call the database. So they have to add these global declarations manually.

QuoteThe reason why '' is at the start of every query is that *eventually* every query will have a unique string there - we just have only added the names to those functions which we needed to alter for PostgreSQL and SQLite - it's not optional.
You mean we'll have to change the mods again?
And what's going to be the purpose of this mysterious string, then?

QuoteThe comment on SSI is fair if that is the case - we shouldn't be changing the position of previous parameters in SSI IMHO.
Thanks.
Maybe you could consider doing it the way I've been doing it, then? It's only a matter of a few minor rewrites. It's been working great on my boards really. You just go through the board list, if a negative value is found, turn it to positive and add it to the exclude list, then unset the current value, and continue to the next iteration.

QuoteAs for db_fetch_assoc etc. That memory overhead is *very* significant as board sizes scale up to the size of, for example, the community here.
I'm already hearing Eldacar say, "when you've got a huge board, you usually don't worry about these little things. You need larger servers anyway." :P

That's the kind of thing he tells me  :P Because I'm one of those who complain that SMF is sometimes... a little bit on the overgrown side. I wouldn't like to see it become another bloated phpBB. $smfFunc is precisely one of these places where I think SMF might be getting a little too big. I mean, so many changes, just to add PostgreSQL support (among others)... I don't know of anyone who uses this. It's a bit frustrating.

Still, the other new features in SMF2 keep me from going back to v1.1 ;)

QuoteI really don't see that using $smfFunc['db_query'] is really any different to smf_db_query in terms of typing.
I don't know, maybe you don't have to press the AltGr key to enter [ and ], and often end up with "[{...@]" because you released the AltGr key too late ;)
db_query('...') doesn't require any shift/alt/alt+gr combination.

QuoteAlso if you need to globalise $smfFunc then you'd need to globalize $db_prefix anyway so it's not like it's really extra work.
Yes you have a point.

QuoteOf course you can simply use smf_db_query if desired but I don't think it's as elegant so it certainly won't be used like that in the core.
I gave up long ago on convincing the team to use an alias function ;) Although the idea's still playing in my head... It won't shut up but I'm doing my best not to let my fingers type it :P

QuoteINNER JOIN and cross joins should be comparable speed wise. We saw no different in speed at all (Positive or negative) on MySQL.
I hope you're right! :)

QuoteAs for documentation - you are right that it could be better. I did however think that the admin menu stuff is documented in the Subs-Menu.php file?
Yes it is, but what's not documented, is how to convert a mod's admin area to SMF2. Unfortunately it can't be automated.

Dannii on November 08, 2007, 09:42:27 AM said
QuoteI don't know, maybe you don't have to press the AltGr key to enter [ and ], and often end up with "[{...@]" because you released the AltGr key too late ;)
db_query('...') doesn't require any shift/alt/alt+gr combination.
Must be your keyboard layout, I can type [ and ] without using shift or alt. ;)

SleePy on November 08, 2007, 12:27:32 PM said
The empty string at the start as said is for other databases. So the function can pass out to other functions or specific details based on a query that is being handed to it.

It isn't to much work to change this, I have already simply changed all my released mods over to 2.0 beta as well as 1.1.4 and they work fine regardless of the SMF version.
Its a major version jump. This isn't a minor thing such as 1.0 to 1.1 (Which itself was big), this is 1.x to 2.x.
Edit,
A great example is say windows 95 and then installing service packs. But instead of that it is windows 95 to vista. A lot has changed and software most likely wont work anymore.
End Edit..

2.0 Final is a long way off, you got plenty of time yourself to make your mods work with 2.0. And since you are a charter member you have access to 2.0 at this time and can make your mods work with 2.0 before others can who do not have 2.0 and need to wait for the public beta.

We are working on 2.0 Documentation, it takes a lot of work to detail all things and then if something is changed, well we have to know to change that, so its a cat/mouse game with ourselves. This is why documentation is slow, If the doc team is caught up with the devs on features, and the devs change a feature, a whole section might need to be rewrote, proof-read and such.
If you ask the question in the Charter boards me or another team member would be happy to help you out, The admin menus in 2.0 are fairly simple and actually, very close (I wish though that they where the same to use the same functions) as the main menu in 2.0.

Nao 尚 on November 08, 2007, 01:46:09 PM said
Quote from: ܝܠܕܟ on November 08, 2007, 09:42:27 AM
Must be your keyboard layout, I can type [ and ] without using shift or alt. ;)
AZERTY keyboard. These are the French standard...

Quote from: SleePy on November 08, 2007, 12:27:32 PM
The empty string at the start as said is for other databases. So the function can pass out to other functions or specific details based on a query that is being handed to it.
I didn't understand a single word of what you tried to explain, but if you say so... Thanks for trying ;)

Quote2.0 Final is a long way off, you got plenty of time yourself to make your mods work with 2.0.
I don't have any mods ;) I just converted the mods I use for my 1.1.3 board to 2.0. (Well, still haven't converted Spoiler Tag... No idea why...)

QuoteThis is why documentation is slow, If the doc team is caught up with the devs on features, and the devs change a feature, a whole section might need to be rewrote, proof-read and such.
I certainly know the feeling ;)

JayBachatero on November 08, 2007, 07:42:16 PM said
You need to get yourself a programming friendly keyboard.  That is not friendly at all. :P

SleePy on November 08, 2007, 10:15:28 PM said
Nao,

Look at the $smfFunc through out 2.0, you will see where the empty first string is sometimes not empty and how it differs from an actual query. I won't post the code here as I am not a developer and 2.0 isn't public, So just peak around and you will see how its used.

Ben_S on November 09, 2007, 06:19:51 AM said
Quote from: Nao 尚 on November 08, 2007, 09:34:32 AMI'm already hearing Eldacar say, "when you've got a huge board, you usually don't worry about these little things. You need larger servers anyway." :P

Urgh when you have a larger board you look at the little things in much closer detail, adding extra / more powerful servers is the lazy option.

Dannii on November 09, 2007, 06:27:40 AM said
I'm being mis-quoted. I was more saying that the things Nao was worrying about are unlikely to be the first bottlenecks reached when your forum grows super big.

Nao 尚 on November 09, 2007, 06:54:02 AM said
Quote from: Ben_S on November 09, 2007, 06:19:51 AM
Urgh when you have a larger board you look at the little things in much closer detail, adding extra / more powerful servers is the lazy option.
Yes, I agree. This is why I'm very demanding on the performance of PrettyURLs ;) I've already managed to speed it up by about 200%.
Advertisement: