Welcome, Guest. Please login or register.
Did you miss your activation email? October 07, 2008, 07:17:21 PM
Did you miss your activation email? October 07, 2008, 07:17:21 PM
Developers' Blog
Database implementation in SMF 2.0
There have been a few posts here and there about the way SMF 2.0 addresses database abstraction and I thought that people might appreciate a blog on the topic from a developer to get an insight into the reasons behind our implementation and the benefits we see to the approach.
Firstly, let's talk about $smfFunc - the SMF 2.0 version of $func - name changed to allow for better integration with other applications. Since 1.1 RC2 (I think) this has been used for string functions to allow efficient implementation of unicode support. $smfFunc is an array of function names, so, for example $smfFunc['strpos'] could be assigned to the name 'strpos' or 'uc_strpos'. Whenever $smfFunc['strpos'] is called the function whose name it's values corresponds is actually called. Take a made up example.
In the above piece of code if the script was submitted with the name of 'grudge' $smfFunc['price'] would equal 'cheap' - otherwise it is 'expensive'. This variable can then be used like in the below:
Which would execute the function 'cheap' or 'expensive' dependant on whether the name was grudge - hence displaying either 2 or 200 in the browser.
The above code shows how we can dynamically assign a function to a variable. Note that the function never exists in the variable - only the name of the function. This means that when you access $smfFunc['price']() PHP calls the cheap or expensive function direct - there is no overhead at all. You could do the above set of functionality like below:
This would work exactly the same used with:
However, note that now when you call the price function PHP actually goes through another function to get there. The value returned from cheap/expensive has to pass into the price function then back out again. For a simple return value like an integer this overhead is relatively minimal but now imagine doing this with mysql_fetch_assoc - which is the function used to fetch data from the database. This function can (And on an average page load will) return, say, 25kB of data in an array. In the example above all this data needs to transition through a "pass through" function - whereas the first example (Using $smfFunc) effectively allows PHP to access the function natively.
So - this explains what we are doing with SMF 2.0. In the database loading function you'll find a line effectively doing this:
And when we call $smfFunc['db_fetch_assoc'] all PHP needs to do is a quick lookup on the value of $smfFunc['db_fetch_assoc'] and then call mysql_fetch_assoc. If you looked in the database loading function for the postgreSQL database the line would say:
and so forth for sqlite. As in the first example we could have done:
But this has the memory overhead. Hence, and for consistantency, all database functions in SMF 2.0 now follow this $smfFunc model. There are other things we've done in 2.0 which I won't go into here but are quite cool. In particular there is now a whole set of functions for creating, altering and droping tables. This is to ensure mod authors can write there mods *once* and SMF will automatically sort out the database changes so they work regardless of which database the user is using (i.e. the mod will work on MySQL, PostgreSQL and SQLite without any extra effort from the author).
The other thing we've done with 2.0 is try to make the SQL much more standard to enable abstraction to work correctly. So, for example, all cross joins such as:
SELECT *
FROM (smf_topics AS t, smf_messages AS m)
WHERE m.id_msg = t.id_first_msg
Won't work on PostgreSQL and many other schemas as it's not standard. Instead SMF 2.0 now uses inner joins for all queries. These are just as quick but much more standard and IMHO easier to read - so the above function now looks like:
SELECT *
FROM smf_topics AS t
INNER JOIN smf_messages AS m ON (m.id_msg = t.id_first_msg)
There have been many more changes than the above but we hope they are for the benefit of users and that mod authors will be quick to adapt to the new way of working with the database. Happy to answer any comments you might have.
Grudge
Firstly, let's talk about $smfFunc - the SMF 2.0 version of $func - name changed to allow for better integration with other applications. Since 1.1 RC2 (I think) this has been used for string functions to allow efficient implementation of unicode support. $smfFunc is an array of function names, so, for example $smfFunc['strpos'] could be assigned to the name 'strpos' or 'uc_strpos'. Whenever $smfFunc['strpos'] is called the function whose name it's values corresponds is actually called. Take a made up example.
Code: [Select]
function cheap()
{
return 2;
}
function expensive()
{
return 200;
}
$smfFunc['price'] = $_POST['name'] == 'grudge' ? 'cheap' : 'expensive';
In the above piece of code if the script was submitted with the name of 'grudge' $smfFunc['price'] would equal 'cheap' - otherwise it is 'expensive'. This variable can then be used like in the below:
Code: [Select]
echo $smfFunc['price']();
Which would execute the function 'cheap' or 'expensive' dependant on whether the name was grudge - hence displaying either 2 or 200 in the browser.
The above code shows how we can dynamically assign a function to a variable. Note that the function never exists in the variable - only the name of the function. This means that when you access $smfFunc['price']() PHP calls the cheap or expensive function direct - there is no overhead at all. You could do the above set of functionality like below:
Code: [Select]
function cheap()
{
return 2;
}
function expensive()
{
return 200;
}
function price()
{
if ($_POST['name'] == 'grudge')
return cheap();
else
return expensive();
}
This would work exactly the same used with:
Code: [Select]
echo price();
However, note that now when you call the price function PHP actually goes through another function to get there. The value returned from cheap/expensive has to pass into the price function then back out again. For a simple return value like an integer this overhead is relatively minimal but now imagine doing this with mysql_fetch_assoc - which is the function used to fetch data from the database. This function can (And on an average page load will) return, say, 25kB of data in an array. In the example above all this data needs to transition through a "pass through" function - whereas the first example (Using $smfFunc) effectively allows PHP to access the function natively.
So - this explains what we are doing with SMF 2.0. In the database loading function you'll find a line effectively doing this:
Code: [Select]
$smfFunc['db_fetch_assoc'] = 'mysql_fetch_assoc';
And when we call $smfFunc['db_fetch_assoc'] all PHP needs to do is a quick lookup on the value of $smfFunc['db_fetch_assoc'] and then call mysql_fetch_assoc. If you looked in the database loading function for the postgreSQL database the line would say:
Code: [Select]
$smfFunc['db_fetch_assoc'] = 'postg_fetch_assoc';
and so forth for sqlite. As in the first example we could have done:
Code: [Select]
function db_fetch_assoc($handle)
{
return mysql_fetch_assoc($handle);
}
But this has the memory overhead. Hence, and for consistantency, all database functions in SMF 2.0 now follow this $smfFunc model. There are other things we've done in 2.0 which I won't go into here but are quite cool. In particular there is now a whole set of functions for creating, altering and droping tables. This is to ensure mod authors can write there mods *once* and SMF will automatically sort out the database changes so they work regardless of which database the user is using (i.e. the mod will work on MySQL, PostgreSQL and SQLite without any extra effort from the author).
The other thing we've done with 2.0 is try to make the SQL much more standard to enable abstraction to work correctly. So, for example, all cross joins such as:
SELECT *
FROM (smf_topics AS t, smf_messages AS m)
WHERE m.id_msg = t.id_first_msg
Won't work on PostgreSQL and many other schemas as it's not standard. Instead SMF 2.0 now uses inner joins for all queries. These are just as quick but much more standard and IMHO easier to read - so the above function now looks like:
SELECT *
FROM smf_topics AS t
INNER JOIN smf_messages AS m ON (m.id_msg = t.id_first_msg)
There have been many more changes than the above but we hope they are for the benefit of users and that mod authors will be quick to adapt to the new way of working with the database. Happy to answer any comments you might have.
Grudge
Loading...
http://us3.php.net/manual/en/function.override-function.php
The thing is, it only works for built-in functions.
$smfFunc could still be used for performance-critical operations like maintenance features.
One of the other problems with the new syntax for db_query is the empty parameter at the beginning of the parameter list. Why wasn't it just put at the end, after the __LINE__ and __FILE__ parameters?
Something like $dummy = '' would allow for no changes. I noticed a similar problematic pattern in an SSI function where an "exclude boards" parameter is now available, but not at the end. Interestingly (for me at least), I had already modified my 1.1.3 code to include such a feature, but I did it differently -- I checked the include board list, and if a board had a negative value, then it would be added to the exclude board list. This allows for no changes at all in the SSI function structure, while retaining the "exclude boards" feature.
What is the purpose of the "INNER JOIN" member, as opposed to the more common (in SMF) "LEFT JOIN"? Is "INNER JOIN" supposed to mean that there's always an equivalent in the joined table and that there should be no overhead as opposed to a LEFT JOIN call?
EDIT--I just checked, and INNER JOIN seems to do the same job as a "FROM table1, table2" indeed. It only returns the common entries. And it's more readable I have to agree, although the query is a bit longer. Still, I don't know if it's exactly the same performance as the double FROM.
So, as a conclusion, I'd say it would be nice to explain somewhere the steps to apply when converting a mod that has entries in the admin area
Then again, I'm not a dev, so I can't tell for sure.
There's already such a topic BTW, but it doesn't mention the admin panel changes.
The reason why '' is at the start of every query is that *eventually* every query will have a unique string there - we just have only added the names to those functions which we needed to alter for PostgreSQL and SQLite - it's not optional.
The comment on SSI is fair if that is the case - we shouldn't be changing the position of previous parameters in SSI IMHO.
As for db_fetch_assoc etc. That memory overhead is *very* significant as board sizes scale up to the size of, for example, the community here. I really don't see that using $smfFunc['db_query'] is really any different to smf_db_query in terms of typing. Also if you need to globalise $smfFunc then you'd need to globalize $db_prefix anyway so it's not like it's really extra work. Of course you can simply use smf_db_query if desired but I don't think it's as elegant so it certainly won't be used like that in the core.
INNER JOIN and cross joins should be comparable speed wise. We saw no different in speed at all (Positive or negative) on MySQL.
As for documentation - you are right that it could be better. I did however think that the admin menu stuff is documented in the Subs-Menu.php file?
For instance, mod authors may add db queries to functions that usually don't call the database. So they have to add these global declarations manually.
And what's going to be the purpose of this mysterious string, then?
Maybe you could consider doing it the way I've been doing it, then? It's only a matter of a few minor rewrites. It's been working great on my boards really. You just go through the board list, if a negative value is found, turn it to positive and add it to the exclude list, then unset the current value, and continue to the next iteration.
That's the kind of thing he tells me
Still, the other new features in SMF2 keep me from going back to v1.1
db_query('...') doesn't require any shift/alt/alt+gr combination.
It isn't to much work to change this, I have already simply changed all my released mods over to 2.0 beta as well as 1.1.4 and they work fine regardless of the SMF version.
Its a major version jump. This isn't a minor thing such as 1.0 to 1.1 (Which itself was big), this is 1.x to 2.x.
Edit,
A great example is say windows 95 and then installing service packs. But instead of that it is windows 95 to vista. A lot has changed and software most likely wont work anymore.
End Edit..
2.0 Final is a long way off, you got plenty of time yourself to make your mods work with 2.0. And since you are a charter member you have access to 2.0 at this time and can make your mods work with 2.0 before others can who do not have 2.0 and need to wait for the public beta.
We are working on 2.0 Documentation, it takes a lot of work to detail all things and then if something is changed, well we have to know to change that, so its a cat/mouse game with ourselves. This is why documentation is slow, If the doc team is caught up with the devs on features, and the devs change a feature, a whole section might need to be rewrote, proof-read and such.
If you ask the question in the Charter boards me or another team member would be happy to help you out, The admin menus in 2.0 are fairly simple and actually, very close (I wish though that they where the same to use the same functions) as the main menu in 2.0.
Look at the $smfFunc through out 2.0, you will see where the empty first string is sometimes not empty and how it differs from an actual query. I won't post the code here as I am not a developer and 2.0 isn't public, So just peak around and you will see how its used.
Urgh when you have a larger board you look at the little things in much closer detail, adding extra / more powerful servers is the lazy option.