I’ve been developing in PHP since 1996, and PHP has seen a huge upsurge over the last decade. It’s essentially one of the standards for web development out there now (and has been for many years). Every few months there’s a release, whether large or small, which introduces new features and bug fixes. However, with every release also comes fears of tiny, sometimes undocumented, changes that break existing code, and often for no solid reason other than someone with commit access decided they liked the ‘new’ way better than the old way.
Back in 2001 (or perhaps 2002) the behaviour of get_object_vars() changed from only returning values for properties that were in the class definition to returning all of an object’s properties. IIRC, this was between 4.2.0 and 4.2.1 (or 4.1.1 and 4.1.2, I can’t recall that far back). The change was reflected in the online docs months later, but the release notes had no mention of this change at the time. This hurt a project demo for us at the time, and took a while to track down the exact cause.
I’m seeing this same thing play out again today with the recent PHP 5.2.5 release. IliaA saw fit to ‘fix’ behaviour in the glob() function’s safe_mode operation by returning FALSE if glob() is used on a directory which falls outside the accesses allowed under safe_mode. No, correct that – I think it’s trying to catch access outside of the open_basedir setting. That’s my understanding of the issue, at any rate. I’m following this because my brother has been working with Magento, built on the Zend Framework, and there’s was some issue raised on their boards about Magento not working with PHP 5.2.5. People were claiming it’s a Zend Framework issue, but digging deeper it’s because the behaviour of glob() changed in PHP 5.2.5. (ZF has some issues here and here on it already). Interestingly, this report indicates that in PHP 5.3 (devel branch I guess) it’s either ‘fixed’ or works as it always should have.
This link is the CVS diff which seems to be where the breaking change was introduced, and was committed with a comment indicating this was fixing a ‘regression’ (which was probably this issue). Digging deeper, this apparently is the culprit:
no_results:
if (PG(safe_mode) || (PG(open_basedir) && *PG(open_basedir))) {
struct stat s; if (0 != VCWD_STAT(pattern, &s) || S_IFDIR != (s.st_mode & S_IFMT)) {
RETURN_FALSE;
}
}
array_init(return_value);
return;
}
So, if safe_mode is on and open_basedir is set, and some pattern matches (sorry, can’t parse all of this 100%), RETURN FALSE happens. glob’s documentation states that it returns an array, never ‘false/true’. The comments in the glob-area of dir.c itself state that
/* Some glob implementation simply return no data if no matches were found, others return the GLOB_NOMATCH error code. We don't want to treat GLOB_NOMATCH as an error condition so that PHP glob() behaves the same on both types of implementations and so that 'foreach (glob() as ...' can be used for simple glob() calls without further error checking. */
The comment here (the spirit of which has been in there for years) is that foreach(glob() as …) is something that should be allowed. Indeed, many apps have been written which rely on that, and moving to PHP 5.2.5, all these apps will need to be rewritten. Why? Because one person decided that this ‘oversight’ needed to be corrected in a minor point release? Given that ‘safe_mode’ thinking is being deprecated for future versions of PHP, why did we need to introduce BC breakage in the name of honoring safe_mode this late in the game?
If permissions or safe_mode or open_basedir settings or anything else prevent the access to a directory or file, glob() should still return an empty array. If a developer *wants* to investigate why the glob() returned empty, they can do further error checking with is_readable() and similar calls in the code. As it stands now, there’s yet more checking that needs to be done ahead of time now.
Bigger issue
The thing that gets me about all this is not *this* particular bug that was introduced, but the way these continue to happen. There’s a PHP QA team, but I don’t believe they don’t have much authority to prevent releases from going out. Looking at their website, they do not seem to acknowledge a PHP 5 existing (front page only mentioned PHP 4.4.x series of releases). And how would you write .phpt tests to toggle safe_mode and open_basedir restrictions at runtime? There’s only so much automated testing that can happen – automated tests probably couldn’t be written to catch a situation like this (if I’m wrong, please correct me).
PHP does not have one person (or even one small group) to review all of these decisions that happen, and it shows in the quality aspect of PHP. Ruby has Matz, Rails has DHH, Perl has Larry Wall, Linux has Linus, Python has Guido. The ‘scripting’ side of the development world mostly have one or a few core people who have final say over major and even minor directions for the language. PHP does not have that sort of singular vision. And even to the extent that large ideas are agreed upon, minor things like the issues above still get through as a matter of course (for another example, see the PHP 5.1 ‘date’ fiasco).
So, is this just me complaining or do I have some suggestions to fix things? Why, yes I do, thanks for asking.
No changes should be made to the PHP core without an issue being opened, either in the ‘bug’ tracker or some other issues trackerMany issues get discussed on the internals mailing list, then get implemented, but there’s no easy way for people to see the thinking behind what’s going on until after the releases are made. When we see the release notes, many ‘issues’ are addressed, but only some of them have specific ‘bug numbers’ attached to them which point to a chronology of the issue. The issue I raised above, the glob() issue, has no issue # attached to it in the PHP 5.2.5 release notes, just a mention of a ‘fixing a regression’. If there really is a regression, that should be logged someplace public, and the release notes mention it.
Associate every commit with an issue number
Ideally, commits to CVS would use the issue number(s) being referenced so that people looking at the issues could see exactly how the issues were fixed in the PHP codebase. As an added bonus, perhaps every change in CVS would need to have a .phpt test associated with it which demonstrates that the ‘fix’ works. That may be going too far, given the size and scope of PHP as a whole, but it’s a nice goal to work towards.
Restrict point releases (5.2.1, 5.2.2, etc.) to *only* deal with bug fixing
Any new functionality (new functions, changing the behaviour of functions, etc) should be reserved for the minor version changes (5.1->5.2, etc).
Perhaps PHP6 development would be a good place to start a new set of development practices, implementing these ideas and others that would help ensure a higher degree of quality in the core language. What do you think?
Reporting bugs and discussing existing bug reports in blogs is useless, use appropriate bug tracking system for that.
On your comment that it’s “impossible” to test ini directives, on the contrary (http://qa.php.net/write-test.php) –INI– can be used for this case. This is because a separate PHP interpreter is used for each phpt test case.
In this case, it sounds like ZF was cutting corners and not properly checking for directory readability before using glob. As you mention, a simple is_readable() before the foreach would tidy this issue up nicely. In fact, I would argue it’s preferable: how do you know an empty array() from glob is caused because there are no files to return, or there was an error accessing the directory?
There may be something wrong with PHP’s QA process, but I don’t think this particular example is a good one.
@Anthony: I was using these as examples, not trying to report on one particular bug. Indeed, it’s already been hashed out some in the bugs.php.net area. My point was that the way changes get introduced in to PHP is flawed.
@Edward: I was not aware that you could programmatically change the ini settings on all values in the phpt system. That is interesting. Is it just doing ‘ini_set’? Or is it launching a new PHP interpreter with specific INI settings? Based on your comment, it’s the latter, but the qa page you pointed to doesn’t seem to indicate that for sure.
@Edward again, regardless of whether or not you’d argue something is preferable, introducing breaking changes in to a minor point release without any bugs reports (or at least, any reference to the bug reports in the release notes), not updating the documentation, and going against both years of documentation and the own stated goal in the comments right above the code you’re changing is exactly the sort of thing I’m talking about which makes dealing with PHP so frustrating. A bug report should have filed about this at the very least, or a separate public RFC about this change. It’s a major breaking change to a function’s core behaviour, and it was just changed for *no good reason*. Rephrased – a reason that wasn’t good enough. Code that worked in 5.2.4 doesn’t work in 5.2.5
These are the sorts of things which cause hosts to not want to upgrade, and in general give PHP a reputation of being flakey. I think this is a perfect example of what’s wrong with PHP’s “QA” process. I put that in quotes because I’m not sure any QA was done on this particular issue – how could it have been, if there was no issue logged for anyone to know anything about it?
Thanks all for your feedback so far. My goal is not to report on a specific bug, but ideally to get people to think about the entire development process and perhaps (just maybe) influence a bit of change in that process. Unlikely as a direct result of my ramblings, to be sure, but some of these ideas might filter around enough to embed their way in to new dev processes on the PHP team.
About qa.php.net, it only shows the current focus. When there is a PHP 5 release candidate, it’s shown there. You can use the Wayback machine to check about this.
@Marc:
Thanks – I realized it might have sounded like I thought qa.php wasn’t aware of php5 or something like that. I know PHP5 comes with phpt tests. Thanks for the clarification.
@Edward:
re: error checking. In PHP5.1, “GLOB_ERR” was introduced as a flag to pass to glob(), which would have the effect of stopping execution if a directory was unreadable. With that behaviour having been around for *2 years*, I just can’t see a good reason for changing that behaviour, unannounced, especially in a point release.
http://bugs.php.net/bug.php?id=28649
makes it seem that someone has been wanting to change this behaviour for awhile.
I think they are good points that Michael Kimsal says.
I haven’t really been affected by this kind of problem, but revisions should not break applications unless someone was using some bugged way a function was working or a loop hole as a security vulnerability which has then been fixed.
It’s a good idea that every commit should have a bug number to go with it (it’s something I’ve been striving for in the projects I work on). Using something like Trac might be nice for that.
At the same time one of the great things about PHP is it’s ability to quick fix bugs and to add new features within minors or even revisions. I don’t think a more structured way of working would break that. I don’t even think that much would change.
I personally find PHP a very strong and stable language. I’ve come across less regressions and bugs in PHP than I have with any of Microsoft’s languages. But QA can always be tightened specially for revisions.
Thanks James. I think they’re good points too.
Few people are ever affected by any one change, but the point is with so many little changes, and without an overarching and unified way to track them, they will continue to affect many small pockets of users differently all the time. I believe that by adopting a more structured process (perhaps for PHPv6) we can take a good platform and continue to make it even better.
It’s a blog with a valid complaint. I got stung by the realpath() change a while back which wasn’t pretty either
.
Granted these pass the mark of becoming a majority concern, but they do cause problems and it does take development time to figure out what’s gone and jumped off the cliff.
@Edward: Is it cutting corners to rely on historically correct expected behaviour? I wouldn’t think so…
The problem with PHP are 2 fold, first the lack of a unstable build versus stable build, second the amateurish handling of their API.
When you change a certain function in a popular programming language you know that you are going to cause a lot of problems for large groups of programmers using that function.
The proper way to deal with that is to make a second function with a slightly different name incorporating the changes while leaving the original function alone, then advertise the new function and slowly phase out the original one over a longer period of time to make it possible for all the programmers that use that function to replace it with the newer one.
Then when you see from feedback that the original is hardly used you take it out, not sooner.
To keep developing a programming language that has become very popular you need to develop a quite conservative mindset.
It is *not* cutting corners to read and fully understand PHP’s documentation about a bug, and go further and actually read the source code to a function and understand that it *always* returns an array.
Why would you even think that you might be cutting corners and that you should make your code more robust? I could say the same thing about any function? Why are you simply running substr()? Don’t you know that in the next version that will have it return -1 on error sometimes? You are a bad programmer if you don’t check for proper types all the time in a loosely typed language.
Talking specifically about glob() is stupid because you are assuming that the programmer has not already checked for readability. Perhaps they have double checking for readability right after the foreach(). Perhaps it’s a “cron”ed shell script and needs no such fancy user-level error checking. You don’t know.
The heart of the matter is that changing the return type of a function in a point release is awful.
^ you cannot argue that that is good ever. You are changing the language, you are changing a library. A point release, by most people’s definitions include only enhancements and fixes, not changes to the language.
Here’s PHP’s versioning system
MajorNum.MinorNum.PointNum
MajorNum = Changes every decade or so when the developers learn new OOP concepts.
MinorNum = Becomes 2 when the major version is stable.
PointNum = Random changes that break your code and provide no real speed improvements of security fixes that people care about or haven’t already fixed in there code because their host won’t upgrade because of the random changes.
Antony Dovgal
“Reporting bugs and discussing existing bug reports in blogs is useless, use appropriate bug tracking system for that.”
Um… read the article. It basically says that there is no appropriate place for PHP bug tracking. Or, don’t read the article and try to submit a bug about this glob() issue. You will get told in about 2 hours that it is bogus and you shouldn’t waste people’s time by submitting bogus bug reports.
Edward Z. Yang
“In fact, I would argue it’s preferable: how do you know an empty array() from glob is caused because there are no files to return, or there was an error accessing the directory?”
You would be arguing against the original intent of the autor(s) of glob(). In the source code they say that they want people to simply run a foreach() on the returned value all the time. In fact, that is a proper method of error testing in my book. All these people that want to change PHP to make it more like Java so that I have to test the type of every value before and after I use it…. go use Java people.
@Antony Dovgal
“Reporting bugs and discussing existing bug reports in blogs is useless, use appropriate bug tracking system for that.”
Wait a minute. Which bug are you talking about? No one here is talking about a bug, unless you consider all of PHP to be one giant bug.
This is how PHP works. There was no bug number associated with this change. This change is a change and that’s how the function works now, the documentation has bee changed somewhat to reflect this change. There’s not a bug in site.
If you don’t know that this is how PHP works, then you’ve only been using PHP for about a year or 2. Breaking, undocumented, un-announced, and historically divergent changes in point release is what defines PHP. It’s been this way all of 4 and all of 5. I don’t think there were enough functions to really break return values, etc. in 3. And I won’t pick on 2′s development style, no matter how bad it was.
So, I ask again. Which bug are you talking about? I don’t see one here.
I dug in to http://us2.php.net/manual/en/migration52.incompatible.php and noticed that the 5.2.5 ‘glob()’ issue is mentioned there. archive.org is down right now for me, so I can’t check if this was there from release date or added more recently. Guess I’ll need to start checking this page as well in addition to (or perhaps instead of) the release notes for future versions.
Let’s be honest. The bigger issue is that PHP has made a lot of poor and inconsistent design choices during it’s history. Hence the need or desire to break backwards compatibility.
One current scenario: The README.UPDATE_X_X file within php-src gets updated, and this where the documentation steals the migration information from. This file (or the documentation version, if it’s online) is linked to from the 5.2.x release announcements. Here’s said file from the 5_2 branch:
http://cvs.php.net/viewvc.cgi/php-src/README.UPDATE_5_2?view=markup&pathrev=PHP_5_2
As for the aforementioned glob() example, both the docs and readme were updated Oct 01, 2007. And the 5.2.5 release (Nov 08) announcement does link to it… and the README file provides the affected source code, and links to the bug (#41655):
http://cvs.php.net/viewvc.cgi/php-src/README.UPDATE_5_2?r1=1.1.2.39&r2=1.1.2.40&pathrev=PHP_5_2
I’m unsure why the 5.2.5 changelog does not include a link to bug #41655 but will assume it’s an oversight.
So why isn’t there anything in PHP equivalent to the Python “Enhancement Proposal” process (http://www.python.org/dev/peps/)? (or is there?)
It’s somewhat telling that the official python web site is a .org, while the official php site is a .net. The python folks seem a bit better ORGanized.
Of course, there are similar examples of blatant disregard for backward-compatibility in the Java world; not from Sun itself but from frameworks such as Tapestry. Probably one of the main reasons people have avoid Tapestry in the past (see http://tapestry.apache.org/tapestry5/ where at least they freely own up to this issue and promise to do better going forward… time will tell! – but the damage has been done and it’s unlikely Tapestry will become very popular at this point).
I believe the lack of concern shown by the PHP dev team for breaking existing code reflects the short term thinking of most PHP projects. I’ve mentioned before that PHP is a perfect language for Web development in part because most Web projects only last a few years at most. If PHP wants to be taken more seriously as an option for long term projects, they will have to address this issue, so bully for you, and keep up the good fight, Mike!
@philip:
Thanks for pointing that out. That helps mitigate the glob() issue a bit, but still does not address why this change was made in the first place. I will still maintain (and it seems others agree) these sorts of breaking changes are frustrating and annoying, *especially* in a minor point release.
If the previously existing behaviour was truly looked at as a ‘bug’, then a bug should have been filed about it’s behaviour. The proper way to have handled #41655 would have been to have had the function return an empty array, not a FALSE boolean. This broke years of glob() behaviour for not a very good reason. There was already a way to deal with the issue of open_basedir() restrictions – GLOB_ERR. If it was felt that GLOB_ERR wasn’t strong enough, then another workaround that doesn’t break backwards compatibility should have been discussed in the open with some sort of enhancement request proposal (as MPS above noted wrt python).
http://bugs.php.net/bug.php?id=43407 this bug report, filed in November, brings up the issue of the FALSE return type not being correct, and it’s listed as ‘bogus’. Ilia’s response is hard to read as anything other than a bit arrogant.
No it should not because that allows glob() open_basedir/safe_mode
bypass by allowing the user to determine that the directory/file exist.
We specifically return FALSE so you cannot tell if the data exists on
disk or not.
The fact that in 2007 changes are still having to be made to functions to implement interpretations of open_basedir restriction behaviour and safe_mode demonstrates how poorly done the underlying PHP filesystem access code was done years ago, but that’s really a topic for another discussion.
Having a bug tracker link to CVS diffs (by having all CVS commits include a bug/issue number) would have made catching something like this perhaps a bit easier. I’m not saying it *would* have been caught, but the change would have been associated with something more publicly visible, and more likely to get caught. Yes, this particular glob() fix was associated with a bug number, but all we see in the bug tracker is ‘fixed in CVS’, not HOW it was fixed, or where the fix was made (what files affected, etc.). This is an issue I have with the Grails community as well, not solely PHP.
So, bugs get fixed, but in ways that may introduce breaking changes, because there’s little oversight as to HOW the bugs get fixed. And the ‘breaking change’ is now considered ‘bogus’ because ‘it is how it is now’. What hope is there of getting this ‘fix’ rolled back so that PHP 5.2.6 is ‘correct’ (as in how it worked for years before – no access to something results in an empty array) if bug reports on this issue are treated as ‘bogus’?
@mps – I’m not sure it represents intentional short term thinking as much as highlights lack of structure and control over the process. I’m not sure people are explicitly thinking “Ah, I only care how this affects me in the next 3 months” when they commit. But by the same token, there’s no official process for how to deal with these sorts of ‘small’ issues that can break BC. Should there be a voting process? I wouldn’t be against that, as long as ‘breaking’ changes didn’t happen until the next minor version (5.3 instead of 5.2.x). In this particular case it seems someone had a more stringent view on how safe_mode should behave, which completely ignored how issues of permission violations had already been set up to be handled. Public discussion about this before the release might have helped avoid this particular issue.
For those who say “you should have checked the glob() argument for readability first”, I posit this. How do we know that in the future someone won’t get the idea that is_readable() should thrown an exception if used on a non-readable directory based on safe_mode being on or off? Given the state of PHP’s internal use of exceptions, that scenario is probably not likely, but I wouldn’t put it past the core team. There might be some discussion on ‘internals’, and then the change would get committed, and there might be a small backlash/outcry after the next release, but honestly, can you tell me that scenario sounds IMPOSSIBLE? I didn’t think so…
I agree with this blog post.
I’ve been using PHP since 4.1.2 (2002) and I have the same impression about their release process. I got burned a few times by point release upgrades. It’s not fun.
PHP feels amateurish and unstable. That’s the way I perceive it after using it for about 5 years.
I switched to ruby, including web development (but not rails, i use my own framework and combine it with lighter varians like rack or ramaze)
I dont miss php, but I must tell everyone who is afraid of switching – do so side-by-side slowly. Sooner or later I am sure your knowledge in ruby will excel the one in php easily.
@Mark:
“If you don’t know that this is how PHP works, then you’ve only been using PHP for about a year or 2.”
Antony Dovgal aka tony2001 is one of the PHP devs, and his comment is banter, I think.
i love that you ended your post with very specific solutions, as opposed to open-ended complaining and hand-wringing. cheers.
If they are using point releases (ie. 5.2.1) to introduce functionality, I will be the first to say that is idiotic beyond measure.
Please tell me the PHP dev team does not use point releases to introduce functionality! I thought the whole development world knew that point releases are only bug fix releases.
@Robert:
To be fair, I don’t think most of whatever functionality introduced in point releases is intentional. Indeed, it’s not so much that *new* functionality is introduced, but that existing behaviours are changed, often without good reason. That was my bigger point I was trying to make.
@Robert & mgkimsal:
I can assure you that functionality is introduced in point releases. That was an explicit goal of the 5.1.3 release, for instance. (And that version was also immediately superceded; this time, because $_POST was broken!)
If I could go back and do things again, we would have lighttpd and cherrypy instead of mod_php. PHP is just that bad.
看了你的文章,感想颇多.我本来想学PHP的,现在态度也改变了
As I can’t read Chinese, I really wasn’t sure whether to approve the above post or not, but it seemed harmless, and may make sense to Chinese readers.
———
By the power of babelfish, the above post translates to (roughly)
“Read your article quite a lot, the feelings I originally want to study PHP, now the manner also changed”
xland, don’t completely give up on PHP, but be careful when using it and upgrading between versions. It’s still a powerful language with a lot of good. However, if you are starting fresh with programming, there are other good choices too. Ruby, Python, Groovy, C# would all be great learning options. Good luck. Xie xie.
There was actually a regression in a security related fix in the glob function. In some really rare cases, glob has to return false to notify an error.
As stated in http://bugs.php.net/bug.php?id=41655 the regression has been promptly fixed (on my request) and is available in the latest release.
@pierre:
why does glob() *have* to return an error? Either it’s always returned an error in some circumstances and the docs have never been in line with the functionality, or this was a new idea about how unreadable conditions should be handled. As I wrote earlier, the code in the dir.c file itself clearly states the intention is to return an empty array if something is unreadable.
Had the move been made to have this throw an exception, I certainly would have complained as loudly, but it would have at least been a PHP5 piece of functionality being used (exceptions).
No, this was a change made without considering backwards compatibility, existing documentation or the original intentions of the glob() function. It would have been perfectly fine to return an empty array in unreadable or error conditions. A concerned developer can do other checking on why the array is empty after the fact *if they want to*, but one should not be forced in to multiple layers of defensive coding. If this is the ‘master plan’ – requiring is_readable checks before every file system call, then enforce it all over the place in one fell swoop. Do not implement it as-and-when over the next several years, and publicly state that this is going to be done. A public forum about planned changes – large and small – for PHP daily development needs to be put in to place to facilitate this sort of advance notice. If people had to start putting their names to intended backwards compatibility breakages before they even occurred, it might bring some more structure to the PHP development world.
Mike,
I can’t agree with your sentiments more. Being the primary person that interacts with our Systems Administrators, I often advise them of new PHP releases. At one point in time, we would run on the bleeding edge of releases. After a number of borked releases that resulted in existing sites collapsing, my SysAdmins have became very gun shy of any new PHP release. As a result, I tend to wait months (which generally means another release has happened) before I push for an update on our servers. It seems to be a constant thread that has happened for as long as I can remember (since ’97). As alluded to in so many comments above, features should be limited to larger releases, functionality and backward compatibility should be the paramount of any release.
I have also found that this sort of problem (lack of acceptance of new releases) drives the “stay with PHP4″ type mentality. Those that maintain services to clients can’t “stay stable” if the releases that are coming from the PHP core developers aren’t exactly that…. stable.
i love that you ended your post with very specific solutions, as opposed to open-ended complaining and hand-wringing. cheers.