[ale] J2EE vs PHP

Ronald Chmara ron at Opus1.COM
Thu Mar 11 19:12:10 EST 2004


On Mar 11, 2004, at 4:57 PM, Michael Mealling wrote:

> On Thu, 2004-03-11 at 16:30, Ronald Chmara wrote:
>> On Mar 11, 2004, at 10:34 AM, Fletch wrote (and quoted):
>>> You might want to check out this paper.  I've quoted the conclusion
>>> paragraph below.  And I trust there's no confusion about what I'd
>>> sugguest instead of either of your sugguestions. :) (mod_perl
>>> underneath either HTML::Mason or Template Toolkit, of course)
>>>
>>> Experiences of Using PHP in Large Websites
>>> http://www.ukuug.org/events/linux2002/papers/html/php/
>> PHP is quite
>> easily scalable, if one simply takes the time to learn *how*. Same as
>> most other languages....but trying to scale perl in the *same* way as
>> PHP in the *same* way as ASP in the *same* way as J2EE? That way
>> madness lies. Each environment scales differently.
> Do you have any pointers for what the proper way to scale PHP is? And
> scale doesn't just include TPS rates. Scale includes complexity as 
> well.

Separate the datastore cluster from the presentation cluster (the easy 
one), code apps/pages in such a way as to make each page independent 
from other data, keep complexity per page down by avoiding code 
bloat...

PHP works best like unix (many small pieces put together as needed,, 
when needed) as compared to a modern shelfware application (all the 
pieces shoved into large binaries, tracking everything at once). Like 
any other environment, it's scalability can be compromised by doing it  
"the wrong way". Perhaps it's easier to point out past environments 
I've seen that hobble PHP... most issues seem to come from familiarity 
with other programming environments and workflows.

Problem 1: The mega-load. Programmers accustomed to building monolithic 
apps include all the PHP code in one file, or a group of include'd 
files pulled up on each page. This makes perfect sense for a monolithic 
app (compile once), or with some mod_perl and python setups (only 
recompile as needed), but with PHP, what happens on page 10 in a 
workflow isn't needed for page two, and, PHP "recompiles" every loaded 
PHP and HTML file on every page hit (Read that again if needed... it 
can be a jaw dropper). What eventually happens is that every function, 
every presentation routine, winds up getting reloaded and recompiled 
every single time.... I've seen sites where an entire 10Mb of site code 
was being loaded, and reinterpreted, on every single page hit (ugh.). 
The solution is simple, but runs counter-intuitive to how many 
programmers may be accustomed to thinking: Only load, on each page hit, 
the code blocks needed for a given page.

Problem 2: The ?berfunction. This is related to problem one. Rather 
than writing small, tight, exclusive code blocks, programmers tend to 
group similar functionality into a single location. So, rather than 
having 15 or 20 different pieces of code defining different 
"data-access techniques", a single lookup on, say, a user id, may call 
functions (that must be reloaded, and recompiled!) for mySQL, 
PostgreSQL, flat-file parsing, RSS, and 
every-other-data-type-under-the-sun. Solution:
Segment out code like mad, not just into routines/functions, but 
separate files, and only load the code/files that are needed. If you're 
just doing a Oracle query, why also load and recompile any custom mySQL 
query functions, or for that matter, custom Oracle update functions?

Problem 3: Not everything is a portal/Not everything deserves to be 
abstracted into loadable modules. This is the opposite extreme of the 
above. In an effort to do great design, I've seen programmers who have 
managed complexity by abstracting virtually everything. So, rather than 
a one-liner piece of PHP code to display the system time being put on a 
page, there's a data gathering component (to get the time), a data 
formatting component (to format the time), a component managing module 
(to keep the time component separate from other components), a module 
determining the page relationship of the component (to indicate where 
the component belongs on a page), and finally, a presentation module 
(to draw the page). Not all abstraction is good.

Problem 4: Web Applications vs. the desktop. Applications that have 
non-deterministic paths are totally different from web pages. Add in 
the PHP "every page is an independant, compiled, application" issue, 
and it's easy for programmers who are used to thinking of desktop apps 
(or other web environments that continue the desktop metaphor) to get 
painted into a corner. Rather than consider each page as a standalone 
application, it can be tempting to think about (and use), PHP (or any 
other web programming environment) as if all of the features are 
somehow deeply related and interlinked. Each page is only related to 
every other page based on the arguments given... that's it. The 
resulting problem is that programmers are trying to manage hundreds of 
variables, functions, and other constructs, and losing sight of the few 
web pages (the last one, the current one, and then next one) that 
actually relate to what they're doing.... this leads to overkill 
session management systems, complex juggling of functionality, 
meta-meta-coding, etc. The solution to the three-page-problem almost 
requires a wholesale shift in thinking, but it totally changes the 
dynamic of how "complex" a web application is..... The page shown to 
the user is a page based on arguments from the last page. No need to 
track the user's home address if all they're doing is looking at an 
online product brochure....If they want to order a printed version of 
that brochure, get/track the address at that time. Only actively track 
the essentials for each page, store and ignore the rest.

So, I guess to summarize, PHP's scalability strengths are in creating 
an army of little, independent, things, each being called and used as 
needed. The reason people have problems with scaling PHP up often comes 
from trying to treat PHP as if an n-page dynamic site is one big 
application or document, rather than a bunch of little components that 
are called only when needed. A 10Mb code compile per page hit really 
drags down speed, and makes each page filled with 10Mb worth of 
variable complexity and possible namespace clashes...... Try to then 
span that 10Mb across servers, passing data through shared memory or 
temp files or other local techniques, and you might as well make the 
funeral arrangements now. <g>

>> IMNSHO:
>> When one needs highly rapid feature implementation and version
>> iteration.
>> When one needs to keep hardware overhead down.
>> When one's application is primarily web-and-db-centric.
>> When one is not bound to using a specific programming style.
>> When one is going to be using pre-existing PHP code bases to build 
>> from
>> (or with).
>> When one doesn't want their job to be farmed to India. ;-)
> Can you speak to other specific concern about the ability to have
> multiple programmers developing modular systems with PHP? In most of 
> the
> situations I've seen the number of developers and the complexity of the
> pageflows requires blackbox modularity. It would seem the single
> namespace issue would create havoc if you had more than a small handful
> of developers.

It certainly can seem that way, if one thinks of (or works with) 
namespace as site-wide, rather than page-wide. PHP doesn't care what 
namespace is used on any other web page. Every page is protected from 
every other page....

The rope that many folks hang themselves with is making the namespace 
much bigger than it needs to be (bulk includes, excessive functions) or 
by avoiding databases for storage (and as a result, carry around tons 
of data (with used variable names) from page to page), or, in some 
cases, just plain bad PHP coding habits (nested for $i = 0;;{} loops, 
or functions named things like "get_it(), or too-simple variable names 
likely to be re-used by even the same coder in the same page ($name, 
$price, $data). (yes, I've actually done this to myself)

> I actively use both J2EE and PHP on the same machine and any PHP system
> just seems to me to be more 'fragile' the more complex the system gets,
> especially when it comes to attempting to extend the application....
> Just a personal impression, though...

Maybe it's just that as people code certain ways, certain systems 
appeal more to some than others.... *shrug*

-Bop



More information about the Ale mailing list