How Web Servers Work and what solutions and technologies are available

Saturday |

It is entirely possible that you have used internet in the past, or indeed you could be using it now. But there are few of us who consider the whole process from the time one clicks "enter" into a browser to the time the page requested makes its way right into the computer screen. At the very basic level, it can be explained that the process involves some kind of interaction between the web browser and a remote server. After someone has keyed in the web address, or URL (Uniform Resource Locator, which invariably looks something like http://www.blahblah.com) into a browser, the information is unlikely to be within that computer, so the browser will request for the page from a remote server which then “serves” the page (hence the name “server”) back to the browser. But that explanation is rather an oversimplification of what goes on “behind the scenes” to make available the web page requested within so little time. There is an intricate and independent sequence of events that combine to make the internet what it is today-headquarters of both information and misinformation!

This is an attempt to explain that intricate process, and because we fully understand that past attempts to do so have mostly failed, we will try to be simple, and speak some bit of English as well. The article will proceed on an assumption that the readers know next to nothing about web servers. Perhaps the appropriate point of departure would be to explain what a web server actually is. The term actually denotes two terms, a hardware and software. The hardware part refers to the computer or machine that is used to store the information, and the software refers to the program that runs inside the machine and which is responsible for processing requests from web browsers. The description of what a web browser is perhaps should appear in another article designed for that purpose; otherwise this article might be accused of lyrical digression if it does that.

If we go back to where we started, when someone types or clicks onto a link, several things happen. The web browser divides the URL into three parts- the address, path name and the protocol. The software installed in that particular server initiates a process of data transfer between the browser and the server itself, using the appropriate communication protocol. Communication protocol can be HTTP, an acronym for is hyper Text Transfer Protocol. The browser, after dividing the URL, then communicates with the name server, commonly known as the Domain name Server (DNS), to interpret the domain name and turns it into a numerical value known as the IP address which reveal the site’s true address in the web. After that is accomplished, the web browser may then choose the protocol to be used. The protocol is usually HTTP, or FTP-file transfer protocol. If the protocol to be used is “HTTP’ then the web browser is accordingly informed that the internet user wishes to retrieve information through web port 80, which is a port used for web page communications. FTP protocol binds the internet user to get the information through port 20. Discussing these ports would be onerous for thee are hundreds of them, but for this purpose it may suffice to say that this is a system devised for easy location of servers on the internet by a body called Internet Assigned Number Authority, and that there is nothing sacred about these ports. It’s just customary to use them as they are. The software installed in a server will is premised on the kind of the operating system installed on the server. Examples of the software mostly used in servers include Microsoft’s’ Internet information Server or UNIX.

After the browser, working with DNS establishes “residence” of the web site, it then sends a request to the web server to give it the web page requested. The specific page requested is normally determined by the specifications that come after the web address, which is after .com or .net whichever the case. So if the URL looks like http://www.blahblah.com./about .asp, the path is that part that reads “about. asp”, which basically is the specific page that the internet user wants to see. If the page is available in the server, then the server will find the requested file and run the suitable scripts, and if necessary, exchange cookies (these are small pieces of information sent by a server to a browser in order to perform certain tasks, e.g. to access a page that requires passwords and usernames) then send that page to the browser using which is usually in HTML, and the web browser formats the page into a readable/viewable one. And if the page contains images, the browser sends additional requests in order for these files to appear in the screen. In practice, it’s common for one web page to be processed after the sending and granting of 5 or more requests from a server. If the page is non-existent or for some reason cannot be accessed, the server then sends“error” messages, which so many of us are familiar with back to the browser.

That in a nutshell covers most of it, unless one is an IT student. Some might ask after that tutorial, which is the most popular web server? The web server market is virtually a duopoly between Apache HTTP server, or simply Apache, and Microsoft’s Internet Information Service (IIS). Other up-and-coming web servers worth mentioning are lighttpd, and Google web server. But these are small fish here. Apache once held a near-stranglehold in the server market, holding about 70 percent of the market share as at November 2005. Although it still holds a significant lead over its main competitor, it has seen its market share drop to about 50 percent as at December 2007.

Comparing the two web servers isn’t easy, going by the vast array of features they have, and to an untrained eye, the exercise would seem like comparing a hybrid engine to a diesel fuelled one because the two systems operate quite differently. Apache, unlike ISS, happens to be offered on an open source platform, though the Free Software Society deems some of its modules to be incompatible with General Public License. Apache definitely has the edge when it comes to popularity and as well as a rich tradition, and it is credited with playing a major role in the rise of the World Wide Web. Apache supports various features, including several programming languages including mod_perl, mod_python, Tcl and PHP, and many of its versions are modular in structure. This permits users to choose modules that are appropriate for their requirements. Virtual hosting can operate in such a manner that different websites can be hosted from a single installation of Apache. Apache 2.0 is now available on many platforms including windows. ISS on the other hand limits the extent to which one can customize functionality. ISS availability has been limited to windows environment, and ISS 6.0, an earlier version, only supported windows server 2003. However ISS 6.0 was a major improvement on ISS 5.0 which appeared hopeless against internet-savvy worms such as Code Red and Nimda. ISS 6.0 was installed with “locked down” mode settings as default settings and this helped curbed worm attack. No major attacks were reported after ISS 6.0 was introduced.

The latest version of ISS is ISS 7.0, which almost rewrote everything from the earlier versions, and has made some bold changes on modularity. In this version, only the binaries needed are installed reducing the chances of attack on the web server. The other strong points of ISS 7.0 is its easiness to scale out brought about by its simplicity in configuration which are based on distributed XML files, and this makes deployment in large-scale web hosting facilities easy and quicker.

When it comes to choosing the right web server, the debate will go on and on about which one of the two is ideal. Microsoft’s ISS certainly finds favour with the Fortune 500 companies, with majority of them using it, whereas many top internet companies seem to think that Apache is the way to go. Each of the two rivals offers compelling reasons for their use. Apache’s distributed configuration feature is called .htaccess which is a powerful tool that makes it possible for the configuration of a site to be overridden using text file in the content directory. But alas, using the feature may cause problems and in Apache’s own website, it is recommended that one avoids using it altogether. ISS on the other hand, also does support distributed configuration in web.config format. But suppose you also want to override the document for a site using ISS; here is where the difference is. The setting will be stored in the web.config file by default. Clearly, there are considerations one should make, depending on the circumstances one is in, and the choice is not always dependent on the price. For instance if you choose a system which is free but is unfamiliar to you or our staff, you may see overheads going up in the form of training costs. Plus in future one may find that he may need to hire experts to update or configure the system, further driving up the costs of what was otherwise “free” software. If one runs, a cost conscious operation but with the right support, them Apache would be the natural choice. For most small enterprises with no money to hire support, but with enough money to buy proprietary applications, then ISS would be the choice for them. ISS is almost free if one buys Windows operating system. Apache is available as a free download, and comes with many Linux distributions. It’s not all about ISS or Apache, but like the article hinted earlier, there are other web server options. Sun java System Web Server is also available for downloading without paying a dime, but only for developing, testing and staging needs. Actually, the production list for Sun Java System is about $1,500 per one Central Processing Unit. There is also Zeus technology web server available for roughly the same size as Sun Java system, but for two CPU's. But this might require an administrator who is not faint hearted, and further, if one wishes to deploy on windows, one may have to change plans as Zeus system does not allow that. Choosing the right web server system is actually a balancing act between your needs, administrative capabilities as well as one's organization's skills.

From the foregoing one can spot some of the weaknesses of each web server as implied herein. For instance, ISS is only available for windows, and until Microsoft paraded its latest version, only Apache was modular, thereby limiting ISS on customization. One might find himself confronted with high maintenance and training bills for running Apache. Thus it can be said none of these web servers is perfect and each of them has got its strongest points as well as its weaker ones.

The other topic for discussion in this issue is the web application server. Web application servers are often confused with web servers. But these workhorses are differentiated from web servers by the widespread use of server-side content (content that is generated by the server) and constant integration with database engines. In other words, these are middleware that connects software applications with the web servers. Actually upon closer scrutiny, the definition of the term “application server” has evolved. In its strict sense, an application server manages the connection between a client and server-side applications. Nowadays, the term has come to refer things such as development tools, business intelligence tools, data integration tools, e-commerce and personalization services and such things. Here is how they work. When a user, though his browser, requests a file that web application servers usually processes, the web server relays that request to the web application server which then processes the request and then ultimately sends the results back to the server. The web server then returns the results to the browser. The application servers serves to “relieve” web servers some of the work. Maybe it can be looked at this way. Requests to a single web server might run into thousands or even hundreds of thousands, this might serve to bring the web server into a virtual standstill. By doing all this “donkeywork” the web application server’s gives web developers time to concentrate on building more interactive and data rich websites which have functionalities such as generating flash application data as well as creating ecommerce websites. Today, there are over 20 web application servers, and they are commonly referred simply as application servers, perhaps to reflect that these are used in internal networks, and according to industry watchers, many of them nowadays use Java technology, apart from traditional lone-rangers such as Microsoft’s Windows Server 2003.

There are Java application servers, a JSP and servlet runner. J2EE is the other application. There are products that don’t fit into these two classifications, and they are normally known for their frameworks or language. They include Macromedia’s ColdFusion and Apple’s WebObjects. The recurring theme in all these is interoperability despite maintaining their server products. When it comes to the price, the gulf can be wide as the Grand Canyon. There are application servers that cost nothing and there are those that cost thousands of dollars per CPU. But between these two extremities, it’s possible to find application servers that are reasonably priced especially for small businesses. But whether one chooses free goodies or spends stacks of cash on these products, one thin remains certain; there is multitude of choice enough to make one spoilt for choice. This issue will attempt to compare the various application servers available in the market, but this exercise can never be regarded as definitive, but it’s a good start.

Apache Tomcat 5.x is a servlet runner, and doesn’t support many of modern features of commercial products. This is an open source web application server and is especially favored by small businesses. Caucho Resin is another application server, but with a price tag of about $1000 per server, it is not for the faint hearted. But actually it performs better than most servlet engines in major areas. Then there is Sun Java Application System that was formerly known as Sun-Netscape Alliance iPlanet. It offers a compatible platform to developing and deploying Java Web Services, naturally. It is offered on different platforms, platform 8 being free whereas platforms standard 8 and enterprise 7 costing anything between $2000 per CPU and $10,000 per CPU respectively. Zope 2.7 is another popular application server. Written in Python language, it is designed for creating content management systems. To its credit, and despite being offered on an open source platform, Zope is a powerful application platform. Macromedia’s ColdFusion is also a formidable application server. Applications can be compiled to Java and deployed on J2EE servers. It has another advantage of being an easy to manage, and has good integration with Macromedia development tools.

1 comments:

gaurav said...

Excellent post. Some great informations to be absorbed from this post.I like your writing style very much.thanks for sharing with us.keep blogging. domain and hosting services companies

Post a Comment