Performance Analysis of the Apache Web Server

 

By:

Ann Chen

Bozidar Dangubic

PROJECT OBJECTIVE

The Objective of the project is to analyze the performance of a web server -APACHE- by using WebStone as the workload generator. The Web Server will installed on Pentium III 500 MHz computer running Linux operating system. The workload generator will be installed on three separate workstations, all three Pentium II 500 MHz computers also running Linux operating system. The WebStone workload generator has the ability to mimic several different clients requesting documents at once. This ability will be utilized in this project to generate heavy workloads that one might experience in the real-world system. In this environment (One server and three clients) Apache web server will be started as a process. Then, three client machines will mimic more than one client generating requests at one time. The data is collected and stored in the log files. Several other applications (HTTPAnalyzer, LogMaster, etc.) will be running at the same time which will help to analyze the data after the simulation is completed. The data collected will be used to build a model of the system. At the system level, the system will be modeled as a finite population, variable service rate performance model. This is chosen because real-world web servers are finite population - finite number of simultaneous connections - and variable service time - each request is processed concurrently with other processes and therefore service rate is variable. At the component level, the system is modeled as a multiple-class closed model where classes are formed on the basis of the file size. Using the log analysis software and system performance monitors, we will gather workload characterization parameters which will later be used to compute the various performance measures. After measurement is completed and model is verified, we will perform workload forecasting on the system, based on the model developed, and we will simulate and measure again and compare the data to verify the predictions that were made are accurate.

SYSTEM SPECIFICATIONS

As stated in the Project Objectives, the Apache web server will be installed on Compaq Pentium III computer with the processor speed of 500 MHz. This machine will server as a server in the system. There are also three Compaq Pentium III computers with the same processor speed. These three computers will serve as clients in the system. All four computers in the system are running Linux operating system. They also all have 126MB of main memory and 9GB of hard drive space. There are located in the computer lab in the Engineering Hall and have fabricated IP addresses and hence are not connected to other machines in the engineering network and are only currently used for some experiments.

MEASUREMENT METHODOLOGY

The Apache web server will be started as a process on the server machine. Then, the WebStone workload generator (which will be installed on three client computers) will mimic more than one client on each of the three client machines thus generating a sufficiently robust workload. The HTTP log will be used to collect the data about the workload to the system. In addition to HTTP log, which will be automatically generated by the web server, we plan to use the following Linux operating system performance and monitoring tools:

In addition to Linux operating system performance monitoring tools, we might also utilize some third party software for analyzing the HTTP log file. Two applications in particular, HTTPAnalyzer and Log Master, could prove to be very useful in analyzing the data. However, even if they are used, they will not take part in the measurement of the system workload.

WORKLOAD CHARACTERIZATION METHODOLOGY

We plan to construct an operational analysis model, a system model and a component level model. At the system level, we will use the finite population, various service rate model and the component level, we will use the multiple-class closed model, as explained in the Project Description document. The classes will be divided by various file size. Since we plan on perform our study from the server standpoint, we are interested in measuring the following parameters of the system.

  1. The arrival rate (request per second)
  2. System throughput in HTTP operations per second or in bits per second depending on the type of workload we have.
  3. Latency at the server (Response time) in transaction per second
  4. Error per second
  5. The size of the file retrieved
  6. Component-wise measurement: e.g utilization of the CPU and utilization of the disk.

We want to predict the system performance when the arrival rate increases and when the requests involve large file size. We also want to predict the performance of the system when components of the system are updated.

PERFOMANCE PREDICTION METHODOLOGY

We want to predict the system performance such as system throughput and the response time of requests as the system parameter such as number of connection per second and number of client the server services changes. We will investigate the impact of the change in these parameters on the overall server performance. The data collected will enable us to use one or all of the three forecasting techniques - regression method, and moving average, and exponential smoothing technique - which are all described in the book.

VALIDATION PROCEDURE

After we build a model from the data collected on the system and perform system prediction using one of the forecasting techniques described in the book, we intent to again measure the system using the data from the forecasting model. Using this validation procedure will enable us to gain accurate information on the correctness of the forecasting methodology. We will also analytically try to prove that are model and prediction data are valid.

COST MODEL

There will be no cost model associated with this project.

LINKS TO IMPORTANT SITES AND RESOURCES

- Apache Web Page

- WebStone benchmark description and download

- Web Capacity Planning class web site