Web Capacity Planning
Midterm Questions

Guidelines:

Except where described otherwise, for this assignment you may not share information. You are to work out the answers to these questions on your own, using the text and supporting materials for the class. This should be turned in at the start of class on Monday, June 28.

1. (5 points)

A web server was monitored for 1 hour. During this period 7200 HTTP requests were processed. A software monitor measured the following values for the utilizations of the CPU and two disks: 30% for the CPU, 35% for disk 1, and 40% for disk 2.

  1. Calculate the service demands at the CPU and the two disks.
  2. What is the minimum value for the response time?
  3. What would be the new value of the CPU service demand if it were replaced by another four times faster?

2. (15 points)

The disk of a web server receives requests at a rate of 25 requests/sec. An analysis of a trace of the disk accesses shows that 10% of the requests are for random blocks and 90% are for sequences of blocks of runs. The block size is 2048 bytes. The average run length for this workload is 10 requests. The disk rotates at 7,200 RPM, has an average seek for random requests equal to 9 msec, and a transfer rate of 20MB/sec. The controller time is equal to 0.1 msec.

  1. Calculate the average disk server time by approximating the disk utilization using Little's Law.
  2. Use the iterative approach discussed in Chapter 3 and the Excel workbook, ServTime.XLS, to refine the service time value for the sequential workload.

3. (5 points)

A transaction processing system was monitored for 1 hour. During this period, 5400 transactions were processed. What was the utilization of a disk that has an average service time equal to 30 msec and that is visited 3 times on the average by every transaction?

4. (5 points)

The average delay experienced by a packet when traversing a computer network is 100 msec. The average number of packets that cross the network is 128 packets/sec. What is the average number of packets in transit in the network?

5. (5 points)

An NFS file server was monitored for 60 minutes, during which time, 7200 requests were completed. The disk utilization was measured to be 30%. The average service time at this disk is 30msec per file operation request. What is the average number of accesses to this disk per file request?

6. (10 points)

Consider the following data collected from a system:

Job CPU demand Disk demand Printer demand
1 3 sec 15 sec 45 sec
2 2 sec 24 sec 1 min
3 1 sec 20 sec 1 min 30 sec
4 1 sec 1 sec 0 sec
5 6 sec 8 sec 0 sec
6 4 sec 18 sec 35 sec
7 2 sec 40 sec 3 min
8 5 sec 5 sec 0 sec
9 1.5 sec 30 sec 1 min 15 sec
10 7 sec 9 sec 0 sec

Convert this data to a form usable in a clustering algorithm. Be sure to justify any scaling technique used or outlier values removed.

7. (15 points)

The following scaled data has been collected from sample system:

Job CPU demand Disk demand
1 .25 .55
2 .1 .6
3 .5 .6
4 .85 .7
5 .15 .4
6 .3 .15
7 .65 .75
8 .2 .45
9 .45 .1
10 .8 .9

Using the K-means algorithm, find the number of clusters that offers the best trade-off of complexity and accuracy. Show your work.

8. (15 points)

Using the minimum spanning tree algorithm, cluster the data from the previous problem into the ``best'' number of clusters from above. Compare the resulting clusters to those you obtained using K-means clustering. Show your work.

9. (25 points)

For this exercise you may use either the http log file from the text CD, the log file collected from a local web server for a 24 hour period, or another http log file that you have access to. Analyze the log file to obtain the following:

  1. Calculate the burstiness parameters (a, b) for the log file. You can use the program provided on the text CD to help.
  2. Graph the data in the log file using time for the X axis and bytes for the Y axis, similar to the graph in Figure 6.8. Construct at least three graphs with different time slots. (You may want to try slots of 5 minutes, 1 minute, and 10 seconds, but other slots sizes may work just as well or better.) Make observations about the graphs.
  3. Using any reasonable clustering algorithm, characterize the workload in the log file into clusters, similar to Table 6.19 from p. 149 in the text.
For this exercise you can consult each other on the use of tools or code to perform the analysis, but each of you must do your own work. You may use Excel or another spreadsheet program, or Unix utilities such as awk and gnuplot to help you. Turn in a printout of any code that you write and an explanation of any tools or utilities that you use.


Last updated June 15, 1999.