Guidelines:
Except where described otherwise, for this assignment you may not share information. You are to work out the answers to these questions on your own, using the text and supporting materials for the class. This should be turned in at the start of class on Monday, June 28.
A web server was monitored for 1 hour. During this period 7200 HTTP requests were processed. A software monitor measured the following values for the utilizations of the CPU and two disks: 30% for the CPU, 35% for disk 1, and 40% for disk 2.
The disk of a web server receives requests at a rate of 25 requests/sec. An analysis of a trace of the disk accesses shows that 10% of the requests are for random blocks and 90% are for sequences of blocks of runs. The block size is 2048 bytes. The average run length for this workload is 10 requests. The disk rotates at 7,200 RPM, has an average seek for random requests equal to 9 msec, and a transfer rate of 20MB/sec. The controller time is equal to 0.1 msec.
ServTime.XLS, to refine the
service time value for the sequential workload.
A transaction processing system was monitored for 1 hour. During this period, 5400 transactions were processed. What was the utilization of a disk that has an average service time equal to 30 msec and that is visited 3 times on the average by every transaction?
The average delay experienced by a packet when traversing a computer network is 100 msec. The average number of packets that cross the network is 128 packets/sec. What is the average number of packets in transit in the network?
An NFS file server was monitored for 60 minutes, during which time, 7200 requests were completed. The disk utilization was measured to be 30%. The average service time at this disk is 30msec per file operation request. What is the average number of accesses to this disk per file request?
Consider the following data collected from a system:
| Job | CPU demand | Disk demand | Printer demand |
|---|---|---|---|
| 1 | 3 sec | 15 sec | 45 sec |
| 2 | 2 sec | 24 sec | 1 min |
| 3 | 1 sec | 20 sec | 1 min 30 sec |
| 4 | 1 sec | 1 sec | 0 sec |
| 5 | 6 sec | 8 sec | 0 sec |
| 6 | 4 sec | 18 sec | 35 sec |
| 7 | 2 sec | 40 sec | 3 min |
| 8 | 5 sec | 5 sec | 0 sec |
| 9 | 1.5 sec | 30 sec | 1 min 15 sec |
| 10 | 7 sec | 9 sec | 0 sec |
Convert this data to a form usable in a clustering algorithm. Be sure to justify any scaling technique used or outlier values removed.
The following scaled data has been collected from sample system:
| Job | CPU demand | Disk demand |
|---|---|---|
| 1 | .25 | .55 |
| 2 | .1 | .6 |
| 3 | .5 | .6 |
| 4 | .85 | .7 |
| 5 | .15 | .4 |
| 6 | .3 | .15 |
| 7 | .65 | .75 |
| 8 | .2 | .45 |
| 9 | .45 | .1 |
| 10 | .8 | .9 |
Using the K-means algorithm, find the number of clusters that offers the best trade-off of complexity and accuracy. Show your work.
Using the minimum spanning tree algorithm, cluster the data from the previous problem into the ``best'' number of clusters from above. Compare the resulting clusters to those you obtained using K-means clustering. Show your work.
For this exercise you may use either the http log file from the text CD, the log file collected from a local web server for a 24 hour period, or another http log file that you have access to. Analyze the log file to obtain the following:
awk and gnuplot to help you.
Turn in a printout of any code that you write and
an explanation of any tools or utilities that you use.