Computer Science/Computer Architecture

[Computer Architecture] Performance

LeeJaeJun 2023. 12. 30. 23:38
728x90
반응형

Computer Performance -> Time

  • Response Time(latency): time to do the task
    • The time from the program start to end.
    • How long does it take for my jot to run?
    • How long does it take to execute a job?
    • How long must I wait for the database qeury?
  • Throughput: tasks per unit time
    • How much work do you get done at any given time?
    • How many jobs can the machine run at once?
    • What is the average execution rate?
    • How much work is getting done?
  • Replacing the processor with a faster version -> generally, both of response time and throughput will be improved.
  • Adding more processors -> It doesn't affect latency, but throughput will be improved.

 

Execution Time

  • Elapsed Time
    • counts everything (dish and memory accesses, I/O etc.)
    • a useful number, but often not good for comparison purposes.
  • CPU time
    • doesn't count I/O or time spent running other programs. (This is because cpu is not used while accessing the disk)
    • can be broken up into system time, and user time. (The time when the OS called by a system call uses cput is system time)
  • Our focus: User CPU Time
    • time spent executing the lines of code that are in out program

 

Definition

  • Perfomance = 1 / Execution time
  • X is n times faster than Y  -> performanceX / performanceY = n
  • Execution time = seconds / programs = (cycles / program) * (seconds / program) = (CPU clock cycles / program) * clock cycle time = (CPU clock cycles / program) * (1 / clcok rate)
  • Clock ticks indicate when to start activities.
  • Clock cycle time = time between ticks = seconds per cycle
  • Clock rate (frequency) = cycles per seconds (1Hz = 1cycle/sec)
  • CPU clock cycles / program = Instructions / program * Clock cycles Per Instruction(CPI)
  • Execution time = Seconds / Program = (Instruction / program) * (Cycles / Instruction) * (Seconds / Cycle)
  • CPI = average # of clock cycles per instruction = Clock cycles / Instruction count (IC)
    • Affected by the cost for each instruction type(ISA) and the frequency of different instruction(Inustrction Mix, program)

 

How many cycles are required for a program?

  • Could assume that # of cycles = # of instructions (This is because, in genenral, one insturction ends within one cycle. Accurately, this assumption is incorrect, different instruction take different amount of time on different machine)
    • Multiplication takes more time than addition.
    • Floating point operations take longer than integer ones.
    • Accessing memoery takes more time than accessing registers.
  • Changing the cycle time often change the number of cycles required for various instructions. (This is because, in general, one innovation ends within one cycle)

 

MIPS(Millions of Instruction Per Second)

  • MIPS = Instruction Count / (Time * 10^6) = Clock Rate / (CPI * 10^6)
  • How many instructions can you perform in a second.
  • Higher MIPS generally perform better.
  • Flaws in using MIPS
    • Machines with different instruction sets?
      • 1 struction for 1 sec VS 10 instruction for 1 sec
      • Doing 10 instructions per second might look better. However, it may be that you performed one powerful instruction per second and a simple 10 instruction per second. Is it a better machine just because they do lots of instructions? For example, when 10 times of adding and 1 time of multiplying 10 times are the same, it is not unconditionally better to do 10 times of adding.
    • Programs with different insturction mixes?
      • It depends on how different instructions are mixed, as in the example of addition and multiplication above
  • MIPS is also used for marketing, but it is not related to actual performance.

 

Performance

  • # of cycles to execute program?
  • # of instructions in prgoram?
  • # of cycles per second?
  • average # of cycles per instruction?

Common pitfall: thinking one of the variables is indicative of performance when it really isnt.

 

Understanding Program Performance

Hardware of software component Affects what? How?
Algorithm Instruction count, possibly CPI The algorithm determines the number of source program instructions executed and hence the number of processor instructions executed. The algorithm may also affect the CPI, by favoring slower or faster instructions.
For example, if the algorithm uses more divies. It will tend to have a higher CPI
Programming language Instruction count, CPI The programming language certainly affects the instruction count, since statements in the language are translated to processor instructions, which determine instruction count. The language may also affect the CPI because of its features: for example, a language with heavy support for data abstraction (e.g Java) will require indirect calls, which will use higher CPI instructions.
Compiler Instruction cout, CPI The efficiency of the compiler affects both the instruction count and average cycles per instruction, since the compiler determines the translation of the source language instructions into computer instructions. The compiler's role can be very complex and affect the CPI in complex ways.
Instruction Set Architecture Instruction count, clock rate, CPI The instruction set architecture affects all three aspects of CPU performance, since it affects the instructions needed for a function, the cost in cycles of each instruction, and teh overall clock rate of the processor.

 

Benchmarks

How do you measure and report performance?

  • Performance best determined by running a real application.
    • Use programs typical of expected workload
    • Or, typical of expected class of applications. e.g) compilers/editors, scientific applications, graphics etc.
    • Difficult to find how to speed up benchmarks (i,e. analysis too difficult)
      • If a benchmark is too simple, you can analysze it and specialize in that benchmark to pretend to improve performance.
      • So, you have to make it hard not to do that.
  • Small benchmarks
    • nice for architects and designers(esp. in the early stages)
    • easy to standardize
    • easy to implement
    • can be aubsed
  • Report: reproducibility
    • It should be reproducible. In other words, when running the same program on the same computer, the same result should be obtained.
    • The benchmark contains not only information about the program, but also what input to give to the program.
    • This is because the time(result value) varies depending on the input.
    • Therefore, information such as input set and envirionmental options for running the program must all be included in the benchmark to be reproduced.
  • Instead of reporting only one program, you should test various programs and then report whether the grades are good or not.
  • Normally, when you report, you report with a relative value. With the use of the arithmetic mean, the result can change completely depending on what is based on. Therefore, Geometric mean is used when reporting relative values.
    • For an extreme example, geometric mean also has problems
    • Drawback1: does not track execution time.
    • Drawback2: rewards all improvements equally.
      • It evaluate that both the reduction of 100 seconds to 50 seconds and the reduction of 1 second to 0.5 seconds are equally reduced to 1/2.

 

SPEC(System Performance Evaluation Cooperative)

  • 5 companies have agreed on a set of real programs and inputs(Apollo, HP, DEC, MIPS, Sun) in 1988
  • SPEC89: normalized to a VAX-11/780
  • SPEC2000: normalized to a Sun Ultra5_10 300MHz
  • SPEC2006
    • 12 integer benchmarks(CINT2006)
    • 17 floating-point benchmark(CFP2006)
  • If you create benchmarks with different programs, it's hard to compare performance one-on-one, so you collect them all and standardize them into one benchmark.
  • SPECratio
    • For each SPEC benchmark program p(1<=P<=N), measure Tp.
    • Compute geometric mean
    • But
      • It is hard to say that the workload included in the SPEC ratio is the same as the workload to be used in the design.
      • As mentioned earlier, there may be a problem with Geometric mean itself.

 

Power

power = Capacitive load * Voltage^2 * Frequency switched

  • There is a capacitor in the transistors in the processor that stores electric charges.
  • The more complex the hardware(the mor transistors you put in), the more capacitive loads.
  • Currently, the processor consumes too much power, causing a lot of heat and problems. Therefore, it is being upgraded to reduce power by reducing voltage while maintaining the clock no longer.

 

Amdahl's Law

  • Execution Time Unaffected + (Execution Time Affected / Amount of Improvement)
  • Make common case fast
  • Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much we have to improve the spped of multiplication if we want the program to run 4 times faster?
728x90
반응형