Measures of Effectiveness & Personnel Evaluation Related to Software

In order to further the following objectives:
– Preventing racial and other forms of non-relevant economic discrimination
– The advancement of meritocracy, and through it the advancement of nations
– To advance the basis of law on actual facts vs. non-specific speculation, thereby removing waste and accelerating economic progress

I will briefly, for the generalist audience, lay out my understanding of measures of effectiveness and their consequences for personnel evaluation in software-related industries. I am going to tell you:
– What “Software Work” Is
– The Relationship of Innovation to Software Work
– How Software Can Be Made Better/Faster/Cheaper
– How Can Software Work Be Usefully Measured
– What We Can Say About the Effectiveness of Software People
– How We Should Execute Personnel Actions Related to Software People

I will talk about some of the allied disciplines like mathematics, electrical engineering, and so forth, but simply to illustrate the concepts involved.

In this regard, there are several key points that I want you to remember:
– Most of “software” work is not about software at all, but about knowledge discovery, transmission, and validation. This is the cause of most the problems you hear about software. Most of the work is not about building a bridge, but about getting people to agree on how to cross the river, and then trying to figure out how well you actually crossed the river, given that you just came up with a bunch of synthetic measures of what you really wanted, instead of a repeatable template.
– The high expense of people capable of general software work, the project-based nature of much software work, combined with generally ineffective means of deep quality inspection and the high cost of testing many of these systems, means that we cannot usually assess, within economically feasible grounds, individual performance at a level of detail beyond competence to perform a specific function, and work that meets the general character of what was wanted.
– Software as such has been well defined for more than half a century. There is nothing innovative about the practice of software, or even some of the allied engineering disciplines. The innovation comes from trying to do something new, for which you need software to support, and so the software happens to do something new, even though its method of construction was entirely orthodox. Only novel algorithms can qualify as a major advance in any socially beneficial sense.

– What “Software Work” Is

In conventional electrical computing systems, software is an assembly of electrically-distinct data points, conceptually known as “bits” of information. (Non-conventional systems work mostly the same way.) When a computer starts, or when it loads a program, these bits are pushed into the electrical lines of a microprocessor, which interprets the bits as instructions. For example, the bits for “do nothing” may be 00, while the bits for “add” may be 01. Along with the bits that are interpreted as instructions, the computer pushes bits into other lines that are interpreted as data; i.e. the letter A might be 01100001; the number 1 might be 00000001. The processor then routes this data through hardware circuitry that performs the function of the instruction; the 00000001 and 00000000 would add to 00000001 in conventional arithmetic. These results are then pushed back to the memory or disk for further computation, or they may be pushed to the screen for you to look at, or they may be pushed over the network to command some device.

The process of the final assembly and delivery of software is just assembling a collection of these instruction bits into a file on the hard disk that, when loaded, will perform the desired function on the hardware it’s running on, with the data it’s supplied. To re-state it in systematic terms:
– Identify the functions to perform
– Identify the hardware and computing platform (to include software not modified as part of the effort)
– Write the instructions to implement the functions on the computing platform
– Verify that the software does in fact perform the desired function

Even at this most basic level of analysis, you can already see that the physical writing of instructions, what we think of as “programming”, is only one of four tasks. Industry professionals and scholars suggest that the actual writing of instructions tends to take something on the order of one-sixth of the development cycle of the software; it’s nowhere near the majority of the time. Hence, most of software work is not even spent writing software.

Identification of the functions to perform is just another way of saying the general value identification and high-level process selection problem. In software, people call this “requirements definition” and use lots of constructs to define what the process inputs and outputs should be. I won’t waste your time on how exactly this is done, because you understand from the creation and destruction of companies in the free market that figuring out what people want, and what the best way to do that is, are tremendously difficult unsolved problems. However, bear in mind that because of this difficulty, there are almost certainly going to be a number of errors or omissions in this process, that will have to be fixed later. If the problem is simple, the software developers may be able to do this themselves; otherwise, they will have to rely on swingmen or subject matter experts to interpret it for them, and then have to load all of that information into their brains in order to write and test the code.

Identifying the hardware and computing platform, to include critical software dependencies, is the phase often termed ambiguously and non-exclusively as “systems engineering”. Doing this process efficiently requires recognizing the economic and performance tradeoffs between buying complicated/sophisticated hardware, and the burden or limitations it puts on the software. Consequently, even though the majority of the functions might be in software, the outcome of the effort critically depends on a good selection of hardware with which the software interacts. If the effort is simple, the software developers may be able to perform the market study, negotiation, performance evaluation, and product selection themselves; otherwise, they will have to employ hardware engineers to assist them in selecting and in working with the products.

Writing the software itself is usually fairly straightforward; the programmers select a programming language, translate the instructions they intend to implement into that language via mental mapping, then type in the programming language instructions into a text file. Then, they run a program called a compiler that maps the programming language instructions into the actual bits required to run on the processor, then packages everything up nicely. However, sometimes this gets complicated if extremely high performance relative to the selected hardware is required. In this case, the programmers may have to use low-level instructions, sometimes writing their own machine-code instructions, multi-threading and multi-process organization of program execution, and machine-specific algorithm optimization. All of these activities add cost and time to the effort.

Finally, the code within the overall system has to be tested. The cost and steps involved in this process vary widely based on the time to market, the cost incurred per defect, and the desired level of quality. For simple tasks like data migration, it may be good enough to run the program against the data, and if it spot checks, it’s good enough. If the software is controlling a complicated satellite, which can’t be manually serviced, has a high associated cost of failure, and for which the costs of updating the software may be prohibitive, the economics may dictate software unit testing, subsystem testing, full-system testing, with human inspections of all code, automated inspections of all code, with all possible execution paths testing. This is pretty expensive and time-consuming. If the programmers are involved in this effort, then they have to understand how the system as a whole is supposed to work, and how to operate it.

Part of the testing is defect identification and removal; in the case of software, this involves something called “debugging” or identification of the defect’s root cause. This step can be simple if the software is in a high-level language and has no real performance optimization. However, if the software is performance-critical, in a low-level language, implementing complex behavior, with highly optimized and therefore hard-to-understand code, this may take a very long time, particularly if the programmer wasn’t the original writer of the code. The key problem is that because computers execute so many instructions, and their order of execution is unpredictable, the instructions that were actually executed are not measured by default. Furthermore, even in a logging mode, the performance-critical systems can’t log all their instructions and still run in a reasonable amount of time, so that their original behavior can be replicated. Hence, measuring what the software is doing is an engineering process based upon the performance impact to the code under test; consequently, it is usually done using manual engineering analysis and labor, which makes this process expensive. In some cases, defect removal is so expensive that it is easier to re-write or re-structure code to reduce the number of defects, instead of trying to figure out the root causes and then fix then. (As an aside, consider: if the programmer can’t even inspect all the executed instructions, how can they claim to know what the software is really doing? How can anyone claim to know what it’s really doing?)

Hence, a typical software developer HAS to understand:
– How the software processes implement the system functionality
– How to translate the hardware requirements and system functionality into code
– How the system works well enough to be able to identify and fix defects

Thus, we started with the statement that this was software work; but actually, you see that the developer has to understand the system, its hardware, and how to operate it, in addition to direct manipulation of the software code. This means that most developers have to spend a substantial portion of their time on general knowledge and not on their specialized area of expertise. Since they are not experts in these other areas, they will be slower and make more mistakes.

– The Relationship of Innovation to Software Work

The core concepts in digital software theory were systematized back in the 1930s-1960s. The key aspects were:

– Mapping a system inputs-outputs specification (“truth table”) to gate-layout/instruction sequence automatically (but non-performance-optimized; optimal performance attainment is still very difficult for large circuits)
– Understanding of computing complexity and computational power; what a general purpose computer is and what is feasible to compute on it
– The basic concepts of algorithm development and run-time performance analysis (although by no means have all useful algorithms been discovered)
– The basic toolbox of engineering approaches to software problems: compilers, inter-thread cooperation, data structures like stacks and trees, the basic algorithms and problem-solving approaches such as greedy algorithms. Though no one usually mentions this, this included trade-offs of abstraction and computing assurance such as memory protection vs. raw computing performance

These subjects are considered the basis of formal computer science, allied electrical engineering, and information processing education and research. They are the foundations of competent practice of software work.

People often abuse the terms innovation and technologies to literally mean anything. Anything someone does with the slightest bit of variation is “innovation”, any machine of any sort is its own “technology”. This abusive practice obscures what is economically useful about innovation: that you find a new process or way of doing things that has substantial tangible economic benefits, which is sometimes referred to as “new technology”. In reviewing what was already created through the 1960s, you can see that any circuit can be realized at some performance level using the techniques that existed at that time. 99% of what software programmers actually do is applying or using engineering trade-offs on these basic concepts. 1% or less of programming work concerns creating new algorithms that greatly increase the speed from these simple implementations. An even smaller fraction of that is creating algorithms that solve problems that couldn’t be solved before. The real advances in computing over the last 40 years came with the electrical engineering, mechanical, material, and manufacturing innovations that enabled massive increases in the speed of instruction processing and the density of information storage and retrieval.

Thus, while much of software work could be considered useful in some way, from the perspective of identifying economically useful innovation and not just an engineering instance, very little software work meets this criteria.

– How Software Can Be Made Better/Faster/Cheaper

In the previous section, I outlined that through competent performance of work, any system behavior can be realized at some default level of performance. However, software is sometimes completed with exceptional quality, development speed, or low cost. There are books such as Rapid Development by Steve McConnell that go into great detail on how to optimize performance of software projects, particularly from a project management perspective; I do not repeat the below here. Rather, I am going to review the core drivers of performance and exceptional success on projects, and, for each one, I’m going to explain why these things don’t always (or ever) happen. This will allow you to directly understand what the evaluation of software developer performance is truly based upon.

– Improve the system specification, so that you don’t miss requirements that require more effort to add later, don’t implement functionality that’s not really needed, or incorrectly state a requirement, which will require rework when the error is discovered during testing. The software industry is trying to move towards this using “agile” and “spiral” development, but their success at actually implementing this is limited. The fundamental problems are that even the end-users may not know what they want, and the imperfect communication that results. Other than finding a way for one or two end-users to become true experts, the best way to handle this problem is to move the developers as close to the end-users as possible, to include sitting in the same rooms, and actually having the developers do the end-users’ jobs. People don’t do this because they have families and friends near their current location, because moving next to the end-user will put them in a war/hot zone, because they switch projects on the order of once a year, and because the moving costs are considerable.
– Find a new algorithm that improves performance or accuracy/precision. However, this requires expertise in the area, usually lots of mathematical or scientific knowledge and specialization, and a lot of time. Hence, it’s not viable, and in some cases not even possible, for most projects.
– Find a cleverer way of doing things that avoids having to write modules to handle certain behaviors. This requires understanding how the system is supposed to work, how it can work, what the specification is, what the customer values, and then the understanding of the possible implementation alternatives. Naturally this implies a significant amount of expertise and time spent on the project.
– Re-use code. This often doesn’t happen because of licensing costs and license provisions that make it impossible to use the already-developed code. There is also a significant amount of work involved in taking a module initially developed for one purpose and then adapting it to a new one, along with fixing defects that are now a problem in the new system. If the code base is large, bringing a lot of other code in can reduce performance. For smaller modules, it isn’t worth it.
– Re-use whole programs. Same deal with re-using code.
– Simplify your code and/or development approach to reduce the number of conditional branches, so that testing is cheaper. This is fairly easy to do, and applies in a number of situations, but less-skilled programmers don’t understand this concept and prefer designs that add branches, indirection, and abstraction.
– Don’t write abstract code with modularization and flexibility, just write code that takes the input and processes it to the output. Doesn’t work in all situations because the specification calls for configurability, portability and so forth. Also, programmers have been socialized to prefer modularization and flexibility, so they won’t like it when you order them to do this. Finally, sometimes programs get so big and complicated that abstraction becomes the only way for humans to reason about the operation; this is probably, but not always, systems engineering screwup.
– Reduce task-switching within role. Easier said than done, since younger developers need mentoring, new people on the project need guidance, managers need information, and organizations need to train and verify compliance. Plus, you will suboptimize your leadership pipeline, which is sometimes a critical resource.
– Write code in a language better suited to the task. This applies both to higher-level languages like SQL, Ruby on Rails, when writing web applications, but also applies to lower-level languages, like writing device-specific and timing-critical code in C. This requires experienced software people to understand the trade space.
– Write code in a language your organization knows, and for which you can find qualified workers. This doesn’t happen because programmers like to write new programming languages for no good reason; there are literally a dozen languages in widespread use, and maybe hundreds more used in specialized situations. Consequently, the fragmented landscape and your own organization’s knowledge base tends to prevent this type of efficiency.
– Limit the programming languages in use in your organization. Same problem as previously.
– Write the correct level of optimization for code the first time. This is difficult because efficiently understanding the performance problems, in a specific application processing a specific set of workloads, usually requires detailed profiling, which requires you to have written the code already. Experienced software developers can help less-experienced ones choose appropriate algorithms and approaches to the problem to avoid obvious mistakes.
– Use more qualified/trained/experienced better personnel. This is true, but is functionally equivalent to saying “work harder”, so it has all of those limitations.
– Use more specialized personnel and limit the number of generalists/swingmen. Since only one-sixth of software effort is devoted to actual heads-down coding, this approach doesn’t quite scale in relation to the staffing levels in a typical organization. If companies better shared their specialized talent, or if many more programmers worked as consultants, then this might work. However, there are considerable obstacles to this relating to the business factors such as project planning. Furthermore, employing this approach requires much higher-quality documentation and communication in order to prevent code from being developed that doesn’t meet the system’s specification. Most organizations don’t run with enough slack to allow that to occur. Furthermore, the extra documentation and communication are additional costs that have to be balanced against the specialization benefits.
– Manage software as operations instead of projects, to avoid task-switching and generalization costs. Doesn’t happen because once the software has most of its features implemented, maintaining it requires far fewer people, with different skill sets, to run than it did to develop the initial feature set. Further, the business value of software experiences the same marginal decline in utility with additional effort and features as with other products. Additionally, software needs are not anticipated far enough ahead to lengthen development duration to permit treating it as an operation; instead, a larger staff must be used to complete the task by the typical deadlines.
– Use good tools. Unfortunately, there aren’t many good tools compared to the average tools in use – and most of the work in the project is not spent on coding anyway, but rather on human interaction.

An excellent question: “can a machine be taught/machine learn to do all of these things better?” For certain tasks, once the requirements have been unambiguously and completely specified, in a language that, at that level of precision, is not that much different in difficulty and semantics from just writing the code. However, as noted above, most of the work is not actually programming. Even if we say that a computer can do all of the programming work itself (1/6 of the work), and that it writes defect-free code (eliminating 1/4 of the work, say), and has some other intangible benefits, then this perfect end-state paradigm only gets rid of around 40% of the total work. By contrast, the most efficient techniques for reducing business expenditures on software involve whole-program re-use and batch processing, which can deliver cost savings of 90% over greenfield/from-scratch development.

– How Can Software Work Be Usefully Measured

There are many not-useful measures, such as the lines of computer code written, because better/faster/cheaper approaches often result in writing less lines of code.

The best measure of effectiveness in software is whether the software parts of the system (the software process) close the actual real-life value gap (i.e. valid). This is not the same as the value gap that the effort was scoped to, or that people thought it was supposed to be. You can write software that does exactly what it was supposed to do, but that’s only a strong predictor of actual real-life value, not its equivalent. However, the validity measure of effectiveness may, and often does, take years to actually measure. Consequently, it is not useful for intermediate software process control.

In software, the verified (not valid, just that it meets spec) end-effectiveness is almost always measurable via test; put the inputs in, read the outputs, compare. However, because software implements complex behaviors, and because its failure modes are fairly subtle and complex themselves, the cost of brute-force testing is extremely high; it is often computationally infeasible given the number of inputs and combinations of inputs that can be tested.

Reviewing the stereotypical measures of effectiveness with relation to software:

— Full process validation: Very accurate as long as specification is valid. Extremely costly or infeasible.
— Risk-based sub-process testing: Very accurate – this is usually termed “unit test” in software. This often helps with infeasibility for full-process testing, Still fairly costly.
— End-product testing: Very accurate as long as specification is valid. Extremely costly or infeasible. The selection of end-product tests is usually done through statistical sampling; this sampling is usually very imprecise. Some people place additional constraints like all-branch testing to help improve the performance of the measurement.
— Sample testing: N/A, the software is easy to reproduce
— Model testing: Not useful in most situations because writing a model doesn’t help you understand the software better than working with the software directly. However, with certain algorithms, testing it as a model or incorporating the proposed approach into models early in development can identify issues that would be extremely difficult to measure or catch in the end-system or operating environment.
— End-user reporting: Works well, but often takes a long time to get feedback.
— Data mining/statistical analysis indirect factors that predict reliability: Not useful since you have to find a statistically valid sampling of defects first in order to give the analysis anything, and the analysis is just going to tell you that you might have defects. Consider also: without knowing the distribution of defects, how would you know what a statistically valid sampling of defects, which reflects the true distribution of defects, is? Normal distributions and other conventional statistical methods are based on continuous models of reality that don’t match the uneven and localized distributions of defects in large software programs. Furthermore, with the exception of multi-threaded code, the software itself is usually extremely reliable; it will usually always fail in the same way when given the same inputs.
— Expert opinion: If the expert spends a lot of time evaluating the code and the hours expended it on it, this can be accurate in assessing the general reliability of the code. If the expert doesn’t spend much time on it, then this is going to be an extremely coarse measure of whether the code is minimally acceptable or not.
— Non-expert opinion: Basically useless except in relation to measures of pleasing the user
— No measurement at all: Obviously useless, but surprisingly common due to the costs of useful testing

There are also a few research approaches, such as formal methods and symbolic execution, that have the potential to greatly improve the cost profile of thorough testing. However, the current techniques in use don’t work on large programs due to various scaling issues. Furthermore, if a program is complicated, a report of all possible execution paths may give millions of possible result combinations, which are not feasible for manual verification.

To summarize, measuring the operating characteristics of the software with high reliability costs a lot and takes a lot of time. Measuring with low fidelity, and huge uncertainty about the un-measured aspects, costs a lot less. Expert review, if the person is truly an expert, can determine the general reliability of the code at a cost roughly between the two extremes of testing. From a quantitative perspective, for a program with average programmed size of 100,000 lines of code, I would give the following rough estimates of the 2013 cost and schedule for these approaches:

– End-product test, low-density sample: $100,000, 3 months
– Expert review: with four people, $250,000, 3 months
– End-product test, all possible inputs: $10,000,000, 1 year

These numbers vary greatly based upon the input and output specification, as well as the internal construction of the software, which may demand more rigorous testing modes. However, you can see that full testing is just not economically viable, and that reliable expert testing isn’t fast or cheap. Even low-fidelity testing costs sizable sums of money because qualified developers usually have to be involved, and because user interfaces should be tested manually whenever they are initially developed or undergo significant modification. (User interfaces are validated against how a human uses them, after all.)

The accuracy of code to specification and to real-life value gap is not the only thing that can be measured. Actual productivity in generating a specification, correct selection of hardware and software environment for development, speed of mapping specification to code in the environment, and speed/quality of testing are all related to the productivity of software workers.

Productivity in generating a specification is difficult to measure for the following reasons:
– Re-doing the work with the same customer may not even be physically possible. For sure, it is economically ineffective, because you do the same work twice, which is the risk you seek to avoid.
– As with software, the amount of information in the specification may or may not be an accurate measure of the actual effectiveness of the specification in terms of modeling the problem statement or in communicating the required outputs.
– Since the inception of the specification is at the beginning of the development, it has the same problem of long feedback time to validation in production environments.
– Other than the end-user and subject matter experts double-checking the specification, there is no way of measuring the effectiveness of the specification prior to implementation other than building a simulation or model of same-output, but lower-performance, than the final product. Then, you have to find a way of reliably testing this model, which, as you have read, is fairly expensive.

Correct selection of hardware and software environment cannot be verified against a “known-good” set of outputs, since this is based on ever-changing technical capabilities and business environments. The only way to know the hardware was the best value is to re-do the analysis. The only way to know if the infrastructure software will work is if you write code that exercises it – but you don’t have the code in hand to fully exercise the infrastructure software until you actually implement most of the code. The hardware performance can usually be simulated to a certain extent, but again, at considerable additional expense. Even if you undertake these expenses, if the hardware or software fails, what does that mean? Does it mean that the initial selection was done improperly? Does it mean that the vendor just screwed up, or does it mean that the premises of the business relationship are invalid? If you are gambling to try and hit a tough schedule date, then if the selection failed, did you make the wrong gamble? If you used a prediction technique to try and estimate the logistical or manufacturing performance, is your technique demonstrated to be suboptimal based on one failure? Two? Three?
The ultimate measurement for this systems engineering aspect therefore is not the end result, but rather the process that was followed – but how is this process validated, and known to be optimal when applied to your situation? Very hard problem; in many ways, this is the same problem as the general statement of how you want to run your business.

Speed of mapping specification to code in the environment can be measured when normalized against the results of testing, i.e. how many function points were completed, how fast does they run, with how many defects. However, this requires you to assign multiple developers to the same task in order to derive an accurate measurement for that task. You can try to compare different developers on different tasks, with the resulting loss in accuracy and precision. You can consult well-known software engineering estimation models for how long it should take; then you have to try and map the fudge factors onto your specific problem. Then you have to put together enough samples to make the test a valid indication of general performance, as opposed to heavily skewed by the specific problem, subject matter area, part of the code, programming language, project, and other factors that significantly impact coding speed. Since there are often 10-1 or 3-1 differences in programming productivity between developers, with enough measurement and normalization, a roughly accurate conclusion can be reached about performance. However, due to the large amount of variables that must be controlled in order to yield reliable results, that’s not an affordable measure of effectiveness for everyday work; it can really only be used to make hire/fire/assignment/training decisions when developers greatly differ in this area.

I mention the testing again simply because the amount and efficiency of programmer test naturally influences how much time they spend coding, and the quality of their work products.

– What We Can Say About the Effectiveness of Software People

From the above, you already know the answer: not a whole lot, at least not within usual economic constraints. The only attributes of the end-product that are roughly affordable to measure on a regular basis – that is, measuring without having to re-do the work all the time, and effectively “buying the risk” of underperformance – are the rough conformance to specification and the rough quality of the code as measured by defect rates through testing and expert review. This is the industry definition of “competent developer” – can they write code that does the job, isn’t riddled with defects, and didn’t take them a horrendous amount of time to write?

Without large gaps in performance when doing similar tasks, it’s hard to say which developers are faster or slower at this in the actual environment, vs. during synthetic tests. This is particularly true when your developers are specialized – as most of them are, whether it’s by skill set, development environment, mix of tasks and responsibilities, or by project type or area. In this case, you know which ones are going to perform faster at given tasks; how are you going to normalize/adjust for that?

How good are your coding speed tests? Are they big enough and long enough in duration to identify productivity gaps in specification through design through testing, or is it a toy-program coding contest that measures less important aspects of raw coding speed? How much are you willing to pay to know? To know within a reasonable error bound?

As for expert review…are your experts calibrated? Do you know what they can find, or how often? Or are they your normal developers, who are moonlighting from their normal work, and just looking for coding standards violations and sub-optimal aesthetics? I’ve suffered through my share of that – and I’ve had several situations where something passed a code review and turned out to be totally inadequate. Do you ever check your experts from time to time?

Sample testing can be performed on the total work output, and is good for ratting out incompetence, but how often are you sampling? Is that enough to determine the technique a person uses during trade study or specification elicitation? Is that enough to determine their efficiency at a given task within a 50% bound? What is it worth to you?

Now, there is one thing that is easy and relatively quick to measure – competence. When someone is incompetent, you’ll know because the work didn’t get done or isn’t getting done in the case of longer tasks. There are many technical sub-specialties where people have expertise, and this is easy to measure or to recognize in practice. Demonstrated knowledge can be a decent proxy of this, if the actual skill required to implement systems based on that knowledge isn’t high. For example, demonstrated knowledge about the problem domain and about estimating theory and technique is a good predictor of skill in estimating, but isn’t very predictive of skill in interpersonal relations.

So is there a possibility to measure exceptional, vs. competent, performance in role, at the time that it happens, and not simply expertise or high competence in challenging technical areas? If a person develops an industry-recognized new advanced algorithm, yes. If they succeed where other generally competent people have already failed, yes. If a person produces double or triple the output of functions for a long period of time over other developers who spend the same amount of time on these types of tasks, then yes. (Of course, you have to be measuring functionality output, but this is economically feasible via cursory manual inspection if your measurement threshold is on the order of 3x productivity.) If they write the same amount of functions as other developers, but write an order of magnitude fewer defects, yes. Again, these are all very gross measures, that can be detected informally, or in the normal course of business. All of these make an assumption of comparable tasking either to the industry, or to peers in the organization, which assumes that you are measuring the level of output, AND tracking the types of tasks that people work on in enough detail to separate how much time is spent on each of these low-level functions. In other words, this type of measurement is impractical for generalists/swingmen/tech managers who split their time amongst many efforts, and amongst many different types of tasks in those efforts.

To summarize:

Your people are incompetent if they can’t get their work done at anywhere near the same rate as your average line developer.
Your people are competent if they get their jobs done roughly on time, and their code mostly works.
Your people are exceptional if they develop new algorithms, succeed in situations where others have already tried and failed, if they can do work that others can’t, if they are functional or recognized experts in an area, or, as specialists dedicated to a specific type of work, if they blow away their peers in work output and/or quality.
Your people are exceptionally dedicated if they work a lot of hours; but that’s not the same as productivity, or anything tangible you can measure.
Otherwise, unless you’re spending huge amounts of money, I doubt you know or can predict your staff productivity to any higher level of accuracy or precision.

For sure, there’s no satisfaction in having only vague ideas about any performance differences in the middle of the pack, but what is it worth to you to know?

– How We Should Execute Personnel Actions Related to Software People

During the hiring decision, through the initial resume screen, and then through the interviews, follow-up and reference check, you attempt to assess their prior actions in order to build a mini-history of what that person has demonstrated to this point. The goal is to bin them into the general categories mentioned above, or, for less experienced hires, to find predictors that correctly places their future performance into the above categories. Then you can make your decision based on the cutoff of people that you are willing to bring into your organization at that point in time.

Similar considerations follow for layoff cuts.

As for how you should pay them, well, that is only loosely correlated to your assessment of their abilities. You know you have to keep your key talent and experts who make the projects go, so, up until their pay becomes exorbitant, you’re going to pay them whatever you have to. Because of spin-up costs and general efficiency, you’re going to keep your competent people on board as long as their salaries don’t get way out of line from what you can get out of a new hire after a year or an experienced off the street. You pay your exceptional people whatever they are worth to you, which might not be anything more, depending on whether you are using their talents or not. If your exceptionally dedicated people are also competent and producing work at a rate in proportion to their hours, you pay them. Anybody you want to promote into management, you pay, so you don’t lose the training you put into them. In short, since you’re firing all the incompetent ones anyway, you’re going to pay.

Since you often don’t know the performance differences between your people with enough accuracy to stand by your decisions, you’re going to follow the market scale for the generic person with the experience and education, since that’s what it takes to keep people. Even if salaries get cheaper, it’s still better to lay off and/or cut pay than to hire from outside, because you know what you have in house; the exception is if you really do want fresh ideas and perspectives, and are willing to pay for the disruption.