Category Methodologies

Response 2 (Performance of Armies)

In an exchange with one of readers, he mentioned that about the possibility to quantifiably access the performances of armies and produce a ranking from best to worst. The exchange is here:

The Dupuy Institute Air Model Historical Data Study

We have done some work on this, and are the people who have done the most extensive published work on this. Swedish researcher Niklas Zetterling in his book Normandy 1944: German Military Organization, Combat Power and Organizational Effectiveness also addresses this subject, as he has elsewhere, for example, an article in The International TNDM Newsletter, volume I, No. 6, pages 21-23 called “CEV Calculations in Italy, 1943.” It is here: http://www.dupuyinstitute.org/tdipub4.htm

When it came to measuring the differences in performance of armies, Martin van Creveld referenced Trevor Dupuy in his book Fighting Power: German and U.S. Army Performance, 1939-1945, pages 4-8.

What Trevor Dupuy has done is compare the performances of both overall forces and individual divisions based upon his Quantified Judgment Model (QJM). This was done in his book Numbers, Predictions and War: The Use of History to Evaluate and Predict the Outcome of Armed Conflict. I bring the readers attention to pages ix, 62-63, Chapter 7: Behavioral Variables in World War II (pages 95-110), Chapter 9: Reliably Representing the Arab-Israeli Wars (pages 118-139), and in particular page 135, and pages 163-165. It was also discussed in Understanding War: History and Theory of Combat, Chapter Ten: Relative Combat Effectiveness (pages 105-123).

I ended up dedicating four chapters in my book War by Numbers: Understanding Conventional Combat to the same issue. One of the problems with Trevor Dupuy’s approach is that you had to accept his combat model as a valid measurement of unit performance. This was a reach for many people, especially those who did not like his conclusions to start with. I choose to simply use the combined statistical comparisons of dozens of division-level engagements, which I think makes the case fairly convincingly without adding a construct to manipulate the data. If someone has a disagreement with my statistical compilations and the results and conclusions from it, I have yet to hear them. I would recommend looking at Chapter 4: Human Factors (pages 16-18), Chapter 5: Measuring Human Factors in Combat: Italy 1943-1944 (pages 19-31), Chapter 6: Measuring Human Factors in Combat: Ardennes and Kursk (pages 32-48), and Chapter 7: Measuring Human Factors in Combat: Modern Wars (pages 49-59).

Now, I did end up discussing Trevor Dupuy’s model in Chapter 19: Validation of the TNDM and showing the results of the historical validations we have done of his model, but the model was not otherwise used in any of the analysis done in the book.

But….what we (Dupuy and I) have done is a comparison between forces that opposed each other. It is a measurement of combat value relative to each other. It is not an absolute measurement that can be compared to other armies in different times and places. Trevor Dupuy toyed with this on page 165 of NPW, but this could only be done by assuming that combat effectiveness of the U.S. Army in WWII was the same as the Israeli Army in 1973.

Anyhow, it is probably impossible to come up with a valid performance measurement that would allow you to rank an army from best to worse. It is possible to come up with a comparative performance measurement of armies that have faced each other. This, I believe we have done, using different methodologies and different historical databases. I do believe it would be possible to then determine what the different factors are that make up this difference. I do believe it would be possible to assign values or weights to those factors. I believe this would be very useful to know, in light of the potential training and organizational value of this knowledge.

‘Love’s Tables’: U.S. War Department Casualty Estimation in World War II

The same friend of TDI who asked about ‘Evett’s Rates,” the British casualty estimation methodology during World War II, also mentioned that the work of Albert G. Love III was now available on-line. Rick Atkinson also referenced “Love’s Tables” in The Guns At Last Light.

In 1931, Lieutenant Colonel (later Brigadier General) Love, then a Medical Corps physician in the U.S. Army Medical Field Services School, published a study of American casualty data in the recent Great War, titled “War Casualties.”[1] This study was likely the source for tables used for casualty estimation by the U.S. Army through 1944.[2]

Love, who had no advanced math or statistical training, undertook his study with the support of the Army Surgeon General, Merritte W. Ireland, and initial assistance from Dr. Lowell J. Reed, a professor of biostatistics at John Hopkins University. Love’s posting in the Surgeon General’s Office afforded him access to an array of casualty data collected from the records of the American Expeditionary Forces in France, as well as data from annual Surgeon General reports dating back to 1819, the official medical history of the U.S. Civil War, and U.S. general population statistics.

Love’s research was likely the basis for rate tables for calculating casualties that first appeared in the 1932 edition of the War Department’s Staff Officer’s Field Manual.[3]

Battle Casualties, including Killed, in Percent of Unit Strength, Staff Officer’s Field Manual (1932).

The 1932 Staff Officer’s Field Manual estimation methodology reflected Love’s sophisticated understanding of the factors influencing combat casualty rates. It showed that both the resistance and combat strength (and all of the factors that comprised it) of the enemy, as well as the equipment and state of training and discipline of the friendly troops had to be taken into consideration. The text accompanying the tables pointed out that loss rates in small units could be quite high and variable over time, and that larger formations took fewer casualties as a fraction of overall strength, and that their rates tended to become more constant over time. Casualties were not distributed evenly, but concentrated most heavily among the combat arms, and in the front-line infantry in particular. Attackers usually suffered higher loss rates than defenders. Other factors to be accounted for included the character of the terrain, the relative amount of artillery on each side, and the employment of gas.

The 1941 iteration of the Staff Officer’s Field Manual, now designated Field Manual (FM) 101-10[4], provided two methods for estimating battle casualties. It included the original 1932 Battle Casualties table, but the associated text no longer included the section outlining factors to be considered in calculating loss rates. This passage was moved to a note appended to a new table showing the distribution of casualties among the combat arms.

Rather confusingly, FM 101-10 (1941) presented a second table, Estimated Daily Losses in Campaign of Personnel, Dead and Evacuated, Per 1,000 of Actual Strength. It included rates for front line regiments and divisions, corps and army units, reserves, and attached cavalry. The rates were broken down by posture and tactical mission.

Estimated Daily Losses in Campaign of Personnel, Dead and Evacuated, Per 1,000 of Actual Strength, FM 101-10 (1941)

The source for this table is unknown, nor the method by which it was derived. No explanatory text accompanied it, but a footnote stated that “this table is intended primarily for use in school work and in field exercises.” The rates in it were weighted toward the upper range of the figures provided in the 1932 Battle Casualties table.

The October 1943 edition of FM 101-10 contained no significant changes from the 1941 version, except for the caveat that the 1932 Battle Casualties table “may or may not prove correct when applied to the present conflict.”

The October 1944 version of FM 101-10 incorporated data obtained from World War II experience.[5] While it also noted that the 1932 Battle Casualties table might not be applicable, the experiences of the U.S. II Corps in North Africa and one division in Italy were found to be in agreement with the table’s division and corps loss rates.

FM 101-10 (1944) included another new table, Estimate of Battle Losses for a Front-Line Division (in % of Actual Strength), meaning that it now provided three distinct methods for estimating battle casualties.

Estimate of Battle Losses for a Front-Line Division (in % of Actual Strength), FM 101-10 (1944)

Like the 1941 Estimated Daily Losses in Campaign table, the sources for this new table were not provided, and the text contained no guidance as to how or when it should be used. The rates it contained fell roughly within the span for daily rates for severe (6-8%) to maximum (12%) combat listed in the 1932 Battle Casualty table, but would produce vastly higher overall rates if applied consistently, much higher than the 1932 table’s 1% daily average.

FM 101-10 (1944) included a table showing the distribution of losses by branch for the theater based on experience to that date, except for combat in the Philippine Islands. The new chart was used in conjunction with the 1944 Estimate of Battle Losses for a Front-Line Division table to determine daily casualty distribution.

Distribution of Battle Losses–Theater of Operations, FM 101-10 (1944)

The final World War II version of FM 101-10 issued in August 1945[6] contained no new casualty rate tables, nor any revisions to the existing figures. It did finally effectively invalidate the 1932 Battle Casualties table by noting that “the following table has been developed from American experience in active operations and, of course, may not be applicable to a particular situation.” (original emphasis)

NOTES

[1] Albert G. Love, War Casualties, The Army Medical Bulletin, No. 24, (Carlisle Barracks, PA: 1931)

[2] This post is adapted from TDI, Casualty Estimation Methodologies Study, Interim Report (May 2005) (Altarum) (pp. 314-317).

[3] U.S. War Department, Staff Officer’s Field Manual, Part Two: Technical and Logistical Data (Government Printing Office, Washington, D.C., 1932)

[4] U.S. War Department, FM 101-10, Staff Officer’s Field Manual: Organization, Technical and Logistical Data (Washington, D.C., June 15, 1941)

[5] U.S. War Department, FM 101-10, Staff Officer’s Field Manual: Organization, Technical and Logistical Data (Washington, D.C., October 12, 1944)

[6] U.S. War Department, FM 101-10 Staff Officer’s Field Manual: Organization, Technical and Logistical Data (Washington, D.C., August 1, 1945)

‘Evett’s Rates’: British War Office Wastage Tables

Stretcher bearers of the East Surrey Regiment, with a Churchill tank of the North Irish Horse in the background, during the attack on Longstop Hill, Tunisia, 23 April 1943. [Imperial War Museum/Wikimedia]

A friend of TDI queried us recently about a reference in Rick Atkinson’s The Guns at Last Light: The War in Western Europe, 1944-1945 to a British casualty estimation methodology known as “Evett’s Rates.” There are few references to Evett’s Rates online, but as it happens, TDI did find out some details about them for a study on casualty estimation. [1]

British Army staff officers during World War II and the 1950s used a set of look-up tables which listed expected monthly losses in percentage of strength for various arms under various combat conditions. The origin of the tables is not known, but they were officially updated twice, in 1942 by a committee chaired by Major General Evett, and in 1951-1955 by the Army Operations Research Group (AORG).[2]

The methodology was based on staff predictions of one of three levels of operational activity, “Intense,” “Normal,” and “Quiet.” These could be applied to an entire theater, or to individual divisions. The three levels were defined the same way for both the Evett Committee and AORG rates:

The rates were broken down by arm and rank, and included battle and nonbattle casualties.

Rates of Personnel Wastage Including Both Battle and Non-battle Casualties According to the Evett Committee of 1942. (Percent per 30 days).

The Evett Committee rates were criticized during and after the war. After British forces suffered twice the anticipated casualties at Anzio, the British 21st Army Group applied a “double intense rate” which was twice the Evett Committee figure and intended to apply to assaults. When this led to overestimates of casualties in Normandy, the double intense rate was discarded.

From 1951 to 1955, AORG undertook a study of casualty rates in World War II. Its analysis was based on casualty data from the following campaigns:

  • Northwest Europe, 1944
    • 6-30 June – Beachhead offensive
    • 1 July-1 September – Containment and breakout
    • 1 October-30 December – Semi-static phase
    • 9 February to 6 May – Rhine crossing and final phase
  • Italy, 1944
    • January to December – Fighting a relatively equal enemy in difficult country. Warfare often static.
    • January to February (Anzio) – Beachhead held against severe and well-conducted enemy counter-attacks.
  • North Africa, 1943
    • 14 March-13 May – final assault
  • Northwest Europe, 1940
    • 10 May-2 June – Withdrawal of BEF
  • Burma, 1944-45

From the first four cases, the AORG study calculated two sets of battle casualty rates as percentage of strength per 30 days. “Overall” rates included KIA, WIA, C/MIA. “Apparent rates” included these categories but subtracted troops returning to duty. AORG recommended that “overall” rates be used for the first three months of a campaign.

The Burma campaign data was evaluated differently. The analysts defined a “force wastage” category which included KIA, C/MIA, evacuees from outside the force operating area and base hospitals, and DNBI deaths. “Dead wastage” included KIA, C/MIA, DNBI dead, and those discharged from the Army as a result of injuries.

The AORG study concluded that the Evett Committee underestimated intense loss rates for infantry and armor during periods of very hard fighting and overestimated casualty rates for other arms. It recommended that if only one brigade in a division was engaged, two-thirds of the intense rate should be applied, if two brigades were engaged the intense rate should be applied, and if all brigades were engaged then the intense rate should be doubled. It also recommended that 2% extra casualties per month should be added to all the rates for all activities should the forces encounter heavy enemy air activity.[1]

The AORG study rates were as follows:

Recommended AORG Rates of Personnel Wastage. (Percent per 30 days).

If anyone has further details on the origins and activities of the Evett Committee and AORG, we would be very interested in finding out more on this subject.

NOTES

[1] This post is adapted from The Dupuy Institute, Casualty Estimation Methodologies Study, Interim Report (May 2005) (Altarum) (pp. 51-53).

[2] Rowland Goodman and Hugh Richardson. “Casualty Estimation in Open and Guerrilla Warfare.” (London: Directorate of Science (Land), U.K. Ministry of Defence, June 1995.), Appendix A.

Comparing the RAND Version of the 3:1 Rule to Real-World Data

Chuliengcheng. In a glorious death eternal life. (Battle of Yalu River, 1904) [Wikimedia Commons]

[The article below is reprinted from the Winter 2010 edition of The International TNDM Newsletter.]

Comparing the RAND Version of the 3:1 Rule to Real-World Data
Christopher A. Lawrence

For this test, The Dupuy Institute took advan­tage of two of its existing databases for the DuWar suite of databases. The first is the Battles Database (BaDB), which covers 243 battles from 1600 to 1900. The sec­ond is the Division-level Engagement Database (DLEDB), which covers 675 division-level engagements from 1904 to 1991.

The first was chosen to provide a historical con­text for the 3:1 rule of thumb. The second was chosen so as to examine how this rule applies to modern com­bat data.

We decided that this should be tested to the RAND version of the 3:1 rule as documented by RAND in 1992 and used in JICM [Joint Integrated Contingency Model] (with SFS [Situational Force Scoring]) and other mod­els. This rule, as presented by RAND, states: “[T]he famous ‘3:1 rule,’ according to which the attacker and defender suffer equal fractional loss rates at a 3:1 force ratio if the battle is in mixed terrain and the defender enjoys ‘prepared’ defenses…”

Therefore, we selected out all those engage­ments from these two databases that ranged from force ratios of 2.5 to 1 to 3.5 to 1 (inclusive). It was then a simple matter to map those to a chart that looked at attackers losses compared to defender losses. In the case of the pre-1904 cases, even with a large database (243 cases), there were only 12 cases of combat in that range, hardly statistically significant. That was because most of the combat was at odds ratios in the range of .50-to-1 to 2.00-to-one.

The count of number of engagements by odds in the pre-1904 cases:

As the database is one of battles, then usually these are only joined at reasonably favorable odds, as shown by the fact that 88 percent of the battles occur between 0.40 and 2.50 to 1 odds. The twelve pre-1904 cases in the range of 2.50 to 3.50 are shown in Table 1.

If the RAND version of the 3:1 rule was valid, one would expect that the “Percent per Day Loss Ratio” (the last column) would hover around 1.00, as this is the ratio of attacker percent loss rate to the defender per­cent loss rate. As it is, 9 of the 12 data points are notice­ably below 1 (below 0.40 or a 1 to 2.50 exchange rate). This leaves only three cases (25%) with an exchange rate that would support such a “rule.”

If we look at the simple ratio of actual losses (vice percent losses), then the numbers comes much closer to parity, but this is not the RAND interpreta­tion of the 3:1 rule. Six of the twelve numbers “hover” around an even exchange ratio, with six other sets of data being widely off that central point. “Hover” for the rest of this discussion means that the exchange ratio ranges from 0.50-to-1 to 2.00-to 1.

Still, this is early modern linear combat, and is not always representative of modern war. Instead, we will examine 634 cases in the Division-level Database (which consists of 675 cases) where we have worked out the force ratios. While this database covers from 1904 to 1991, most of the cases are from WWII (1939- 1945). Just to compare:

As such, 87% of the cases are from WWII data and 10% of the cases are from post-WWII data. The engagements without force ratios are those that we are still working on as The Dupuy Institute is always ex­panding the DLEDB as a matter of routine. The specific cases, where the force ratios are between 2.50 and 3.50 to 1 (inclusive) are shown in Table 2:

This is a total of 98 engagements at force ratios of 2.50 to 3.50 to 1. It is 15 percent of the 634 engage­ments for which we had force ratios. With this fairly significant representation of the overall population, we are still getting no indication that the 3:1 rule, as RAND postulates it applies to casualties, does indeed fit the data at all. Of the 98 engagements, only 19 of them demonstrate a percent per day loss ratio (casualty exchange ratio) between 0.50-to-1 and 2-to-1. This is only 19 percent of the engagements at roughly 3:1 force ratio. There were 72 percent (71 cases) of those engage­ments at lower figures (below 0.50-to-1) and only 8 percent (cases) are at a higher exchange ratio. The data clearly was not clustered around the area from 0.50-to- 1 to 2-to-1 range, but was well to the left (lower) of it.

Looking just at straight exchange ratios, we do get a better fit, with 31 percent (30 cases) of the figure ranging between 0.50 to 1 and 2 to 1. Still, this fig­ure exchange might not be the norm with 45 percent (44 cases) lower and 24 percent (24 cases) higher. By definition, this fit is 1/3rd the losses for the attacker as postulated in the RAND version of the 3:1 rule. This is effectively an order of magnitude difference, and it clearly does not represent the norm or the center case.

The percent per day loss exchange ratio ranges from 0.00 to 5.71. The data tends to be clustered at the lower values, so the high values are very much outliers. The highest percent exchange ratio is 5.71, the second highest is 4.41, the third highest is 2.92. At the other end of the spectrum, there are four cases where no losses were suffered by one side and seven where the exchange ratio was .01 or less. Ignoring the “N/A” (no losses suffered by one side) and the two high “outliers (5.71 and 4.41), leaves a range of values from 0.00 to 2.92 across 92 cases. With an even dis­tribution across that range, one would expect that 51 percent of them would be in the range of 0.50-to-1 and 2.00-to-1. With only 19 percent of the cases being in that range, one is left to conclude that there is no clear correlation here. In fact, it clearly is the opposite effect, which is that there is a negative relationship. Not only is the RAND construct unsupported, it is clearly and soundly contradicted with this data. Furthermore, the RAND construct is theoretically a worse predictor of casualty rates than if one randomly selected a value for the percentile exchange rates between the range of 0 and 2.92. We do believe this data is appropriate and ac­curate for such a test.

As there are only 19 cases of 3:1 attacks fall­ing in the even percentile exchange rate range, then we should probably look at these cases for a moment:

One will note, in these 19 cases, that the aver­age attacker casualties are way out of line with the av­erage for the entire data set (3.20 versus 1.39 or 3.20 versus 0.63 with pre-1943 and Soviet-doctrine attack­ers removed). The reverse is the case for the defenders (3.12 versus 6.08 or 3.12 versus 5.83 with pre-1943 and Soviet-doctrine attackers removed). Of course, of the 19 cases, 2 are pre-1943 cases and 7 are cases of Soviet-doctrine attackers (in fact, 8 of the 14 cases of the So­viet-doctrine attackers are in this selection of 19 cases). This leaves 10 other cases from the Mediterranean and ETO (Northwest Europe 1944). These are clearly the unusual cases, outliers, etc. While the RAND 3:1 rule may be applicable for the Soviet-doctrine offensives (as it applies to 8 of the 14 such cases we have), it does not appear to be applicable to anything else. By the same token, it also does not appear to apply to virtually any cases of post-WWII combat. This all strongly argues that not only is the RAND construct not proven, but it is indeed clearly not correct.

The fact that this construct also appears in So­viet literature, but nowhere else in US literature, indi­cates that this is indeed where the rule was drawn from. One must consider the original scenarios run for the RSAC [RAND Strategy Assessment Center] wargame were “Fulda Gap” and Korean War scenarios. As such, they were regularly conducting bat­tles with Soviet attackers versus Allied defenders. It would appear that the 3:1 rule that they used more closely reflected the experiences of the Soviet attackers in WWII than anything else. Therefore, it may have been a fine representation for those scenarios as long as there was no US counterattacking or US offensives (and assuming that the Soviet Army of the 1980s performed at the same level as in did in the 1940s).

There was a clear relative performance difference between the Soviet Army and the German Army in World War II (see our Capture Rate Study Phase I & II and Measuring Human Factors in Combat for a detailed analysis of this).[1] It was roughly in the order of a 3-to-1-casualty exchange ratio. Therefore, it is not surprising that Soviet writers would create analytical tables based upon an equal percentage exchange of losses when attacking at 3:1. What is surprising, is that such a table would be used in the US to represent US forces now. This is clearly not a correct application.

Therefore, RAND’s SFS, as currently con­structed, is calibrated to, and should only be used to represent, a Soviet-doctrine attack on first world forces where the Soviet-style attacker is clearly not properly trained and where the degree of performance difference is similar to that between the Germans and Soviets in 1942-44. It should not be used for US counterattacks, US attacks, or for any forces of roughly comparable ability (regardless of whether Soviet-style doctrine or not). Furthermore, it should not be used for US attacks against forces of inferior training, motivation and co­hesiveness. If it is, then any such tables should be ex­pected to produce incorrect results, with attacker losses being far too high relative to the defender. In effect, the tables unrealistically penalize the attacker.

As JICM with SFS is now being used for a wide variety of scenarios, then it should not be used at all until this fundamental error is corrected, even if that use is only for training. With combat tables keyed to a result that is clearly off by an order of magnitude, then the danger of negative training is high.

NOTES

[1] Capture Rate Study Phases I and II Final Report (The Dupuy Institute, March 6, 2000) (2 Vols.) and Measuring Human Fac­tors in Combat—Part of the Enemy Prisoner of War Capture Rate Study (The Dupuy Institute, August 31, 2000). Both of these reports are available through our web site.

TDI Friday Read: The Lanchester Equations

Frederick W. Lanchester (1868-1946), British engineer and author of the Lanchester combat attrition equations. [Lanchester.com]

Today’s edition of TDI Friday Read addresses the Lanchester equations and their use in U.S. combat models and simulations. In 1916, British engineer Frederick W. Lanchester published a set of calculations he had derived for determining the results of attrition in combat. Lanchester intended them to be applied as an abstract conceptualization of aerial combat, stating that he did not believe they were applicable to ground combat.

Due to their elegant simplicity, U.S. military operations researchers nevertheless began incorporating the Lanchester equations into their land warfare computer combat models and simulations in the 1950s and 60s. The equations are the basis for many models and simulations used throughout the U.S. defense community today.

The problem with using Lanchester’s equations is that, despite numerous efforts, no one has been able to demonstrate that they accurately represent real-world combat.

Lanchester equations have been weighed….

Really…..Lanchester?

Trevor Dupuy was critical of combat models based on the Lanchester equations because they cannot account for the role behavioral and moral (i.e. human) factors play in combat.

Human Factors In Warfare: Interaction Of Variable Factors

He was also critical of models and simulations that had not been tested to see whether they could reliably represent real-world combat experience. In the modeling and simulation community, this sort of testing is known as validation.

Military History and Validation of Combat Models

The use of unvalidated concepts, like the Lanchester equations, and unvalidated combat models and simulations persists. Critics have dubbed this the “base of sand” problem, and it continues to affect not only models and simulations, but all abstract theories of combat, including those represented in military doctrine.

https://dupuyinstitute.dreamhosters.com/2017/04/10/wargaming-multi-domain-battle-the-base-of-sand-problem/

Combat Power vs Combat Power

In my last post, I ended up comparing Combat Effectiveness Value (CEV) to Combat Power. CEV is only part of combat power. In Trevor Dupuy’s formulation, Combat Power is P = (S x V x CEV)

This means that combat power (P) is the product of force strength, including weapon effects (S), operational and environmental factors (V) and human factors (CEV).

From his list of 73 variables on page 33 of Numbers, Predictions and War (NPW), the operational and environmental factors include terrain factors, weather factors, season factors, air superiority factors, posture factors, mobility effects, vulnerability factors, tactical air effects, other combat processes (including surprise), and the intangible factors (which are included in his CEV).

Again, it turns into a much longer laundry list of variables than we have from ADP 3.0.

What Makes Up Combat Power?

Trevor Dupuy used in his models and theoretical work the concept of the Combat Effectiveness Value. The combat multiplier consisted of:

  1. Morale,
  2. training,
  3. experience,
  4. leadership,
  5. motivation,
  6. cohesion,
  7. intelligence (including interpretation),
  8. momentum,
  9. initiative,
  10. doctrine,
  11. the effects of surprise,
  12. logistical systems,
  13. organizational habits,
  14. and even cultural differences.
  15. (generalship)

See War by Numbers, page 17 and Numbers, Predictions and War, page 33. To this list, I have added a fifteenth item: “generalship,” which I consider something different than leadership. As I stated in my footnote on pages 17 & 348 of War by Numbers:

“Leadership” is this sense represents the training and capabilities of the non-commission and commissioned officers throughout the unit, which is going to be fairly consistent in an army from unit to unit. This can be a fairly consistent positive or negative influence on a unit. On the other hand, “generalship” represents the guy at the top of the unit, making the decisions. This is widely variable; with the history of warfare populated with brilliant generals, a large number of perfectly competent ones, and a surprisingly large number of less than competent ones. Within in army, no matter the degree and competence of the officer corps, or the rigor of their training, poor generals show up, and sometimes, brilliant generals show up with no military training (like the politician turned general Julius Caesar).

 

Anyhow, looking at the previous blog post by Shawn, the U.S. Army states that “combat power” consists of eight elements:

  1. Leadership,
  2. information,
  3. mission command,
  4. movement and maneuver
  5. intelligence
  6. fires,
  7. sustainment,
  8. and protection.

I am not going to debate strengths and weaknesses of these two lists, but I do note that there are only two items on both lists (leadership and intelligence). I prefer the 15 point list.

How Does the U.S. Army Calculate Combat Power? ¯\_(ツ)_/¯

The constituents of combat power as described in current U.S. military doctrine. [The Lightning Press]

One of the fundamental concepts of U.S. warfighting doctrine is combat power. The current U.S. Army definition is “the total means of destructive, constructive, and information capabilities that a military unit or formation can apply at a given time. (ADRP 3-0).” It is the construct commanders and staffs are taught to use to assess the relative effectiveness of combat forces and is woven deeply throughout all aspects of U.S. operational thinking.

To execute operations, commanders conceptualize capabilities in terms of combat power. Combat power has eight elements: leadership, information, mission command, movement and maneuver, intelligence, fires, sustainment, and protection. The Army collectively describes the last six elements as the warfighting functions. Commanders apply combat power through the warfighting functions using leadership and information. [ADP 3-0, Operations]

Yet, there is no formal method in U.S. doctrine for estimating combat power. The existing process is intentionally subjective and largely left up to judgment. This is problematic, given that assessing the relative combat power of friendly and opposing forces on the battlefield is the first step in Course of Action (COA) development, which is at the heart of the U.S. Military Decision-Making Process (MDMP). Estimates of combat power also figure heavily in determining the outcomes of wargames evaluating proposed COAs.

The Existing Process

The Army’s current approach to combat power estimation is outlined in Field Manual (FM) 6-0 Commander and Staff Organization and Operations (2014). Planners are instructed to “make a rough estimate of force ratios of maneuver units two levels below their echelon.” They are then directed to “compare friendly strengths against enemy weaknesses, and vice versa, for each element of combat power.” It is “by analyzing force ratios and determining and comparing each force’s strengths and weaknesses as a function of combat power” that planners gain insight into tactical and operational capabilities, perspectives, vulnerabilities, and required resources.

That is it. Planners are told that “although the process uses some numerical relationships, the estimate is largely subjective. Assessing combat power requires assessing both tangible and intangible factors, such as morale and levels of training.” There is no guidance as to how to determine force ratios [numbers of troops or weapons systems?]. Nor is there any description of how to relate force calculations to combat power. Should force strengths be used somehow to determine a combat power value? Who knows? No additional doctrinal or planning references are provided.

Planners then use these subjective combat power assessments as they shape potential COAs and test them through wargaming. Although explicitly warned not to “develop and recommend COAs based solely on mathematical analysis of force ratios,” they are invited at this stage to consult a table of “minimum historical planning ratios as a starting point.” The table is clearly derived from the ubiquitous 3-1 rule of combat. Contrary to what FM 6-0 claims, neither the 3-1 rule nor the table have a clear historical provenance or any sort of empirical substantiation. There is no proven validity to any of the values cited. It is not even clear whether the “historical planning ratios” apply to manpower, firepower, or combat power.

During this phase, planners are advised to account for “factors that are difficult to gauge, such as impact of past engagements, quality of leaders, morale, maintenance of equipment, and time in position. Levels of electronic warfare support, fire support, close air support, civilian support, and many other factors also affect arraying forces.” FM 6-0 offers no detail as to how these factors should be measured or applied, however.

FM 6-0 also addresses combat power assessment for stability and civil support operations through troop-to-task analysis. Force requirements are to be based on an estimate of troop density, a “ratio of security forces (including host-nation military and police forces as well as foreign counterinsurgents) to inhabitants.” The manual advises that most “most density recommendations fall within a range of 20 to 25 counterinsurgents for every 1,000 residents in an area of operations. A ratio of twenty counterinsurgents per 1,000 residents is often considered the minimum troop density required for effective counterinsurgency operations.”

While FM 6-0 acknowledges that “as with any fixed ratio, such calculations strongly depend on the situation,” it does not mention that any references to force level requirements, tie-down ratios, or troop density were stripped from both Joint and Army counterinsurgency manuals in 2013 and 2014. Yet, this construct lingers on in official staff planning doctrine. (Recent research challenged the validity of the troop density construct but the Defense Department has yet to fund any follow-on work on the subject.)

The Army Has Known About The Problem For A Long Time

The Army has tried several solutions to the problem of combat power estimation over the years. In the early 1970s, the U.S. Army Center for Army Analysis (CAA; known then as the U.S. Army Concepts & Analysis Agency) developed the Weighted Equipment Indices/Weighted Unit Value (WEI/WUV or “wee‑wuv”) methodology for calculating the relative firepower of different combat units. While WEI/WUV’s were soon adopted throughout the Defense Department, the subjective nature of the method gradually led it to be abandoned for official use.

In the 1980s and 1990s, the U.S. Army Command & General Staff College (CGSC) published the ST 100-9 and ST 100-3 student workbooks that contained tables of planning factors that became the informal basis for calculating combat power in staff practice. The STs were revised regularly and then adapted into spreadsheet format in the late 1990s. The 1999 iteration employed WEI/WEVs as the basis for calculating firepower scores used to estimate force ratios. CGSC stopped updating the STs in the early 2000s, as the Army focused on irregular warfare.

With the recently renewed focus on conventional conflict, Army staff planners are starting to realize that their planning factors are out of date. In an attempt to fill this gap, CGSC developed a new spreadsheet tool in 2012 called the Correlation of Forces (COF) calculator. It apparently drew upon analysis done by the U.S. Army Training and Doctrine Command Analysis Center (TRAC) in 2004 to establish new combat unit firepower scores. (TRAC’s methodology is not clear, but if it is based on this 2007 ISMOR presentation, the scores are derived from runs by an unspecified combat model modified by factors derived from the Army’s unit readiness methodology. If described accurately, this would not be an improvement over WEI/WUVs.)

The COF calculator continues to use the 3-1 force ratio tables. It also incorporates a table for estimating combat losses based on force ratios (this despite ample empirical historical analysis showing that there is no correlation between force ratios and casualty rates).

While the COF calculator is not yet an official doctrinal product, CGSC plans to add Marine Corps forces to it for use as a joint planning tool and to incorporate it into the Army’s Command Post of the Future (CPOF). TRAC is developing a stand-alone version for use by force developers.

The incorporation of unsubstantiated and unvalidated concepts into Army doctrine has been a long standing problem. In 1976, Huba Wass de Czege, then an Army major, took both “loosely structured and unscientific analysis” based on intuition and experience and simple counts of gross numbers to task as insufficient “for a clear and rigorous understanding of combat power in a modern context.” He proposed replacing it with a analytical framework for analyzing combat power that accounted for both measurable and intangible factors. Adopting a scrupulous method and language would overcome the simplistic tactical analysis then being taught. While some of the essence of Wass de Czege’s approach has found its way into doctrinal thinking, his criticism of the lack of objective and thorough analysis continues to echo (here, here, and here, for example).

Despite dissatisfaction with the existing methods, little has changed. The problem with this should be self-evident, but I will give the U.S. Naval War College the final word here:

Fundamentally, all of our approaches to force-on-force analysis are underpinned by theories of combat that include both how combat works and what matters most in determining the outcomes of engagements, battles, campaigns, and wars. The various analytical methods we use can shed light on the performance of the force alternatives only to the extent our theories of combat are valid. If our theories are flawed, our analytical results are likely to be equally wrong.

TDI Friday Read: The Validity Of The 3-1 Rule Of Combat

Canadian soldiers going “over the top” during the First World War. [History.com]

Today’s edition of TDI Friday Read addresses the question of force ratios in combat. How many troops are needed to successfully attack or defend on the battlefield? There is a long-standing rule of thumb that holds that an attacker requires a 3-1 preponderance over a defender in combat in order to win. The aphorism is so widely accepted that few have questioned whether it is actually true or not.

Trevor Dupuy challenged the validity of the 3-1 rule on empirical grounds. He could find no historical substantiation to support it. In fact, his research on the question of force ratios suggested that there was a limit to the value of numerical preponderance on the battlefield.

Trevor Dupuy and the 3-1 Rule

Human Factors In Warfare: Diminishing Returns In Combat

TDI President Chris Lawrence has also challenged the 3-1 rule in his own work on the subject.

Force Ratios in Conventional Combat

The 3-to-1 Rule in Histories

Aussie OR

Comparing Force Ratios to Casualty Exchange Ratios

The validity of the 3-1 rule is no mere academic question. It underpins a great deal of U.S. military policy and warfighting doctrine. Yet, the only time the matter was seriously debated was in the 1980s with reference to the problem of defending Western Europe against the threat of Soviet military invasion.

The Great 3-1 Rule Debate

It is probably long past due to seriously challenge the validity and usefulness of the 3-1 rule again.

Validating Trevor Dupuy’s Combat Models

[The article below is reprinted from Winter 2010 edition of The International TNDM Newsletter.]

A Summation of QJM/TNDM Validation Efforts

By Christopher A. Lawrence

There have been six or seven different validation tests conducted of the QJM (Quantified Judgment Model) and the TNDM (Tactical Numerical Deterministic Model). As the changes to these two models are evolutionary in nature but do not fundamentally change the nature of the models, the whole series of validation tests across both models is worth noting. To date, this is the only model we are aware of that has been through multiple validations. We are not aware of any DOD [Department of Defense] combat model that has undergone more than one validation effort. Most of the DOD combat models in use have not undergone any validation.

The Two Original Validations of the QJM

After its initial development using a 60-engagement WWII database, the QJM was tested in 1973 by application of its relationships and factors to a validation database of 21 World War II engagements in Northwest Europe in 1944 and 1945. The original model proved to be 95% accurate in explaining the outcomes of these additional engagements. Overall accuracy in predicting the results of the 81 engagements in the developmental and validation databases was 93%.[1]

During the same period the QJM was converted from a static model that only predicted success or failure to one capable of also predicting attrition and movement. This was accomplished by adding variables and modifying factor values. The original QJM structure was not changed in this process. The addition of movement and attrition as outputs allowed the model to be used dynamically in successive “snapshot” iterations of the same engagement.

From 1973 to 1979 the QJM’s formulae, procedures, and variable factor values were tested against the results of all of the 52 significant engagements of the 1967 and 1973 Arab-Israeli Wars (19 from the former, 33 from the latter). The QJM was able to replicate all of those engagements with an accuracy of more than 90%?[2]

In 1979 the improved QJM was revalidated by application to 66 engagements. These included 35 from the original 81 engagements (the “development database”), and 31 new engagements. The new engagements included five from World War II and 26 from the 1973 Middle East War. This new validation test considered four outputs: success/failure, movement rates, personnel casualties, and tank losses. The QJM predicted success/failure correctly for about 85% of the engagements. It predicted movement rates with an error of 15% and personnel attrition with an error of 40% or less. While the error rate for tank losses was about 80%, it was discovered that the model consistently underestimated tank losses because input data included all kinds of armored vehicles, but output data losses included only numbers of tanks.[3]

This completed the original validations efforts of the QJM. The data used for the validations, and parts of the results of the validation, were published, but no formal validation report was issued. The validation was conducted in-house by Colonel Dupuy’s organization, HERO [Historical Evaluation Research Organization]. The data used were mostly from division-level engagements, although they included some corps- and brigade-level actions. We count these as two separate validation efforts.

The Development of the TNDM and Desert Storm

In 1990 Col. Dupuy, with the collaborative assistance of Dr. James G. Taylor (author of Lanchester Models of Warfare [vol. 1] [vol. 2], published by the Operations Research Society of America, Arlington, Virginia, in 1983) introduced a significant modification: the representation of the passage of time in the model. Instead of resorting to successive “snapshots,” the introduction of Taylor’s differential equation technique permitted the representation of time as a continuous flow. While this new approach required substantial changes to the software, the relationship of the model to historical experience was unchanged.[4] This revision of the model also included the substitution of formulae for some of its tables so that there was a continuous flow of values across the individual points in the tables. It also included some adjustment to the values and tables in the QJM. Finally, it incorporated a revised OLI [Operational Lethality Index] calculation methodology for modem armor (mobile fighting machines) to take into account all the factors that influence modern tank warfare.[5] The model was reprogrammed in Turbo PASCAL (the original had been written in BASIC). The new model was called the TNDM (Tactical Numerical Deterministic Model).

Building on its foundation of historical validation and proven attrition methodology, in December 1990, HERO used the TNDM to predict the outcome of, and losses from, the impending Operation DESERT STORM.[6] It was the most accurate (lowest) public estimate of U.S. war casualties provided before the war. It differed from most other public estimates by an order of magnitude.

Also, in 1990, Trevor Dupuy published an abbreviated form of the TNDM in the book Attrition: Forecasting Battle Casualties and Equipment Losses in Modern War. A brief validation exercise using 12 battles from 1805 to 1973 was published in this book.[7] This version was used for creation of M-COAT[8] and was also separately tested by a student (Lieutenant Gozel) at the Naval Postgraduate School in 2000.[9] This version did not have the firepower scoring system, and as such neither M-COAT, Lieutenant Gozel’s test, nor Colonel Dupuy’s 12-battle validation included the OLI methodology that is in the primary version of the TNDM.

For counting purposes, I consider the Gulf War the third validation of the model. In the end, for any model, the proof is in the pudding. Can the model be used as a predictive tool or not? If not, then there is probably a fundamental flaw or two in the model. Still the validation of the TNDM was somewhat second-hand, in the sense that the closely-related previous model, the QJM, was validated in the 1970s to 200 World War II and 1967 and 1973 Arab-Israeli War battles, but the TNDM had not been. Clearly, something further needed to be done.

The Battalion-Level Validation of the TNDM

Under the guidance of Christopher A. Lawrence, The Dupuy Institute undertook a battalion-level validation of the TNDM in late 1996. This effort tested the model against 76 engagements from World War I, World War II, and the post-1945 world including Vietnam, the Arab-Israeli Wars, the Falklands War, Angola, Nicaragua, etc. This effort was thoroughly documented in The International TNDM Newsletter.[10] This effort was probably one of the more independent and better-documented validations of a casualty estimation methodology that has ever been conducted to date, in that:

  • The data was independently assembled (assembled for other purposes before the validation) by a number of different historians.
  • There were no calibration runs or adjustments made to the model before the test.
  • The data included a wide range of material from different conflicts and times (from 1918 to 1983).
  • The validation runs were conducted independently (Susan Rich conducted the validation runs, while Christopher A. Lawrence evaluated them).
  • The results of the validation were fully published.
  • The people conducting the validation were independent, in the sense that:

a) there was no contract, management, or agency requesting the validation;
b) none of the validators had previously been involved in designing the model, and had only very limited experience in using it; and
c) the original model designer was not able to oversee or influence the validation.[11]

The validation was not truly independent, as the model tested was a commercial product of The Dupuy Institute, and the person conducting the test was an employee of the Institute. On the other hand, this was an independent effort in the sense that the effort was employee-initiated and not requested or reviewed by the management of the Institute. Furthermore, the results were published.

The TNDM was also given a limited validation test back to its original WWII data around 1997 by Niklas Zetterling of the Swedish War College, who retested the model to about 15 or so Italian campaign engagements. This effort included a complete review of the historical data used for the validation back to their primarily sources, and details were published in The International TNDM Newsletter.[12]

There has been one other effort to correlate outputs from QJM/TNDM-inspired formulae to historical data using the Ardennes and Kursk campaign-level (i.e., division-level) databases.[13] This effort did not use the complete model, but only selective pieces of it, and achieved various degrees of “goodness of fit.” While the model is hypothetically designed for use from squad level to army group level, to date no validation has been attempted below battalion level, or above division level. At this time, the TNDM also needs to be revalidated back to its original WWII and Arab-Israeli War data, as it has evolved since the original validation effort.

The Corps- and Division-level Validations of the TNDM

Having now having done one extensive battalion-level validation of the model and published the results in our newsletters, Volume 1, issues 5 and 6, we were then presented an opportunity in 2006 to conduct two more validations of the model. These are discussed in depth in two articles of this issue of the newsletter.

These validations were again conducted using historical data, 24 days of corps-level combat and 25 cases of division-level combat drawn from the Battle of Kursk during 4-15 July 1943. It was conducted using an independently-researched data collection (although the research was conducted by The Dupuy Institute), using a different person to conduct the model runs (although that person was an employee of the Institute) and using another person to compile the results (also an employee of the Institute). To summarize the results of this validation (the historical figure is listed first followed by the predicted result):

There was one other effort that was done as part of work we did for the Army Medical Department (AMEDD). This is fully explained in our report Casualty Estimation Methodologies Study: The Interim Report dated 25 July 2005. In this case, we tested six different casualty estimation methodologies to 22 cases. These consisted of 12 division-level cases from the Italian Campaign (4 where the attack failed, 4 where the attacker advanced, and 4 Where the defender was penetrated) and 10 cases from the Battle of Kursk (2 cases Where the attack failed, 4 where the attacker advanced and 4 where the defender was penetrated). These 22 cases were randomly selected from our earlier 628 case version of the DLEDB (Division-level Engagement Database; it now has 752 cases). Again, the TNDM performed as well as or better than any of the other casualty estimation methodologies tested. As this validation effort was using the Italian engagements previously used for validation (although some had been revised due to additional research) and three of the Kursk engagements that were later used for our division-level validation, then it is debatable whether one would want to call this a seventh validation effort. Still, it was done as above with one person assembling the historical data and another person conducting the model runs. This effort was conducted a year before the corps and division-level validation conducted above and influenced it to the extent that we chose a higher CEV (Combat Effectiveness Value) for the later validation. A CEV of 2.5 was used for the Soviets for this test, vice the CEV of 3.0 that was used for the later tests.

Summation

The QJM has been validated at least twice. The TNDM has been tested or validated at least four times, once to an upcoming, imminent war, once to battalion-level data from 1918 to 1989, once to division-level data from 1943 and once to corps-level data from 1943. These last four validation efforts have been published and described in depth. The model continues, regardless of which validation is examined, to accurately predict outcomes and make reasonable predictions of advance rates, loss rates and armor loss rates. This is regardless of level of combat (battalion, division or corps), historic period (WWI, WWII or modem), the situation of the combats, or the nationalities involved (American, German, Soviet, Israeli, various Arab armies, etc.). As the QJM, the model was effectively validated to around 200 World War II and 1967 and 1973 Arab-Israeli War battles. As the TNDM, the model was validated to 125 corps-, division-, and battalion-level engagements from 1918 to 1989 and used as a predictive model for the 1991 Gulf War. This is the most extensive and systematic validation effort yet done for any combat model. The model has been tested and re-tested. It has been tested across multiple levels of combat and in a wide range of environments. It has been tested where human factors are lopsided, and where human factors are roughly equal. It has been independently spot-checked several times by others outside of the Institute. It is hard to say what more can be done to establish its validity and accuracy.

NOTES

[1] It is unclear what these percentages, quoted from Dupuy in the TNDM General Theoretical Description, specify. We suspect it is a measurement of the model’s ability to predict winner and loser. No validation report based on this effort was ever published. Also, the validation figures seem to reflect the results after any corrections made to the model based upon these tests. It does appear that the division-level validation was “incremental.” We do not know if the earlier validation tests were tested back to the earlier data, but we have reason to suspect not.

[2] The original QJM validation data was first published in the Combat Data Subscription Service Supplement, vol. 1, no. 3 (Dunn Loring VA: HERO, Summer 1975). (HERO Report #50) That effort used data from 1943 through 1973.

[3] HERO published its QJM validation database in The QJM Data Base (3 volumes) Fairfax VA: HERO, 1985 (HERO Report #100).

[4] The Dupuy Institute, The Tactical Numerical Deterministic Model (TNDM): A General and Theoretical Description, McLean VA: The Dupuy Institute, October 1994.

[5] This had the unfortunate effect of undervaluing WWII-era armor by about 75% relative to other WWII weapons when modeling WWII engagements. This left The Dupuy Institute with the compromise methodology of using the old OLI method for calculating armor (Mobile Fighting Machines) when doing WWII engagements and using the new OLI method for calculating armor when doing modem engagements

[6] Testimony of Col. T. N. Dupuy, USA, Ret, Before the House Armed Services Committee, 13 Dec 1990. The Dupuy Institute File I-30, “Iraqi Invasion of Kuwait.”

[7] Trevor N. Dupuy, Attrition: Forecasting Battle Casualties and Equipment Losses in Modern War (HERO Books, Fairfax, VA, 1990), 123-4.

[8] M-COAT is the Medical Course of Action Tool created by Major Bruce Shahbaz. It is a spreadsheet model based upon the elements of the TNDM provided in Dupuy’s Attrition (op. cit.) It used a scoring system derived from elsewhere in the U.S. Army. As such, it is a simplified form of the TNDM with a different weapon scoring system.

[9] See Gözel, Ramazan. “Fitting Firepower Score Models to the Battle of Kursk Data,” NPGS Thesis. Monterey CA: Naval Postgraduate School.

[10] Lawrence, Christopher A. “Validation of the TNDM at Battalion Level.” The International TNDM Newsletter, vol. 1, no. 2 (October 1996); Bongard, Dave “The 76 Battalion-Level Engagements.” The International TNDM Newsletter, vol. 1, no. 4 (February 1997); Lawrence, Christopher A. “The First Test of the TNDM Battalion-Level Validations: Predicting the Winner” and “The Second Test of the TNDM Battalion-Level Validations: Predicting Casualties,” The International TNDM Newsletter, vol. 1 no. 5 (April 1997); and Lawrence, Christopher A. “Use of Armor in the 76 Battalion-Level Engagements,” and “The Second Test of the Battalion-Level Validation: Predicting Casualties Final Scorecard.” The International TNDM Newsletter, vol. 1, no. 6 (June 1997).

[11] Trevor N. Dupuy passed away in July 1995, and the validation was conducted in 1996 and 1997.

[12] Zetterling, Niklas. “CEV Calculations in Italy, 1943,” The International TNDM Newsletter, vol. 1, no. 6. McLean VA: The Dupuy Institute, June 1997. See also Research Plan, The Dupuy Institute Report E-3, McLean VA: The Dupuy Institute, 7 Oct 1998.

[13] See Gözel, “Fitting Firepower Score Models to the Battle of Kursk Data.”