Tag Base of Sand problem

Breakpoints in U.S. Army Doctrine

U.S. Army prisoners of war captured by German forces during the Battle of the Bulge in 1944. [Wikipedia]

One of the least studied aspects of combat is battle termination. Why do units in combat stop attacking or defending? Shifts in combat posture (attack, defend, delay, withdrawal) are usually voluntary, directed by a commander, but they can also be involuntary, as a result of direct or indirect enemy action. Why do involuntary changes in combat posture, known as breakpoints, occur?

As Chris pointed out in a previous post, the topic of breakpoints has only been addressed by two known studies since 1954. Most existing military combat models and wargames address breakpoints in at least a cursory way, usually through some calculation based on personnel casualties. Both of the breakpoints studies suggest that involuntary changes in posture are seldom related to casualties alone, however.

Current U.S. Army doctrine addresses changes in combat posture through discussions of culmination points in the attack, and transitions from attack to defense, defense to counterattack, and defense to retrograde. But these all pertain to voluntary changes, not breakpoints.

Army doctrinal literature has little to say about breakpoints, either in the context of friendly forces or potential enemy combatants. The little it does say relates to the effects of fire on enemy forces and is based on personnel and material attrition.

According to ADRP 1-02 Terms and Military Symbols, an enemy combat unit is considered suppressed after suffering 3% personnel casualties or material losses, neutralized by 10% losses, and destroyed upon sustaining 30% losses. The sources and methodology for deriving these figures is unknown, although these specific terms and numbers have been a part of Army doctrine for decades.

The joint U.S. Army and U.S. Marine Corps vision of future land combat foresees battlefields that are highly lethal and demanding on human endurance. How will such a future operational environment affect combat performance? Past experience undoubtedly offers useful insights but there seems to be little interest in seeking out such knowledge.

Trevor Dupuy criticized the U.S. military in the 1980s for its lack of understanding of the phenomenon of suppression and other effects of fire on the battlefield, and its seeming disinterest in studying it. Not much appears to have changed since then.

Perla On Dupuy

Dr. Peter Perla, noted defense researcher, wargame designer and expert, and author of the seminal The Art of Wargaming: A Guide for Professionals and Hobbyists, gave the keynote address at the 2017 Connections Wargaming Conference last August. The topic of his speech, which served as his valedictory address on the occasion of his retirement from government service, addressed the predictive power of wargaming. In it, Perla recalled a conversation he once had with Trevor Dupuy in the early 1990s:

Like most good stories, this one has a beginning, a middle, and an end. I have sort of jumped in at the middle. So let’s go back to the beginning.

As it happens, that beginning came during one of the very first Connections. It may even have been the first one. This thread is one of those vivid memories we all have of certain events in life. In my case, it is a short conversation I had with Trevor Dupuy.

I remember the setting well. We were in front of the entrance to the O Club at Maxwell. It was kind of dark, but I can’t recall if it was in the morning before the club opened for our next session, or the evening, before a dinner. Trevor and I were chatting and he said something about wargaming being predictive. I still recall what I said.

“Good grief, Trevor, we can’t even predict the outcome of a Super Bowl game much less that of a battle!” He seemed taken by surprise that I felt that way, and he replied, “Well, if that is true, what are we doing? What’s the point?”

I had my usual stock answers. We wargame to develop insights, to identify issues, and to raise questions. We certainly don’t wargame to predict what will happen in a battle or a war. I was pretty dogmatic in those days. Thank goodness I’m not that way any more!

The question of prediction did not go away, however.

For the rest of Perla’s speech, see here. For a wonderful summary of the entire 2017 Connections Wargaming conference, see here.

 

Comparing the RAND Version of the 3:1 Rule to Real-World Data

Chuliengcheng. In a glorious death eternal life. (Battle of Yalu River, 1904) [Wikimedia Commons]

[The article below is reprinted from the Winter 2010 edition of The International TNDM Newsletter.]

Comparing the RAND Version of the 3:1 Rule to Real-World Data
Christopher A. Lawrence

For this test, The Dupuy Institute took advan­tage of two of its existing databases for the DuWar suite of databases. The first is the Battles Database (BaDB), which covers 243 battles from 1600 to 1900. The sec­ond is the Division-level Engagement Database (DLEDB), which covers 675 division-level engagements from 1904 to 1991.

The first was chosen to provide a historical con­text for the 3:1 rule of thumb. The second was chosen so as to examine how this rule applies to modern com­bat data.

We decided that this should be tested to the RAND version of the 3:1 rule as documented by RAND in 1992 and used in JICM [Joint Integrated Contingency Model] (with SFS [Situational Force Scoring]) and other mod­els. This rule, as presented by RAND, states: “[T]he famous ‘3:1 rule,’ according to which the attacker and defender suffer equal fractional loss rates at a 3:1 force ratio if the battle is in mixed terrain and the defender enjoys ‘prepared’ defenses…”

Therefore, we selected out all those engage­ments from these two databases that ranged from force ratios of 2.5 to 1 to 3.5 to 1 (inclusive). It was then a simple matter to map those to a chart that looked at attackers losses compared to defender losses. In the case of the pre-1904 cases, even with a large database (243 cases), there were only 12 cases of combat in that range, hardly statistically significant. That was because most of the combat was at odds ratios in the range of .50-to-1 to 2.00-to-one.

The count of number of engagements by odds in the pre-1904 cases:

As the database is one of battles, then usually these are only joined at reasonably favorable odds, as shown by the fact that 88 percent of the battles occur between 0.40 and 2.50 to 1 odds. The twelve pre-1904 cases in the range of 2.50 to 3.50 are shown in Table 1.

If the RAND version of the 3:1 rule was valid, one would expect that the “Percent per Day Loss Ratio” (the last column) would hover around 1.00, as this is the ratio of attacker percent loss rate to the defender per­cent loss rate. As it is, 9 of the 12 data points are notice­ably below 1 (below 0.40 or a 1 to 2.50 exchange rate). This leaves only three cases (25%) with an exchange rate that would support such a “rule.”

If we look at the simple ratio of actual losses (vice percent losses), then the numbers comes much closer to parity, but this is not the RAND interpreta­tion of the 3:1 rule. Six of the twelve numbers “hover” around an even exchange ratio, with six other sets of data being widely off that central point. “Hover” for the rest of this discussion means that the exchange ratio ranges from 0.50-to-1 to 2.00-to 1.

Still, this is early modern linear combat, and is not always representative of modern war. Instead, we will examine 634 cases in the Division-level Database (which consists of 675 cases) where we have worked out the force ratios. While this database covers from 1904 to 1991, most of the cases are from WWII (1939- 1945). Just to compare:

As such, 87% of the cases are from WWII data and 10% of the cases are from post-WWII data. The engagements without force ratios are those that we are still working on as The Dupuy Institute is always ex­panding the DLEDB as a matter of routine. The specific cases, where the force ratios are between 2.50 and 3.50 to 1 (inclusive) are shown in Table 2:

This is a total of 98 engagements at force ratios of 2.50 to 3.50 to 1. It is 15 percent of the 634 engage­ments for which we had force ratios. With this fairly significant representation of the overall population, we are still getting no indication that the 3:1 rule, as RAND postulates it applies to casualties, does indeed fit the data at all. Of the 98 engagements, only 19 of them demonstrate a percent per day loss ratio (casualty exchange ratio) between 0.50-to-1 and 2-to-1. This is only 19 percent of the engagements at roughly 3:1 force ratio. There were 72 percent (71 cases) of those engage­ments at lower figures (below 0.50-to-1) and only 8 percent (cases) are at a higher exchange ratio. The data clearly was not clustered around the area from 0.50-to- 1 to 2-to-1 range, but was well to the left (lower) of it.

Looking just at straight exchange ratios, we do get a better fit, with 31 percent (30 cases) of the figure ranging between 0.50 to 1 and 2 to 1. Still, this fig­ure exchange might not be the norm with 45 percent (44 cases) lower and 24 percent (24 cases) higher. By definition, this fit is 1/3rd the losses for the attacker as postulated in the RAND version of the 3:1 rule. This is effectively an order of magnitude difference, and it clearly does not represent the norm or the center case.

The percent per day loss exchange ratio ranges from 0.00 to 5.71. The data tends to be clustered at the lower values, so the high values are very much outliers. The highest percent exchange ratio is 5.71, the second highest is 4.41, the third highest is 2.92. At the other end of the spectrum, there are four cases where no losses were suffered by one side and seven where the exchange ratio was .01 or less. Ignoring the “N/A” (no losses suffered by one side) and the two high “outliers (5.71 and 4.41), leaves a range of values from 0.00 to 2.92 across 92 cases. With an even dis­tribution across that range, one would expect that 51 percent of them would be in the range of 0.50-to-1 and 2.00-to-1. With only 19 percent of the cases being in that range, one is left to conclude that there is no clear correlation here. In fact, it clearly is the opposite effect, which is that there is a negative relationship. Not only is the RAND construct unsupported, it is clearly and soundly contradicted with this data. Furthermore, the RAND construct is theoretically a worse predictor of casualty rates than if one randomly selected a value for the percentile exchange rates between the range of 0 and 2.92. We do believe this data is appropriate and ac­curate for such a test.

As there are only 19 cases of 3:1 attacks fall­ing in the even percentile exchange rate range, then we should probably look at these cases for a moment:

One will note, in these 19 cases, that the aver­age attacker casualties are way out of line with the av­erage for the entire data set (3.20 versus 1.39 or 3.20 versus 0.63 with pre-1943 and Soviet-doctrine attack­ers removed). The reverse is the case for the defenders (3.12 versus 6.08 or 3.12 versus 5.83 with pre-1943 and Soviet-doctrine attackers removed). Of course, of the 19 cases, 2 are pre-1943 cases and 7 are cases of Soviet-doctrine attackers (in fact, 8 of the 14 cases of the So­viet-doctrine attackers are in this selection of 19 cases). This leaves 10 other cases from the Mediterranean and ETO (Northwest Europe 1944). These are clearly the unusual cases, outliers, etc. While the RAND 3:1 rule may be applicable for the Soviet-doctrine offensives (as it applies to 8 of the 14 such cases we have), it does not appear to be applicable to anything else. By the same token, it also does not appear to apply to virtually any cases of post-WWII combat. This all strongly argues that not only is the RAND construct not proven, but it is indeed clearly not correct.

The fact that this construct also appears in So­viet literature, but nowhere else in US literature, indi­cates that this is indeed where the rule was drawn from. One must consider the original scenarios run for the RSAC [RAND Strategy Assessment Center] wargame were “Fulda Gap” and Korean War scenarios. As such, they were regularly conducting bat­tles with Soviet attackers versus Allied defenders. It would appear that the 3:1 rule that they used more closely reflected the experiences of the Soviet attackers in WWII than anything else. Therefore, it may have been a fine representation for those scenarios as long as there was no US counterattacking or US offensives (and assuming that the Soviet Army of the 1980s performed at the same level as in did in the 1940s).

There was a clear relative performance difference between the Soviet Army and the German Army in World War II (see our Capture Rate Study Phase I & II and Measuring Human Factors in Combat for a detailed analysis of this).[1] It was roughly in the order of a 3-to-1-casualty exchange ratio. Therefore, it is not surprising that Soviet writers would create analytical tables based upon an equal percentage exchange of losses when attacking at 3:1. What is surprising, is that such a table would be used in the US to represent US forces now. This is clearly not a correct application.

Therefore, RAND’s SFS, as currently con­structed, is calibrated to, and should only be used to represent, a Soviet-doctrine attack on first world forces where the Soviet-style attacker is clearly not properly trained and where the degree of performance difference is similar to that between the Germans and Soviets in 1942-44. It should not be used for US counterattacks, US attacks, or for any forces of roughly comparable ability (regardless of whether Soviet-style doctrine or not). Furthermore, it should not be used for US attacks against forces of inferior training, motivation and co­hesiveness. If it is, then any such tables should be ex­pected to produce incorrect results, with attacker losses being far too high relative to the defender. In effect, the tables unrealistically penalize the attacker.

As JICM with SFS is now being used for a wide variety of scenarios, then it should not be used at all until this fundamental error is corrected, even if that use is only for training. With combat tables keyed to a result that is clearly off by an order of magnitude, then the danger of negative training is high.

NOTES

[1] Capture Rate Study Phases I and II Final Report (The Dupuy Institute, March 6, 2000) (2 Vols.) and Measuring Human Fac­tors in Combat—Part of the Enemy Prisoner of War Capture Rate Study (The Dupuy Institute, August 31, 2000). Both of these reports are available through our web site.

Spotted In The New Books Section Of The U.S. Naval Academy Library…

Christopher A. Lawrence, War by Numbers: Understanding Conventional Combat (Lincoln, NE: Potomac Books, 2017) 390 pages, $39.95

War by Numbers assesses the nature of conventional warfare through the analysis of historical combat. Christopher A. Lawrence (President and Executive Director of The Dupuy Institute) establishes what we know about conventional combat and why we know it. By demonstrating the impact a variety of factors have on combat he moves such analysis beyond the work of Carl von Clausewitz and into modern data and interpretation.

Using vast data sets, Lawrence examines force ratios, the human factor in case studies from World War II and beyond, the combat value of superior situational awareness, and the effects of dispersion, among other elements. Lawrence challenges existing interpretations of conventional warfare and shows how such combat should be conducted in the future, simultaneously broadening our understanding of what it means to fight wars by the numbers.

The book is available in paperback directly from Potomac Books and in paperback and Kindle from Amazon.

TDI Friday Read: How Do We Know What We Know About War?

The late, great Carl Sagan.

Today’s edition of TDI Friday Read asks the question, how do we know if the theories and concepts we use to understand and explain war and warfare accurately depict reality? There is certainly no shortage of explanatory theories available, starting with Sun Tzu in the 6th century BCE and running to the present. As I have mentioned before, all combat models and simulations are theories about how combat works. Military doctrine is also a functional theory of warfare. But how do we know if any of these theories are actually true?

Well, one simple way to find out if a particular theory is valid is to use it to predict the outcome of the phenomenon it purports to explain. Testing theory through prediction is a fundamental aspect of the philosophy of science. If a theory is accurate, it should be able to produce a reasonable accurate prediction of future behavior.

In his 2016 article, “Can We Predict Politics? Toward What End?” Michael D. Ward, a Professor of Political Science at Duke University, made a case for a robust effort for using prediction as a way of evaluating the thicket of theory populating security and strategic studies. Dropping invalid theories and concepts is important, but there is probably more value in figuring out how and why they are wrong.

Screw Theory! We Need More Prediction in Security Studies!

Trevor Dupuy and TDI publicly put their theories to the test in the form of combat casualty estimates for the 1991 Gulf Way, the U.S. intervention in Bosnia, and the Iraqi insurgency. How well did they do?

Predictions

Dupuy himself argued passionately for independent testing of combat models against real-world data, a process known as validation. This is actually seldom done in the U.S. military operations research community.

Military History and Validation of Combat Models

However, TDI has done validation testing of Dupuy’s Quantified Judgement Model (QJM) and Tactical Numerical Deterministic Model (TNDM). The results are available for all to judge.

Validating Trevor Dupuy’s Combat Models

I will conclude this post on a dissenting note. Trevor Dupuy spent decades arguing for more rigor in the development of combat models and analysis, with only modest success. In fact, he encountered significant skepticism and resistance to his ideas and proposals. To this day, the U.S. Defense Department seems relatively uninterested in evidence-based research on this subject. Why?

David Wilkinson, Editor-in-Chief of the Oxford Review, wrote a fascinating blog post looking at why practitioners seem to have little actual interest in evidence-based practice.

Why evidence-based practice probably isn’t worth it…

His argument:

The problem with evidence based practice is that outside of areas like health care and aviation/technology is that most people in organisations don’t care about having research evidence for almost anything they do. That doesn’t mean they are not interesting in research but they are just not that interested in using the research to change how they do things – period.

His explanation for why this is and what might be done to remedy the situation is quite interesting.

Happy Holidays to all!

TDI Friday Read: The Lanchester Equations

Frederick W. Lanchester (1868-1946), British engineer and author of the Lanchester combat attrition equations. [Lanchester.com]

Today’s edition of TDI Friday Read addresses the Lanchester equations and their use in U.S. combat models and simulations. In 1916, British engineer Frederick W. Lanchester published a set of calculations he had derived for determining the results of attrition in combat. Lanchester intended them to be applied as an abstract conceptualization of aerial combat, stating that he did not believe they were applicable to ground combat.

Due to their elegant simplicity, U.S. military operations researchers nevertheless began incorporating the Lanchester equations into their land warfare computer combat models and simulations in the 1950s and 60s. The equations are the basis for many models and simulations used throughout the U.S. defense community today.

The problem with using Lanchester’s equations is that, despite numerous efforts, no one has been able to demonstrate that they accurately represent real-world combat.

Lanchester equations have been weighed….

Really…..Lanchester?

Trevor Dupuy was critical of combat models based on the Lanchester equations because they cannot account for the role behavioral and moral (i.e. human) factors play in combat.

Human Factors In Warfare: Interaction Of Variable Factors

He was also critical of models and simulations that had not been tested to see whether they could reliably represent real-world combat experience. In the modeling and simulation community, this sort of testing is known as validation.

Military History and Validation of Combat Models

The use of unvalidated concepts, like the Lanchester equations, and unvalidated combat models and simulations persists. Critics have dubbed this the “base of sand” problem, and it continues to affect not only models and simulations, but all abstract theories of combat, including those represented in military doctrine.

https://dupuyinstitute.dreamhosters.com/2017/04/10/wargaming-multi-domain-battle-the-base-of-sand-problem/

How Does the U.S. Army Calculate Combat Power? ¯\_(ツ)_/¯

The constituents of combat power as described in current U.S. military doctrine. [The Lightning Press]

One of the fundamental concepts of U.S. warfighting doctrine is combat power. The current U.S. Army definition is “the total means of destructive, constructive, and information capabilities that a military unit or formation can apply at a given time. (ADRP 3-0).” It is the construct commanders and staffs are taught to use to assess the relative effectiveness of combat forces and is woven deeply throughout all aspects of U.S. operational thinking.

To execute operations, commanders conceptualize capabilities in terms of combat power. Combat power has eight elements: leadership, information, mission command, movement and maneuver, intelligence, fires, sustainment, and protection. The Army collectively describes the last six elements as the warfighting functions. Commanders apply combat power through the warfighting functions using leadership and information. [ADP 3-0, Operations]

Yet, there is no formal method in U.S. doctrine for estimating combat power. The existing process is intentionally subjective and largely left up to judgment. This is problematic, given that assessing the relative combat power of friendly and opposing forces on the battlefield is the first step in Course of Action (COA) development, which is at the heart of the U.S. Military Decision-Making Process (MDMP). Estimates of combat power also figure heavily in determining the outcomes of wargames evaluating proposed COAs.

The Existing Process

The Army’s current approach to combat power estimation is outlined in Field Manual (FM) 6-0 Commander and Staff Organization and Operations (2014). Planners are instructed to “make a rough estimate of force ratios of maneuver units two levels below their echelon.” They are then directed to “compare friendly strengths against enemy weaknesses, and vice versa, for each element of combat power.” It is “by analyzing force ratios and determining and comparing each force’s strengths and weaknesses as a function of combat power” that planners gain insight into tactical and operational capabilities, perspectives, vulnerabilities, and required resources.

That is it. Planners are told that “although the process uses some numerical relationships, the estimate is largely subjective. Assessing combat power requires assessing both tangible and intangible factors, such as morale and levels of training.” There is no guidance as to how to determine force ratios [numbers of troops or weapons systems?]. Nor is there any description of how to relate force calculations to combat power. Should force strengths be used somehow to determine a combat power value? Who knows? No additional doctrinal or planning references are provided.

Planners then use these subjective combat power assessments as they shape potential COAs and test them through wargaming. Although explicitly warned not to “develop and recommend COAs based solely on mathematical analysis of force ratios,” they are invited at this stage to consult a table of “minimum historical planning ratios as a starting point.” The table is clearly derived from the ubiquitous 3-1 rule of combat. Contrary to what FM 6-0 claims, neither the 3-1 rule nor the table have a clear historical provenance or any sort of empirical substantiation. There is no proven validity to any of the values cited. It is not even clear whether the “historical planning ratios” apply to manpower, firepower, or combat power.

During this phase, planners are advised to account for “factors that are difficult to gauge, such as impact of past engagements, quality of leaders, morale, maintenance of equipment, and time in position. Levels of electronic warfare support, fire support, close air support, civilian support, and many other factors also affect arraying forces.” FM 6-0 offers no detail as to how these factors should be measured or applied, however.

FM 6-0 also addresses combat power assessment for stability and civil support operations through troop-to-task analysis. Force requirements are to be based on an estimate of troop density, a “ratio of security forces (including host-nation military and police forces as well as foreign counterinsurgents) to inhabitants.” The manual advises that most “most density recommendations fall within a range of 20 to 25 counterinsurgents for every 1,000 residents in an area of operations. A ratio of twenty counterinsurgents per 1,000 residents is often considered the minimum troop density required for effective counterinsurgency operations.”

While FM 6-0 acknowledges that “as with any fixed ratio, such calculations strongly depend on the situation,” it does not mention that any references to force level requirements, tie-down ratios, or troop density were stripped from both Joint and Army counterinsurgency manuals in 2013 and 2014. Yet, this construct lingers on in official staff planning doctrine. (Recent research challenged the validity of the troop density construct but the Defense Department has yet to fund any follow-on work on the subject.)

The Army Has Known About The Problem For A Long Time

The Army has tried several solutions to the problem of combat power estimation over the years. In the early 1970s, the U.S. Army Center for Army Analysis (CAA; known then as the U.S. Army Concepts & Analysis Agency) developed the Weighted Equipment Indices/Weighted Unit Value (WEI/WUV or “wee‑wuv”) methodology for calculating the relative firepower of different combat units. While WEI/WUV’s were soon adopted throughout the Defense Department, the subjective nature of the method gradually led it to be abandoned for official use.

In the 1980s and 1990s, the U.S. Army Command & General Staff College (CGSC) published the ST 100-9 and ST 100-3 student workbooks that contained tables of planning factors that became the informal basis for calculating combat power in staff practice. The STs were revised regularly and then adapted into spreadsheet format in the late 1990s. The 1999 iteration employed WEI/WEVs as the basis for calculating firepower scores used to estimate force ratios. CGSC stopped updating the STs in the early 2000s, as the Army focused on irregular warfare.

With the recently renewed focus on conventional conflict, Army staff planners are starting to realize that their planning factors are out of date. In an attempt to fill this gap, CGSC developed a new spreadsheet tool in 2012 called the Correlation of Forces (COF) calculator. It apparently drew upon analysis done by the U.S. Army Training and Doctrine Command Analysis Center (TRAC) in 2004 to establish new combat unit firepower scores. (TRAC’s methodology is not clear, but if it is based on this 2007 ISMOR presentation, the scores are derived from runs by an unspecified combat model modified by factors derived from the Army’s unit readiness methodology. If described accurately, this would not be an improvement over WEI/WUVs.)

The COF calculator continues to use the 3-1 force ratio tables. It also incorporates a table for estimating combat losses based on force ratios (this despite ample empirical historical analysis showing that there is no correlation between force ratios and casualty rates).

While the COF calculator is not yet an official doctrinal product, CGSC plans to add Marine Corps forces to it for use as a joint planning tool and to incorporate it into the Army’s Command Post of the Future (CPOF). TRAC is developing a stand-alone version for use by force developers.

The incorporation of unsubstantiated and unvalidated concepts into Army doctrine has been a long standing problem. In 1976, Huba Wass de Czege, then an Army major, took both “loosely structured and unscientific analysis” based on intuition and experience and simple counts of gross numbers to task as insufficient “for a clear and rigorous understanding of combat power in a modern context.” He proposed replacing it with a analytical framework for analyzing combat power that accounted for both measurable and intangible factors. Adopting a scrupulous method and language would overcome the simplistic tactical analysis then being taught. While some of the essence of Wass de Czege’s approach has found its way into doctrinal thinking, his criticism of the lack of objective and thorough analysis continues to echo (here, here, and here, for example).

Despite dissatisfaction with the existing methods, little has changed. The problem with this should be self-evident, but I will give the U.S. Naval War College the final word here:

Fundamentally, all of our approaches to force-on-force analysis are underpinned by theories of combat that include both how combat works and what matters most in determining the outcomes of engagements, battles, campaigns, and wars. The various analytical methods we use can shed light on the performance of the force alternatives only to the extent our theories of combat are valid. If our theories are flawed, our analytical results are likely to be equally wrong.

TDI Friday Read: The Validity Of The 3-1 Rule Of Combat

Canadian soldiers going “over the top” during the First World War. [History.com]

Today’s edition of TDI Friday Read addresses the question of force ratios in combat. How many troops are needed to successfully attack or defend on the battlefield? There is a long-standing rule of thumb that holds that an attacker requires a 3-1 preponderance over a defender in combat in order to win. The aphorism is so widely accepted that few have questioned whether it is actually true or not.

Trevor Dupuy challenged the validity of the 3-1 rule on empirical grounds. He could find no historical substantiation to support it. In fact, his research on the question of force ratios suggested that there was a limit to the value of numerical preponderance on the battlefield.

Trevor Dupuy and the 3-1 Rule

Human Factors In Warfare: Diminishing Returns In Combat

TDI President Chris Lawrence has also challenged the 3-1 rule in his own work on the subject.

Force Ratios in Conventional Combat

The 3-to-1 Rule in Histories

Aussie OR

Comparing Force Ratios to Casualty Exchange Ratios

The validity of the 3-1 rule is no mere academic question. It underpins a great deal of U.S. military policy and warfighting doctrine. Yet, the only time the matter was seriously debated was in the 1980s with reference to the problem of defending Western Europe against the threat of Soviet military invasion.

The Great 3-1 Rule Debate

It is probably long past due to seriously challenge the validity and usefulness of the 3-1 rule again.

Validating Trevor Dupuy’s Combat Models

[The article below is reprinted from Winter 2010 edition of The International TNDM Newsletter.]

A Summation of QJM/TNDM Validation Efforts

By Christopher A. Lawrence

There have been six or seven different validation tests conducted of the QJM (Quantified Judgment Model) and the TNDM (Tactical Numerical Deterministic Model). As the changes to these two models are evolutionary in nature but do not fundamentally change the nature of the models, the whole series of validation tests across both models is worth noting. To date, this is the only model we are aware of that has been through multiple validations. We are not aware of any DOD [Department of Defense] combat model that has undergone more than one validation effort. Most of the DOD combat models in use have not undergone any validation.

The Two Original Validations of the QJM

After its initial development using a 60-engagement WWII database, the QJM was tested in 1973 by application of its relationships and factors to a validation database of 21 World War II engagements in Northwest Europe in 1944 and 1945. The original model proved to be 95% accurate in explaining the outcomes of these additional engagements. Overall accuracy in predicting the results of the 81 engagements in the developmental and validation databases was 93%.[1]

During the same period the QJM was converted from a static model that only predicted success or failure to one capable of also predicting attrition and movement. This was accomplished by adding variables and modifying factor values. The original QJM structure was not changed in this process. The addition of movement and attrition as outputs allowed the model to be used dynamically in successive “snapshot” iterations of the same engagement.

From 1973 to 1979 the QJM’s formulae, procedures, and variable factor values were tested against the results of all of the 52 significant engagements of the 1967 and 1973 Arab-Israeli Wars (19 from the former, 33 from the latter). The QJM was able to replicate all of those engagements with an accuracy of more than 90%?[2]

In 1979 the improved QJM was revalidated by application to 66 engagements. These included 35 from the original 81 engagements (the “development database”), and 31 new engagements. The new engagements included five from World War II and 26 from the 1973 Middle East War. This new validation test considered four outputs: success/failure, movement rates, personnel casualties, and tank losses. The QJM predicted success/failure correctly for about 85% of the engagements. It predicted movement rates with an error of 15% and personnel attrition with an error of 40% or less. While the error rate for tank losses was about 80%, it was discovered that the model consistently underestimated tank losses because input data included all kinds of armored vehicles, but output data losses included only numbers of tanks.[3]

This completed the original validations efforts of the QJM. The data used for the validations, and parts of the results of the validation, were published, but no formal validation report was issued. The validation was conducted in-house by Colonel Dupuy’s organization, HERO [Historical Evaluation Research Organization]. The data used were mostly from division-level engagements, although they included some corps- and brigade-level actions. We count these as two separate validation efforts.

The Development of the TNDM and Desert Storm

In 1990 Col. Dupuy, with the collaborative assistance of Dr. James G. Taylor (author of Lanchester Models of Warfare [vol. 1] [vol. 2], published by the Operations Research Society of America, Arlington, Virginia, in 1983) introduced a significant modification: the representation of the passage of time in the model. Instead of resorting to successive “snapshots,” the introduction of Taylor’s differential equation technique permitted the representation of time as a continuous flow. While this new approach required substantial changes to the software, the relationship of the model to historical experience was unchanged.[4] This revision of the model also included the substitution of formulae for some of its tables so that there was a continuous flow of values across the individual points in the tables. It also included some adjustment to the values and tables in the QJM. Finally, it incorporated a revised OLI [Operational Lethality Index] calculation methodology for modem armor (mobile fighting machines) to take into account all the factors that influence modern tank warfare.[5] The model was reprogrammed in Turbo PASCAL (the original had been written in BASIC). The new model was called the TNDM (Tactical Numerical Deterministic Model).

Building on its foundation of historical validation and proven attrition methodology, in December 1990, HERO used the TNDM to predict the outcome of, and losses from, the impending Operation DESERT STORM.[6] It was the most accurate (lowest) public estimate of U.S. war casualties provided before the war. It differed from most other public estimates by an order of magnitude.

Also, in 1990, Trevor Dupuy published an abbreviated form of the TNDM in the book Attrition: Forecasting Battle Casualties and Equipment Losses in Modern War. A brief validation exercise using 12 battles from 1805 to 1973 was published in this book.[7] This version was used for creation of M-COAT[8] and was also separately tested by a student (Lieutenant Gozel) at the Naval Postgraduate School in 2000.[9] This version did not have the firepower scoring system, and as such neither M-COAT, Lieutenant Gozel’s test, nor Colonel Dupuy’s 12-battle validation included the OLI methodology that is in the primary version of the TNDM.

For counting purposes, I consider the Gulf War the third validation of the model. In the end, for any model, the proof is in the pudding. Can the model be used as a predictive tool or not? If not, then there is probably a fundamental flaw or two in the model. Still the validation of the TNDM was somewhat second-hand, in the sense that the closely-related previous model, the QJM, was validated in the 1970s to 200 World War II and 1967 and 1973 Arab-Israeli War battles, but the TNDM had not been. Clearly, something further needed to be done.

The Battalion-Level Validation of the TNDM

Under the guidance of Christopher A. Lawrence, The Dupuy Institute undertook a battalion-level validation of the TNDM in late 1996. This effort tested the model against 76 engagements from World War I, World War II, and the post-1945 world including Vietnam, the Arab-Israeli Wars, the Falklands War, Angola, Nicaragua, etc. This effort was thoroughly documented in The International TNDM Newsletter.[10] This effort was probably one of the more independent and better-documented validations of a casualty estimation methodology that has ever been conducted to date, in that:

  • The data was independently assembled (assembled for other purposes before the validation) by a number of different historians.
  • There were no calibration runs or adjustments made to the model before the test.
  • The data included a wide range of material from different conflicts and times (from 1918 to 1983).
  • The validation runs were conducted independently (Susan Rich conducted the validation runs, while Christopher A. Lawrence evaluated them).
  • The results of the validation were fully published.
  • The people conducting the validation were independent, in the sense that:

a) there was no contract, management, or agency requesting the validation;
b) none of the validators had previously been involved in designing the model, and had only very limited experience in using it; and
c) the original model designer was not able to oversee or influence the validation.[11]

The validation was not truly independent, as the model tested was a commercial product of The Dupuy Institute, and the person conducting the test was an employee of the Institute. On the other hand, this was an independent effort in the sense that the effort was employee-initiated and not requested or reviewed by the management of the Institute. Furthermore, the results were published.

The TNDM was also given a limited validation test back to its original WWII data around 1997 by Niklas Zetterling of the Swedish War College, who retested the model to about 15 or so Italian campaign engagements. This effort included a complete review of the historical data used for the validation back to their primarily sources, and details were published in The International TNDM Newsletter.[12]

There has been one other effort to correlate outputs from QJM/TNDM-inspired formulae to historical data using the Ardennes and Kursk campaign-level (i.e., division-level) databases.[13] This effort did not use the complete model, but only selective pieces of it, and achieved various degrees of “goodness of fit.” While the model is hypothetically designed for use from squad level to army group level, to date no validation has been attempted below battalion level, or above division level. At this time, the TNDM also needs to be revalidated back to its original WWII and Arab-Israeli War data, as it has evolved since the original validation effort.

The Corps- and Division-level Validations of the TNDM

Having now having done one extensive battalion-level validation of the model and published the results in our newsletters, Volume 1, issues 5 and 6, we were then presented an opportunity in 2006 to conduct two more validations of the model. These are discussed in depth in two articles of this issue of the newsletter.

These validations were again conducted using historical data, 24 days of corps-level combat and 25 cases of division-level combat drawn from the Battle of Kursk during 4-15 July 1943. It was conducted using an independently-researched data collection (although the research was conducted by The Dupuy Institute), using a different person to conduct the model runs (although that person was an employee of the Institute) and using another person to compile the results (also an employee of the Institute). To summarize the results of this validation (the historical figure is listed first followed by the predicted result):

There was one other effort that was done as part of work we did for the Army Medical Department (AMEDD). This is fully explained in our report Casualty Estimation Methodologies Study: The Interim Report dated 25 July 2005. In this case, we tested six different casualty estimation methodologies to 22 cases. These consisted of 12 division-level cases from the Italian Campaign (4 where the attack failed, 4 where the attacker advanced, and 4 Where the defender was penetrated) and 10 cases from the Battle of Kursk (2 cases Where the attack failed, 4 where the attacker advanced and 4 where the defender was penetrated). These 22 cases were randomly selected from our earlier 628 case version of the DLEDB (Division-level Engagement Database; it now has 752 cases). Again, the TNDM performed as well as or better than any of the other casualty estimation methodologies tested. As this validation effort was using the Italian engagements previously used for validation (although some had been revised due to additional research) and three of the Kursk engagements that were later used for our division-level validation, then it is debatable whether one would want to call this a seventh validation effort. Still, it was done as above with one person assembling the historical data and another person conducting the model runs. This effort was conducted a year before the corps and division-level validation conducted above and influenced it to the extent that we chose a higher CEV (Combat Effectiveness Value) for the later validation. A CEV of 2.5 was used for the Soviets for this test, vice the CEV of 3.0 that was used for the later tests.

Summation

The QJM has been validated at least twice. The TNDM has been tested or validated at least four times, once to an upcoming, imminent war, once to battalion-level data from 1918 to 1989, once to division-level data from 1943 and once to corps-level data from 1943. These last four validation efforts have been published and described in depth. The model continues, regardless of which validation is examined, to accurately predict outcomes and make reasonable predictions of advance rates, loss rates and armor loss rates. This is regardless of level of combat (battalion, division or corps), historic period (WWI, WWII or modem), the situation of the combats, or the nationalities involved (American, German, Soviet, Israeli, various Arab armies, etc.). As the QJM, the model was effectively validated to around 200 World War II and 1967 and 1973 Arab-Israeli War battles. As the TNDM, the model was validated to 125 corps-, division-, and battalion-level engagements from 1918 to 1989 and used as a predictive model for the 1991 Gulf War. This is the most extensive and systematic validation effort yet done for any combat model. The model has been tested and re-tested. It has been tested across multiple levels of combat and in a wide range of environments. It has been tested where human factors are lopsided, and where human factors are roughly equal. It has been independently spot-checked several times by others outside of the Institute. It is hard to say what more can be done to establish its validity and accuracy.

NOTES

[1] It is unclear what these percentages, quoted from Dupuy in the TNDM General Theoretical Description, specify. We suspect it is a measurement of the model’s ability to predict winner and loser. No validation report based on this effort was ever published. Also, the validation figures seem to reflect the results after any corrections made to the model based upon these tests. It does appear that the division-level validation was “incremental.” We do not know if the earlier validation tests were tested back to the earlier data, but we have reason to suspect not.

[2] The original QJM validation data was first published in the Combat Data Subscription Service Supplement, vol. 1, no. 3 (Dunn Loring VA: HERO, Summer 1975). (HERO Report #50) That effort used data from 1943 through 1973.

[3] HERO published its QJM validation database in The QJM Data Base (3 volumes) Fairfax VA: HERO, 1985 (HERO Report #100).

[4] The Dupuy Institute, The Tactical Numerical Deterministic Model (TNDM): A General and Theoretical Description, McLean VA: The Dupuy Institute, October 1994.

[5] This had the unfortunate effect of undervaluing WWII-era armor by about 75% relative to other WWII weapons when modeling WWII engagements. This left The Dupuy Institute with the compromise methodology of using the old OLI method for calculating armor (Mobile Fighting Machines) when doing WWII engagements and using the new OLI method for calculating armor when doing modem engagements

[6] Testimony of Col. T. N. Dupuy, USA, Ret, Before the House Armed Services Committee, 13 Dec 1990. The Dupuy Institute File I-30, “Iraqi Invasion of Kuwait.”

[7] Trevor N. Dupuy, Attrition: Forecasting Battle Casualties and Equipment Losses in Modern War (HERO Books, Fairfax, VA, 1990), 123-4.

[8] M-COAT is the Medical Course of Action Tool created by Major Bruce Shahbaz. It is a spreadsheet model based upon the elements of the TNDM provided in Dupuy’s Attrition (op. cit.) It used a scoring system derived from elsewhere in the U.S. Army. As such, it is a simplified form of the TNDM with a different weapon scoring system.

[9] See Gözel, Ramazan. “Fitting Firepower Score Models to the Battle of Kursk Data,” NPGS Thesis. Monterey CA: Naval Postgraduate School.

[10] Lawrence, Christopher A. “Validation of the TNDM at Battalion Level.” The International TNDM Newsletter, vol. 1, no. 2 (October 1996); Bongard, Dave “The 76 Battalion-Level Engagements.” The International TNDM Newsletter, vol. 1, no. 4 (February 1997); Lawrence, Christopher A. “The First Test of the TNDM Battalion-Level Validations: Predicting the Winner” and “The Second Test of the TNDM Battalion-Level Validations: Predicting Casualties,” The International TNDM Newsletter, vol. 1 no. 5 (April 1997); and Lawrence, Christopher A. “Use of Armor in the 76 Battalion-Level Engagements,” and “The Second Test of the Battalion-Level Validation: Predicting Casualties Final Scorecard.” The International TNDM Newsletter, vol. 1, no. 6 (June 1997).

[11] Trevor N. Dupuy passed away in July 1995, and the validation was conducted in 1996 and 1997.

[12] Zetterling, Niklas. “CEV Calculations in Italy, 1943,” The International TNDM Newsletter, vol. 1, no. 6. McLean VA: The Dupuy Institute, June 1997. See also Research Plan, The Dupuy Institute Report E-3, McLean VA: The Dupuy Institute, 7 Oct 1998.

[13] See Gözel, “Fitting Firepower Score Models to the Battle of Kursk Data.”

Military History and Validation of Combat Models

Soldiers from Britain’s Royal Artillery train in a “virtual world” during Exercise Steel Sabre, 2015 [Sgt Si Longworth RLC (Phot)/MOD]

Military History and Validation of Combat Models

A Presentation at MORS Mini-Symposium on Validation, 16 Oct 1990

By Trevor N. Dupuy

In the operations research community there is some confusion as to the respective meanings of the words “validation” and “verification.” My definition of validation is as follows:

“To confirm or prove that the output or outputs of a model are consistent with the real-world functioning or operation of the process, procedure, or activity which the model is intended to represent or replicate.”

In this paper the word “validation” with respect to combat models is assumed to mean assurance that a model realistically and reliably represents the real world of combat. Or, in other words, given a set of inputs which reflect the anticipated forces and weapons in a combat encounter between two opponents under a given set of circumstances, the model is validated if we can demonstrate that its outputs are likely to represent what would actually happen in a real-world encounter between these forces under those circumstances

Thus, in this paper, the word “validation” has nothing to do with the correctness of computer code, or the apparent internal consistency or logic of relationships of model components, or with the soundness of the mathematical relationships or algorithms, or with satisfying the military judgment or experience of one individual.

True validation of combat models is not possible without testing them against modern historical combat experience. And so, in my opinion, a model is validated only when it will consistently replicate a number of military history battle outcomes in terms of: (a) Success-failure; (b) Attrition rates; and (c) Advance rates.

“Why,” you may ask, “use imprecise, doubtful, and outdated history to validate a modem, scientific process? Field tests, experiments, and field exercises can provide data that is often instrumented, and certainly more reliable than any historical data.”

I recognize that military history is imprecise; it is only an approximate, often biased and/or distorted, and frequently inconsistent reflection of what actually happened on historical battlefields. Records are contradictory. I also recognize that there is an element of chance or randomness in human combat which can produce different results in otherwise apparently identical circumstances. I further recognize that history is retrospective, telling us only what has happened in the past. It cannot predict, if only because combat in the future will be fought with different weapons and equipment than were used in historical combat.

Despite these undoubted problems, military history provides more, and more accurate information about the real world of combat, and how human beings behave and perform under varying circumstances of combat, than is possible to derive or compile from arty other source. Despite some discrepancies, patterns are unmistakable and consistent. There is always a logical explanation for any individual deviations from the patterns. Historical examples that are inconsistent, or that are counter-intuitive, must be viewed with suspicion as possibly being poor or false history.

Of course absolute prediction of a future event is practically impossible, although not necessarily so theoretically. Any speculations which we make from tests or experiments must have some basis in terms of projections from past experience.

Training or demonstration exercises, proving ground tests, field experiments, all lack the one most pervasive and most important component of combat: Fear in a lethal environment. There is no way in peacetime, or non-battlefield, exercises, test, or experiments to be sure that the results are consistent with what would have been the behavior or performance of individuals or units or formations facing hostile firepower on a real battlefield.

We know from the writings of the ancients (for instance Sun Tze—pronounced Sun Dzuh—and Thucydides) that have survived to this day that human nature has not changed since the dawn of history. The human factor the way in which humans respond to stimuli or circumstances is the most important basis for speculation and prediction. What about the “scientific” approach of those who insist that we cart have no confidence in the accuracy or reliability of historical data, that it is therefore unscientific, and therefore that it should be ignored? These people insist that only “scientific” data should be used in modeling.

In fact, every model is based upon fundamental assumptions that are intuitive and unprovable. The first step in the creation of a model is a step away from scientific reality in seeking a basis for an unreal representation of a real phenomenon. I have shown that the unreality is perpetuated when we use other imitations of reality as the basis for representing reality. History is less than perfect, but to ignore it, and to use only data that is bound to be wrong, assures that we will not be able to represent human behavior in real combat.

At the risk of repetition, and even of protesting too much, let me assure you that I am well aware of the shortcomings of military history:

The record which is available to us, which is history, only approximately reflects what actually happened. It is incomplete. It is often biased, it is often distorted. Even when it is accurate, it may be reflecting chance rather than normal processes. It is neither precise nor consistent. But, it provides more, and more accurate, information on the real world of battle than is available from the most thoroughly documented field exercises, proving ground less, or laboratory or field experiments.

Military history is imperfect. At best it reflects the actions and interactions of unpredictable human beings. We must always realize that a single historical example can be misleading for either of two reasons: (1) The data may be inaccurate, or (2) The data may be accurate, but untypical.

Nevertheless, history is indispensable. I repeat that the most pervasive characteristic of combat is fear in a lethal environment. For all of its imperfections, military history and only military history represents what happens under the environmental condition of fear.

Unfortunately, and somewhat unfairly, the reported findings of S.L.A. Marshall about human behavior in combat, which he reported in Men Against Fire, have been recently discounted by revisionist historians who assert that he never could have physically performed the research on which the book’s findings were supposedly based. This has raised doubts about Marshall’s assertion that 85% of infantry soldiers didn’t fire their weapons in combat in World War ll. That dramatic and surprising assertion was first challenged in a New Zealand study which found, on the basis of painstaking interviews, that most New Zealanders fired their weapons in combat. Thus, either Americans were different from New Zealanders, or Marshall was wrong. And now American historians have demonstrated that Marshall had had neither the time nor the opportunity to conduct his battlefield interviews which he claimed were the basis for his findings.

I knew Marshall, moderately well. I was fully as aware of his weaknesses as of his strengths. He was not a historian. I deplored the imprecision and lack of documentation in Men Against Fire. But the revisionist historians have underestimated the shrewd journalistic assessment capability of “SLAM” Marshall. His observations may not have been scientifically precise, but they were generally sound, and his assessment has been shared by many American infantry officers whose judgements l also respect. As to the New Zealand study, how many people will, after the war, admit that they didn’t fire their weapons?

Perhaps most important, however, in judging the assessments of SLAM Marshall, is a recent study by a highly-respected British operations research analyst, David Rowland. Using impeccable OR methods Rowland has demonstrated that Marshall’s assessment of the inefficient performance, or non-performance, of most soldiers in combat was essentially correct. An unclassified version of Rowland’s study, “Assessments of Combat Degradation,” appeared in the June 1986 issue of the Royal United Services Institution Journal.

Rowland was led to his investigations by the fact that soldier performance in field training exercises, using the British version of MILES technology, was not consistent with historical experience. Even after allowances for degradation from theoretical proving ground capability of weapons, defensive rifle fire almost invariably stopped any attack in these field trials. But history showed that attacks were often in fact, usually successful. He therefore began a study in which he made both imaginative and scientific use of historical data from over 100 small unit battles in the Boer War and the two World Wars. He demonstrated that when troops are under fire in actual combat, there is an additional degradation of performance by a factor ranging between 10 and 7. A degradation virtually of an order of magnitude! And this, mind you, on top of a comparable built-in degradation to allow for the difference between field conditions and proving ground conditions.

Not only does Rowland‘s study corroborate SLAM Marshall’s observations, it showed conclusively that field exercises, training competitions and demonstrations, give results so different from real battlefield performance as to render them useless for validation purposes.

Which brings us back to military history. For all of the imprecision, internal contradictions, and inaccuracies inherent in historical data, at worst the deviations are generally far less than a factor of 2.0. This is at least four times more reliable than field test or exercise results.

I do not believe that history can ever repeat itself. The conditions of an event at one time can never be precisely duplicated later. But, bolstered by the Rowland study, I am confident that history paraphrases itself.

If large bodies of historical data are compiled, the patterns are clear and unmistakable, even if slightly fuzzy around the edges. Behavior in accordance with this pattern is therefore typical. As we have already agreed, sometimes behavior can be different from the pattern, but we know that it is untypical, and we can then seek for the reason, which invariably can be discovered.

This permits what l call an actuarial approach to data analysis. We can never predict precisely what will happen under any circumstances. But the actuarial approach, with ample data, provides confidence that the patterns reveal what is to happen under those circumstances, even if the actual results in individual instances vary to some extent from this “norm” (to use the Soviet military historical expression.).

It is relatively easy to take into account the differences in performance resulting from new weapons and equipment. The characteristics of the historical weapons and the current (or projected) weapons can be readily compared, and adjustments made accordingly in the validation procedure.

In the early 1960s an effort was made at SHAPE Headquarters to test the ATLAS Model against World War II data for the German invasion of Western Europe in May, 1940. The first excursion had the Allies ending up on the Rhine River. This was apparently quite reasonable: the Allies substantially outnumbered the Germans, they had more tanks, and their tanks were better. However, despite these Allied advantages, the actual events in 1940 had not matched what ATLAS was now predicting. So the analysts did a little “fine tuning,” (a splendid term for fudging). Alter the so-called adjustments, they tried again, and ran another excursion. This time the model had the Allies ending up in Berlin. The analysts (may the Lord forgive them!) were quite satisfied with the ability of ATLAS to represent modem combat. (Or at least they said so.) Their official conclusion was that the historical example was worthless, since weapons and equipment had changed so much in the preceding 20 years!

As I demonstrated in my book, Options of Command, the problem was that the model was unable to represent the German strategy, or to reflect the relative combat effectiveness of the opponents. The analysts should have reached a different conclusion. ATLAS had failed validation because a model that cannot with reasonable faithfulness and consistency replicate historical combat experience, certainly will be unable validly to reflect current or future combat.

How then, do we account for what l have said about the fuzziness of patterns, and the fact that individual historical examples may not fit the patterns? I will give you my rules of thumb:

  1. The battle outcome should reflect historical success-failure experience about four times out of five.
  2. For attrition rates, the model average of five historical scenarios should be consistent with the historical average within a factor of about 1.5.
  3. For the advance rates, the model average of five historical scenarios should be consistent with the historical average within a factor of about 1.5.

Just as the heavens are the laboratory of the astronomer, so military history is the laboratory of the soldier and the military operations research analyst. The scientific basis for both astronomy and military science is the recording of the movements and relationships of bodies, and then analysis of those movements. (In the one case the bodies are heavenly, in the other they are very terrestrial.)

I repeat: Military history is the laboratory of the soldier. Failure of the analyst to use this laboratory will doom him to live with the scientific equivalent of Ptolomean astronomy, whereas he could use the evidence available in his laboratory to progress to the military science equivalent of Copernican astronomy.