Tag Quantified Judgement Model (QJM)

Validating A Combat Model (Part VI)

Advancing Germans halted by 2nd Battalion, Fifth Marine, June 3 1918. Les Mares form 2 1/2 miles west of Belleau Wood attacked the American lines through the wheat fields. From a painting by Harvey Dunn. [U.S. Navy]

[The article below is reprinted from April 1997 edition of The International TNDM Newsletter.]

The First Test of the TNDM Battalion-Level Validations: Predicting the Winners
by Christopher A. Lawrence

CASE STUDIES: WHERE AND WHY THE MODEL FAILED CORRECT PREDICTIONS

World War I (12 cases):

Yvonne-Odette (Night)—On the first prediction, selected the defender as a winner, with the attacker making no advance. The force ratio was 0.5 to 1. The historical results also show e attacker making no advance, but rate the attacker’s mission accomplishment score as 6 while the defender is rated 4. Therefore, this battle was scored as a draw.

On the second run, the Germans (Sturmgruppe Grethe) were assigned a CEV of 1.9 relative to the US 9th Infantry Regiment. This produced a draw with no advance.

This appears to be a result that was corrected by assigning the CEV to the side that would be expected to have that advantage. There is also a problem in defining who is winner.

Hill 142—On the first prediction the defending Germans won, whereas in the real world the attacking Marines won. The Marines are recorded as having a higher CEV in a number of battles, so when this correction is put in the Marines win with a CEV of 1.5. This appears to be a case where the side that would be expected to have the higher CEV needed that CEV input into the combat rim to replicate historical results.

Note that while many people would expect the Germans to have the higher CEV, at this juncture in WWI the German regular army was becoming demoralized, while the US Army was highly motivated, trained and fresh. While l did not initially expect to see a superior CEV for the US Marines, when l did see it l was not surprised. I also was not surprised to note that the US Army had a lower CEV than the Marine Corps or that the German Sturmgruppe Grethe had a higher CEV than the US side. As shown in the charts below, the US Marines’ CEV is usually higher than the German CEV for the engagements of Belleau Wood, although this result is not very consistent in value. But this higher value does track with Marine Corps legend. l personally do not have sufficient expertise on WWI to confirm or deny the validity of the legend.

West Wood I—0n the first prediction the model rated the battle a draw with minimal advance (0.265 km) for the attacker, whereas historically the attackers were stopped cold with a bloody repulse. The second run predicted a very high CEV of 2.3 for the Germans, who stopped the attackers with a bloody repulse. The results are not easily explainable.

Bouresches I (Night)—On the first prediction the model recorded an attacker victory with an advance of 0.5 kilometer. Historically, the battle was a draw with an attacker advance of one kilometer. The attacker’s mission accomplishment score was 5, while the defender’s was 6. Historically, this battle could also have been considered an attacker victory. A second run with an increased German CEV to 1.5 records it as a draw with no advance. This appears to be a problem in defining who is the winner.

West Wood II—On the first run, the model predicted a draw with an advance of 0.3 kilometers. Historically, the attackers won and advanced 1.6 kilometers. A second run with a US CEV of 1.4 produced a clear attacker victory. This appears to be a case where the side that would be expected to have the higher CEV needed that CEV input into the combat run.

North Woods I—On the first prediction, the model records the defender winning, while historically the attacker won. A second run with a US CEV of 1.5 produced a clear attacker victory. This appears to be a case where the side that would be expected to have the higher CEV needed that CEV input into the combat run.

Chaudun—On the first prediction, the model predicted the defender winning when historically, the attacker clearly won. A second run with an outrageously high US CEV of 2.5 produced a clear attacker victory. The results are not easily explainable.

Medeah Farm—On the first prediction, the model recorded the defender as winning when historically the attacker won with high casualties. The battle consists of a small number of German defenders with lots of artillery defending against a large number of US attackers with little artillery. On the second run, even with a US CEV of 1.6, the German defender won. The model was unable to select a CEV that would get a correct final result yet reflect the correct casualties. The model is clearly having a problem with this engagement.

Exermont—On the first prediction, the model recorded the defender as winning when historically, the attacker did, with both the attackers and the defender’s mission accomplishment scores being rated at 5. The model did rate the defender‘s casualties too high, so when it calculated what the CEV should be, it gave the defender a higher CEV so that it could bring down the defenders losses relative to the attackers. Otherwise, this is a normal battle. The second prediction was no better. The model is clearly having a problem with this engagement due to the low defender casualties.

Mayache Ravine—The model predicted the winner (the attacker) correctly on the first run, with the attacker having an opposed advance of 0.8 kilometer. Historically, the attacker had an opposed rate of advance of 1.3 kilometers. Both sides had a mission accomplishment score of 5. The problem is that the model predicted higher defender casualties than the attacker, while in the actual battle the defender had lower casualties that the attacker. On the second run, therefore, the model put in a German CEV of 1.5, which resulted in a draw with the attacker advancing 0.3 kilometers. This brought the casualty estimates more in line, but turned a successful win/loss prediction into one that was “off by one.” The model is clearly having a problem with this engagement due to the low defender casualties.

La Neuville—The model also predicted the winner (the attacker) correctly here, with the attacker advancing 0.5 kilometer. In the historical battle they advanced 1.6 kilometers. But again, the model predicted lower attacker losses than the defender losses, while in the actual battle the defender losses were much lower than the attacker losses. So, again on the second run, the model gave the defender (the Germans) a CEV of 1.4, which turned an accurate win/loss prediction into an inaccurate one. It still didn’t do a very good job on the casualties. The model is clearly having a problem with this engagement due to the low defender casualties.

Hill 252—On the first run, the model predicts a draw with a distanced advanced of 0.2 km, while the real battle was an attacker victory with an advance of 2.9 kilometers. The model’s casualty predictions are quite good. On the second run, the model correctly predicted an attacker win with a US CEV of 1.5. The distance advanced increases to 0.6 kilometer, while the casualty prediction degrades noticeably. The model is having some problems with this engagement that are not really explainable, but the results are not far off the mark.

Next: WWII Cases

Validating A Combat Model (Part V)

[The article below is reprinted from April 1997 edition of The International TNDM Newsletter.]

The First Test of the TNDM Battalion-Level Validations: Predicting the Winners
by Christopher A. Lawrence

Part II

CONCLUSIONS:

WWI (12 cases):

For the WWI battles, the nature of the prediction problems are summarized as:

CONCLUSION: In the case of the WWI runs, five of the problem engagements were due to confusion of defining a winner or a clear CEV existing for a side that should have been predictable. Seven out of the 23 runs have some problems, with three problems resolving themselves by assigning a CEV value to a side that may not have deserved it. One (Medeah Farm) was just off any way you look at it, and three suffered a problems because historically the defenders (Germans) suffered surprisingly low losses. Two had the battle outcome predicted correctly on the first run, and then had the outcome incorrectly predicted after a CEV was assigned.

With 5 to 7 clear failures (depending on how you count them), this leads one to conclude that the TNDM can be relied upon to predict the winner in a WWI battalion-level battle in about 70% of the cases.

WWII (8 cases):

For the WWII battles, the nature of the prediction problems are summarized as:

CONCLUSION: In the case of the WWII runs, three of the problem engagements were due to confusion of defining a winner or a clear CEV existing for a side that should have been predictable. Four out of the 23 runs suffered a problem because historically the defenders (Germans) suffered surprisingly low losses and one case just simply assigned a possible unjustifiable CEV. This led to the battle outcome being predicted correctly on the first run, then incorrectly predicted after CEV was assigned.

With 3 to 5 clear failures, one can conclude that the TNDM can be relied upon to predict the winner in a WWII battalion-level battle in about 80% of the cases.

Modern (8 cases):

For the post-WWll battles, the nature of the prediction problems are summarized as:

CONCLUSION: ln the case of the modem runs, only one result was a problem. In the other seven cases, when the force with superior training is given a reasonable CEV (usually around 2), then the correct outcome is achieved. With only one clear failure, one can conclude that the TNDM can be relied upon to predict the winner in a modern battalion-level battle in over 90% of the cases.

FINAL CONCLUSIONS: In this article, the predictive ability of the model was examined only for its ability to predict the winner/loser. We did not look at the accuracy of the casualty predictions or the accuracy of the rates of advance. That will be done in the next two articles. Nonetheless, we could not help but notice some trends.

First and foremost, while the model was expected to be a reasonably good predictor of WWII combat, it did even better for modem combat. It was noticeably weaker for WWI combat. In the case of the WWI data, all attrition figures were multiplied by 4 ahead of time because we knew that there would be a fit problem otherwise.

This would strongly imply that there were more significant changes to warfare between 1918 and 1939 than between 1939 and 1989.

Secondly, the model is a pretty good predictor of winner and loser in WWII and modern cases. Overall, the model predicted the winner in 68% of the cases on the first run and in 84% of the cases in the run incorporating CEV. While its predictive powers were not perfect, there were 13 cases where it just wasn’t getting a good result (17%). Over half of these were from WWI, only one from the modern period.

In some of these battles it was pretty obvious who was going to win. Therefore, the model needed to do a step better than 50% to be even considered. Historically, in 51 out of 76 cases (67%). the larger side in the battle was the winner. One could predict the winner/loser with a reasonable degree of success by just looking at that rule. But the percentage of the time the larger side won varied widely with the period. In WWI the larger side won 74% of the time. In WWII it was 87%. In the modern period it was a counter-intuitive 47% of the time, yet the model was best at selecting the winner in the modern period.

The model’s ability to predict WWI battles is still questionable. It obviously does a pretty good job with WWII battles and appears to be doing an excellent job in the modern period. We suspect that the difference in prediction rates between WWII and the modern period is caused by the selection of battles, not by any inherit ability of the model.

RECOMMENDED CHANGES: While it is too early to settle upon a model improvement program, just looking at the problems of winning and losing, and the ancillary data to that, leads me to three corrections:

  1. Adjust for times of less than 24 hours. Create a formula so that battles of six hours in length are not 1/4 the casualties of a 24-hour battle, but something greater than that (possibly the square root of time). This adjustment should affect both casualties and advance rates.
  2. Adjust advance rates for smaller unit: to account for the fact that smaller units move faster than larger units.
  3. Adjust for fanaticism to account for those armies that continue to fight after most people would have accepted the result, driving up casualties for both sides.

Next Part III: Case Studies

Validating A Combat Model (Part IV)

[The article below is reprinted from April 1997 edition of The International TNDM Newsletter.]

The First Test of the TNDM Battalion-Level Validations: Predicting the Winners
by Christopher A. Lawrence

Part I

In the basic concept of the TNDM battalion-level validation, we decided to collect data from battles from three periods: WWI, WWII, and post-WWII. We then made a TNDM run for each battle exactly as the battle was laid out, with both sides having the same CEV [Combat Effectiveness Value]. The results of that run indicated what the CEV should have been for the battle, and we then made a second run using that CEV. That was all we did. We wanted to make sure that there was no “tweaking” of the model for the validation, so we stuck rigidly to this procedure. We then evaluated each run for its fit in three areas:

  1. Predicting the winner/loser
  2. Predicting the casualties
  3. Predicting the advance rate

We did end up changing two engagements around. We had a similar situation with one WWII engagement (Tenaru River) and one modern period engagement (Bir Gifgafa), where the defender received reinforcements part-way through the battle and counterattacked. In both cases we decided to run them as two separate battles (adding two more battles to our database), with the conditions from the first engagement being the starting strength, plus the reinforcements, for the second engagement. Based on our previous experience with running Goose Green, for all the Falklands Island battles we counted the Milans and Carl Gustavs as infantry weapons. That is the only “tweaking” we did that affected the battle outcome in the model. We also put in a casualty multiplier of 4 for WWI engagements, but that is discussed in the article on casualties.

This is the analysis of the first test, predicting the winner/loser. Basically, if the attacker won historically, we assigned it a value of 1, a draw was 0, and a defender win was -1. In the TNDM results summary, it has a column called “winner” which records either an attacker win, a draw, or a defender win. We compared these two results. If they were the same, this is a “correct” result. If they are “off by one,” this means the model predicted an attacker win or loss, where the actual result was a draw, or the model predicted a draw, where the actual result was a win or loss. If they are “off by two” then the model simply missed and predicted the wrong winner.

The results are (the envelope please….):

It is hard to determine a good predictability from a bad one. Obviously, the initial WWI prediction of 57% right is not very good, while the Modern second run result of 97% is quite good. What l would really like to do is compare these outputs to some other model (like TACWAR) to see if they get a closer fit. I have reason to believe that they will not do better.

Most cases in which the model was “off by 1″ were easily correctable by accounting for the different personnel capabilities of the army. Therefore, just to look where the model really failed. let‘s just look at where it simply got the wrong winner:

The TNDM is not designed or tested for WWI battles. It is basically designed to predict combat between 1939 and the present. The total percentages without the WWI data in it are:

Overall, based upon this data I would be willing to claim that the model can predict the correct winner 75% of the time without accounting for human factors and 90% of the time if it does.

CEVs: Quite simply a user of the TNDM must develop a CEV to get a good prediction. In this particular case, the CEVs were developed from the first run. This means that in the second run, the numbers have been juggled (by changing the CEV) to get a better result. This would make this effort meaningless if the CEVs were not fairly consistent over several engagements for one side versus its other side. Therefore, they are listed below in broad groupings so that the reader can determine if the CEVs appear to be basically valid or are simply being used as a “tweak.”

Now, let’s look where it went wrong. The following battles were not predicted correctly:

There are 19 night engagements in the data base, five from WWI, three from WWII, and 11 modern. We looked at whether the miss prediction was clustered among night engagements and that did not seem to be the case. Unable to find a pattern, we examined each engagement to see what the problem was. See the attachments at the end of this article for details.

We did obtain CEVs that showed some consistency. These are shown below. The Marines in World War l record the following CEVs in these WWI battles:

Compare those figures to the performance of the US Army:

In the above two and in all following cases, the italicized battles are the ones with which we had prediction problems.

For comparison purposes, the CEVs were recorded in the battles in World War II between the US and Japan:

For comparison purposes, the following CEVs were recorded in Operation Veritable:

These are the other engagements versus Germans for which CEVs were recorded:

For comparison purposes, the following CEVs were recorded in the post-WWII battles between Vietnamese forces and their opponents:

Note that the Americans have an average CEV advantage of 1 .6 over the NVA (only three cases) while having a 1.8 advantage over the VC (6 cases).

For comparison purposes, the following CEVs were recorded in the battles between the British and Argentine’s:

Next: Part II: Conclusions

Validating A Combat Model (Part III)

[The article below is reprinted from April 1997 edition of The International TNDM Newsletter.]

Numerical Adjustment of CEV Results: Averages and Means
by Christopher A. Lawrence and David L. Bongard

As part of the battalion-level validation effort, we made two runs with the model for each test case—one without CEV [Combat Effectiveness Value] incorporated and one with the CEV incorporated. The printout of a TNDM [Tactical Numerical Deterministic Model] run has three CEV figures for each side: CEVt CEVl and CEVad. CEVt shows the CEV as calculated on the basis of battlefield results as a ratio of the performance of side a versus side b. It measures performance based upon three factors: mission accomplishment, advance, and casualty effectiveness. CEVt is calculated according to the following formula:

P′ = Refined Combat Power Ratio (sum of the modified OLls). The ′ in P′ indicates that this ratio has been “refined” (modified) by two behavioral values already: the factor for Surprise and the Set Piece Factor.

CEVd = 1/CEVa (the reciprocal)

In effect the formula is relative results multiplied by the modified combat power ratio. This is basically the formulation that was used for the QJM [Quantified Judgement Model].

In the TNDM Manual, there is an alternate CEV method based upon comparative effective lethality. This methodology has the advantage that the user doesn’t have to evaluate mission accomplishment on a ten point scale. The CEVI calculated according to the following formula:

In effect, CEVt is a measurement of the difference in results predicted by the model from actual historical results based upon assessment for three different factors (mission success, advance rates, and casualties), while CEVl is a measurement of the difference in predicted casualties from actual casualties. The CEVt and the CEVl of the defender is the reciprocal of the one for the attacker.

Now the problem comes in when one creates the CEVad, which is the average of the two CEVs above. l simply do not know why it was decided to create an alternate CEV calculation from the old QJM method, and then average the two, but this is what is currently being done in the model. This averaging results in a revised CEV for the attacker and for the defender that are not reciprocals of each other, unless the CEVt and the CEVl were the same. We even have some cases where both sides had a CEVad of greater than one. Also, by averaging the two, we have heavily weighted casualty effectiveness relative to mission effectiveness and mission accomplishment.

What was done in these cases (again based more on TDI tradition or habit, and not on any specific rule) was:

(1.) If CEVad are reciprocals, then use as is.

(2.) If one CEV is greater than one while the other is less than 1,  then add the higher CEV to the value of the reciprocal of the lower CEV (1/x) and divide by two. This result is the CEV for the superior force, and its reciprocal is the CEV for the inferior force.

(3.) If both CEVs are above zero, then we divide the larger CEVad value by the smaller, and use its result as the superior force’s CEV.

In the case of (3.) above, this methodology usually results in a slightly higher CEV for the attacker side than if we used the average of the reciprocal (usually 0.1 or 0.2 higher). While the mathematical and logical consistency of the procedure bothered me, the logic for the different procedure in (3.) was that the model was clearly having a problem with predicting the engagement to start with, but that in most cases when this happened before (meaning before the validation), a higher CEV usually produced a better fit than a lower one. As this is what was done before. I accepted it as is, especially if one looks at the example of Mediah Farm. If one averages the reciprocal with the US’s CEV of 8.065, one would get a CEV of 4.13. By the methodology in (3.), one comes up with a more reasonable US CEV of 1.58.

The interesting aspect is that the TNDM rules manual explains how CEVt, CEVl and CEVad are calculated, but never is it explained which CEVad (attacker or defender) should be used. This is the first explanation of this process, and was based upon the “traditions” used at TDI. There is a strong argument to merge the two CEVs into one formulation. I am open to another methodology for calculating CEV. I am not satisfied with how CEV is calculated in the TNDM and intend to look into this further. Expect another article on this subject in the next issue.

Validating A Combat Model (Part II)

[The article below is reprinted from October 1996 edition of The International TNDM Newsletter.]

Validation of the TNDM at Battalion Level
by Christopher A. Lawrence

The original QJM (Quantified Judgement Model) was created and validated using primarily division-level engagements from WWII and the 1967 and 1973 Mid-East Wars. For a number of reasons, we are now using the TNDM (Tactical Numerical Deterministic Model) for analyzing lower-level engagements. We expect, with the changed environment in the world, this trend to continue.

The model, while designed to handle battalion-level engagements, was never validated for those size engagements. There were only 16 engagements in the original QJM Database with less than 5,000 people on one side, and only one with less than 2,000 people on a side. The sixteen smallest engagements are:

While it is not unusual in the operations research community to use unvalidated models of combat, it is a very poor practice. As TDI is starting to use this model for battalion-level engagements, it is time it was formally validated for that use. A model that is validated at one level of combat is not validated to represent sizes, types and forms of combat to which it has not been tested. TDI is undertaking a battalion-level validation effort for the TNDM. We intend to publish the material used and the results of the validation in the International TNDM Newsletter. As part of this battalion-level validation we will also be looking at a number of company-level engagements. Right now, my intention is to simply just throw all the engagements into the same hopper and see what comes out.

By battalion-level, I mean any operation consisting of the equivalent of two or less reinforced battalions on one side. Three or more battalions imply a regiment or brigade—level operation. A battalion in combat can range widely in strength, but that usually does not have an authorized strength in excess of 900. Therefore, the upper limit for a battalion—level engagement is 2,000 people, while its lower limit can easily go below 500 people. Only one engagement in the original OJM Database fits that definition of a battalion-level engagement. HERO, DMSI, TND & Associates, and TDI (all companies founded by Trevor N. Dupuy) examined a number of small engagements over the years. HERO assembled 23 WWI engagements for the Land Warfare Database (LWDB), TDI has done 15 WWII small unit actions for the Suppression contract and Dave Bongard has assembled four others from that period for the Pacific, DMSI did 14 battalion-level engagements from Vietnam for a study on low intensity conflict 10 years ago, and Dave Bongard has been independently looking into the Falkland Islands War and other post-WWII sources to locate 10 more engagements, and we have three engagements that Trevor N. Dupuy did for South Africa. We added two other World War II engagements and the three smallest engagements from the list to the left (those marked with an asterisk). This gives us a list of 74 additional engagements that can be used to test the TNDM.

The smallest of these engagements is 220 people on both sides (100 vs I20), while the largest engagement on this list is 5,336 versus 3,270 or 8,679 vs 725. These 74 engagements consist of 23 engagements from WWI, 22 from WWII, and 29 post-1945 engagements. There are three engagements where both sides have over 3,000 men and 3 more where both sides are above 2,000 men. In the other 68 engagements, at least one side is below 2,000, while in 50 of the engagements, both sides are below 2,000.

This leaves the following force sizes to be tested:

These engagements have been “randomly” selected in the sense that the researchers grabbed whatever had been done and whatever else was conveniently available. It is not a proper random selection, in the sense that every war in this century was analyzed and a representative number of engagements was taken from each conflict. This is not practical, so we settle for less than perfect data selection.

Furthermore, as many of these conflicts are with countries that do not have open archives (and in many cases limited unit records) some of the opposing forces strength and losses had to be estimated. This is especially true with the Viet Nam engagements. It is hoped that the errors in estimation deviate equally on both sides of the norm, but there is no way of knowing that until countries like the People’s Republic of China and Vietnam open up their archives for free independent research.

TDI intends to continue to look for battalion-level and smaller engagements for analysis, and may add to this data base over time. If some of our readers have any other data assembled, we would be interested in seeing it. In the next issue we will publish the preliminary results of our validation.

Note that in the above table, for World War II, German, Japanese, and Axis forces are listed in italics, while US, British, and Allied forces are listed in regular typeface, Also, in the VERITABLE engagements, the 5/7th Gordons’ action continues the assault of the 7th Black Watch, and that the 9th Cameronians assumed the attack begun by the 2d Gordon Highlanders.

Tu-Vu is described in some detail in Fall’s Street Without Joy (pp. 51-53). The remaining Indochina/SE Asia engagements listed here are drawn from a QJM-based analysis of low-intensity operations (HERO Report 124, Feb 1988).

The coding for source and validation status, on the extreme right of each engagement line in the D Cas column, is as follows:

  • n indicates an engagement which has not been employed for validation, but for which good data exists for both sides (35 total).
  • Q indicates an engagement which was part of the original QJM database (3 total).
  • Q+ indicates an engagement which was analyzed as part of the QJM low-intensity combat study in 1988 (14 total).
  • T indicates an engagement analyzed with the TNDM (20 total).

Validating A Combat Model

The question of validating combat models—“To confirm or prove that the output or outputs of a model are consistent with the real-world functioning or operation of the process, procedure, or activity which the model is intended to represent or replicate”—as Trevor Dupuy put it, has taken up a lot of space on the TDI blog this year. What this discussion did not address is what an effort to validate a combat model actually looks like. This will be the first in a series of posts that will do exactly that.

Under the guidance of Christopher A. Lawrence, TDI undertook a battalion-level validation of Dupuy’s Tactical Numerical Deterministic Model (TNDM) in late 1996. This effort tested the model against 76 engagements from World War I, World War II, and the post-1945 world including Vietnam, the Arab-Israeli Wars, the Falklands War, Angola, Nicaragua, etc. It was probably one of the more independent and better-documented validations of a casualty estimation methodology that has ever been conducted to date, in that:

  • The data was independently assembled (assembled for other purposes before the validation) by a number of different historians.
  • There were no calibration runs or adjustments made to the model before the test.
  • The data included a wide range of material from different conflicts and times (from 1918 to 1983).
  • The validation runs were conducted independently (Susan Rich conducted the validation runs, while Christopher A. Lawrence evaluated them).
  • The results of the validation were fully published.
  • The people conducting the validation were independent, in the sense that:

a) there was no contract, management, or agency requesting the validation;
b) none of the validators had previously been involved in designing the model, and had only very limited experience in using it; and
c) the original model designer was not able to oversee or influence the validation. (Dupuy passed away in July 1995 and the validation was conducted in 1996 and 1997.)

The validation was not truly independent, as the model tested was a commercial product of TDI, and the person conducting the test was an employee of the Institute. On the other hand, this was an independent effort in the sense that the effort was employee-initiated and not requested or reviewed by the management of the Institute.

Descriptions and outcomes of this validation effort were first reported in The International TNDM Newsletter. Chris Lawrence also addressed validation of the TNDM in Chapter 19 of War by Numbers (2017).

More on the QJM/TNDM Italian Battles

Troops of the U.S. 36th Infantry Division advance inland on Red Beach, Salerno, Italy, 1943. [ibiblio/U.S. Center for Military History]

[The article below is reprinted from December 1998 edition of The International TNDM Newsletter.]

More on the QJM/TNDM Italian Battles
by Richard C. Anderson, Jr.

In regard to Niklas Zetterling’s article and Christopher Lawrence’s response (Newsletter Volume 1, Number 6) [and Christopher Lawrence’s 2018 addendum] I would like to add a few observations of my own. Recently I have had occasion to revisit the Allied and German records for Italy in general and for the Battle of Salerno in particular. What I found is relevant in both an analytical and an historical sense.

The Salerno Order of Battle

The first and most evident observation that I was able to make of the Allied and German Order of Battle for the Salerno engagements was that it was incorrect. The following observations all relate to the table found on page 25 of Volume 1, Number 6.

The divisional totals are misleading. The U.S. had one infantry division (the 36th) and two-thirds of a second (the 45th, minus the 180th RCT [Regimental Combat Team] and one battalion of the 157th Infantry) available during the major stages of the battle (9-15 September 1943). The 82nd Airborne Division was represented solely by elements of two parachute infantry regiments that were dropped as emergency reinforcements on 13-14 September. The British 7th Armored Division did not begin to arrive until 15-16 September and was not fully closed in the beachhead until 18-19 September.

The German situation was more complicated. Only a single panzer division, the 16th, under the command of the LXXVI Panzer Corps was present on 9 September. On 10 September elements of the Hermann Goring Parachute Panzer Division, with elements of the 15th Panzergrenadier Division under tactical command, began arriving from the vicinity of Naples. Major elements of the Herman Goring Division (with its subordinated elements of the 15th Panzergrenadier Division) were in place and had relieved elements of the 16th Panzer Division opposing the British beaches by 11 September. At the same time the 29th Panzergrenandier Division began arriving from Calabria and took up positions opposite the U.S. 36th Divisions in and south of Altavilla, again relieving elements of the 16th Panzer Division. By 11-12 September the German forces in the northern sector of the beachhead were under the command of the XIV Panzer Corps (Herman Goring Division (-), elements of the 15th Panzergrenadier Division and elements of the 3rd Panzergrenadier Division), while the LXXVI Panzer Corps commanded the 16th Panzer Division, 29th Panzergrenadier Division, and elements of the 26th Panzer Division. Unfortunately for the Germans the 16th Panzer Division’s zone was split by the boundary between the XIV and LXXVI Corps, both of whom appear to have had operational control over different elements of the division. Needless to say, the German command and control problems in this action were tremendous.[1]

The artillery totals given in the table are almost inexplicable. The numbers of SP [self-propelled] 75mm howitzers is a bit fuzzy, inasmuch as this was a non-standardized weapon on a half-track chassis. It was allocated to the infantry regimental cannon company (6 tubes) and was also issued to tank and tank destroyer battalions as a stopgap until purpose-designed systems could be brought into production. The 105mm SP was also present on a half-track chassis in the regimental cannon company (2 tubes) and on a full-track chassis in the armored field artillery battalion (18 tubes). The towed 105mm artillery was present in the five field artillery battalions present of the 36th and 45th divisions and in a single non-divisional battalion assigned to the VI Corps. The 155mm howitzers were only present in the two divisional field artillery battalions, the general support artillery assigned to the VI Corps, the 36th Field Artillery Regiment, did not arrive until 16 September. No 155mm gun battalions landed in Italy until October 1943. The U.S. artillery figures should approximately be as follows:

75mm Howitzer (SP)

2 per infantry battalion

28

6 per tank battalion

12

Total

40
105mm Howitzer (SP)

2 per infantry regiment

10

1 armored FA battalion[2]

18

5 divisional FA battalions

60

1 non-divisional FA battalion

12

Total

100
155mm Howitzer

2 divisional FA battalions

24
3″ Tank Destroyer

3 battalions

108

Thus, the U.S. artillery strength is approximately 272 versus 525 as given in the chart.

The British artillery figures are also suspect. Each of the British divisions present, the 46th and 56th, had three regiments (battalions in U.S. parlance) of 25-pounder gun-howitzers for a total of 72 per division. There is no evidence of the presence of the British 3-inch howitzer, except possibly on a tank chassis in the support tank role attached to the tank troop headquarters of the armor regiment (battalion) attached to the X Corps (possibly 8 tubes). The X Corps had a single medium regiment (battalion) attached with either 4.5 inch guns or 5.5 inch gun-howitzers or a mixture of the two (16 tubes). The British did not have any 7.2 inch howitzers or 155mm guns at Salerno. I do not know where the figure for British 75mm howitzers is from, although it is possible that some may have been present with the corps armored car regiment.

Thus the British artillery strength is approximately 168 versus 321 as given in the chart.

The German artillery types are highly suspect. As Niklas Zetterling deduced, there was no German corps or army artillery present at Salemo. Neither the XIV or LXXVI Corps had Heeres (army) artillery attached. The two battalions of the 7lst Nebelwerfer regiment and one battery of 170mm guns (previously attached to the 15th Panzergrenadier Division) were all out of action, refurbishing and replenishing equipment in the vicinity of Naples. However, U.S. intelligence sources located 42 Italian coastal gun positions, including three 149mm (not 132mm) railway guns defending the beaches. These positions were taken over by German personnel on the night before the invasion. That they fired at all in the circumstances is a comment on the professionalism of the German Army. The remaining German artillery available was with the divisional elements that arrived to defend against the invasion forces. The following artillery strengths are known for the German forces at Salerno:

16th Panzer Division (as of 3 September):

14 75mm infantry support howitzers
11 150mm SP infantry support howitzers
10 105mm howitzers
8 105mm SP howitzers
4 105mm guns
8 150mm howitzers
5 150mm SP howitzers
5 88mm AA guns

26th Panzer Division (as of 12 September):

15 75mm infantry support howitzers
12 150mm infantry support howitzers
6 105mm SP howitzers
12 105mm howitzers
10 150mm SP howitzers
4 150mm howitzers

Herman Goring Parachute Panzer Division (as of 13 September):

6-8 75mm infantry support howitzers
8 150mm infantry support howitzers
24 105mm howitzers
12 105mm SP howitzers
4 105mm guns
8 150mrn howitzers
6 150mm SP howitzers
6 150mm multiple rocket launchers
12 88mm AA guns

29th Panzergrenadier Division

106 artillery pieces (types unknown)

15th Panzergrenadier Division (elements):

10-12 105mm howitzers

3d Panzergrenadier Division

6 150mm infantry support howitzers

Non-divisional:

501st Army Flak Battalion (probably 20mm and 37mm AA only)
I/49th Flak Battalion (probably 8 88mm AA guns)

Thus, German artillery strength is about 342 tubes versus 394 as given in the chart.[3]

Armor strengths are equally suspect for both the Allied and German forces. It should be noted however, that the original QJM database considered wheeled armored cars to be the equivalent of a light tank.

Only two U.S. armor battalions were assigned to the initial invasion force, with a total of 108 medium and 34 light tanks. The British X Corps had a single armor regiment (battalion) assigned with approximately 67 medium and 10 light tanks. Thus, the Allies had some 175 medium tanks versus 488 as given in the chart and 44 light tanks versus 236 (including an unknown number of armored cars) as given in the chart.

German armor strength was as follows (operational/in repair as of the date given):

16th Panzer Division (8 September):

7/0 Panzer III flamethrower tanks
12/0 Panzer IV short
86/6 Panzer IV long
37/3 assault guns

29th Panzergrenadier Division (1 September):

32/5 assault guns
17/4 SP antitank
3/0 Panzer III

26th Panzer Division (5 September):

11/? assault guns
10/? Panzer III

Herman Goering Parachute Panzer Division (7 September):

5/? Panzer IV short
11/? Panzer IV long
5/? Panzer III long
1/? Panzer III 75mm
21/? assault guns
3/? SP antitank

15th Panzergrenadier Division (8 September):

6/? Panzer IV long
18/? assault guns

Total 285/18 medium tanks, SP anti-tank, and assault guns. This number actually agrees very well with the 290 medium tanks given in the chart. I have not looked closely at the number of German armored cars but suspect that it is fairly close to that given in the charts.

In general it appears that the original QJM Database got the numbers of major items of equipment right for the Germans, even if it flubbed on the details. On the other hand, the numbers and details are highly suspect for the Allied major items of equipment. Just as a first order “guestimate” I would say that this probably reduces the German CEV to some extent; however, missing from the formula is the Allied naval gunfire support which, although negligible in impact in the initial stages of the battle, had a strong influence on the later stages of the battle.

Hopefully, with a little more research and time, we will be able to go back and revalidate these engagements. In the meantime I hope that this has clarified some of the questions raised about the Italian QJM Database.

NOTES

[1] Exacerbating the German command and control problems was the fact that the Tenth Army, which was in overall command of the XIV Panzer Corps and LXXVI Panzer Corps, had only been in existence for about six weeks. The army’s signal regiment was only partly organized and its quartermaster services were almost nonexistent.

[2] Arrived 13 September, 1 battery in action 13-15 September.

[3] However, the number given for the 29th Panzergrenadier Division appears to be suspiciously high and is not well defined. Hopefully further research may clarify the status of this division.

Response To “CEV Calculations in Italy, 1943”

German infantry defending against the allied landing at Anzio pass a damaged “Elefant” tank destroyer, March 1944. [Wikimedia/Bundesarchiv]

[The article below is reprinted from August 1997 edition of The International TNDM Newsletter. It was written in response to an article by Mr. Zetterling originally published in the June 1997 edition of The International TNDM Newsletter]

Response to Niklas Zetterling’s Article
by Christopher A. Lawrence

Mr. Zetterling is currently a professor at the Swedish War College and previously worked at the Swedish National Defense Research Establishment. As I have been having an ongoing dialogue with Prof. Zetterling on the Battle of Kursk, I have had the opportunity to witness his approach to researching historical data and the depth of research. I would recommend that all of our readers take a look at his recent article in the Journal of Slavic Military Studies entitled “Loss Rates on the Eastern Front during World War II.” Mr. Zetterling does his German research directly from the Captured German Military Records by purchasing the rolls of microfilm from the US National Archives. He is using the same German data sources that we are. Let me attempt to address his comments section by section:

The Database on Italy 1943-44:

Unfortunately, the Italian combat data was one of the early HERO research projects, with the results first published in 1971. I do not know who worked on it nor the specifics of how it was done. There are references to the Captured German Records, but significantly, they only reference division files for these battles. While I have not had the time to review Prof. Zetterling‘s review of the original research. I do know that some of our researchers have complained about parts of the Italian data. From what I’ve seen, it looks like the original HERO researchers didn’t look into the Corps and Army files, and assumed what the attached Corps artillery strengths were. Sloppy research is embarrassing, although it does occur, especially when working under severe financial constraints (for example, our Battalion-level Operations Database). If the research is sloppy or hurried, or done from secondary sources, then hopefully the errors are random, and will effectively counterbalance each other, and not change the results of the analysis. If the errors are all in one direction, then this will produce a biased result.

I have no basis to believe that Prof. Zetterling’s criticism is wrong, and do have many reasons to believe that it is correct. Until l can take the time to go through the Corps and Army files, I intend to operate under the assumption that Prof. Zetterling’s corrections are good. At some point I will need to go back through the Italian Campaign data and correct it and update the Land Warfare Database. I did compare Prof. Zetterling‘s list of battles with what was declared to be the forces involved in the battle (according to the Combat Data Subscription Service) and they show the following attached artillery:

It is clear that the battles were based on the assumption that here was Corps-level German artillery. A strength comparison between the two sides is displayed in the chart on the next page.

The Result Formula:

CEV is calculated from three factors. Therefore a consistent 20% error in casualties will result in something less than a 20% error in CEV. The mission effectiveness factor is indeed very “fuzzy,” and these is simply no systematic method or guidance in its application. Sometimes, it is not based upon the assigned mission of the unit, but its perceived mission based upon the analyst’s interpretation. But, while l have the same problems with the mission accomplishment scores as Mr. Zetterling, I do not have a good replacement. Considering the nature of warfare, I would hate to create CEVs without it. Of course, Trevor Dupuy was experimenting with creating CEVs just from casualty effectiveness, and by averaging his two CEV scores (CEVt and CEVI) he heavily weighted the CEV calculation for the TNDM towards measuring primarily casualty effectiveness (see the article in issue 5 of the Newsletter, “Numerical Adjustment of CEV Results: Averages and Means“). At this point, I would like to produce a new, single formula for CEV to replace the current two and its averaging methodology. I am open to suggestions for this.

Supply Situation:

The different ammunition usage rate of the German and US Armies is one of the reasons why adding a logistics module is high on my list of model corrections. This was discussed in Issue 2 of the Newsletter, “Developing a Logistics Model for the TNDM.” As Mr. Zetterling points out, “It is unlikely that an increase in artillery ammunition expenditure will result in a proportional increase in combat power. Rather it is more likely that there is some kind of diminished return with increased expenditure.” This parallels what l expressed in point 12 of that article: “It is suspected that this increase [in OLIs] will not be linear.”

The CEV does include “logistics.” So in effect, if one had a good logistics module, the difference in logistics would be accounted for, and the Germans (after logistics is taken into account) may indeed have a higher CEV.

General Problems with Non-Divisional Units Tooth-to-Tail Ratio

Point taken. The engagements used to test the TNDM have been gathered over a period of over 25 years, by different researchers and controlled by different management. What is counted when and where does change from one group of engagements to the next. While l do think this has not had a significant result on the model outcomes, it is “sloppy” and needs to be addressed.

The Effects of Defensive Posture

This is a very good point. If the budget was available, my first step in “redesigning” the TNDM would be to try to measure the effects of terrain on combat through the use of a large LWDB-type database and regression analysis. I have always felt that with enough engagements, one could produce reliable values for these figures based upon something other than judgement. Prof. Zetterling’s proposed methodology is also a good approach, easier to do, and more likely to get a conclusive result. I intend to add this to my list of model improvements.

Conclusions

There is one other problem with the Italian data that Prof. Zetterling did not address. This was that the Germans and the Allies had different reporting systems for casualties. Quite simply, the Germans did not report as casualties those people who were lightly wounded and treated and returned to duty from the divisional aid station. The United States and England did. This shows up when one compares the wounded to killed ratios of the various armies, with the Germans usually having in the range of 3 to 4 wounded for every one killed, while the allies tend to have 4 to 5 wounded for every one killed. Basically, when comparing the two reports, the Germans “undercount” their casualties by around 17 to 20%. Therefore, one probably needs to use a multiplier of 20 to 25% to match the two casualty systems. This was not taken into account in any the work HERO did.

Because Trevor Dupuy used three factors for measuring his CEV, this error certainly resulted in a slightly higher CEV for the Germans than should have been the case, but not a 20% increase. As Prof. Zetterling points out, the correction of the count of artillery pieces should result in a higher CEV than Col. Dupuy calculated. Finally, if Col. Dupuy overrated the value of defensive terrain, then this may result in the German CEV being slightly lower.

As you may have noted in my list of improvements (Issue 2, “Planned Improvements to the TNDM”), I did list “revalidating” to the QJM Database. [NOTE: a summary of the QJM/TNDM validation efforts can be found here.] As part of that revalidation process, we would need to review the data used in the validation data base first, account for the casualty differences in the reporting systems, and determine if the model indeed overrates the effect of terrain on defense.

CEV Calculations in Italy, 1943

Tip of the Avalanche by Keith Rocco. Soldiers from the U.S. 36th Infantry Division landing at Salerno, Italy, September 1943.

[The article below is reprinted from June 1997 edition of The International TNDM Newsletter. Chris Lawrence’s response from the August 1997 edition of The International TNDM Newsletter will be posted on Friday.]

CEV Calculations in Italy, 1943
by Niklas Zetterling

Perhaps one of the most debated results of the TNDM (and its predecessors) is the conclusion that the German ground forces on average enjoyed a measurable qualitative superiority over its US and British opponents. This was largely the result of calculations on situations in Italy in 1943-44, even though further engagements have been added since the results were first presented. The calculated German superiority over the Red Army, despite the much smaller number of engagements, has not aroused as much opposition. Similarly, the calculated Israeli effectiveness superiority over its enemies seems to have surprised few.

However, there are objections to the calculations on the engagements in Italy 1943. These concern primarily the database, but there are also some questions to be raised against the way some of the calculations have been made, which may possibly have consequences for the TNDM.

Here it is suggested that the German CEV [combat effectiveness value] superiority was higher than originally calculated. There are a number of flaws in the original calculations, each of which will be discussed separately below. With the exception of one issue, all of them, if corrected, tend to give a higher German CEV.

The Database on Italy 1943-44

According to the database the German divisions had considerable fire support from GHQ artillery units. This is the only possible conclusion from the fact that several pieces of the types 15cm gun, 17cm gun, 21cm gun, and 15cm and 21cm Nebelwerfer are included in the data for individual engagements. These types of guns were almost exclusively confined to GHQ units. An example from the database are the three engagements Port of Salerno, Amphitheater, and Sele-Calore Corridor. These take place simultaneously (9-11 September 1943) with the German 16th Pz Div on the Axis side in all of them (no other division is included in the battles). Judging from the manpower figures, it seems to have been assumed that the division participated with one quarter of its strength in each of the two former battles and half its strength in the latter. According to the database, the number of guns were:

15cm gun 28
17cm gun 12
21cm gun 12
15cm NbW 27
21cm NbW 21

This would indicate that the 16th Pz Div was supported by the equivalent of more than five non-divisional artillery battalions. For the German army this is a suspiciously high number, usually there were rather something like one GHQ artillery battalion for each division, or even less. Research in the German Military Archives confirmed that the number of GHQ artillery units was far less than indicated in the HERO database. Among the useful documents found were a map showing the dispositions of 10th Army artillery units. This showed clearly that there was only one non-divisional artillery unit south of Rome at the time of the Salerno landings, the III/71 Nebelwerfer Battalion. Also the 557th Artillery Battalion (17cm gun) was present, it was included in the artillery regiment (33rd Artillery Regiment) of 15th Panzergrenadier Division during the second half of 1943. Thus the number of German artillery pieces in these engagements is exaggerated to an extent that cannot be considered insignificant. Since OLI values for artillery usually constitute a significant share of the total OLI of a force in the TNDM, errors in artillery strength cannot be dismissed easily.

While the example above is but one, further archival research has shown that the same kind of error occurs in all the engagements in September and October 1943. It has not been possible to check the engagements later during 1943, but a pattern can be recognized. The ratio between the numbers of various types of GHQ artillery pieces does not change much from battle to battle. It seems that when the database was developed, the researchers worked with the assumption that the German corps and army organizations had organic artillery, and this assumption may have been used as a “rule of thumb.” This is wrong, however; only artillery staffs, command and control units were included in the corps and army organizations, not firing units. Consequently we have a systematic error, which cannot be corrected without changing the contents of the database. It is worth emphasizing that we are discussing an exaggeration of German artillery strength of about 100%, which certainly is significant. Comparing the available archival records with the database also reveals errors in numbers of tanks and antitank guns, but these are much smaller than the errors in artillery strength. Again these errors do always inflate the German strength in those engagements l have been able to check against archival records. These errors tend to inflate German numerical strength, which of course affects CEV calculations. But there are further objections to the CEV calculations.

The Result Formula

The “result formula” weighs together three factors: casualties inflicted, distance advanced, and mission accomplishment. It seems that the first two do not raise many objections, even though the relative weight of them may always be subject to argumentation.

The third factor, mission accomplishment, is more dubious however. At first glance it may seem to be natural to include such a factor. Alter all, a combat unit is supposed to accomplish the missions given to it. However, whether a unit accomplishes its mission or not depends both on its own qualities as well as the realism of the mission assigned. Thus the mission accomplishment factor may reflect the qualities of the combat unit as well as the higher HQs and the general strategic situation. As an example, the Rapido crossing by the U.S. 36th Infantry Division can serve. The division did not accomplish its mission, but whether the mission was realistic, given the circumstances, is dubious. Similarly many German units did probably, in many situations, receive unrealistic missions, particularly during the last two years of the war (when most of the engagements in the database were fought). A more extreme example of situations in which unrealistic missions were given is the battle in Belorussia, June-July 1944, where German units were regularly given impossible missions. Possibly it is a general trend that the side which is fighting at a strategic disadvantage is more prone to give its combat units unrealistic missions.

On the other hand it is quite clear that the mission assigned may well affect both the casualty rates and advance rates. If, for example, the defender has a withdrawal mission, advance may become higher than if the mission was to defend resolutely. This must however not necessarily be handled by including a missions factor in a result formula.

I have made some tentative runs with the TNDM, testing with various CEV values to see which value produced an outcome in terms of casualties and ground gained as near as possible to the historical result. The results of these runs are very preliminary, but the tendency is that higher German CEVs produce more historical outcomes, particularly concerning combat.

Supply Situation

According to scattered information available in published literature, the U.S. artillery fired more shells per day per gun than did German artillery. In Normandy, US 155mm M1 howitzers fired 28.4 rounds per day during July, while August showed slightly lower consumption, 18 rounds per day. For the 105mm M2 howitzer the corresponding figures were 40.8 and 27.4. This can be compared to a German OKH study which, based on the experiences in Russia 1941-43, suggested that consumption of 105mm howitzer ammunition was about 13-22 rounds per gun per day, depending on the strength of the opposition encountered. For the 150mm howitzer the figures were 12-15.

While these figures should not be taken too seriously, as they are not from primary sources and they do also reflect the conditions in different theaters, they do at least indicate that it cannot be taken for granted that ammunition expenditure is proportional to the number of gun barrels. In fact there also exist further indications that Allied ammunition expenditure was greater than the German. Several German reports from Normandy indicate that they were astonished by the Allied ammunition expenditure.

It is unlikely that an increase in artillery ammunition expenditure will result in a proportional increase combat power. Rather it is more likely that there is some kind of diminished return with increased expenditure.

General Problems with Non-Divisional Units

A division usually (but not necessarily) includes various support services, such as maintenance, supply, and medical services. Non-divisional combat units have to a greater extent to rely on corps and army for such support. This makes it complicated to include such units, since when entering, for example, the manpower strength and truck strength in the TNDM, it is difficult to assess their contribution to the overall numbers.

Furthermore, the amount of such forces is not equal on the German and Allied sides. In general the Allied divisional slice was far greater than the German. In Normandy the US forces on 25 July 1944 had 812,000 men on the Continent, while the number of divisions was 18 (including the 5th Armored, which was in the process of landing on the 25th). This gives a divisional slice of 45,000 men. By comparison the German 7th Army mustered 16 divisions and 231,000 men on 1 June 1944, giving a slice of 14,437 men per division. The main explanation for the difference is the non-divisional combat units and the logistical organization to support them. In general, non-divisional combat units are composed of powerful, but supply-consuming, types like armor, artillery, antitank and antiaircraft. Thus their contribution to combat power and strain on the logistical apparatus is considerable. However I do not believe that the supporting units’ manpower and vehicles have been included in TNDM calculations.

There are however further problems with non-divisional units. While the whereabouts of tank and tank destroyer units can usually be established with sufficient certainty, artillery can be much harder to pin down to a specific division engagement. This is of course a greater problem when the geographical extent of a battle is small.

Tooth-to-Tail Ratio

Above was discussed the lack of support units in non-divisional combat units. One effect of this is to create a force with more OLI per man. This is the result of the unit‘s “tail” belonging to some other part of the military organization.

In the TNDM there is a mobility formula, which tends to favor units with many weapons and vehicles compared to the number of men. This became apparent when I was performing a great number of TNDM runs on engagements between Swedish brigades and Soviet regiments. The Soviet regiments usually contained rather few men, but still had many AFVs, artillery tubes, AT weapons, etc. The Mobility Formula in TNDM favors such units. However, I do not think this reflects any phenomenon in the real world. The Soviet penchant for lean combat units, with supply, maintenance, and other services provided by higher echelons, is not a more effective solution in general, but perhaps better suited to the particular constraints they were experiencing when forming units, training men, etc. In effect these services were existing in the Soviet army too, but formally not with the combat units.

This problem is to some extent reminiscent to how density is calculated (a problem discussed by Chris Lawrence in a recent issue of the Newsletter). It is comparatively easy to define the frontal limit of the deployment area of force, and it is relatively easy to define the lateral limits too. It is, however, much more difficult to say where the rear limit of a force is located.

When entering forces in the TNDM a rear limit is, perhaps unintentionally, drawn. But if the combat unit includes support units, the rear limit is pushed farther back compared to a force whose combat units are well separated from support units.

To what extent this affects the CEV calculations is unclear. Using the original database values, the German forces are perhaps given too high combat strength when the great number of GHQ artillery units is included. On the other hand, if the GHQ artillery units are not included, the opposite may be true.

The Effects of Defensive Posture

The posture factors are difficult to analyze, since they alone do not portray the advantages of defensive position. Such effects are also included in terrain factors.

It seems that the numerical values for these factors were assigned on the basis of professional judgement. However, when the QJM was developed, it seems that the developers did not assume the German CEV superiority. Rather, the German CEV superiority seems to have been discovered later. It is possible that the professional judgement was about as wrong on the issue of posture effects as they were on CEV. Since the British and American forces were predominantly on the offensive, while the Germans mainly defended themselves, a German CEV superiority may, at least partly, be hidden in two high effects for defensive posture.

When using corrected input data on the 20 situations in Italy September-October 1943, there is a tendency that the German CEV is higher when they attack. Such a tendency is also discernible in the engagements presented in Hitler’s Last Gamble. Appendix H, even though the number of engagements in the latter case is very small.

As it stands now this is not really more than a hypothesis, since it will take an analysis of a greater number of engagements to confirm it. However, if such an analysis is done, it must be done using several sets of data. German and Allied attacks must be analyzed separately, and preferably the data would be separated further into sets for each relevant terrain type. Since the effects of the defensive posture are intertwined with terrain factors, it is very much possible that the factors may be correct for certain terrain types, while they are wrong for others. It may also be that the factors can be different for various opponents (due to differences in training, doctrine, etc.). It is also possible that the factors are different if the forces are predominantly composed of armor units or mainly of infantry.

One further problem with the effects of defensive position is that it is probably strongly affected by the density of forces. It is likely that the main effect of the density of forces is the inability to use effectively all the forces involved. Thus it may be that this factor will not influence the outcome except when the density is comparatively high. However, what can be regarded as “high” is probably much dependent on terrain, road net quality, and the cross-country mobility of the forces.

Conclusions

While the TNDM has been criticized here, it is also fitting to praise the model. The very fact that it can be criticized in this way is a testimony to its openness. In a sense a model is also a theory, and to use Popperian terminology, the TNDM is also very testable.

It should also be emphasized that the greatest errors are probably those in the database. As previously stated, I can only conclude safely that the data on the engagements in Italy in 1943 are wrong; later engagements have not yet been checked against archival documents. Overall the errors do not represent a dramatic change in the CEV values. Rather, the Germans seem to have (in Italy 1943) a superiority on the order of 1.4-1.5, compared to an original figure of 1.2-1.3.

During September and October 1943, almost all the German divisions in southern Italy were mechanized or parachute divisions. This may have contributed to a higher German CEV. Thus it is not certain that the conclusions arrived at here are valid for German forces in general, even though this factor should not be exaggerated, since many of the German divisions in Italy were either newly raised (e.g., 26th Panzer Division) or rebuilt after the Stalingrad disaster (16th Panzer Division plus 3rd and 29th Panzergrenadier Divisions) or the Tunisian debacle (15th Panzergrenadier Division).

Scoring Weapons And Aggregation In Trevor Dupuy’s Combat Models

[The article below is reprinted from the October 1997 edition of The International TNDM Newsletter.]

Consistent Scoring of Weapons and Aggregation of Forces:
The Cornerstone of Dupuy’s Quantitative Analysis of Historical Land Battles
by
James G. Taylor, PhD,
Dept. of Operations Research, Naval Postgraduate School

Introduction

Col. Trevor N. Dupuy was an American original, especially as regards the quantitative study of warfare. As with many prophets, he was not entirely appreciated in his own land, particularly its Military Operations Research (OR) community. However, after becoming rather familiar with the details of his mathematical modeling of ground combat based on historical data, I became aware of the basic scientific soundness of his approach. Unfortunately, his documentation of methodology was not always accepted by others, many of whom appeared to confuse lack of mathematical sophistication in his documentation with lack of scientific validity of his basic methodology.

The purpose of this brief paper is to review the salient points of Dupuy’s methodology from a system’s perspective, i.e., to view his methodology as a system, functioning as an organic whole to capture the essence of past combat experience (with an eye towards extrapolation into the future). The advantage of this perspective is that it immediately leads one to the conclusion that if one wants to use some functional relationship derived from Dupuy’s work, then one should use his methodologies for scoring weapons, aggregating forces, and adjusting for operational circumstances; since this consistency is the only guarantee of being able to reproduce historical results and to project them into the future.

Implications (of this system’s perspective on Dupuy’s work) for current DOD models will be discussed. In particular, the Military OR community has developed quantitative methods for imputing values to weapon systems based on their attrition capability against opposing forces and force interactions.[1] One such approach is the so-called antipotential-potential method[2] used in TACWAR[3] to score weapons. However, one should not expect such scores to provide valid casualty estimates when combined with historically derived functional relationships such as the so-called ATLAS casualty-rate curves[4] used in TACWAR, because a different “yard-stick” (i.e. measuring system for estimating the relative combat potential of opposing forces) was used to develop such a curve.

Overview of Dupuy’s Approach

This section briefly outlines the salient features of Dupuy’s approach to the quantitative analysis and modeling of ground combat as embodied in his Tactical Numerical Deterministic Model (TNDM) and its predecessor the Quantified Judgment Model (QJM). The interested reader can find details in Dupuy [1979] (see also Dupuy [1985][5], [1987], [1990]). Here we will view Dupuy’s methodology from a system approach, which seeks to discern its various components and their interactions and to view these components as an organic whole. Essentially Dupuy’s approach involves the development of functional relationships from historical combat data (see Fig. 1) and then using these functional relationships to model future combat (see Fig, 2).

At the heart of Dupuy’s method is the investigation of historical battles and comparing the relationship of inputs (as quantified by relative combat power, denoted as Pa/Pd for that of the attacker relative to that of the defender in Fig. l)(e.g. see Dupuy [1979, pp. 59-64]) to outputs (as quantified by extent of mission accomplishment, casualty effectiveness, and territorial effectiveness; see Fig. 2) (e.g. see Dupuy [1979, pp. 47-50]), The salient point is that within this scheme, the main input[6] (i.e. relative combat power) to a historical battle is a derived quantity. It is computed from formulas that involve three essential aspects: (1) the scoring of weapons (e.g, see Dupuy [1979, Chapter 2 and also Appendix A]), (2) aggregation methodology for a force (e.g. see Dupuy [1979, pp. 43-46 and 202-203]), and (3) situational-adjustment methodology for determining the relative combat power of opposing forces (e.g. see Dupuy [1979, pp. 46-47 and 203-204]). In the force-aggregation step the effects on weapons of Dupuy’s environmental variables and one operational variable (air superiority) are considered[7], while in the situation-adjustment step the effects on forces of his behavioral variables[8] (aggregated into a single factor called the relative combat effectiveness value (CEV)) and also the other operational variables are considered (Dupuy [1987, pp. 86-89])

Figure 1.

Moreover, any functional relationships developed by Dupuy depend (unless shown otherwise) on his computational system for derived quantities, namely OLls, force strengths, and relative combat power. Thus, Dupuy’s results depend in an essential manner on his overall computational system described immediately above. Consequently, any such functional relationship (e.g. casualty-rate curve) directly or indirectly derivative from Dupuy‘s work should still use his computational methodology for determination of independent-variable values.

Fig l also reveals another important aspect of Dupuy’s work, the development of reliable data on historical battles, Military judgment plays an essential role in this development of such historical data for a variety of reasons. Dupuy was essentially the only source of new secondary historical data developed from primary sources (see McQuie [1970] for further details). These primary sources are well known to be both incomplete and inconsistent, so that military judgment must be used to fill in the many gaps and reconcile observed inconsistencies. Moreover, military judgment also generates the working hypotheses for model development (e.g. identification of significant variables).

At the heart of Dupuy’s quantitative investigation of historical battles and subsequent model development is his own weapons-scoring methodology, which slowly evolved out of study efforts by the Historical Evaluation Research Organization (HERO) and its successor organizations (cf. HERO [1967] and compare with Dupuy [1979]). Early HERO [1967, pp. 7-8] work revealed that what one would today call weapons scores developed by other organizations were so poorly documented that HERO had to create its own methodology for developing the relative lethality of weapons, which eventually evolved into Dupuy’s Operational Lethality Indices (OLIs). Dupuy realized that his method was arbitrary (as indeed is its counterpart, called the operational definition, in formal scientific work), but felt that this would be ameliorated if the weapons-scoring methodology be consistently applied to historical battles. Unfortunately, this point is not clearly stated in Dupuy’s formal writings, although it was clearly (and compellingly) made by him in numerous briefings that this author heard over the years.

Figure 2.

In other words, from a system’s perspective, the functional relationships developed by Colonel Dupuy are part of his analysis system that includes this weapons-scoring methodology consistently applied (see Fig. l again). The derived functional relationships do not stand alone (unless further empirical analysis shows them to hold for any weapons-scoring methodology), but function in concert with computational procedures. Another essential part of this system is Dupuy‘s aggregation methodology, which combines numbers, environmental circumstances, and weapons scores to compute the strength (S) of a military force. A key innovation by Colonel Dupuy [1979, pp. 202- 203] was to use a nonlinear (more precisely, a piecewise-linear) model for certain elements of force strength. This innovation precluded the occurrence of military absurdities such as air firepower being fully substitutable for ground firepower, antitank weapons being fully effective when armor targets are lacking, etc‘ The final part of this computational system is Dupuy’s situational-adjustment methodology, which combines the effects of operational circumstances with force strengths to determine relative combat power, e.g. Pa/Pd.

To recapitulate, the determination of an Operational Lethality Index (OLI) for a weapon involves the combination of weapon lethality, quantified in terms of a Theoretical Lethality Index (TLI) (e.g. see Dupuy [1987, p. 84]), and troop dispersion[9] (e.g. see Dupuy [1987, pp. 84- 85]). Weapons scores (i.e. the OLIs) are then combined with numbers (own side and enemy) and combat- environment factors to yield force strength. Six[10] different categories of weapons are aggregated, with nonlinear (i.e. piecewise-linear) models being used for the following three categories of weapons: antitank, air defense, and air firepower (i.e. c1ose—air support). Operational, e.g. mobility, posture, surprise, etc. (Dupuy [1987, p. 87]), and behavioral variables (quantified as a relative combat effectiveness value (CEV)) are then applied to force strength to determine a side’s combat-power potential.

Requirement for Consistent Scoring of Weapons, Force Aggregation, and Situational Adjustment for Operational Circumstances

The salient point to be gleaned from Fig.1 and 2 is that the same (or at least consistent) weapons—scoring, aggregation, and situational—adjustment methodologies be used for both developing functional relationships and then playing them to model future combat. The corresponding computational methods function as a system (organic whole) for determining relative combat power, e.g. Pa/Pd. For the development of functional relationships from historical data, a force ratio (relative combat power of the two opposing sides, e.g. attacker’s combat power divided by that of the defender, Pa/Pd is computed (i.e. it is a derived quantity) as the independent variable, with observed combat outcome being the dependent variable. Thus, as discussed above, this force ratio depends on the methodologies for scoring weapons, aggregating force strengths, and adjusting a force’s combat power for the operational circumstances of the engagement. It is a priori not clear that different scoring, aggregation, and situational-adjustment methodologies will lead to similar derived values. If such different computational procedures were to be used, these derived values should be recomputed and the corresponding functional relationships rederived and replotted.

However, users of the Tactical Numerical Deterministic Model (TNDM) (or for that matter, its predecessor, the Quantified Judgment Model (QJM)) need not worry about this point because it was apparently meticulously observed by Colonel Dupuy in all his work. However, portions of his work have found their way into a surprisingly large number of DOD models (usually not explicitly acknowledged), but the context and range of validity of historical results have been largely ignored by others. The need for recalibration of the historical data and corresponding functional relationships has not been considered in applying Dupuy’s results for some important current DOD models.

Implications for Current DOD Models

A number of important current DOD models (namely, TACWAR and JICM discussed below) make use of some of Dupuy’s historical results without recalibrating functional relationships such as loss rates and rates of advance as a function of some force ratio (e.g. Pa/Pd). As discussed above, it is not clear that such a procedure will capture the essence of past combat experience. Moreover, in calculating losses, Dupuy first determines personnel losses (expressed as a percent loss of personnel strength, i.e., number of combatants on a side) and then calculates equipment losses as a function of this casualty rate (e.g., see Dupuy [1971, pp. 219-223], also [1990, Chapters 5 through 7][11]). These latter functional relationships are apparently not observed in the models discussed below. In fact, only Dupuy (going back to Dupuy [1979][12] takes personnel losses to depend on a force ratio and other pertinent variables, with materiel losses being taken as derivative from this casualty rate.

For example, TACWAR determines personnel losses[13] by computing a force ratio and then consulting an appropriate casualty-rate curve (referred to as empirical data), much in the same fashion as ATLAS did[14]. However, such a force ratio is computed using a linear model with weapon values determined by the so-called antipotential-potential method[15]. Unfortunately, this procedure may not be consistent with how the empirical data (i.e. the casualty-rate curves) was developed. Further research is required to demonstrate that valid casualty estimates are obtained when different weapon scoring, aggregation, and situational-adjustment methodologies are used to develop casualty-rate curves from historical data and to use them to assess losses in aggregated combat models. Furthermore, TACWAR does not use Dupuy’s model for equipment losses (see above), although it does purport, as just noted above, to use “historical data” (e.g., see Kerlin et al. [1975, p. 22]) to compute personnel losses as a function (among other things) of a force ratio (given by a linear relationship), involving close air support values in a way never used by Dupuy. Although their force-ratio determination methodology does have logical and mathematical merit, it is not the way that the historical data was developed.

Moreover, RAND (Allen [1992]) has more recently developed what is called the situational force scoring (SFS) methodology for calculating force ratios in large-scale, aggregated-force combat situations to determine loss and movement rates. Here, SFS refers essentially to a force- aggregation and situation-adjustment methodology, which has many conceptual elements in common with Dupuy‘s methodology (except, most notably, extensive testing against historical data, especially documentation of such efforts). This SFS was originally developed for RSAS[16] and is today used in JICM[17]. It also apparently uses a weapon-scoring system developed at RAND[18]. It purports (no documentation given [citation of unpublished work]) to be consistent with historical data (including the ATLAS casualty-rate curves) (Allen [1992, p.41]), but again no consideration is given to recalibration of historical results for different weapon scoring, force-aggregation, and situational-adjustment methodologies. SFS emphasizes adjusting force strengths according to operational circumstances (the “situation”) of the engagement (including surprise), with many innovative ideas (but in some major ways has little connection with previous work of others[19]). The resulting model contains many more details than historical combat data would support. It also is methodology that differs in many essential ways from that used previously by any investigator. In particular, it is doubtful that it develops force ratios in a manner consistent with Dupuy’s work.

Final Comments

Use of (sophisticated) mathematics for modeling past historical combat (and extrapolating it into the future for planning purposes) is no reason for ignoring Dupuy’s work. One would think that the current Military OR community would try to understand Dupuy’s work before trying to improve and extend it. In particular, Colonel Dupuy’s various computational procedures (including constants) must be considered as an organic whole (i.e. a system) supporting the development of functional relationships. If one ignores this computational system and simply tries to use some isolated aspect, the result may be interesting and even logically sound, but it probably lacks any scientific validity.

REFERENCES

P. Allen, “Situational Force Scoring: Accounting for Combined Arms Effects in Aggregate Combat Models,” N-3423-NA, The RAND Corporation, Santa Monica, CA, 1992.

L. B. Anderson, “A Briefing on Anti-Potential Potential (The Eigen-value Method for Computing Weapon Values), WP-2, Project 23-31, Institute for Defense Analyses, Arlington, VA, March 1974.

B. W. Bennett, et al, “RSAS 4.6 Summary,” N-3534-NA, The RAND Corporation, Santa Monica, CA, 1992.

B. W. Bennett, A. M. Bullock, D. B. Fox, C. M. Jones, J. Schrader, R. Weissler, and B. A. Wilson, “JICM 1.0 Summary,” MR-383-NA, The RAND Corporation, Santa Monica, CA, 1994.

P. K. Davis and J. A. Winnefeld, “The RAND Strategic Assessment Center: An Overview and Interim Conclusions About Utility and Development Options,” R-2945-DNA, The RAND Corporation, Santa Monica, CA, March 1983.

T.N, Dupuy, Numbers. Predictions and War: Using History to Evaluate Combat Factors and Predict the Outcome of Battles, The Bobbs-Merrill Company, Indianapolis/New York, 1979,

T.N. Dupuy, Numbers Predictions and War, Revised Edition, HERO Books, Fairfax, VA 1985.

T.N. Dupuy, Understanding War: History and Theory of Combat, Paragon House Publishers, New York, 1987.

T.N. Dupuy, Attrition: Forecasting Battle Casualties and Equipment Losses in Modem War, HERO Books, Fairfax, VA, 1990.

General Research Corporation (GRC), “A Hierarchy of Combat Analysis Models,” McLean, VA, January 1973.

Historical Evaluation and Research Organization (HERO), “Average Casualty Rates for War Games, Based on Historical Data,” 3 Volumes in 1, Dunn Loring, VA, February 1967.

E. P. Kerlin and R. H. Cole, “ATLAS: A Tactical, Logistical, and Air Simulation: Documentation and User’s Guide,” RAC-TP-338, Research Analysis Corporation, McLean, VA, April 1969 (AD 850 355).

E.P. Kerlin, L.A. Schmidt, A.J. Rolfe, M.J. Hutzler, and D,L. Moody, “The IDA Tactical Warfare Model: A Theater-Level Model of Conventional, Nuclear, and Chemical Warfare, Volume II- Detailed Description” R-21 1, Institute for Defense Analyses, Arlington, VA, October 1975 (AD B009 692L).

R. McQuie, “Military History and Mathematical Analysis,” Military Review 50, No, 5, 8-17 (1970).

S.M. Robinson, “Shadow Prices for Measures of Effectiveness, I: Linear Model,” Operations Research 41, 518-535 (1993).

J.G. Taylor, Lanchester Models of Warfare. Vols, I & II. Operations Research Society of America, Alexandria, VA, 1983. (a)

J.G. Taylor, “A Lanchester-Type Aggregated-Force Model of Conventional Ground Combat,” Naval Research Logistics Quarterly 30, 237-260 (1983). (b)

NOTES

[1] For example, see Taylor [1983a, Section 7.18], which contains a number of examples. The basic references given there may be more accessible through Robinson [I993].

[2] This term was apparently coined by L.B. Anderson [I974] (see also Kerlin et al. [1975, Chapter I, Section D.3]).

[3] The Tactical Warfare (TACWAR) model is a theater-level, joint-warfare, computer-based combat model that is currently used for decision support by the Joint Staff and essentially all CINC staffs. It was originally developed by the Institute for Defense Analyses in the mid-1970s (see Kerlin et al. [1975]), originally referred to as TACNUC, which has been continually upgraded until (and including) the present day.

[4] For example, see Kerlin and Cole [1969], GRC [1973, Fig. 6-6], or Taylor [1983b, Fig. 5] (also Taylor [1983a, Section 7.13]).

[5] The only apparent difference between Dupuy [1979] and Dupuy [1985] is the addition of an appendix (Appendix C “Modified Quantified Judgment Analysis of the Bekaa Valley Battle”) to the end of the latter (pp. 241-251). Hence, the page content is apparently the same for these two books for pp. 1-239.

[6] Technically speaking, one also has the engagement type and possibly several other descriptors (denoted in Fig. 1 as reduced list of operational circumstances) as other inputs to a historical battle.

[7] In Dupuy [1979, e.g. pp. 43-46] only environmental variables are mentioned, although basically the same formulas underlie both Dupuy [1979] and Dupuy [1987]. For simplicity, Fig. 1 and 2 follow this usage and employ the term “environmental circumstances.”

[8] In Dupuy [1979, e.g. pp. 46-47] only operational variables are mentioned, although basically the same formulas underlie both Dupuy [1979] and Dupuy [1987]. For simplicity, Fig. 1 and 2 follow this usage and employ the term “operational circumstances.”

[9] Chris Lawrence has kindly brought to my attention that since the same value for troop dispersion from an historical period (e.g. see Dupuy [1987, p. 84]) is used for both the attacker and also the defender, troop dispersion does not actually affect the determination of relative combat power PM/Pd.

[10] Eight different weapon types are considered, with three being classified as infantry weapons (e.g. see Dupuy [1979, pp, 43-44], [1981 pp. 85-86]).

[11] Chris Lawrence has kindly informed me that Dupuy‘s work on relating equipment losses to personnel losses goes back to the early 1970s and even earlier (e.g. see HERO [1966]). Moreover, Dupuy‘s [1992] book Future Wars gives some additional empirical evidence concerning the dependence of equipment losses on casualty rates.

[12] But actually going back much earlier as pointed out in the previous footnote.

[13] See Kerlin et al. [1975, Chapter I, Section D.l].

[14] See Footnote 4 above.

[15] See Kerlin et al. [1975, Chapter I, Section D.3]; see also Footnotes 1 and 2 above.

[16] The RAND Strategy Assessment System (RSAS) is a multi-theater aggregated combat model developed at RAND in the early l980s (for further details see Davis and Winnefeld [1983] and Bennett et al. [1992]). It evolved into the Joint Integrated Contingency Model (JICM), which is a post-Cold War redesign of the RSAS (starting in FY92).

[17] The Joint Integrated Contingency Model (JICM) is a game-structured computer-based combat model of major regional contingencies and higher-level conflicts, covering strategic mobility, regional conventional and nuclear warfare in multiple theaters, naval warfare, and strategic nuclear warfare (for further details, see Bennett et al. [1994]).

[18] RAND apparently replaced one weapon-scoring system by another (e.g. see Allen [1992, pp. 9, l5, and 87-89]) without making any other changes in their SFS System.

[19] For example, both Dupuy’s early HERO work (e.g. see Dupuy [1967]), reworks of these results by the Research Analysis Corporation (RAC) (e.g. see RAC [1973, Fig. 6-6]), and Dupuy’s later work (e.g. see Dupuy [1979]) considered daily fractional casualties for the attacker and also for the defender as basic casualty-outcome descriptors (see also Taylor [1983b]). However, RAND does not do this, but considers the defender’s loss rate and a casualty exchange ratio as being the basic casualty-production descriptors (Allen [1992, pp. 41-42]). The great value of using the former set of descriptors (i.e. attacker and defender fractional loss rates) is that not only is casualty assessment more straight forward (especially development of functional relationships from historical data) but also qualitative model behavior is readily deduced (see Taylor [1983b] for further details).