Response 2 (Performance of Armies)

In an exchange with one of readers, he mentioned that about the possibility to quantifiably access the performances of armies and produce a ranking from best to worst. The exchange is here:

The Dupuy Institute Air Model Historical Data Study

We have done some work on this, and are the people who have done the most extensive published work on this. Swedish researcher Niklas Zetterling in his book Normandy 1944: German Military Organization, Combat Power and Organizational Effectiveness also addresses this subject, as he has elsewhere, for example, an article in The International TNDM Newsletter, volume I, No. 6, pages 21-23 called “CEV Calculations in Italy, 1943.” It is here: http://www.dupuyinstitute.org/tdipub4.htm

When it came to measuring the differences in performance of armies, Martin van Creveld referenced Trevor Dupuy in his book Fighting Power: German and U.S. Army Performance, 1939-1945, pages 4-8.

What Trevor Dupuy has done is compare the performances of both overall forces and individual divisions based upon his Quantified Judgment Model (QJM). This was done in his book Numbers, Predictions and War: The Use of History to Evaluate and Predict the Outcome of Armed Conflict. I bring the readers attention to pages ix, 62-63, Chapter 7: Behavioral Variables in World War II (pages 95-110), Chapter 9: Reliably Representing the Arab-Israeli Wars (pages 118-139), and in particular page 135, and pages 163-165. It was also discussed in Understanding War: History and Theory of Combat, Chapter Ten: Relative Combat Effectiveness (pages 105-123).

I ended up dedicating four chapters in my book War by Numbers: Understanding Conventional Combat to the same issue. One of the problems with Trevor Dupuy’s approach is that you had to accept his combat model as a valid measurement of unit performance. This was a reach for many people, especially those who did not like his conclusions to start with. I choose to simply use the combined statistical comparisons of dozens of division-level engagements, which I think makes the case fairly convincingly without adding a construct to manipulate the data. If someone has a disagreement with my statistical compilations and the results and conclusions from it, I have yet to hear them. I would recommend looking at Chapter 4: Human Factors (pages 16-18), Chapter 5: Measuring Human Factors in Combat: Italy 1943-1944 (pages 19-31), Chapter 6: Measuring Human Factors in Combat: Ardennes and Kursk (pages 32-48), and Chapter 7: Measuring Human Factors in Combat: Modern Wars (pages 49-59).

Now, I did end up discussing Trevor Dupuy’s model in Chapter 19: Validation of the TNDM and showing the results of the historical validations we have done of his model, but the model was not otherwise used in any of the analysis done in the book.

But….what we (Dupuy and I) have done is a comparison between forces that opposed each other. It is a measurement of combat value relative to each other. It is not an absolute measurement that can be compared to other armies in different times and places. Trevor Dupuy toyed with this on page 165 of NPW, but this could only be done by assuming that combat effectiveness of the U.S. Army in WWII was the same as the Israeli Army in 1973.

Anyhow, it is probably impossible to come up with a valid performance measurement that would allow you to rank an army from best to worse. It is possible to come up with a comparative performance measurement of armies that have faced each other. This, I believe we have done, using different methodologies and different historical databases. I do believe it would be possible to then determine what the different factors are that make up this difference. I do believe it would be possible to assign values or weights to those factors. I believe this would be very useful to know, in light of the potential training and organizational value of this knowledge.

Share this:
Christopher A. Lawrence
Christopher A. Lawrence

Christopher A. Lawrence is a professional historian and military analyst. He is the Executive Director and President of The Dupuy Institute, an organization dedicated to scholarly research and objective analysis of historical data related to armed conflict and the resolution of armed conflict. The Dupuy Institute provides independent, historically-based analyses of lessons learned from modern military experience.

Mr. Lawrence was the program manager for the Ardennes Campaign Simulation Data Base, the Kursk Data Base, the Modern Insurgency Spread Sheets and for a number of other smaller combat data bases. He has participated in casualty estimation studies (including estimates for Bosnia and Iraq) and studies of air campaign modeling, enemy prisoner of war capture rates, medium weight armor, urban warfare, situational awareness, counterinsurgency and other subjects for the U.S. Army, the Defense Department, the Joint Staff and the U.S. Air Force. He has also directed a number of studies related to the military impact of banning antipersonnel mines for the Joint Staff, Los Alamos National Laboratories and the Vietnam Veterans of American Foundation.

His published works include papers and monographs for the Congressional Office of Technology Assessment and the Vietnam Veterans of American Foundation, in addition to over 40 articles written for limited-distribution newsletters and over 60 analytical reports prepared for the Defense Department. He is the author of Kursk: The Battle of Prokhorovka (Aberdeen Books, Sheridan, CO., 2015), America’s Modern Wars: Understanding Iraq, Afghanistan and Vietnam (Casemate Publishers, Philadelphia & Oxford, 2015), War by Numbers: Understanding Conventional Combat (Potomac Books, Lincoln, NE., 2017) and The Battle of Prokhorovka (Stackpole Books, Guilford, CT., 2019)

Mr. Lawrence lives in northern Virginia, near Washington, D.C., with his wife and son.

Articles: 1455

8 Comments

  1. “Anyhow, it is probably impossible to come up with a valid performance measurement that would allow you to rank and army from best to worse”

    I am working on that, but I guess it is rather dilettantish, and a very tedious process, but I am building up on total warmaking potential vs relative tactical effectiveness. China has the largest warmaking potential on earth (defense), the IDF on the other hand still scores very high (eff). Its actually not that difficult, “human material”, political decisions and alliances, ius ad bellum, forfeits, espionage, cultural spheres, martial spirit/ideology, geostrategy/resources – such things complicate accurate predictions, but militarization levels define the general effectiveness of the armed forces, development levels decide how well the individual soldier can be armed and trained, the effectiveness of the institutions and the education of the staff. Population size determines the abililty to bleed, but it stands in relation to the overall economic power and military outlays which then translates into mobilization. Per capita GDP defines material density, high tech industry and traditions increase material quality. Anyway, our modern, post industrialized nations cannot carry wars to such an extent as they were actually used to be. In the past a nation like Britain could occupy 2/3 of the world with limited manpower, while nowdays it would be almost impossible and overly expensive to even hold Western Africa. The nations ability to put up a fight increased in the globalized world (and with the advent of computers, mobile phones and the internet). A collapse would occur quicker than WW1 levels.

    Though people are usually raised with the believe that their nations forces are superior which is a byproduct of nationalism and thus Dupuy’s figures are rarely accepted.
    Was Dupuy serious about his methodology or was it just more “spontaneous” I wonder?

    There is also another way of doing it, a very simple method I utilized: A time based function with the focus on total ammo expenditure over a period with comparable situations, with the engagements respective loss rates and forces casualty infliction. After normalization the values were indeed close to the listed CEVs.

    What always interested me were the Axis Allies, especially Romania vis the Soviets for Case Blue (I did not encounter a lot of accurate data for the operations on their performance), as well as the RKKAs performance over the years, specifically the initial performance from 41 vs mid 44.

  2. Thank you again for the prompt reply; you have given me a lot to think about. I think I need to get a copy of “War by Numbers” to further understand your rationale and models. I agree that it is probably impossible to come up with a valid performance measurement that would allow ranking armies from best to worse. However, and coming back to my original question, is it of scientific value to measure the perception of professionals of armies performances?
    For instance, in the past and due to various reasons (the adoption by many Western historians of the German perspective of WW2, the cold war, etc) there was a clear tendency to underestimate the Red Army performance, whereas this was not always true, especially later in the war. Or the British would be considered rather conservative and lacking imagination whereas the Germans or the Americans more willing to accept new ideas on a tactical level. Or even the Italian army was considered due to various factors to have a rather poor performance, used sometimes as a scapegoat for German failures (Romanians too, for that matter). Are these still the cases?

  3. “However, and coming back to my original question, is it of scientific value to measure the perception of professionals of armies performances?”

    I don’t think so….and let me give you a story as to why.

    In the late 1970s, Trevor Dupuy wrote NPW and Genius for War, discussing the advantages the German Army had relative to the United States Army in World War II (same theme that Creveld picked up). Over the next two decades, four books came out with the theme that this was not the case, some with some very caustic views of Trevor Dupuy’s and others work. They were all written by members of the U.S. Army, who had all graduated from U.S. Army CGSC (Command and General Staff College), and one author eventually rose to become the Chief of U.S. Military History. As is clear from my book War by Numbers, I do not believe there is much validity to their arguments, and this was also discussed in depth in Niklas Zetterling’s book on Normandy (Zetterling is Swedish). Their counter-arguments were not based upon any significant collection of data, yet they were still professionals (including Gulf War veterans). So, if you polled Dupuy, Creveld, Zetterling or I you would get one opinion, if you polled the other four, you would get the opposite opinion.

    Quite simply, nationalist bias, prejudices and other such issues are so predominate in these discussions that I don’t think there is much value in polling opinion. For example, Dupuy also pointed out that Israeli’s were better than the Arab armies it faced, yet no one took him task over that issue. It was only the claim that the Germans were better than the Americans that got people up in arms.

    As far as Red Army performance, I would direct your attention to my Kursk book. In there I put in over 100 engagement sheets showing the strength and losses for each side for each roughly division-sized engagement in the Belgorod Offensive from 4-18 July 1943. These could be inputted into a spread sheet and used to independently compare the performance differences between the German and the Soviet armies. I did some of this in Chapter 6 of War by Numbers.

  4. It is also important, I think, to draw a distinction between Dupuy’s concept of relative combat effectiveness and the larger, broader inquiry by scholars of international relations/political science and security and strategic studies into the subject of military effectiveness.

    Dupuy’s concept applies specifically to the the phenomenon of military combat, that is the force-on-force engagement between two organized combat elements. It is also relative in the sense that his measurements of combat effectiveness were most directly applicable between two sides in a particular conflict, i.e. Germany vs. the Western Allies in World War I and II; Germany vs. the Soviet Union in World War II; and the Israelis vs. various Arab forces in the 1967 and 1973 wars.

    Dupuy was clearly interested in the determinants of why the forces of one country would consistently outperform those of another in military combat and he explored the possibility that combat effectiveness might be a quality more broadly applicable. As Chris has argued above, however, it is not clear that combat effectiveness is, in fact, an objective quality that transcends relative comparison between direct combatants.

    The study of military effectiveness has become an academic sub-discipline in its own right and encompasses assessment of the entire range of national military power, beyond the battlefield effectiveness of combat forces (though that too is explored as well). Many of Dupuy’s critics conflated his concept of combat effectiveness with claims that it purported to be a broader measure of military effectiveness. Effectiveness in combat (specifically ground combat) would only be one of many constituents of a broader measure of national military effectiveness. There is a large, rich literature on the subject, but a good place to start would be the three-volume study edited by Allen Millet and Williamson Murrray, _Military Effectiveness_ (Unwin Hyman: Boston, 1988).

    • Actually they went even below the belt and directly accused him of being “prussophile” and having some sort of agenda. On the other side of the spectrum, certain individuals like Ambrose or Mansoor’s counterarguments usually revolved around “the GI was better because it is simply so, because he was better at improvising”….

  5. You have all raised valid points that I think require much discussion. Let me give you my perspective, and please note that I am not a professional historian or even in the military – I am in the academia. When I was drafted in the army about 20 years ago I served in Leopard 1 MBTs. Nowadays, the Greek army has by far more modern MBTs like the Leo 2A4 and Leo2HEL, yet, if you check the equipment lists you see a total number of MBTs disregarding the huge battle capabilities and performance difference of the material. Therefore when I read that Greece has currently a huge number of MBTs (around 1700) I am confused as most are M48A5s, M60A3s and Leo 1A5s without any upgrade and any chance to survive in a modern war. You know what happened to Iraqi armor when they met US Abrams. Yet this is the case for all kinds of equipment; the quality factor is usually not taken correctly in to account – if taken at all, in the studies I am aware of. I repeat that I am not familiar with your work, or the other literature you mentioned (it is not easily available here), I stumbled upon your webpage browsing at the internet.
    The way I see it, it would be interesting is to check whether modern Americans, Europeans and (why not) Russians have the same or not perception of their armies performances in WW2. This is a topic 75 years old so nationalist bias, etc, may be less than it would be for a modern topic (Syria or Ukraine for instance). The experts can maybe access the quality value better than raw quantitative data. Therefore even to access the difference of opinions between modern US, Europeans, etc may be of some value, to compare produced rankings. Multiple criteria decision making (MCDM) models can assist on this, as they full group decision making capabilities and include both quantitative and qualitative data (usually in the form of a Likert scale). Not all experts need to have the same views; how can you say who is right or not? Regardless, group PROMETHEE for instance can tackle this. Each expert can weigh the criteria on his/her own and apply the rest of the model parameters the way he/she sees fit, the results are then aggregated to a final complete ranking (for each country). I have worked on MCDM for years and the methods have been applied to numerous domains, but not, to my knowledge, to the military.
    I just ordered “War by numbers”, I need to read it through.
    Once again, thank you all for your feedback.

  6. Dr. Papathanasiou,

    Thank you for your interest. They have been many attempts made to weight the value of weapon systems, including so-called “static measures” like firepower scores and combat power scores, and so-called “dynamic measures” like comparisons of SSPKs (Single-shot probability of kill) for weapons versus targets. Obviously for a tank vs. tank fight, an SSPK is a very useful measurement. The problem is that the value of a tank on the battlefield is more than just its ability to kill other tanks. There is a history and discussion on methodologies that started in the late 1950s when the first combat models were being developed (Carmonette). As one can probably tell from perusing this site, these issues are far from being resolved.

Leave a Reply

Your email address will not be published. Required fields are marked *