A synopsis for those whose eyes have glazed over

Perhaps you could give a synopsis of it for those of us whose eyes glaze over when reading technical papers like this.

I’m no expert but I did take the holiday weekend to digest the paper and make some notes. Here is my synopsis, for what it is worth. Feel free to make corrections and additions.

Basics of cladistics

Organisms are classified according to a hierarchical system (taxonomy) first established by Carl Linnaeus. This system has traditionally been based on physical features, or morphology. With the advent of biotechnology there has been an ongoing effort to augment or replace morphology with DNA as the basis for taxonomy. These new approaches have taken on the name of cladistics. It is currently as much art as science but trending toward the latter.

This study examines the state of this art with respect to Roses and compares three different DNA technologies in an effort to decide which approach is best, while also giving useful taxonomies.

In all three methods DNA is processed with enzymes to create small fragments. The sequence of nucleic acids (the ‘genetic code’) are then compared using one of the three methods. For each of the three methods a taxonomy (phylogenetic tree) is given. These methods are:

  1. Unweighted Pair Group Method with Arithmetic mean, or UPGMA. This method yields a tree called a phenogram which is based essentially on similarities among the gene sequences. UPGMA is generally regarded as the least best method. The phenogram is given as figure 1.

  2. Parsimony. This method makes assumtions that the simplist (most parsimonius) evolutionary path is the correct one. From among many parsimony techniques the authors chose Wagner Parsimony to include in thier study, giving a number of reasons for this choice. Wagner Parsimony yields a “most parsimonious tree”, or MPT. In this study the authors actually made eight MPT’s, but published only one in their paper, which they said is actually the most parsimonious of the trees that they calculated. The MPT is given as figure 2.

  3. Bayesian Inference. This technique is based on complex algorithms that uses only Bayesian statistics (and so, no assumptions about the data) to generate a phylogenetic tree. The tree has no formal name, the authors simply call it “tree from the Bayesian analysis” and it is given as figure 3.

Each tree has a caption but the captions for each are presented two pages down from its respective tree, this being due to stylistic rules of publication for the American Journal of Botany. Thus, you will need to scroll about to view the caption for a given tree. It is annoying, but the captions are not so informative to us anyway.

The authors discuss differences among the three trees, but for our purposes they are pretty nearly the same. The authors believe that the Bayesian technique is probably best so use the third tree if you trust thier judgement.

For a general discussion on these methods see

It is also important to know that a given entry on a tree represents a single plant. Some species and some named hybrids are represented by more than one plant and so appear more than once on a tree. The major source of the plants used in this study was the Wageningen Botanical Garden, although some plants - especially hybrids - were obtained elsewhere from sources not identified.

How these trees help rose breeders

Probably the most important point from the study is this:

“Apparently, the cultivated gene pool [i.e. the various _hybrid_ roses used in the study] is relatively isolated from the wild gene pool [i.e. the true species roses]. This relatively isolated position is in line with the common practice of breeding new varieties by interbreeding of existing varieties, with only a limited use of wild germplasm. Our results show that the segregation of cultivated and wild accessions is not complete because several wild species cluster with the varieties. This incomplete segregation is also apparent in the results of Scariot et al. (2006) and may be a direct indication of a limited use of wild germplasm in the cultivated gene pool.”

Basically, some wild species have been used more than others and overall there is plenty of room left for breeders to incorporate species into the modern cultivars.

The species which have been used most often are those which cluster closely to the hybrid cultivars in the trees. The ones with the least contribution are those farthest away from the hybrids. This is no doubt old news to rose breeders but the trees do show which species contributed most to a given plant and so can help inform breeding strategies.

The trees can also help by application of the principle that the more closely related two individual plants are, the more likely that a cross can be successfully made.

The basic rule to use when interpreting the trees is the obvious one - the closer two plants are to each other the more closely related they are. The trees vary from each other slightly, but mostly they are very similar. The differences between trees are that one method may show one plant to be evolved from the other while another tree shows the two to be ‘sisters’ which have evolved from a common ancestor.

The authors mention, too, that in a number of instances there appear to be a ‘missing’ common ancestor, a relationship termed “polyphyletic” or “paraphyletic”. The major example cited is the subgroup Pimpinellifoliae, which (they claim) includes R.foetida, R. sericea, R. persica, R. roxburghii, R. hugonis and R. spinosissima, and they debate the pros and cons of lumping these in one group as opposed to splitting some out into separate subgenera. They explicitly say that R. persica (hulthemia) deserves to be demoted, and is not actually a subgenera as many taxonimists have previously claimed.

You will also see the term ‘clade’ used. “A clade is a taxonomic group (such as one of organisms) comprising a single common ancestor and all the descendants of that ancestor”(from wikipedia).

So roses with a polyphyletic relationship are by definition not a clade. But for purposes of informed breeding, and with the level of resolution these particular trees provide, such distinctions are essentially hair-splitting.

For a better explaination see


Hybridization muddies the water

You will notice that some plants, particularly R. multiflora, show up in several different places on the tree. This is because they sometimes used multiple different cultivars of that plant in thier study and these just happened to have genes different enough to fall out in different places on the tree. This should not be surprising especially for R. multiflora, given it’s wide use as a rootstock and it’s broad geographic distribution in nature which increases the likelihood that it has been hybridized, deliberately or not, and the genes which have mixed in cause the particular cultivar to fall closer to some modern ancestors than to the true wild type or species. One should not forget that rose breeding had been going on for over a thousand years in China before any of the Asian roses made it to western Europe.

Indeed, hybridization is mentioned by the authors as the big problem in establishing phylogenetic relationships in roses. At the same time, these trees do give clues about the dominant genetic contributions made in thier hybridization, and breeders can use these clues in planning thier breeding programs.

Unfortunately, and for no stated reason, two of the most important species of Rosa were not included in this study, R. Chinensis and R. Gigantea It is easy enough to guess why these were omitted. The authors are attempting to establish a baseline tree to be used for further study, and to allow discussion about which of the methods is best suited for the purpose. Some China roses, especially the teas, have been so heavily hybridized into modern cultivars that further work is going to be needed before the techniques have sufficient resolving power to discriminate among them. This is evidenced by the thicket assigned as “Rosa” on the trees, which includes most of the hybrid cultivars.

In addition to the generalizations above, there are some minor specifics in the paper that deserve attention.

You will come across mention of “satellite DNA” in several places in the study particularly in relation to the Canina group. It has recently been shown that R. canina has not only the usual complement of normal chromosomes in the nucleus, but also a group of DNA fragments that do not form a chromosome and which are present as a separate little cluster (during meiosis) in the nucleus. This group of DNA fragments is what is meant by satellite DNA. It seems so far to be unique to R. Canina and its offspring, and as can be imagined is a source of potential confusion to the taxonomists who have not so far figured out what it means or how it came about.

You will see the term ‘bootstrap’ in the study with respect to the first two trees and numbers at some branches on these two trees that are these bootstrap values. These have to do with the calculated validity of the branches presented.

In the Bayesian (third) tree there are numbers given at some branchs in the tree. These numbers are analagous to bootstrap values but represent the formal statistical probablility that the branch is real (and not an artifact of the method used). A probability index of 0.8 would therefore mean that there is about an 80% probability of the branch actually existing. Where no probability index is given, this means that the calculated probability was below some critical threshold and so is in relative doubt. You will notice a lot of branches that were detected but which have probabilities that are questionable. The authors are frank about this and much of the results section discusses the merit of particular branches.

The importance of these numbers is that the authors claim the Bayeseian probabilities are better than the bootstrapping values, and use this to support their contention that the Bayesian (third) tree is therefore best representative of the relationships among the roses tested.

You will note at the bottom of each tree there are three plants from the genus Rubus. These are the blackberry, European dewberry and the wineberry. They are included in the analysis as experimental controls. The idea is that because they are known to not be species of the genus Rosa they had better fall into their own genus in the computed results.

The authors discussed briefly, but rejected as immature, a method employing network analysis (as opposed to the three other methods used in this study). My personal opinion is that when this network methodology matures it may well become a very useful tool for breeders because it will allow us to have non-heirarchical, multi-dimensional maps that show the precise genetic relationship of every cultivar with respect to every other. In this way a tree becomes a network that represents not only evolutionary data but also hybridization relationships. Imagine the usefulness of map that allows the breeder to pick a cultivar and then see it’s nearest genetic neighbors, and thier nearest genetic neighbors, and so on. This is the likely direction that Bayesian techniques are heading toward.

Ooooo, thank you Don, this is my kind of reading! (Must stop drooling…)


Yes, thank you Don,

This is great. You spent quite a lot of time on it. Now it makes much more sense.


Don, thanks for your explanation of the article. I like this kind of research so it’s nice that you make it more accesible.

Two notes about the article:

I have been browsing roses for two years in the Wageningen Botanical Garden and I noticed some plants are mislabelled, especially the hybrids. I saw last summer they did a job on re-identifying the roses and they put new labels on them. However, I hope they compared each rose with herbarium specimens to verify. They probably did, but I can’t be sure, since the list is not in the PDF versions of the article. I can’t check if they mentioned their protocol for choosing and verifying the identities of the plants they used.

There may also be another reason for not including R. chinensis and R. gigantea into the tree: they are not available in the Botanical Garden.


Rob, Paul, Fa,

My pleasure. It is an important paper and the phylogenetic tree will be a standard reference until someone else refines it.

.I hope they compared each rose with herbarium specimens to verify.

That assumes the herbaria got the taxonomy right. I’m finding this is mostly not the case as I try to track down omeiensis cultivars, quite a mess.

Down the pike we will see morphology take a back seat to genetics for taxonomy purposes anyway.

they are not available in the Botanical Garden

I thought so at first but they state that they went out of thier way to obtain cultivars that were not in thier collection. My guess is that they ran the algorithms with chinensis and gigantea included and the results were so muddled that they had to take them out. Maybe we will soon know, I have sent a note to the corresponding author and that was one of my questions.



Thank you for the quite accessible explanations.

About the habilities at sorting species at Wageningen Botanical Garden one can suppose they have some as they are at studying caninaes that are probably the least understood rose group.

Hi Rob

I think you can ask them.

Some forty years ago as a learning french greenhouse vegetables grower I went to Wageningen and unannounced met the top world specialist of lettuces that spent a full hour showing me the work he was doing with his staff.

I talked with many papers writers the world over. It is easy and quite instructive.

Most scientist do like to explain their works and researches even to would be growers or hybridizers.

One last world: it is advisable that we do not all phone to Wageningen. The second caller could be wellcome but the third…???

So Rob if you could be kind enough to report…

Dr. Wim Koopman, the corresponding author for this paper on phylogenetic trees, graciously responded to an inquiry I made of him about his paper. A very nice response, and oddly enough he is a lettuce researcher too.

He answers the question raised by Rob about authentication of cultivars use in the study. It seems to be “all of the above”:

“We originally included more species in our work, but later found out that some of the material was either identified incorrectly, or seemed to have hybrid influences of other species. These doubtful entries were omitted from further research, and that’s why some species (including R, chinensis and R. gigantea) are missing. Also, some species were just not available to us at the time we collected samples.”

He expressed his own sentiment that the tree can be used to predict cross compatability though he had reservations about interference by ploidy levels. He said some of his colleagues disagree about being able to predict compatability, but that he has himself used similar trees to do so in his work with lettuce.

He explained in detail how to read genetic distance between species on the three trees, which I will outline here with an example. I give the method for the tree in figure 3. (Figure 2 is read the same way, but figure 1 is read differently and if you would like an explanation let me know.)

Most closely related species do appear next to each other on the tree.

The length of vertical lines on the tree are meaningless with respect to genetic distance. However, the horizontal lines are a direct reflection of genetic distance.

To calculate genetic distance between to cultivars, measure the length of all horizontal lines on the direct path between them. Add to this the length of the horizontal ‘stub’ that marks the uppermost branch between them. Ignore the length of the vertical lines.

For the distance between R. hugonis and R. blanda, there are eight segments to measure plus the stub. They are shown in red in the example below (hopefully there are no red-colorblind readers). This can be easily done with photo editing software, just measure the pixels.

R. hugonis to R. blanda: 762 pixels.

To compare three cultivars you could just make the raw distance measurements between each and subtract, and this would be a valid metric. But I think expressing it as the percentage difference is more informative.

R. hugonis to R. sericea: 634 pixels.

Thus, hugonis is about 17% more closely related to sericea than it is to blanda [ (762-634)/762 = 0.168). All other things being equal, it should be easier to cross hugonis with sericea than with blanda. So maybe it is no coincidence that most of the documented F1 hybrids with hugonis are with sericea (at least two of which were spontaneous), and there are none with blanda.

R. hugonis to R. persica: 453 pixels.

So hugonis is 29% more closely related to persica than it is to sericea [ (634-453)/634 = 0.285 ]. Theoretically it should be easier to get crosses of hugonis with persica than sericea, yet here are none. However, Xerxes papa was Canary Bird and a grandpa was hugonis. Perhaps there should be more work done in this direction, especially since hugonis opens a path to the spinosissima, and from there into modern roses.

The wild card in all this, of course, are the missing Chinensis data. It would be good to have gigantea, ecae, xanthinia and a few others too. I asked about future work in this direction but he has none in the pipeline. However his colleague Dr. Rene Smulders of Plant Research International has and will be working on other Rose projects.

I have made a snapshot of the tree if anyone wants to play with it. You can download it from

Link: holeman.org/images/rosatree.jpg

"“We originally included more species in our work, but later found out that some of the material was either identified incorrectly, or seemed to have hybrid influences of other species.”

Yeah, I bet Rosa californica can have woodsii, pisocarpa, acicularis, nutkana, gymnocarpa in them. Also, I theorize that some strains of Rosa californica are actually naturalized hybrids with Rosa rubinginosa and Rosa canina from when they were introduced as rootstalks a long, long time ago. I bring this up because I have seen a huge variation in Rosa californica from region to region. So much so that it doesnt seem probable that it is 100% environmental selection.

Where do you think bracteata, longicuspis, and clinophylla fit in here?

Where do you think bracteata, longicuspis, and clinophylla fit in here?

One of the phylogenetic trees I have shows clinophylla and bracteata as being grouped together under ‘brac’ and next to Sericea.

I have no data on longicuspic but, being a Synstellae, it is probably next to wichuraiana and multiflora.

One of the problems with these trees is that they are two-dimensional and heirarchical. The reality is that, because of hybridization, such models are inherently inaccurate. We really have to await multidimensional network models to give us truly accurate metrics of relatedness.