1 / 49

Constituting Core collections of Germplasm using morphological descriptors

Constituting Core collections of Germplasm using morphological descriptors. R. Balakrishnan Sugarcane Breeding Institute. Coimbatore- 641 007. Major issues involved in the management of a large gene bank (germplasm collection) are :.

chelsi
Download Presentation

Constituting Core collections of Germplasm using morphological descriptors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Constituting Core collections of Germplasm using morphological descriptors R. Balakrishnan Sugarcane Breeding Institute. Coimbatore- 641 007

  2. Major issues involved in the management of a large gene bank (germplasm collection) are: • Year-to-year maintenance of large collections of germplasm require enormous amount of land, time, labour and other resources • The use of the collection is limited by lack of knowledge of the way in which genetic diversity is distributed in the collection • The users are not fully aware of the variation in the collection that could benefit their breeding programmes or enrich their research projects • It is difficult to decide whether gaps exist or whether new material has to be added to the collection

  3. How to overcome the problems in utilizing large germplasm collections? • By short-listing the field evaluated germplasm & by earmarking a set of accessions holding promise for one or more traits – called a working collection (Harlan, 1972) • By adopting the concept of Core Collection (Frankel, 1984) in identifying such limited sets for effective use of the collection

  4. What’s a Core Collection ? A Core collection or a Core subset is a sub-sample of the base collection (about 5 – 20% size of the base collection) It is sampled in such a manner so as to represent the available genetic variability in the base collection to the maximum possible extent with minimum duplication (or redundancy)

  5. Scientific basis for setting up a core collection (Brown 1989) • The first reason is based on statistical sampling considerations, which essentially assume that breeders, through crossing and selection could recover desirable alleles when required from the core collection. Hence in principle they needed to access only one copy of such alleles • The second reason relates to the genetic structure of plant populations in general and germplasm collections in particular • The third reason relates to easier management and better access and exploitation of the germplasm collections

  6. Advantages of core collections • For breeders a core collection represents a logical first step in screening desirable alleles in the collection • Setting up a core collection is important in understanding the quality of the base collection itself as it helps in elucidating the contents, diversity and duplication in the base collection • It helps in deciding the quantity of conserved seed stocks that needs to be preserved – smaller seed collection for the reserve collection (or non-core set) and larger seed stocks for the core collection. • The time and resources needed to evaluate a new trait in the collection are reduced by allowing evaluation of more number of characters and use of more sophisticated techniques like molecular markers

  7. Core collection Scenario The reports on Global Survey on core collections by the International Plant Genetic Resources Institute (IPGRI) indicates that at least 63 core collections covering 51 crop species have been formed across the world

  8. Procedures of constituting a core collection • Use compiled data on Passport & evaluation of qualitative and quantitative traits – from germplasm catalogues • Constitute appropriate groups wherever possible • Use a suitable sampling procedure to select the entries for the core collection • Verification and validation of the selected core collection • In some instances, the assembly of core collections has been based on a combination of morphological data, biochemical and molecular markers

  9. Statistical / Sampling methods for constituting a core collection • Simple Random sampling (no need for evaluation data on morphological descriptors) • Stratified Random Sampling – i.e first we discover some structure in the base collection by forming groups through • Stratification on the basis of Geographical Origin of accessions in the base collection (Passport data needed) • Stratification on the basis of Multivariate Cluster Analysis (evaluation data on morphological descriptors required) • Stratification on the basis of a combination of Geographical origin & cluster analysis or other schemes that is applicable to the crop species (both passport and evaluation data required) • Purposive or directed sampling using the Principal Component Scores (Noirot et al. 1996) aimed at maximizing the diversity in the core or Information Measure (Balakrishnan, 2002).

  10. How to decide the optimum number of entries in the core collection collection? • By studying the relative efficiencies of different stratified sampling procedures through simulation - by estimating the sampling variance of a diversity measure for varying sample sizes. • Normally the sampling variance of a pooled Shannon Diversity Index (SDI) of the descriptors is a useful criterion. • Since the sampling variance of SDI can not be estimated thro’ formulas, we resort to simulation or boot-strap procedures to estimate the sampling variance of SDI and decide the core collection size as the one beyond which there is no appreciable reduction in the sampling variance. • The best stratification method is decided as the one for which even for a smaller sample size, there is a high value of diversity with minimum sampling variance of the diversity measure.

  11. Constituting a core subset by Stratified Random Sampling method • Having decided the size of the core collection, say 10% of the base collection size, allocate as many accessions randomly from each group to the core subset by • Proportional to Frequency method (P strategy)-when group diversity is proportional to group size. • Proportional to Logarithm of Frequency method (L strategy)-when group sizes differ widely • Constant Frequency method (C- strategy)-when diversity is concentrated in smaller groups. • Proportional to diversity method - when sampling depends on some measure of diversity of each group.

  12. Method of grouping 1 New Guinea Group ID Accessions from 364 31.01 No. of Accessions Group Pooled SDI 2 Indonesia 68 29.61 3 New Caledonia 36 27.82 4 Fiji 18 26.17 5 India 22 27.57 6 & 7 Hawaii & Mauritius 9 23.65 8 Other regions 173 31.19 Grouping of S. officinarum accessions for stratified sampling of the core set Geographical origin Between group : Within group Diversity Component = 58 : 42

  13. Method 1 New Guinea in cluster I Group ID Accessions from 165 30.50 No. of accessions Group Pooled SDI 2 Indonesia in cluster I 32 31.65 3 New Caledonia in cluster I 22 32.13 4 Other regions in cluster I 61 33.18 5 Cluster 2 73 30.95 6 Cluster 3 38 31.31 7 New Guinea in cluster 4 169 30.09 8 Indonesia in cluster 4 25 29.79 9 Others in cluster 4 83 29.79 10 Cluster 5 22 34.87 Grouping on the basis of Cluster Analysis + major sources within clusters Between-Group Diversity : Within-Group Diversity Component = 68 : 32

  14. Method 1 LEAV range 22.0 - 26.0 Group ID Accessions with 39 20.04 No. of accessions 24.90 Group Pooled SDI Group Mean LEAV 2 …………… 26.0 - 27.5 63 23.98 26.70 3 …………… 27.5 – 28.5 66 26.25 28.00 4 …………… 28.5 – 29.5 58 27.61 29.00 5 …………… 29.5 – 30.5 79 28.80 30.00 6 …………… 30.5 – 31.5 66 29.78 30.90 7 …………… 31.5 – 32.5 58 30.66 31.90 8 …………… 32.5 – 33.5 54 31.81 32.90 9 …………… 33.5 – 35.0 64 32.93 34.30 10 …………… 35.0 – 36.5 42 33.55 35.80 11 …………… 36.5 – 39.5 52 34.96 37.90 12 …………… 39.5 – 47.5 49 36.68 41.90 Diversity groups on the basis of LEAV index of the accessions Between Group Diversity : Within Group Diversity Component = 71: 29

  15. Purposive sampling methods • Principal Component Analysis Method of Noirot et al. (1996) • In this method a Principal Component Analysis is carried out using quantitative traits data of the base collection • The contribution of the i-th accession to the total variance of the system is computed as: t Pi =  y2ij, j=1 • where yij is the component score of the i-th accession on the j-th principal component and t is the number of principal components extracted • Then, for each accession in the base collection, its relative contribution to the total GSS (Generalized Sum of Squares) is computed as follows- Cri = SS of Component Scores * 100 (p x t) where p = no. of accessions; t = no. of traits (p x t) is called the Generalized Sum of Squares or GSS in short

  16. PCA Method – contd.. • The accessions in the base collection are then arranged in the descending order of magnitude of their contribution to the GSS; and the cumulative contribution of successive accessions to the GSS is also computed. • A logistic regression model of the form: loge y/(A-y) = exp (a + b*n) is fitted to the cumulative values. • The rate of progress (dy/dn) for this model = by(A-y). • Either a fixed percent (say 5-10%) of the top accessions are selected to form the core set or the top accessions are included in the core set until the point of at which the rate of increase in the contribution of the accessions to the GSS starts declining (see the fig. in next slide). • This method is useful for reducing the redundancy in the core set.

  17. PCS Method – contd..

  18. Purposive sampling method (cond) • The second method is similar to PCS method, but here each accession is ranked based on an Information Measure (called the Length of Encoded Attribute Value-LEAV) and the top ranked accessions are included in the core set (Balakrishnan, 2002). • LEAV is evaluated based on the concepts of Information Theory (Shannon, 1948, Wallace and Boulton, 1968) • Each entry in the base collection is assigned a score by combining the evaluation data on a number of characters that are either qualitative or quantitative in nature. • LEAV can be treated as a diversity measure that tells how far each individual is distributed away from the centroid of all individuals. • It can be used to group the accessions in a way similar to cluster analysis (but we use a divisive algorithm for clustering of the accessions) • A typical example of computing LEAV is illustrated in the next slide

  19. Descriptor states Freq -ln(p) Weather marks Present * 146 0.2367 Absent 39 1.5422 Ivory marks Present 153 0.1942 Absent * 32 1.7546 Bud germpore Apical 119 0.4490 Sub-Apical * 46 1.3917 Median 20 2.1919 Geographical origin New Guinea 62 1.1196 Indonesia * 26 1.9622 New Caledonia 19 2.2670 India 15 2.4901 Fiji 13 2.6236 Hawaii 6 3.3168 Mauritius 3 3.8764 Unknown origin 41 1.5250 LEAV = 0.2367 + 1.7546 + 1.3917 + 1.9622 = 5.3452

  20. Grouping of the accessions on the basis of LEAV Index 1. The computed LEAV index for the entries can be arranged in the form of a frequency distribution and the entries divided into L strata, with stratum boundaries x1, x2,…..x(L-1) 2. An optimum stratification strategy can be arrived at such that the pooled variance of LEAV index evaluated through the stratification is minimum. The stratum boundaries are fixed by using the Dalenius formula (Jarque, 1981) through an interactive computer program x(h) x(h) x(h+1) x(h+1) x(h) = ½ {[ x.f(x) dx /  f(x) dx] + [ x.f(x) dx /  f(x) dx]} x(h-1) x(h-1)x(h) x(h)

  21. Advantages of using LEAV for clustering • Multivariate cluster analysis using quantitative traits through Hierarchical methods becomes complicated and unwieldy when number of accessions are more. • In general, a small proportion of accessions (say about 100 entries) is selected at random from the main set, clustered and the remaining entries are grouped into already formed clusters (k-means clustering). • Cluster constitution may differ depending up on the initial selection. • In most cases, we tend to leave out qualitative traits in these methods, though there are procedures using which you can rescale qualitative attributes to quantitative values

  22. Advantages of LEAV for clustering (cont) • LEAV is very easy to compute and we can include all evaluation data (including passport data) that are qualitative or quantitative. • Class-intervals of Quantitative data can be treated as attributes and hence can be used in a similar way to that of qualitative attribute values. • All accessions (even thousands) can be included in one step and a divisive algorithm can be used to form the accessions into diversity groups and these groups can be used for stratified random sampling to constitute the core collection. • See the references cited in the lecture notes for further details.

  23. Verification of the core subsets constituted through various methods • by evaluating the retention level of diversity in the core subsets • by evaluating the retention of association among closely related traits in the core subsets - through correlation and Factor Analysis • by evaluating the redundancy levels in the core subsets - using the empirical distribution a Similarity Index

  24. Measures of Diversity • Quantitative traits • Range • Standard Deviation (SD) • Coefficient of Variance (CV) • Qualitative traits • Shannon-Weaver Diversity Index

  25. Descriptor State Frequencies of INTERNODE SHAPE Absolute Relative Frequency Frequency _________________________________________________ CYLINDRICAL : 306 44.35 TUMESCENT : 45 6.52 BOBBIN : 93 13.48 CONOIDAL : 193 27.97 OBCONOIDAL : 13 1.88 CONCAVE CONVEX : 40 5.80 Total............. 690 100.00 _________________________________________________ Shannon-Weaver Diversity Index (SDI):: 1.4050 Std.Err of Shannon Diversity Index :: 0.0314 Standardized value of SDI.......... :: 0.7842

  26. Diversity as measured by Shannon Diversity Index (SDI) for qualitative descriptors in the whole collection of S. officinarum Descriptor $ SDI Standardized SDI # 1. Ivory marks (2) 0.245 0.353 2. Weather marks (2) 0.188 0.271 3. Internode shape (6) 1.405 0.784 4. Internode alignment (2) 0.482 0.695 5. Internode wax (5) 1.263 0.785 6. Growth cracks (2) 0.672 0.970 7. Stripes on cane (2) 0.427 0.615 8. Bud shape (11) 1.696 0.707 9. Bud germpore (3) 0.512 0.466 10. Bud groove (3) 0.926 0.843 11. Growth ring swelling (3) 0.748 0.681 12. Leaf upper surface (2) 0.192 0.277 13. Leaf carriage (3) 0.859 0.782 14. Sheath prickles (5) 1.403 0.872 15. Sheath clasping (2) 0.627 0.905 16. Ligule shape (12) 2.180 0.877 17. Ligular process symmetry (2) 0.492 0.710 $: Figures in parentheses are corresponding number of descriptor states #: Standardized SDI = SDI / Loge(No. of descriptor states); its value ranging from 0 –1

  27. Sample size 1. Simple random sampling 10 groups based on geographical distribution within major clusters 8 groups based on geographical origin only 12 groups based on the LEAV index 70 30.90 0.3132 Common to all the 3 methods of grouping Mean pooled SDI * Variance (pooled SDI) Mean pooled SDI Variance (pooled SDI) Mean pooled SDI Variance (pooled SDI) 100 31.25 0.1285 “ 140 31.39 0.1364 “ 170 31.54 0.1009 “ 210 31.57 0.0680 “ 2. Frequency proportion method 70 30.92 0.1849 30.77 0.2023 30.89 0.0181 100 31.29 0.1177 31.11 0.1432 31.22 0.0100 140 31.38 0.0999 31.42 0.1117 31.33 0.0050 170 31.50 0.0787 31.49 0.0724 31.48 0.0037 210 31.53 0.0574 31.56 0.0798 31.50 0.0028 Mean diversity and its sampling variance for the core subsets drawn from the whole collection of S. officinarum through simple random sampling and stratified random sampling using different stratification procedures

  28. Sample size 3. Square root proportion method 10 groups based on geographical distribution within major clusters 8 groups based on geographical origin only 12 groups based on the LEAV index 70 31.27 0.2673 30.87 0.2454 31.06 0.0166 Mean pooled SDI * Variance (pooled SDI) Mean pooled SDI Variance (pooled SDI) Mean pooled SDI Variance (pooled SDI) 100 31.65 0.1361 31.14 0.1757 31.35 0.0094 140 31.84 0.0761 31.40 0.1083 31.52 0.0063 170 31.94 0.0662 31.45 0.0891 31.62 0.0039 210 31.96 0.0463 31.57 0.0636 31.68 0.0025 4. Log frequency method 70 31.48 0.2318 30.79 0.2486 31.21 0.0187 100 31.83 0.1037 31.10 0.1163 31.42 0.0096 140 31.95 0.0717 31.36 0.1275 31.70 0.0067 170 32.13 0.0680 31.37 0.0720 31.73 0.0033 210 32.16 0.0359 31.60 0.0540 31.86 0.0025

  29. Sample size 5. Diversity proportional method 10 groups based on geographical distribution within major clusters 8 groups based on geographical origin only 12 groups based on the LEAV index 70 31.06 0.2566 30.87 0.2085 31.50 0.0236 Mean pooled SDI * Variance (pooled SDI) Mean pooled SDI Variance (pooled SDI) Mean pooled SDI Variance (pooled SDI) 100 31.31 0.1495 31.21 0.1840 31.87 0.0115 140 31.58 0.1103 31.39 0.1280 31.97 0.0057 170 31.58 0.0764 31.50 0.1030 32.14 0.0037 210 31.68 0.0495 31.57 0.0660 32.14 0.0030 6. Equal frequency method 70 31.71 0.2287 Not considered 31.19 0.0185 100 32.02 0.1354 Do 31.43 0.0078 140 32.26 0.0746 Do 31.65 0.0055 170 32.29 0.0522 Do 31.73 0.0052 210 32.42 0.0453 Do 31.80 0.0017

  30. Method of sampling PCS-method Grouping criterion NIL NIL Allocation strtegy 99.7 Retention % of Range Retention % CV 137.3 Retention % of GSS 43.33 104.47 Retention % of SDI Total Retention of Diversity 73.90 PCS-method Cluster P 99.8 135.0 42.01 102.24 72.12 PCS-method Cluster L 98.9 137.6 41.86 105.49 73.67 PCS-method Cluster C 98.8 136.0 41.04 107.61 74.32 PCS-method Origin P 99.1 136.8 42.50 104.55 73.53 PCS-method Origin L 98.1 132.3 39.22 108.89 74.05 LEAV Index NIL NIL 97.7 134.8 34.69 124.36 79.53 LEAV Index Cluster P 95.0 126.9 33.50 126.12 79.81 LEAV Index Cluster L 96.1 129.7 33.54 127.14 80.34 LEAV Index Cluster C 98.9 128.9 33.46 126.54 80.00 LEAV Index Origin P 97.7 132.8 34.79 129.62 82.21 LEAV Index Origin L 96.6 126.6 31.79 126.65 79.22 Verification of Core subsets selected through Purposive Sampling in S. officinarum Core subset size = 20% (140 accessions)

  31. General considerations • Need to optimize methods to determine the size of the core collection • Cluster Analysis using a large base collection is cumbersome • There should be scope for the user to pre-determine the extent of diversity or variation that he would like to have in the core collection for various traits • Including accessions with missing data for one or more traits needs consideration • The personal knowledge about the collection by the gene bank curator is also essential in selecting the accessions to the core subset.

  32. Evaluation of retention of association among quantitative traits • Thro’ Factor Analysis, the major factors can be identified and the associated factor loadings on the individual traits in the base collection and the core subset can be evaluated. • A comparison of the factor loadings can then used to infer whether the association among the traits in the base collection is retained in the core subset also.

  33. Variable Factor-1 Whole collection Sample Size = 15% (100) Sample Size = 20% (140) Sucrose 0.965 0.945 SRS 0.974 PCS-based LEAV Index- based 0.981 SRS 0.957 PCS-bases 0.968 LEAV-Index-based 0.976 Brix-300 0.905 0.905 0.909 0.927 0.905 0.884 0.921 Purity 0.854 0.827 0.862 0.899 0.852 0.842 0.885 Brix-200 0.604 0.718 0.633 0.791 0.687 0.587 0.771 %variance 32.0 36.1 30.1 33.8 33.9 30.7 33.6 Factor-2 Stk. Girth 0.853 0.747 0.816 0.887 0.746 0.8538 0.886 Stk.Wt 0.832 …. 0.819 0.870 …. 0.8898 0.867 Leaf Wid 0.705 0.735 …. 0.810 0.735 0.6566 0.807 Leaf Lng 0.497 0.584 0.582 0.644 0.657 …. 0.622 NMC -0.570 -0.726 -0.654 -0.636 -0.639 …. -0.629 %variance 23.5 21.5 22.6 30.6 23.8 24.6 29.1 Retention of association among groups of quantitative traits in core subsets obtained thro’ random and purposive sampling in S. officinarum

  34. Evaluation of redundancy in the core 1. A measure of similarity between any two accessions in the core collection is computed by making use of available information on quantitative and qualitative trait. Likewise for all possible pairs of accessions the similarity index can be computed (there would be N* (N-1)/2 such coefficients) 2. The empirical distribution of the similarity coefficient is then tabulated 3. The relative frequencies of accession-pairs having a range of similarity coefficients {like 0-0.5 (least similar); 0.5-0.7 (moderately similar) and more than 0.7 (highly similar) can then be analyzed

  35. Range of similarity coefficient 0.0 – 0.1 Core subset of 15% size based on 0.00 0.00 0.00 Core subset of 20% size based on 0.00 0.00 0.00 0.1 – 0.2 0.00 0.00 0.00 0.00 0.00 0.00 Random $ sampling PCS-method LEAV Index Random $ sampling PCS-method LEAV Index 0.2 – 0.3 0.00 0.00 0.00 0.00 0.00 0.00 0.3 – 0.4 0.19 0.40 1.29 0.11 0.26 0.79 0.4 – 0.5 4.47 8.42 19.54 3.77 6.75 15.72 0.5 – 0.6 27.52 35.69 49.15 25.86 33.09 48.44 0.6 – 0.7 45.77 41.27 25.72 47.10 43.54 30.06 0.7 – 0.8 20.13 12.83 4.06 21.07 15.10 4.68 0.8 – 0.9 1.90 1.37 0.22 2.05 1.24 0.29 0.9 – 1.0 0.02 0.02 0.02 0.04 0.02 0.02 Distribution of similarity coefficient among all possible pairs of accessions in S. officinarum (as relative % of total number of accessions pairs)

  36. Constituting core subsets with pre-assigned frequency distributions for several traits simultaneously Core subsets constituted through simple or stratified random sampling in general represent the diversity in the base collection for a reasonable core size but does not satisfy specific user’s needs. Core subsets constituted through purposive sampling using the PCS-method or the LEAV index method on the other hand are expected to result in higher level of diversity than that of random sampling but it would not be possible to predict the pattern of variation for the individual traits in the core collection.

  37. So, can the user of a genetic resource decide the pattern of diversity to be represented in the core collection? As an example, an attribute (say, spiny leaves) may be present in 10% of the accessions and absent in 90% of the accessions in the base collection. This indicates a standardized SDI of 0.47 (in the scale 0-1) for this descriptor. Can the user of the genetic resource obtain a core subset with say, the attribute being present in 70% of the accessions and absent in 30% of the accessions in the core (with a standardized SDI of 0.88)? This implies an entirely different distribution pattern of the descriptor states in the core subset and a better variation in the relative frequencies of the descriptor states.

  38. If the variation in the relative frequency of attribute states of only one descriptor is pre-determined, then it is quite simple to draw such a core set. However, when the frequency densities of several descriptors (qualitative or quantitative) are to be pre-assigned simultaneously for the core subset of a given size, the conventional sampling strategies are not adequate. Hence a new technique for constituting such a user-defined core subset with pre-assigned core size and frequency distributions with respect to several traits has been proposed by Balakrishnan (2002)

  39. Basic strategies in delineating a core subset with pre-determined density distributions with respect to several traits • Decide a suitable core collection size (and hence the non-core group size). • Decide the pattern of variation to be realized in the core subset (and hence for the non-core group also) for each of the selected traits (qualitative & quantitative traits) • If stratification of the accessions in the base collection is available on some criterion (say, on the basis of geographical origin), the user may also decide the allocation pattern of accessions from the diversity groups to the core subset. • Compute a pair of diversity indices for each accession in the base collection, one corresponding to the core subset and the other to the non-core group based on the joint density distributions. • Any accession that has the least value of the diversity index on the core group is allocated to the core subset. The process is repeated until all accessions are screened.

  40. Pre-assigning the frequency density of qualitative traits • For illustrating the method, two types of frequency transformation can be used in pre-assigning the frequency density of the qualitative attributes, viz. Square-root- proportion and Log-frequency methods. These two frequency transformations result in a higher variation of the frequency proportions of the attribute states of a qualitative trait (and hence result in higher SDI values) in the core subset.

  41. Bud germpore Apical 581 0.8420 0.6341 462 0.9149 Descriptor & descriptor states Frequency in whole collection @ Relative frequency in the whole collection= pi Relative frequency fixed for the core subset = qi Frequency in non-core group Relative frequency in non-core group Sub-Apical 89 0.1290 0.2482 43 0.0851 Median 20 0.0290 0.1177 0 0.0000 Standardized SDI 0.4657 0.7992 0.2782 An example of fixing the descriptor state frequencies of a core subset as per the square-root proportion method @ : Size of the whole collection = 690; Size of core collection = 185; Size of non-core group = 505 s Where qi = pi / [pj] is the square-root proportion freq. transformation j=1

  42. Descriptor & descriptor states Location of spines on OIB Freq. in base collection = pi Relative frequency in the base collection Relative frequency fixed for the core subset =qi Freq. in non-core group Relative frequency in non-core subset None 133 0.0409 0.2101 45 0.0158 Tip only 62 0.0191 0.1476 0 0.0000 Tip & few basal 88 0.0271 0.1923 7 0.0026 Tip & few apical 6 0.0018 0.0143 0 0.0000 Tip & all along the margin 2961 0.9111 0.4357 2778 0.9816 Standardized SDI 0.2489 0.8437 0.0676 An example of fixing the descriptor state frequencies of the core subset as per the Log-Frequency method Size of the main collection = 3250; Size of core = 420 ; Size of non-core group = 2830 s qi = Log(Ni)/ [ {Log(Nj)}], is the Log-frequency transformation j=1

  43. Descriptor Whole collection Core subset* Non-core group Absolute freq. Relative freq. Absolute freq. Relative freq. Absolute freq. Relative freq. Leaf width (cm) 2.50- 3.50 27 0.039 17 0.092 10 0.020 3.50- 4.50 179 0.259 39 0.211 140 0.277 4.50- 5.50 312 0.452 67 0.362 245 0.485 5.50- 6.50 137 0.199 34 0.184 103 0.204 6.50- 7.50 31 0.045 24 0.129 7 0.014 > 7.50 4 0.006 4 0.022 0 0.000 Total size 690 1.000 185 1.000 505 1.000 Mean 5.03 5.16 4.98 Std. Deviation 0.872 1.173 0.724 CV% 17.33 22.75 14.55 Range 2.50-9.60 2.50-9.60 3.00-7.30 Std. SDI 0.740 0.879 0.726 Example of pre-assigned frequency profile of a quantitative descriptor in the core subset and the non-core group

  44. Allocating an accession with a set of measurement & attribute values to the core subset with pre-assigned frequency densities • For each accession k in the base collection, compute the LEAV index that is based on the pre-assigned frequency densities of the traits under consideration fixed for the proposed core subset. Add to this the length of the information code ( = loge(S/n1)), that corresponds to the core size, where S is the base collection size and n1 is the core size. This is denoted by F(k,1). • Compute the corresponding LEAV index for each accession that is based on the frequency densities of non-core group. Add to this the length of the information code ( = loge(S/n2)), that corresponds to the non-core group size, where S is the base collection size and n2 is the non-core group size. This is denoted by F(k,2). • If F(k,1) <= F(k,2), then the accession is allocated to the core subset, else it is allocated to the non-core group. • Continue till all accessions are screened.

  45. Descriptor states Freq- core group * Freq- non-core group * c[m,d,t] for core group c[m,d,t] for non-core Weather marks Present 146 498 0.2407 0.0159 Absent 39 7 1.5422 4.1491 Ivory marks Present 153 505 0.1942 0.0020 Absent 32 0 1.7346 6.2285 Bud germpore Apical 119 462 0.4490 0.0928 Sub-Apical 46 43 1.3863 2.4463 Median 20 0 2.1919 6.2305 Geographical origin New Guinea 62 302 1.1196 0.5265 Indonesia 26 42 1.9669 2.4791 New Caledonia 19 17 2.2670 3.3499 India 15 7 2.4901 4.1608 Fiji 13 5 2.6236 4.4485 Hawaii 6 0 3.3168 6.2403 Mauritius 3 0 3.8764 6.2403 Unknown origin 41 132 1.5250 1.3499     F[k,1] = log(690/185) + 0.2407 + 1.7346 + 1.3863 = 4.6779; F[k,2] = log(690/505) + 0.0159 + 6.2285 + 2.4463 = 9.0028

  46. Thank You

More Related