Sensory Evaluation: Applications and Opportunities in the Product Development Process Herbert Stone 2007

1. 1 Sensory Evaluation: Applications and Opportunities in the Product Development Process Herbert Stone (2007) Introduction Sensory evaluation is a science that measures, analyses and interprets the responses of people to products as perceived by the senses. For decades sensory scientists have researched and developed methods to capture the reactions of people to various kinds of stimuli and better understand the perceptual process, while others have used sensory information to identify successful consumer products.

2. 2 The stimulus can be as simple as a purified chemical used to study perception or it can be a more complex mixture used in the manufacture of a food, a beverage or a cosmetic intended for sale to a consumer. While some sensory professionals continue to study basic processes, many others apply their knowledge to the evaluation of consumer products, contributing product sensory information to a brand�s market strategy.

3. 3 The articles in this issue are focused primarily on this latter topic, to demonstrate the application of sensory resources to the product development process along with discussion about the potential uses and misuses of the obtained sensory information. Before describing some examples, it is useful to provide a brief review of the science behind sensory evaluation.

4. 4 This is essential because there still is a lack of familiarity with the basic principles of the science and specifically how physiology, psychology, experimental design and statistics impact the process. While this may be surprising to some, it is evident when new ideas are presented and adopted without appreciation for whether they have a scientific basis. A recent example is the idea that one should field a discrimination test using a large number of consumers (as many as 100 has been mentioned) based solely on product use.

5. 5 Empirically it has been observed that about 30% of consumers who are high frequency consumers of a particular food type, cannot discriminate differences at better than chance. For most consumers the discrimination test is a behavioral challenge, i.e. consumers have to learn how to take such a test. Following the aforementioned plan, keeping in mind that consumers are not all skilled and have no prior test experience,

6. 6 there is a high likelihood of not finding a difference when, in fact, there is (sometimes referred to as � risk) an error that has serious business consequences. Proposals such as these to avoid the sensory qualifying process, reflects a lack of knowledge of the science and familiarity with the sensory literature. Sensory professionals have a responsibility to assess new ideas and methods for their applicability to their company�s products before considering changes to their practices, hence the need for a brief review of the science.

7. 7 Sensory Science As stated at the outset, knowledge of the science of sensory evaluation must be an integral part of the test planning process. One must be clear as to objectives, the availability of qualified subjects, the method selected, and the appropriate design and analyses. In sensory analytical tests, repeated trials are often ignored without appreciation for the loss of information/conclusions derived from the results.

8. 8 Confidence in a decision based on the obtained data is essential. A test with no repeated trials is no different than completing a single chemical analysis of a product and presenting it as a valid representation of the chemical composition of that product. Sensory evaluation relies on a limited set of resources to function effectively, and these include subjects, methods, facilities and capabilities in experimental design and analysis.

9. 9 To be effective in providing actionable product knowledge, one must have the resources available and know how to use them. For a more extensive discussion see Sidel and Stone (2006) or Stone and Sidel (2004). Subjects People are an essential part of the evaluation process. This is �good� because there are many people available but �bad� because people are different from each other in many ways, and especially in their sensory skills.

10. 10 In addition, because all of us have senses, there is a tendency to assume we are all experts, especially those involved in developing a product or part of a development team. This leads to situations in which decision makers dissatisfied with a sensory test result reinterpret the results to meet their expectations, assuming they are better qualified than the consumer. Regardless of the type of test, subjects must be qualified to participate.

11. 11 The qualifications are two-fold, the first being an average or above average user of the product type and the second is empirical, i.e. the subject must demonstrate that he/she can discriminate differences at better than chance with the product category being tested. This screening/qualifying process takes 3 or 4 sessions (~60-90 minutes each) and about 30 to 40 trials before one can be confident that a specific individual is qualified.

12. 12 As mentioned, about 30% of those who volunteer fail to meet the better-than-chance qualifying criterion. Once qualified, performance is monitored based on test results and serves as a basis for participation in future tests. This is not unlike the demographic and product usage profile that qualifies a consumer to participate in larger scale marketing tests. These latter recruiting criteria have been shown to increase sensitivity and provide results that enable a decision to be reached with greater confidence.

13. 13 Regardless of the test type, failure to follow qualifying procedures increases the risk of concluding there is no difference when, in fact, there is (commonly referred to as � risk). Testing errors lead to decisions that are not correct and this leads to a loss of confidence in sensory and sensory professionals. While space limitations preclude a more comprehensive discussion about this topic, the aforementioned references will be helpful to the interested reader.

14. 14 Methods There are two basic categories of sensory methods: analytical and affective. They are separate types of methods because they answer different questions. On occasion, one encounters requests to combine a difference with a preference test without appreciating the difficulties in analysis and interpretation of the results. Such requests should be rejected vigorously.

15. 15 Analytical methods They include discrimination (or difference) and descriptive methods. The former methods answer the question: were the differences between products perceived and at what confidence level? The latter method identifies the specific types of differences and their magnitudes.

16. 16 A. Discrimination methods include: paired, duo-trio, triangle, and dual standard, to name the most frequently used methods. Each offers advantages and disadvantages, but no one of them is more sensitive than the other. The choice of method is determined by the objective and the type of product, i.e. a product with a lingering effect will reduce sensitivity and one should select a method that limits the number of samplings.

17. 17 The paired test requires the fewest number of samplings (1 pair), followed by the duo-trio (2 pairs), the triangle (3 pairs) and the dual standard (4 pairs). All too often the selection is based on other criteria that reduce test sensitivity and increase � risk. A common practice is to select the triangle test based on the mistaken belief that it is more sensitive because p=0.333 vs 0.50 for the duo-trio and paired tests. The problem with this approach is that the test is a behavioral/perceptual test, not a statistical test.

18. 18 If sensitivity were based solely on the probability value, then a four sample test would be more sensitive, p=0.25, etc. No evidence has been published that supports the sensitivity of the triangle test. Further to this point, sensory fatigue and test complexity directly impacts sensitivity especially for foods and beverages.

19. 19 A second requirement is that the design of a test must incorporate repeated trials to further increase test sensitivity and enhance the likelihood that the difference will be detected. For discrimination testing, the recommended practice is for each subject to provide two judgments, i.e. to evaluate a set of products and make a decision, and after a specified rest interval (usually 2-3 minutes) the process is repeated. This yields two judgments for a subject and enables testing of sensitivity and reliability within and across the subjects, by serving order, etc. which further enhances confidence in the decision.

20. 20 B. Descriptive Analysis methods include: Flavor Profile�, Texture Profile�, Spectrum Analysis�, and QDA�. These methods provide product descriptions with numerical measures of strength. All these methods are described in the literature as are many other methods.

21. 21 All the others are combinations of existing methods, claiming to make use of the best of each. Since there are fundamental differences between them, such claims should be viewed with caution. Whatever method is chosen, the sensory staff must understand the process sufficiently to reach an informed decision. There are two basic issues to consider: the first is whether the subjects develop a language to describe the products or the sensory staff trains the subjects to respond to specific attributes, and second, how the responses are analysed. A detailed discussion of this topic can be found in Stone and Sidel (1994).

22. 22 Descriptive analysis is probably the most powerful and useful of sensory methods. It identifies all of a product�s perceived characteristics along with measures of the strengths of those characteristics. It provides a �signature� or a �fingerprint� of a product. When that information is combined with physical, chemical measures, preferences and imagery, it becomes an integral part of a company�s product market strategy. It is especially useful to know what product attributes best fit the imagery and which formulation optimises those attributes.

23. 23 C. There are two types of affective methods, the paired preference and the 9-point hedonic scale. Examples of each method can be found in the cited literature. From a sensory perspective, the 9-point hedonic scale is more useful because it provides a measure of liking for each product,

24. 24 the magnitude of the difference in liking among the products and enables use of parametric statistics such as the analysis of variance to identify significant product differences. Overall it is a more efficient methodology enabling one to test multiple products vs multiple paired comparisons. However, if one wants to know which of two products are preferred, then the paired comparison is more appropriate.

25. 25 Method selection should be determined primarily by the objectives for a test. Other factors that impact choice are the time line, availability of qualified subjects, and, of course, availability of the product. With these resources, sensory professionals can make a more significant contribution to a company�s business and especially the product development process.

26. 26 To be most effective, however, sensory professionals need to be organized and be a part of the development team from the beginning. Sensory professionals have to be able to provide actionable product information and manage it effectively vs simply reporting a statistically significant difference. In this next section I will discuss the ways in which sensory can contribute in a meaningful way.

27. 27 ASTM E1885 - 04ASTM E1885 - 04 Standard Test Method for Sensory Analysis�Triangle Test ASTM E1885 Significance and Use This test method is effective for the following test objectives: 5.1.1 To determine whether a perceivable difference results or a perceivable difference does not result, for example, when a change is made in ingredients, processing, packaging, handling or storage; or

28. 28 5.1.2 To select, train and monitor assessors. This test method itself does not change whether the purpose of the triangle test is to determine that two products are perceivably different versus that the products are not perceivably different. Only the selected values of pd, a, and � change. If the objective of the test is to determine if there is a perceivable difference between two products, then the value selected for a is typically smaller than the value selected for �

29. 29 If the objective is to determine if the two products are sufficiently similar to be used interchangeably, then the value selected for � is typically smaller than the value selected for a and the value of pd is selected to define �sufficiently similar.� 1 . Scope1.1 This test method covers a procedure for determining whether a perceptible sensory difference exists between samples of two products. limitations prior to use.

30. 30 1.2 This test method applies whether a difference may exist in a single sensory attribute or in several. 1.3 This test method is applicable when the nature of the difference between the samples is unknown. It does not determine the size or the direction of the difference. The attribute(s) responsible for the difference are not identified. 1.4 Compared to the duo-trio test, the triangle test can achieve an equivalent level of statistical significance with fewer assessors. For details on how the triangle test compares to other three-sample tests, see Refs (1), (2), (3) and (4).

31. 31 1.5 This test method is applicable only if the products are homogeneous. If two samples of the same product can often be distinguished, then another method, for example, descriptive analysis, may be more appropriate. 1.6 This test method is applicable only when the products do not cause excessive sensory fatigue, carryover or adaptation. 1.7 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety and health practices and determine the applicability of regulatory limitations prior to use.

32. 32 What is new in sensory analysis International Journal of Food Science and Technology 1998, 33, 7-18 (Piggott et al) Summary *Sensory methods can be loosely separated into two groups: discriminant methods and descriptive methods. *Simple models of difference tests rest on a number of assumptions, and not only are they not very good at showing that samples are the same, they are not good at detecting small differences.

33. 33 Summary *Quantitative Descriptive Analysis was developed from the flavor Profile Method, and used an interval scale with emphasis on statistical evaluation of results. *A variation of descriptive analysis is Free-Choice Profiling, where data are normally examined by generalized Procrustes analysis.

34. 34 Summary *Initial suspicion of the results has been overcome by more rigorous testing of their reliability. *Time-intensity measurement is a special case of descriptive analysis, where a single characteristic is tracked as it changes over a period of time. Time-intensity has only relatively recently achieved wide application, and there have been rather few methodological studies.

35. 35 Introduction *Flavour research is concerned with developing improved methods of characterising and measuring overall flavour quality and individual attributes, with studying the influence on flavour of changes in food materials and procedures at all stages in the food chain to protect established standards of flavour quality.

36. 36 Introduction *This review is concerned with sensory aspects of flavour research, acknowledged by Land in the passage quoted as being of central importance; flavour cannot be measured by instruments, but results from an interaction of food and consumer.

37. 37 Introduction *The discipline of sensory analysis uses scientific principles drawn from food science, physiology, psychology and statistics. Its purpose is to elicit objective responses to the properties of foods, as perceived by the senses of sight, smell, taste, touch and hearing.

38. 38 Introduction *Sensory techniques must meet the requirements of all measurement methods, in that they must be accurate, precise and valid,but they must also be related to consumer perceptions and preferences. *Food without a consumer does not have a flavour. Thus a major goal for any sensory analysis programme must be to understand the importance of sensory characteristics and the role they play in consumer acceptance.

39. 39 Introduction *Sensory methods can be loosely separated into two groups: discriminant methods and descriptive methods. *The purpose of discrimination testing is simply to indicate whether the products being tested are perceived to be different. *Descriptive methods are more akin to chemical analysis, i.e. they aim to identify and measure the composition of products, or to determine the presence or intensity of a particular characteristic.

40. 40 Discrimination tests *There are several variations of discrimination tests, including the A-Not A (ISO 8588:1987), Paired Comparison (ISO 5495:1983), Duo-Trio (ISO 10399:1991) and Triangle tests (ISO 4120:1983), though other forms of test can be used. *In the Paired Comparison test, individuals are presented with two products and are asked to indicate which product has more of a designated characteristic such as fruitiness or sweetness.

41. 41 Discrimination tests the reason for carrying out the test (to provide reassurance that a product has not changed), but the most usual execution (a small number of assessors) and *As a pure difference test, the Duo-Trio is the simplest. In this case, the assessors are asked which of two products is more similar to a third product identified as a reference.

42. 42 Discrimination tests *In the Triangle test, the assessors are requested to identify which two of three products are most alike, or to indicate which product is most different from the other two. *The Triangle test is superficially similar to the three-alternative forced choice procedure (3-AFC), and both can be generalized, the Triangle test to a series of tests of the form 'pick m out of n samples' and the 3-AFC to the multiple-alternative forced choice (m-AFC).

43. 43 Discrimination tests *Some statistical aspects of testing observed differences for significance are discussed in, and tables of significant results have been available for many years. *The usual test model is to assume that an assessor who cannot identify the correct sample guesses without any bias, and thus if the difference between the two products is not perceptible, all possible answers should be chosen with equal frequency.

44. 44 Discrimination tests *This provides the rationale for the so called 50% chance level tests (Paired Comparison and Duo-Trio) where there are two possible answers, and the 33% tests (Triangle test and 3AFC) where there are three possible answers. *The observed distribution of answers is tested for a significant departure from random, using the binomial distribution. If the distribution does depart from random, at the chosen level of significance, the samples are said to be different.

45. 45 Discrimination tests *The number of tests required is usually achieved by using a panel of assessors, who carry out the test one or more times each. *However, this model of the difference test rests on a number of assumptions which will not be met. The initial problem lies in the response bias, the criterion used by an assessor to make a decision.

46. 46 Discrimination tests *If correctly executed, all common discrimination tests (both dyadic and triadic) eliminate response bias as a factor by forcing a choice between samples. *However, a slight change in the instructions or in an assessor's interpretation, to allow the test to require the assessor to decide whether a difference is present, will reintroduce a response bias.

47. 47 Discrimination tests *The basic assumption underlying the common interpretation of the tests is that the probability of making a correct choice is the same for all assessors. *On this assumption, the use of the binomial distribution for testing the significance of a result is logical.

48. 48 Discrimination tests *The ISO standards for Triangle and Duo-Trio tests are at the time of writing being revised to include assessments of similarity. *Not only is the Triangle test not very good at showing that samples are not the same, but it is also not good at unequivocally detecting a small difference. That is, it has low power. The power of the test can be greatly increased by changing format to that of the 3-AFC.

49. 49 Discrimination tests *The disadvantage, and the main reason for the Triangle test's popularity, is that a 3-AFC format requires that the nature of the difference is known and specified in advance. This can be achieved by the use of 'warm-up' samples, which are tasted back and forth until the assessors have identified the difference.

50. 50 Discrimination tests The real test can then be run as a 3-AFC. The same effect can be achieved by running the Triangle test as a preference test, effectively a 3-AFC for liking, but only if there is a real difference in preference. If the sensations happen to be drawn from the distributions in such a way that the odd sample in a Triangle is actually closer to one of the pair than is the other of the pair, a 3-AFC test (choose the strongest sample) could be correct, whereas a Triangle test (choose the odd sample) would be wrong.

51. 51 Discrimination tests Confirmation of this demonstrated that it is not the identification of the difference which gives improved performance in the 3-AFC, but the strategies adopted by the assessors when faced with different questions. It is also to be expected that a 'choose one out of four' test would be better performed as a 4-AFC than as a Triangle analogue.

52. 52 Discrimination tests Applications of Thurstonian modelling have mostly been with model systems with only one variable. The multivariate case (as is likely to apply when real foods are tasted in a QC system) is more complex, and the proportion of correct choices (in Triangle and Duo-Trio tests) depends on the contribution of each dimension to a constant delta (d') and a number of other factors. Thus, in this case, there is no simple relationship between the proportion of correct choices and the sensory difference.

53. 53 Descriptive methods Differences observed in either of the above tests, whether statistically significant or not, can justify descriptive analysis to provide further information that may identify the reasons for the differences found.

54. 54 Descriptive analysis is a term generally used to describe a method by which identification, quantification and description of the sensory attributes of food by trained human subjects are obtained.

55. 55 Descriptive methods The four methods which are generally considered as descriptive analysis techniques are the Flavor Profile Method (FPM), the Texture Profile Method (TPM), Quantitative Descriptive Analysis (QDA) and the Spectrum Method (SM).

56. 56 The origins of the technique lie in the FPM, developed to complement existing formal and informal sensory techniques for the expanding food industry. It is the most time- and effort-intensive sensory method, ultimately geared towards producing a trained panel of assessors to act as an analytical instrument.

57. 57 Descriptive methods The TPM that complements FPM did not set out to describe textural properties of food; Its specific goal was to improve the interpretability of the relationship between rheological principles and popular nomenclature.

58. 58 TPM is similar to FPM in that the panel is highly trained, and a consensus is reached for all attributes for all products, although, with the introduction of line scales, statistical methods have subsequently been applied.

59. 59 Descriptive methods Some of the initial work on descriptive techniques following FPM and TPM was carried out by Land and his colleagues. Their initial aim was to remove the subjectivity of flavour description by creating a glossary of commonly used odour stimuli.

60. 60 This comprehensive study provided a detailed account of two experiments using both experienced and inexperienced panellists to characterize a range of odour stimuli with the aid of a list of odor descriptors.

61. 61 Descriptive methods These experiments demonstrated that some odor stimuli, such as ammonia, are well characterized, whilst others are not as well characterized. In the latter case, the perceptions of assessors vary, depending in part on their experience and on the stimuli involved. This work formed a very early example of, and laid the foundations for, a technique which was to become a routine procedure used in research and quality control throughout the world.

62. 62 Descriptive methods Quantitative Descriptive Analysis was developed with the intention of including strategies to account for behavioural aspects of perception. However, subject selection and training were still considered to be very important.

63. 63 The use of an interval scale and emphasis on statistical evaluation of results allowed for some behavioural effects to be accommodated. The data are averaged across the panel as a whole, as opposed to the group discussing the results to reach a consensus.

64. 64 Descriptive methods The Spectrum method uses ideas drawn from FPM and QDA . It provides detailed characterization of a product's sensory categories and, as with FPM, reference products are chosen to provide attribute-intensity references. Within the methods collectively known as descriptive analysis techniques, the differences worth noting are the level of training given to the assessors, and the final results provided by the methods.

65. 65 Originally, both FPM and TPM provided a consensus view of the products being profiled through panel discussions. QDA and the Spectrum method were designed to take the responses from each assessor and to calculate an average for the panel.

66. 66 A variation of descriptive analysis is Free Choice Profiling (FCP). Unlike QDA and the Spectrum method, the data cannot be averaged across a panel, as each descriptor has a unique meaning to a particular assessor.

67. 67 Time-intensity scaling Individual differences might not be simply random variation to be averaged out, but sometimes represent real variations between individuals. Saliva flow rates, for example, affect perceptions of bitterness, astringency and other sensations. In texture, differences appear to be due to differences in chewing behaviour.

68. 68 Time-intensity scaling Attempts to improve summarizing of T-I data from a panel have included the use of Principal Components Analysis (PCA) as a means of reducing a set of T-I curves to a principal curve. This method still assumes that an average of some kind is meaningful; It is simply an attempt to produce a better one.

Sensory Evaluation: Applications and Opportunities in the Product Development Process Herbert Stone 2007

Sensory Evaluation: Applications and Opportunities in the Product Development Process Herbert Stone 2007

Presentation Transcript

Product Development Tactics

Self-evaluation as a process and an instrument

Objectives - Product Development Module

Product Development

Introduction to the Product Development Process

Sensory Evaluation Tests

Design & Manufacturing of Materials

New –Product Development Strategy

NEW PRODUCT DEVELOPMENT AND MANAGING NEW PRODUCT DEVELOPMENT PROCESS

Landscape Vegetation Inventory Process & Product Evaluation

Product Development From Lab to label claims

Proprietary Product Certification Process

ISAT 211: Module 3 Product Design and Product Development Process

Sensory Evaluation of Aroma Models for Flavor Characterization

Sensory Evaluation of Hay

Activity-Dependent Development II April 25, 2007 Mu-ming Poo

SENSORY PERCEPTION OF FOOD

Product Development

Sensory Evaluation

New Product Development

New Product Development Process

Enterprise Applications Development: New Trends Over Traditional Methodologies

Sensory Evaluation: Applications and Opportunities in the Product Development Process Herbert Stone 2007