- Home
- Sociology of Sport
- Research Methods, Measurements, and Evaluation
- Psychology of Sport and Exercise
- Kinesiology/Exercise and Sport Science
- Measurement in Sport and Exercise Psychology
Measurement in Sport and Exercise Psychology
Edited by Gershon Tenenbaum, Robert C. Eklund and Akihito Kamata
568 Pages
Measurement in Sport and Exercise Psychology provides a complete analysis of the tools and methods used in sport and exercise psychology research. Each chapter of this accessible text presents key measurement variables and concepts, including their definitions; an evaluation of the measurement constructs and tools available; and an explanation of any controversies in each topic. The text includes access to an online resource that presents 14 measurement instruments in their entirety. This resource also contains additional web links to many other measurement instruments.
Drawing on their experience as leading researchers in the field, editors Tenenbaum, Eklund, and Kamata have selected a team of recognized scholars to bring both breadth and depth to this essential resource. By thoroughly examining each measurement tool, Measurement in Sport and Exercise Psychology assists readers in determining strengths and limitations of each tool and discovering which tools are best suited to their research projects. Readers will also gain critical knowledge to expand the field by recognizing opportunities for new methods of measurement and evaluation.
The text begins with a historical review of measurement in sport and exercise psychology followed by a comprehensive description of theories and measurement issues. It provides detailed information regarding ethical and cultural issues inherent in the selection of specific testing protocols as well as issues in interpreting meta-analysis. This is followed by discussion of the commonly used constructs and inventories in three areas: cognition, perception, and motivation measurement; emotion (affect) and coping measurement; and social and behavioral measurement.
Recommendations for researchers and practitioners included at the end of each chapter provide starting points for considering ways to incorporate chapter content into research projects and professional practice. Tables located at the end of each chapter summarize key information for quick reference and provide online sources, when available, so that readers can access each measurement tool. Original source information is provided for those tools not available online.
Measurement in Sport and Exercise Psychology assists readers in evaluating the effectiveness of specific measurement tools. As the most complete and up-to-date directory of tools and inventories in the field of sport and exercise, this text offers a thorough explanation of considerations, controversies, recommendations, and locations for accessing these measurement tools.
Chapter 1. Introduction to Measurement in Sport and Exercise Psychology
Gershon Tenenbaum, Robert Eklund, and Akihito Kamata
Concepts, Items, and Responses
Steps in Designing Measures
Assigning Meaning to Measures
Introspection and Measurement: Reliability and Validity
Conclusion
Chapter 2. Measurement Practice in Sport and Exercise Psychology: A Historical, Comparative, and Psychometric View
Weimo Zhu
Key Developments in Educational and Psychological Measurement
Progress and Status of Measurement in Sport and Exercise Psychology
Conclusion
Acknowledgements
Part I. Measurement Basics, Methods, and Issues
Measurement Basics
Chapter 3. Reliability
Brandon K. Vaughn, Hwa-Young Lee, and Akihito Kamata
The Theory of Reliability
Estimating the Reliability Coefficient
Standard Error of Measurement
Evaluating the Magnitudes of Reliability Coefficients
Improving Reliability
Relationship to Validity
Reliability for Multidimensional Instruments
Misconceptions and Misuses of Reliability
Conclusion
Chapter 4. Conceptualizing Validity
Brandon K. Vaughn and Sarah R. Daniel
Validity in Premodern Era
Collecting Evidence of Validity
Validity in Modern Era
Issues of Validity in Research Designs
Conclusion
Chapter 5. Validating Scores from New Assessments: A Comparison of Classical Test Theory and Item Response Theory
Yaakov Petscher and Christopher Schatschneider
Level of Analysis
Item Difficulty
Item Discrimination
Item Response Theory Parameter Invariance
Constructing the Assessment
Sample Size
Conclusion
Chapter 6. Factorial Invariance: Tools and Concepts for Strengthening Research
Ryne Estabrook
Factorial Invariance
Configural Invariance
Metric Invariance
Alternative Approaches
Fitting Invariance Models
Ordinal Data
Conclusion
Acknowledgments
Appendix A: Coding Example of Mplus
Appendix B: Coding Example of OpenMx
Chapter 7. Modeling Change Over Time
Kevin J. Grimm and Nilam Ram
Sample Data
Analysis
Latent Growth Curve Modeling
Conclusion
Acknowledgments
Appendix
Chapter 8. Rasch Modeling in Sort
Bernd Strauss, Dirk Busch, and Gershon Tenenbaum
The Basic Idea of the Rasch Model
An Example for the Use of the Ordinal Rasch Model
Extensions and Generalizations of Rasch Modeling
The Use of the Mixed Rasch Model: An Example
Probabilistic Test Models in Sport Psychology and Exercise Sports
Conclusion
Measurement Methods
Chapter 9. Idiosyncratic Measures in Sport
William A. Edmonds, Michael B. Johnson, Gershon Tenenbaum, and Akihito Kamata
Theoretical and Conceptual Framework
Eight-Step Idiosyncratic Approach
Conclusion
Chapter 10. Dynamic Assessment in Sport
Thomas Schack
Dynamic Assessment
Dynamic Assessment Concept and Procedures Dynamic Assessment of Motor Learning Potential
Further Areas for Applying Dynamic Assessment in Sport Psychology
Conclusion
Acknowledgments
Chapter 11. Verbal Reports of Cognitive Processes
David Eccles
Validity of Verbal Reports of Cognitive Processes
Methods Used in Studies of Psychological Skill Use With Regard to the Verbal Report Framework Proposed by Ericsson and Simon (1980)
Summary of Methods Used in Studies of Psychological Skill Use With Regard to the Verbal Report Framework
Concerns Over Using Verbal Report Methods
Conclusion
Acknowledgment
Chapter 12. Making Sense of Words and Stories in Qualitative Research: Some Strategies for Consideration
Brett Smith and Andrew Sparks
Analysing the Whats: Content
Analysing the Hows: Performative Narrative Analysis
Showing the Whats and Hows: Creative Analytic Practices
Conclusion
Acknowledgments
Measurement Issues
Chapter 13. Developmentally Informed Measurement in Sport and Exercise Psychology Research
Alan L. Smith, Travis E. Dorsch, and Eva V. Monsma
Cognitive Abilities and Structures
Social Development
Biological Maturation
Change in Multiple Domains
Conclusion
Chapter 14. Cultural Sport Psychology: Special Measurement Considerations
Tatiana V Ryba, Robert J. Schinke and Natalia B. Stambulova
Assumptions and Principles of Cultural Sport Psychology
Measuring Culture
Conclusion
Chapter 15. Synthesizing Measurement Outcomes through Meta-Analysis
Betsy J. Becker and Soyeon Ahn
What is Meta-Analysis?
Meta-Analysis in Sport and Exercise Psychology
Measurement Issues in Meta-Analysis
Conclusion
Chapter 16. Ethics: Assessment and Measurement in Sport and Exercise Psychology
Jack C. Watson, Edward F. Etzel and Justine Vosloo
Ethics and Ethics Codes
Use of Technology
Billing for Services
Cultural Issues
Conclusion
Appendix: Codes of Ethics for Related Organizations
Part II. Cognition, Perception, and Motivation Measurement
Cognition Measurement
Chapter 17. Cognitive Measures Related to Exercise and Physical Activity
Jennifer L. Etnier
Theoretical Framework
Limitations and Sources of Confusion
Primary Measurement Tools
Examples Studies
Recommendations for Researchers and Practitioners
Chapter 18. Anticipation and Decision Making: Skills, Methods, and Measures
Andrew M. Williams and Bruce Abernethy
Anticipation in Sport: Capturing Performance
Decision Making in Sport: Capturing Performance
Anticipation and Decision Making: Identifying Causal Mechanisms Using Process Measures of Performance
Recommendations for Researchers and Practitioners
Chapter 19. Measuring Mental Representations
Thomas Schack
Mental Representations
Cognitive Representation and Performance: Perspectives and Methods
Mental Representations: A Theoretical Framework
Measurement of Mental Representations
Measuring Mental Representations in Sport
Measuring Mental Representations in Sport: Insight From Empirical Studies
Recommendations for Researchers and Practitioners
Self-Perception Measurement
Chapter 20. Physical Self-Concept
Herbert W. Marsh and Jacqueline H.S. Cheng
Construct Definition of Physical Self-Concept
Dimensions and Sources of Confusion: Self-Esteem Versus Self-Concept and Self-Efficacy
Tools to Measure the Physical Self
Examples from the Literature
Recommendations for Researchers and Practitioners
Chapter 21. Exercise and Self-Perception Constructs
Catherine Sabiston, James R. Whitehead, and Robert C. Eklund
Self-Esteem and Self-Concept
Exercise Identity
Physical Activity Self-Definitions
Exerciser Self-Schemata
Possible Selves
Dimensions and Sources of Confusion
Recommendations for Researchers and Practitioners
Chapter 22. Exercise-Related Self-Efficacy
Edward McAuley, Siobhan M. White, Emily L. Mailey, and Thomas R. Wojcicki
Self-Efficacy and Social Cognitive Theory
Primary Self-Efficacy Measures
Evidence for Support: Examples From the Literature
Further Issues Recommendations for Researchers and Practitioners
Acknowledgments
Chapter 23. Self-Efficacy and Collective-Efficacy
Lori Dithurbide and Deborah L. Feltz
Definitions
Theoretical and Conceptual Framework
Sources of Collective Efficacy Information
Dimensions and Sources of Confusion in Self-Efficacy and Collective Efficacy
Guidelines for Constructing Self- Efficacy and Collective Efficacy Scales
Examples From the Literature
Recommendations for Researchers and Practitioners
Chapter 24. Effort Perception
Selen Razon, Jasmin Hutchinson, and Gershon Tenenbaum
A Historical Perspective on Perceived Exertion
Modern Psychophysics
Models of Psychobiological Responses to Exercise
Measurement of Perceived Effort
Recommendations for Researchers and Practitioners
Motivation Measurement
Chapter 25. Intrinsic and Extrinsic Motivation in Sport and Exercise
Robert J. Vallerand, Eric D. Donahue, Marc-Andre K. Lafreniere
Defining Intrinsic and Extrinsic Motivation
The Nature of Intrinsic and Extrinsic Motivation
Multidimensional View of Intrinsic and Extrinsic Motivation
Intrinsic and Extrinsic Motivation at Different Levels of Generality
Evaluation of Measures of Intrinsic and Extrinsic Motivation in Sport and Exercise
Recommendations for Researchers and Practitioners
Chapter 26. Exercise Motivation
Philip M. Wilson
Key Concepts and Theoretical Frameworks
Theory and Measurement
Exercise Motivation Instruments
Recommendations for Researchers and Practitioners
Acknowledgments
Chapter 27. Achievement Motivation Processes
David E. Conroy and Amanda L. Hyde
History of Achievement Motivation Theories
Review of Achievement Motive Measures
Review of Achievement Goal Measures
Other Measures
Recommendations for Researchers and Practitioners
Acknowledgments
Part III. Emotion, Affect, and Coping Measurement
Chapter 28. Affect, Mood, and Emotion
Panteleimon Ekkekakis
Choosing a Measure: A Three-Step Process
Understanding the Differences Between Affect, Emotion, And Mood
Hierarchical Structure of the Affective Domain: An Integrative Framework
Review of Specific Measures
Recommendations for Researchers and Practitioners
Chapter 29. Emotional Reactivity
Christopher M. Janelle and Kelly M. Naugle
Definitions and Dimensions of the Variable Construct
Dimensions and Sources of Confusion
Theoretical and Conceptual Frameworks
Overview of Emotion Measures
Recommendations for Researchers and PractitionersAcknowledgments
Chapter 30. Flow
Susan Jackson and Robert C. Eklund
Theoretical Framework
Flow Dimensions
Sources of Confusion in the Flow Construct
Measurement Tools
Examples From the Literature
Recommendations for Researchers and Practitioners
Chapter 31. Burnout
Robert C. Eklund, Tom Raedeke, Allen L. Smith, and Scott Cresswell
Conceptualizing Athlete Burnout as a Syndrome
Sources of Confusion About Athlete Burnout
Burnout Measurement Tools for Athletes
Sample Studies Using the ABQ From the Literature
Recommendations for Researchers and Practitioners
Chapter 32. Bayesian Approach of Measuring Competitive Crisis
Michael Bar-Eli and Gershon Tenenbaum
Bayesian Notions in Psychology. An Approach to Judgment and Decision Making
Theory of Psychological Performance Crisis
Bayes’ Theorem: A Measurement Tool for Developing the Individual Performance Psychological Crisis
Recommendations to Researchers and Practitioners
Chapter 33. Psychological Skills
Robert Weinberg and Samuel Forlenza
History and Theoretical Foundations
Issues and Limitations in the Measurement and Assessment of Psychological Skills
Psychological Skill Assessment and Measurement
Individual Assessments of Psychological Skills
Recommendations for Researchers and Practitioners
Chapter 34. Coping in Sport and Exercise
Ronnie Lidor, Peter R.E. Crocker, and Amber D. Mosewich
Coping Concept and Definition
Instruments and Questionnaires Assessing Coping Skills
Preperformance Coping Strategies: The Case of Self-Paced Tasks
Recommendations for Researchers and Practitioners
Part IV. Social and Behavioral Measurement
Chapter 35. Cohesion
Albert V. Carron, Mark A. Eys, and Luc J. Martin
Definitions of Cohesion
Conceptual Framework for Cohesion
Sources of Confusion
Questionnaires for Assessing Cohesion
Overview of Questionnaire Use
Recommendations for Researchers and Practitioners
Chapter 36. Sequential Analysis of Team Communications and Effects on Team Performance
Allan Jeong
Introduction to Team Communications
Seven-Step Procedure for Sequentially Analyzing Team Communications
Recommendations for Researchers and Practitioners
Chapter 37. Models and Measurement of Leadership in Sport
Packianathan Chellandurai
Theoretical Frameworks of Leadership in Sport
Sources of Confusion
Measures of Leadership
Measures of Decision Style
Measurement of Autonomy-Supportive Behavior
Confusion in Purposes of Sport Participation
Recommendations for Researchers and Practitioners
Chapter 38. Moral Behavior
Maria Kavussanu and Ian D. Boardley
Definitions of the Construct
Theoretical and Conceptual Framework
Dimensions and Sources of Confusion
Main Tools for Measuring the Variables
Examples From the Literature
Recommendations for Researchers and Practitioners
Chapter 39. Behavioral Measurement in Exercise Psychology
Claudio R. Nigg, Patricia J. Jordan, and Angela Atkins
Concept Definitions
Conceptual Issues
Tools for Measuring Physical Activity
Recommendations for Researchers and Practitioners
Gershon Tenenbaum, PhD, is a professor of educational psychology at Florida State University in Tallahassee, where he teaches courses on measurement in sport and exercise. He previously served as the director of the Center of Research and Sport Medicine at the Wingate Institute in Israel and was the coordinator of the sport psychology program at the University of Southern Queensland in Australia.
Tenenbaum’s research on measurement and statistical methods in the domain of sport and exercise psychology has been widely published, and he has published over 300 articles in peer-refereed journals and book chapters in leading journals in psychology, sport and exercise psychology, sports medicine, and sport sciences. In addition, he has edited and written several handbooks and books, including the Handbook of Sport and Exercise Psychology, Third Edition (with Robert Eklund), Case Studies in Applied Psychophysiology: Neurofeedback and Biofeedback Treatments for Advancesin Human Performance (with William Edmonds), The Cultural Turn in Sport and Exercise Psychology (with Tatiana Ryba and Robert Schinke), Brain and Body in Sport and Exercise: Biofeedback Applications in Performance Enhancement (with Boris Blumenstein and Michael Bar-Eli), The Practice of Sport Psychology, and Research Methodology in Sport andExercise Sciences: Quantitative and Qualitative Methods (with Marcy Driscoll).
Tenenbaum was the president of the International Society of Sport Psychology (ISSP) and a fellow of both the National Academy of Kinesiology (NAK) and the Association for Applied Sport Psychology (AASP). Tenenbaum was the editor of the International Journal of Sport Psychology and the International Journal of Sport and Exercise Psychology. Each year, he organizes several sessions and symposia on measurement issues at conferences in the United States and abroad.
In 2011, Tenenbaum received the Scientific Award for Scientific Achievement from the American Psychological Association (APA) Division 47 (Sport and Exercise Psychology division). In 2005, he was awarded the Benjamin S. Bloom Professorship from Florida State University and the Presidential Award from the International Society of Sport Psychology. In 2002, he was named a Distinguished Sport Science Scholar Lecturer in sport and exercise psychology for the University of Utah. He was also the recipient of the International Society of Sport Psychology Honor Award in 1997.Tenenbaum holds a doctorate in measurement and statistics from the University of Chicago. He resides in Tallahassee and enjoys traveling to conferences throughout the world, visiting his homeland of Israel, and watching competitive sport
Robert C. Eklund, PhD, is a professor of sport psychology in the department of educational psychology and learning systems at Florida State University in Tallahassee, where he was recently named the Mode L. Stone Distinguished Professor of Sport Psychology. He earned his doctoral degree in exercise and sport science with a specialization in sport and exercise psychology from the University of North Carolina at Greensboro. He is a fellow of both the American College of Sports Medicine (ACSM) and the National Academy of Kinesiology (NAK).
Eklund has published over 60 articles in referred journals; coedited (with Gershon Tenenbaum) the prestigious Handbook of Sport Psychology, Third Edition; coauthored two measurement manuals; and authored or coauthored 12 book chapters in the area of sport and exercise psychology. Eklund has presented his research and participated as a keynote lecturer and invited colloquia participant at numerous conferences worldwide.
Eklund is the current editor in chief of the Journal of Sport and Exercise Psychology and has served in that capacity since January 2003. He has also served as associate editor for the Journal of Applied Sport Psychology and psychology section editor for Research Quarterly for Exercise and Sport. In addition to providing editorial review services for a range of scholarly journals, Eklund currently serves as an editorial board member for The Sport Psychologist; Sport, Exercise, and Performance Psychology; Pamukkale Journal of Sport Sciences; and Hacettepe Journal of Sport Sciences. In the past, he has served on the editorial boards for the Journal of Sport and Exercise Psychology and the Journal of Applied Sport Psychology.
Eklund resides in Tallahassee with his wife, Colleen, and two sons, Garth and Kieran. He enjoys their sport involvement immensely as well as their interest in spending sunny afternoons fishing at the beach.
Akihito Kamata, PhD, is a professor of psychometrics and educational measurement in the department of educational methodology, policy, and leadership at the University of Oregon. Before joining the University of Oregon in 2009, he was on faculty at Florida State University for 11 years, where he also served as the chair of the department of educational psychology and learning systems.
Kamata's primary research interest is implementation of item-level test data analysis methodology through item response theory modeling, multilevel modeling, and structural equation modeling. Kamata has done pioneering work on multilevel item response theory modeling, which is represented by his 2001 publication in the Journal of Educational Measurement, a special issue on multilevel measurement modeling in the Journal of Applied Measurement in 2005, and several book chapters on the topic, including a chapter in the Handbook of Advanced Multilevel Analysis (2011). He has other publications on psychometrics, measurement theory, and applied measurement, including articles in the Journal of Educational Measurement, Applied Psychological Measurement, Structural Equation Modeling, and Psychometrika.
"This is a welcome contribution to the field of sport and exercise psychology. The measurement and evaluation tools introduced and expanded upon are based on past and current research practices and have been validated in the context of the field's most respected scientists."
—Doody's Book Review (5 star review)
“…the text distinguishes itself from others within the domain and provides a valuable and needed contribution.”
—The Sport Psychologist (December 2012)
Assess intrinsic and extrinsic motivation in sport and exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted.
Evaluation of Measures of Intrinsic and Extrinsic Motivation in Sport and Exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted. Certain criteria have guided the selection of the measures presented in this section. First, we have selected measures that are fully developed instruments that have gone through extensive validation steps. Second, we have chosen scales that have been used in research, published or unpublished, during the past 10 years. Scales that have not been used during that time frame are considered to be obsolete and are not reviewed. Finally, in light of recent theoretical development and because of space limitation, we have focused on motivation scales that assess intrinsic and extrinsic motivation independently of determinants and outcomes, while focusing on the perceived reasons of behavior. Our earlier discussion on the definitions of intrinsic and extrinsic motivation makes it possible to classify the different measures. The measures can vary in terms of the level of generality (situational versus contextual level) and the area (sport versus exercise). This classification appears in table 25.1. Table 25.2 (see p. 291) provides additional information on the concept of, dimensions of, publication source of, and where to obtain the scale. As can be seen, seven measures are reviewed. For each one, we present (a) a description of the instrument, (b) the conceptual and theoretical rationale underlying its scale development, (c) the available evidence concerning its psychometric properties (e.g., factorial validity, reliability, and construct validity), and (d) a broad assessment of the strengths and weaknesses associated with each measure.
Measures Used in Sport
In this section, we review the SMS (Brière et al., 1995; Pelletier et al., 1995), the Sport Motivation Scale-6 (SMS-6; Mallett, Kawabata, Newcombe, Otero-Rorero, & Jackson, 2007), the Behavioral Regulation in Sport Questionnaire (BRSQ; Lonsdale, Hodge, & Rose, 2008), the Pictorial Motivation Scale (PMS; Reid, Vallerand, Poulin, Crocker, & Farrell, 2009), and the SIMS (Guay et al., 2000).
Sport Motivation Scale
The SMS was developed (Brière et al., 1995; Pelletier et al., 1995) in order to assess contextual intrinsic and extrinsic motivation from a multidimensional perspective, as well as amotivation. The SMS has been the most often used motivation measure in sport, being employed with a variety of athletes (recreational to elite), age groups (adolescent to senior), and cultures (e.g., Canada, United States, United Kingdom, Bulgaria, Australia, Spain, and New Zealand). In fact, the SMS has been translated and validated in several languages (see Pelletier & Sarrazin, 2007). The SMS is based on SDT (Deci & Ryan, 1985) and is made up of seven subscales assessing amotivation; external, introjected, and identified regulation; and intrinsic motivation to know, to experience stimulation, and to accomplish. In line with SDT, motivation is assessed as the perceived reasons for participation, or the why of behavior. At the beginning of the scale, participants are asked, “In general, why do you practice your sport?” The items represent the perceived reasons for engaging in the activity, thus reflecting the different types of motivation.
The original scale was developed in French as L'Échelle de Motivation dans les Sports (Brière, Vallerand, Blais, & Pelletier, 1995) and was validated in three steps. The first step involved generating a pool of items explaining various reasons for sport participation through interviews with French Canadian athletes (aged 17-20 y). These reasons were then used to formulate items for the seven subscales of the French SMS. In the second step, a committee of experts evaluated the content validity of the items and eliminated those that were thought to be inadequate. Another sample of athletes from various sports completed the scale. Results from an exploratory factor analysis (EFA) provided support for a seven-factor structure with 4 items per subscale; this second step thus resulted in a 28-item scale. In the third and final step, two additional studies were conducted to further validate the scale. These studies included approximately 500 individuals, most of whom were involved in recreational sports. Results from confirmatory factor analyses (CFA) and correlational analyses confirmed the seven-factor structure, the subscale internal consistency (ranging from .65-.96), and moderate to high indexes of temporal stability (ranging from .54-.82) over 1 month. Furthermore, inspection of correlations among the seven SMS subscales provided support for the simplex pattern proposed by SDT. Results of correlations also showed that (in line with SDT) the most self-determined forms of motivation (intrinsic motivation and identified regulation) were related more strongly to determinants such as autonomy support from coaches and feelings of competence than to other forms of motivation (external and introjected regulation) and amotivation. Similar results were obtained with motivational outcomes such as positive affect, concentration, and intentions to pursue engagement in sport. In sum, adequate construct validity was obtained for the French form of the SMS.
The translation of the French SMS into English involved back-translation and committee procedures as suggested by Vallerand (1989). Pelletier and colleagues (1995) conducted two studies involving college athletes from various sports in order to assess the psychometric properties of the English form of the SMS. Results from CFA with a sample of 593 Canadian university athletes revealed adequate fit indices or the hypothesized seven-factor model (see the Adjusted Goodness of Fit Index and the Normed Fit Index both > .90 and the Root Mean Square Residual < .08), and correlations with determinants and outcomes supported the simplex model. Moreover, internal consistency above .70 was obtained on all of the subscales except the identified subscale (.63). Test-retest correlations were acceptable and very similar to those obtained with the French SMS, as was the scale construct validity.
Since 1995, the SMS has been used extensively in sport psychology research. The seven-factor structure has been supported repeatedly (e.g., Doganis, 2000; Gillet, Vallerand, & Rosnet, 2009; Li & Harmer, 1996; Shaw, Ostrow, & Beckstead, 2005; Standage, Duda, & Ntoumanis, 2003). In addition, Hu and Bentler (1999) obtained support for a five-factor model by combining the three types of intrinsic motivation into one factor. Similar results were obtained by Gillet and colleagues (2009) with the French SMS. However, some studies have not supported the seven-factor model (Hodge, Allen, & Smellie, 2008; Mallett, Kawabata, & Newcombe, 2007; Mallett, Kawabata, Newcombe, & Otero-Rorero, 2007; Martens & Webber, 2002). Why is there such a discrepancy between these two sets of studies? One possibility lies in the populations from which the different samples were taken. Specifically, the SMS was validated using adolescent and young adult athletes and not older athletes. Because of this specific focus, some of the items may reflect a participation rather than an elite orientation, which is more in line with the younger population. For instance, an identified regulation item reads, “Because sport is one of the best ways to maintain good relationships with my friends.” Such an item seems more relevant for a younger population. An older, high-level athlete may disagree with this item but still display a high level of identified regulation for a sport (but not for relationship reasons). Future research using the SMS with different age groups and proficiency levels is needed to clarify this issue.
Whereas the internal consistency of the SMS has systematically shown adequate values, some values below .70 have been found. This is especially the case for the identified regulation subscale (Brière et al., 1995; Kingston, Horrocks, & Hanton, 2006; Li & Harmer, 1996; Pelletier et al., 1995), although some lower values (below .70) have been obtained with the introjected (McNeill & Wang, 2005; Perreault & Vallerand, 2007; Riemer, Fink, & Fitzgerald, 2002; Standage, Duda, & Ntoumanis, 2003) and external regulation (Standage, Duda, & Ntoumanis, 2003) and amotivation subscales (Standage, Duda, & Ntoumanis, 2003). However, very few instances of values below .60 have been obtained. It should be noted that a Cronbach alpha of .60 with only 4 items is acceptable because, as noted by Cronbach (1951), the coefficient alpha underestimates the internal consistency of scales with a low number of items. This is because the coefficient alpha includes the number of items in the formula. For instance, given the same average interitem correlation, a 3-item scale coefficient alpha value of .56 is equivalent to an alpha value of .81 on an 8-item scale!
In line with the original work of Ryan and Connell (1989) and the initial SMS validation procedures (Brière et al., 1995; Pelletier et al., 1995), construct validity has been assessed by other authors in two fashions: (1) with the simplex pattern of correlations among the subscales and (2) with correlations between motivational factors and their determinants and consequences. We do not have space to review all studies. However, overall, there is overwhelming support for the construct validity of the SMS both in French and English. For instance, in addition to finding support for the simplex pattern, Pelletier and Sarrazin (2007) concluded in their review of the evidence that the SMS has been used with success to predict a great variety of specific outcomes and consequences (such as burnout, exercise dependence among endurance athletes, fear of failing, adaptive coping skills, perceptions of constraints, flow, vitality and well-being, sporting behavior orientations, aggression, and performance) in a manner that is consistent with SDT. These findings provide strong support for the construct validity of the SMS.
In sum, the SMS has some positive features. First, it is a multidimensional instrument that assesses different types of intrinsic and extrinsic motivation as well as amotivation. Second, the scale focuses on the why of behavior and thus items are not confounded with determinants and consequences. Finally, it has some excellent psychometric properties. Nevertheless, some limitations should be underscored. First, although internal consistency levels have been acceptable overall, some subscales, especially the identified regulation subscale, have yielded relatively low coefficient alphas at times. Second, the SMS does not assess integrated regulation. Third, the seven-factor structure has not always been supported by CFAs. According to Pelletier, Vallerand, and Sarrazin (2007), this may be explained by a host of factors, including differences in sample sizes, variations in the way the instrument is administrated, or some other characteristics specific to the context of the study. However, as already indicated, it is also possible that the SMS is better suited for a younger, nonelite athlete population. Clearly, future research on this issue is in order.
Sport Motivation Scale-6
Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed another version of the SMS, the SMS-6. This scale has the same underlying rationale that the original SMS scale but was designed to improve the original version of the SMS by including an integrated regulation subscale and attempting to solve some of the inconsistencies with the factor structure and some of the relatively low internal consistency values (below .70). The SMS-6 comprises 24 items, 4 for each of the six subscales, which include amotivation; external, introjected, identified, and integrated regulation; and general intrinsic motivation. Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed 5 items for the integrated regulation subscale as well as 7 other items (4 of which were kept in the final scale) to replace some items in the original SMS. Two samples were used to validate the SMS-6. Sample 1 was composed of 501 first-year university students participating in competitive sport at least twice per week and 113 elite athletes representing Australia at the international level (for a total of 614 participants). Sample 1 was used to derive a factor structure that included the SMS items as well as the reformulated and integrated regulation items. Sample 2 was composed of 557 university students who were engaged in a variety of sports or physical activities twice per week. The second sample was used to confirm the structure of the SMS-6. Participants also completed the Dispositional Flow Scale (DFS).
Results of a CFA with the SMS-6 (with sample 2) provided support for the factor structure as well as for the internal consistency values (all above .70). Concerning the construct validity of the SMS-6, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) reported a rather weak simplex pattern of correlations among the subscales. More specifically, external regulation correlated highly with intrinsic motivation (r = .54), while the correlation between identified regulation and intrinsic motivation was very high (r = .91) and was higher than the one between integrated regulation and intrinsic motivation (r = .75). The construct validity of the SMS-6 was not fully supported, as some of the correlations involving the SMS and flow were not as expected by SDT. For instance, the distinctions among integrated regulation, identified regulation, and intrinsic motivation were not always clear. Furthermore, external regulation revealed some positive and sometimes strong correlations with flow, contrary to hypotheses derived from SDT.
In sum, the SMS-6 contains some nice features. First, it contains an integrated regulation subscale. Furthermore, the addition of 4 new items may make the SMS more acceptable for older and more experienced athletes. Second, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) presented results supporting the validity of a variation of the SMS-6, the SMS-8. The SMS-8 contains the same items that the SMS-6 contains but assesses the three types of intrinsic motivation rather than general intrinsic motivation. The SMS-6 also shows some limitations. First, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) proposed 7 new items to replace those that were presumably problematic in the original SMS. However, only 4 of these items made it to the final version. Thus, it appears that the SMS-6 retained much of the original SMS. Second, even some of the new items appear problematic and may not assess the desired construct (see Pelletier et al., 2007). For instance, a new amotivation item (“I don't seem to be enjoying my sport as much as I previously did”) seems to reflect a decrease in intrinsic motivation rather than amotivation. Finally, results from Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) demonstrated that the integrated regulation subscale may lack discriminant validity, leading to results with flow highly similar to identified regulation and intrinsic motivation.
Behavioral Regulation in Sport Questionnaire
Lonsdale and colleagues (2008) developed the BRSQ to create an alternative measure of elite sport motivation as conceptualized by SDT. However, in contrast to Mallett, Kawabata, Newcombe, and Otero-Rorero (2007), these authors used a complete new pool of items developed by SDT experts and competitive athletes. There are two versions of the BRSQ. The BRSQ-8 contains 32 items assessing integrated, identified, introjected, and external regulation; amotivation; and the three forms of intrinsic motivation (knowledge, experience stimulation, and accomplishment) identified by Vallerand (1997). The BRSQ-6 contains the same items but assesses general intrinsic motivation rather than all three types of intrinsic motivation, for a total of 24 items.
Lonsdale and colleagues (2008) conducted a series of three studies to validate the scale. In the first study, the factorial validity and the internal consistency were assessed with 382 New Zealand elite athletes. Results from a CFA on the 32 items supported the factor structure of the BRSQ. Specifically, fit indexes were acceptable and all items loaded significantly on the appropriate factors (they ranged from .58-.91). Finally, internal consistency of the eight subscales, measured with the Cronbach alpha, showed high values ranging from .71 to 91. Additionally, 1 wk test-retest reliability was tested with 34 competitive adult athletes. Coefficient alphas for all subscales supported the temporal reliability (values ranged from .73-.90).
In a second study with 343 athletes from New Zealand, the results of a CFA on the BRSQ-8 supported once more the factor structure as well as the subscale internal consistency. Lonsdale and colleagues (2008) also showed that the factor structure of the BRSQ-6 model fit the data very well and that subscale coefficient alphas all exceeded .78. Moreover, the construct validity of the BRSQ-6 was assessed by testing for a simplex pattern of correlations among the six subscales. While some relationships were in line with predictions (e.g., amotivation was negatively related to intrinsic motivation), there was a lack of discrimination between some subscales. More specifically, there was no difference between external and introjected regulation scores in terms of their relationships with amotivation. A similar pattern was evident with the identified and integrated regulation subscales, which both had similar high correlations with intrinsic motivation. These results with the simplex pattern were replicated in a third study conducted with nonelite athletes. In this third study, Lonsdale and colleagues also assessed the relationships between the BRSQ-6 and indexes of burnout (Lemyre, Treasure, & Roberts, 2006; Raedeke & Smith, 2001) and flow (Jackson & Eklund, 2002). Overall, results supported hypotheses in line with SDT. Specifically, amotivation and external and introjected regulation showed negative correlations with flow and positive correlations with burnout. The opposite pattern of correlations was found for the self-determined subscales (intrinsic motivation and identified and integrated regulation). However, there was a lack of discrimination between integrated regulation and general intrinsic motivation. Results of another study on burnout (Lonsdale, Hodge, & Rose, 2009) replicated these findings. Thus, overall, the support for the construct validity of the BRSQ-6 appears to be mixed.
It should be underscored that the BRSQ has some nice features. First, the scale is designed in such a way that the researcher can decide to use a multidimensional (BRSQ-8) or unitary (BRSQ-6) conceptualization of intrinsic motivation. Second, the scale is rather short, with 4 items per subscale. Finally, it assesses integrated regulation. At the same time, the BRSQ also displays some limitations. First, additional research is needed on the construct validity of the scale. Whereas there is support distinguishing the self-determined subscales (intrinsic motivation and identified and integrated regulation) from the non-self-determined subscales (external and introjected regulation), the finer discrimination within each type of category appears to be lacking. Such evidence is crucial, and future research is needed in order to show that this scale does indeed assess the SDT constructs rather than two broad sets of subscales tapping self-determined versus non-self-determined motivation. Second, this scale is designed specifically for older participants in competitive sport; it remains to be seen if the BRSQ can be used with younger participants, for whom the integrated regulation subscale may not have full meaning. Finally, research is needed to test the temporal stability of the scale over a time framed longer than 1 week.
Pictorial Motivation Scale
The PMS was designed to measure intrinsic and extrinsic motivation for sport and exercise in people with an intellectual disability. It assesses participants' reasons for engaging in sport and exercise. The scale's main characteristics are drawings depicting each of the 20 items. There are 5 items (pictures) for each of four subscales: intrinsic motivation, self-determined extrinsic motivation (a mixture of integrated and identified regulation), non-self-determined extrinsic motivation (a mixture of introjected and external regulation), and amotivation. These pictures are used to help participants with cognitive difficulties and to help represent the motivational concept depicted in each item.
The original scale was developed in French (Reid, Poulin, & Vallerand, 1994). Results of a study with 62 participants supported the internal consistency, temporal stability, and construct validity, as exemplified by the presence of a simplex pattern among the four subscales. However, the amotivation subscale had poor reliability (α = .52). The French version (Reid et al., 2009) was translated into English according to the back-translation and committee procedures outlined in Vallerand (1989). Then, 6 new items were generated for the less reliable amotivation subscale. Participants in the Special Olympics (n = 160) completed the English version. Results of the CFA confirmed the four-factor structure of the PMS. Furthermore, the internal consistency (Cronbach alphas) ranged from .60 to .71. Finally, the construct validity was assessed by testing for a simplex pattern of correlations among the four subscales. The intercorrelations among latent variables from the CFA provided support for the simplex pattern.
Results from a study conducted with the English version of the PMS involving 80 high school students with mild intellectual disability provided support for the internal consistency, temporal stability (over 3 wk), and construct validity of the PMS with respect to the simplex pattern of correlations among the PMS subscales as well as correlations between the PMS subscales and motivational antecedents (skill and perceived competence) and outcomes (perceived effort) as rated by the physical education teacher. Finally, the internal consistency of each subscale was tested without the pictorial dimension with a subset of 47 high school students with mild intellectual disability. Results indicated poor internal consistency (.91 for intrinsic motivation, .27 for self-determined extrinsic motivation, .20 for non-self-determined extrinsic motivation, and .60 for amotivation). This finding suggests that the scale is not reliable without the drawings.
The preliminary findings with the English version of the PMS are encouraging. Furthermore, this scale is the only one geared for individuals with intellectually disability. The use of drawings to depict the various items makes this scale unique in the field. Nevertheless, the PMS shows some limitations. First, the scale does not differentiate among all forms of intrinsic (knowledge, stimulation, and accomplishment) or extrinsic ( integrated, identified, introjected, and external regulation) motivation. Second, construct validity was tested with only a limited number of variables. Third, it is not known if the scale is usable with children who have severe forms of intellectual disabilities. Clearly additional research is needed on the reliability and validity of the PMS.
Situational Motivation Scale
The SIMS is one of the few scales to assess intrinsic and extrinsic motivation and amotivation at the situational level (Guay et al., 2000). The SIMS is a multidimensional tool that measures four types of motivation: intrinsic motivation, identified regulation, external regulation, and amotivation. The SIMS is made up of 16 items (4 items per subscale) and asks this question: “Why are you currently engaged in this activity?” The items represent potential reasons for task engagement. The scale is worded in such a way that it can be used in most situations (sport and nonsport).
Five studies were reported in the original article. In study 1, the original scale was developed by a committee of experts and completed by 195 French Canadian college students. Results of an EFA revealed a four-factor structure with the final 16 items loading on their respective factor. In study 2, a CFA confirmed the factor structure as well as its invariance across gender. Across the five studies, the internal consistency values of the subscales were acceptable, ranging from .62 to .95 (see Guay et al., 2000). Moreover, across all studies, support was obtained for the construct validity of the SIMS through results from correlations in line with the simplex pattern among the subscales as well as between the SIMS subscales and motivational determinants and consequences. Perhaps of greater interest for the present discussion were the results of study 4, which showed that some subscales (intrinsic motivation and identified regulation) were sensitive enough to detect changes in motivation that took place during two games of a basketball tournament.
Other researchers have also obtained support for the psychometric properties of the SIMS. First, all studies reported acceptable internal consistency values for each subscale (Blanchard, Mask, Vallerand, de la Sablonnière, & Provencher, 2007; Conroy, Coatsworth, & Kaye, 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, Duda, & Prusak, 2003). The coefficient alpha values of all but the amotivation subscale (α = .58) in the Conroy and colleagues study were above .60. Second, support for the factorial validity of the SIMS was obtained through CFAs with one qualification. Whereas the CFA results with the 16 items yielded acceptable fit indexes, removal of 1 item (Jaakkola, Liukkonen, Laakso, & Ommundsen, 2008) and even 2 items (Gillet, Berjot, & Paty, 2009; Standage, Treasure, et al., 2003) yielded better fit indexes. Moreover, Standage, Treasure, and colleagues (2003) conducted multisample CFAs and showed that the pattern of factor loadings was largely invariant across four different samples.
Construct validity of the SIMS was also assessed in several studies (Blanchard et al., 2007; Conroy et al., 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, et al., 2003). In addition to supporting the simplex pattern among the SIMS subscales and between the SIMS subscales and need satisfaction (study 2 of Blanchard and colleagues, 2007), results also supported the postulate from the HMIEM (Vallerand, 1997) for the top-down effect, in which contextual sport motivation was found to predict situational sport motivation (studies 1 and 2 of Blanchard et al., 2007; Jaakkola et al., 2008; Ntoumanis & Blaymires, 2003). Specifically, the more self-determined the motivation was found to be in a specific context (in this case, sport), the more self-determined the motivation was found to be in a given situation. Furthermore, Blanchard and colleagues (2007, studies 1 and 2) found support for another postulate from the HMIEM that suggests that over time, situational motivation in the realm of sport (basketball) has recursive effects on contextual motivation. The more that situational motivation is self-determined, the more that contextual motivation becomes self-determined over time. Finally, Jaakkola and coworkers (2008) demonstrated that, as predicted by the HMIEM, situational self-determined motivation was better than contextual motivation in predicting the situational intensity (as assessed by HR) displayed by students in a physical education class. Overall, these findings provide strong support for the reliability and factorial and construct validity of the SIMS.
The SIMS has several positive features, one of them being that it is the only scale to assess intrinsic and extrinsic motivation and amotivation at the situational level. Furthermore, it does so using only 16 items. Nevertheless, it also has some weaknesses. First, the SIMS does not assess the different types of intrinsic motivation and integrated and introjected regulation, because it was designed to be short. Second, while the factor structure has been supported, it is not clear if some items should be replaced (Gillet, Berjot, et al., 2009; Jaakkola et al., 2008; Standage, Treasure, et al., 2003). Third, research so far has not assessed the validity of the scale with high-performance athletes. Thus, additional research is needed to further test the psychometric properties of the SIMS in sport.
Learn more about Measurement in Sport and Exercise Psychology.
Ethics codes imperative in conducting research
Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body.
Ethics Codes: Their Nature, Purposes, and Application
Ethics codes typically comprise principles and standards. Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body. These general and aspirational statements set the underlying tone for the more specific codes and guide the work-related ethical decision making of professionals. In contrast, ethical standards specify both proscribed and prescribed member behaviors. While not always black and white, these standards serve as a more clear cut and enforceable guide for professional behavior.
Members should apply both the aspirational principles and enforceable standards to shape their thinking and behavior in work settings. Ideally, members self-monitor their own behavior. In an effort to remain ethical, professionals are encouraged to consult with colleagues about ethically challenging situations and to provide constructive feedback about perceived possibly unethical behavior they witness in others.
Assessment and Measurement
A central question to be addressed in this chapter is what are assessment and measurement. Sundberg (1977) defines assessment as the processes used “for developing impressions and images, making decisions and checking hypotheses about another person's pattern of characteristics that determines his or her behavior in interaction with the environment” (p. 21). The assessment process involves collecting and assembling a broad range of objective and subjective information about persons or groups to develop impressions about them; identify their needs; predict how they might think, feel, and behave in future situations; and select and apply interventions based on the content and dependability of that information. Professionals may use multiple assessment methods that include observations of behavior, symptom checklists, surveys and questionnaires, structured and unstructured interview materials, and standardized tests (Bennett et al., 2006). Gardner and Moore (2006) emphasize using a triad of psychological assessment strategies in the practice of clinical sport psychology: (1) initial interviews, (2) behavioral observation, and (3) psychological testing. The nature and assumptions underlying assessment approaches are usually grounded in the theoretical orientation of the professional (Andersen, 2002).
In contrast, measurement can mean many things to many people. It is one of the most common words in the English language and can be used as both a noun and a verb (Lorge, 1967). For the purposes of this chapter, measurement is viewed as an extension of assessment processes. It can be thought of more narrowly as the process of collecting information about psychological characteristics of interest (e.g., attitudes, behaviors, state experiences) using one or more methods or tools (such as those mentioned earlier) to monitor change, the effect of intervention, or treatments postassessment. For example, an educational sport psychology consultant might administer a measure of team cohesion over the course of a competitive season to see how team members perceive their relationships. Another consultant might conduct a preseason baseline screening assessment of cognitive functioning in hockey players and then reevaluate players who incur a mild traumatic brain injury (i.e., concussion) later in the season.
In this chapter, the terms measurement and assessment are used interchangeably. Furthermore, these terms are used to describe the decisions and opinions made by professionals regarding clients with whom they work. As such, measurement and assessment techniques include all methods of gathering information about clients, such as (a) psychological, educational, and neurological tests; (b) data gathered during clinical interviewing; (c) information gathered from significant others (e.g., family members, teachers, friends); (d) direct and indirect observation; and (e) interactions with people via teletherapy (e.g., Internet, phone; Fisher, 2009).
Competence and Education
In order to excel in our professional duties and do well for those we serve, teach, study, and otherwise interact with, we must know what to do and how to do it in a capable manner. The ethics codes mentioned earlier identify the necessity of being knowledgeable and capable in our work. For example, the APA ethical standards provide guidance for organization members in this area, including information about (a) competence limitations, (b) keeping up competence, (c) making sound professional and scientific judgments, (d) delegating work responsibilities to others, (e) engaging in activities in emergencies, and (f) impairment (APA, 2002). Competence in professional behaviors is a personal matter that is frequently challenged. It is the responsibility of professionals to know their limitations and how their knowledge and skills change and require constant upgrading. The APA ethics code also emphasizes the importance of making sound work-related decisions based on scientific knowledge and appropriate discipline-specific practice. This portion of the APA code cautions professionals to be careful when delegating work to others, describes how a professional is responsible for others' work, and explains the necessity of avoiding multiple relationships with those to whom work is delegated. The APA standards note that we can occasionally be thrown into situations in which our competence is stretched; in such cases we need to be very careful, seek supervision if available, and end such work as soon as possible.
Measurement Referral Questions and Appropriateness of Instruments
When selecting assessment instruments, the professional must consider the referral questions that prompted this process (Fisher, 2009; Smith, 1976). The instruments selected should reflect these referral questions and utilize assessment strategies that have appropriate validity and reliability. For example, if a professional is interested in measuring state anxiety for research purposes, an appropriate assessment may be the Competitive State Anxiety Index-2 (CSAI-2; Martens, Burton, Vealey, Bump, & Smith, 1990) as opposed to the State-Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, & Luschene, 1970), which measures both trait and state anxiety. When selecting the assessment, the professional should be aware of limitations or biases regarding cultural sensitivity (see the later section on cultural issues); gender considerations (Etzel, Yura, & Perna, 1998); and age, language, or disability factors that may influence the psychometric qualities of the assessment differently from the way they influenced the normative groups used for the development and validation of the instrument (APA, 2002; Fisher, 2009). It is also important to consider the method of delivery. For example, assessments based on paper and pencil may not have been validated for online use (see the later section on technology), and instruments with elevated reading levels may not be appropriate for certain age or developmental groups. Therefore, the professional should always verify the assessment's validity and reliability when a modified assessment method or group is used (Fisher, 2009). Furthermore, the professional should also attempt to conduct in-person assessments when possible, as a great deal of information can be learned about clients from the way in which they present themselves during the assessment process. This information can affect the richness of the assessment data.
It is also important for professionals to be aware of and competent to assess and use appropriate psychometric strategies for establishing validity and reliability of the instruments they use (AERA, APA, & NCME, 1999). All instruments have unique psychometric properties that affect how they should be administered and interpreted. When validity and reliability issues are not taken into consideration, it is possible to choose and utilize instruments to assess factors that they were not designed to assess. Furthermore, practitioners should be well aware of other psychometric properties such as content and criterion validity and standard error of measurement that may affect how results are interpreted and used. The ethical practitioner needs to be aware of psychometric issues in order to choose appropriate instruments with regard to the referral questions, client characteristics, assessment strategies, and environmental factors.
Consent and Assent
As discussed earlier, the ethical principles for sport and exercise psychology emphasize doing no harm to the client and respecting the individual's rights and dignity (AASP, 1996; APA, 2002). The test taker's right to privacy and confidentiality applies here as well, and the professional should take all necessary precautions to maintain the confidentiality and privacy of the client. To protect the test taker, informed consent must be obtained at the start of the relationship (e.g., research, consultation, therapy). Beyond the informed consent process and before formal assessment, the client or participant should be informed of all pertinent information regarding the assessment process. This information includes (a) the nature and purpose of assessment; (b) any applicable fees; (c) potential involvement of third parties such as a coach, athletic trainer, or manager; (d) limits of privacy and confidentiality (as discussed in the next section); and (e) the timeline for the process and potential feedback (Fisher, 2009). This information should be presented in a clear and understandable manner. Furthermore, this information should be agreed to by the test taker, who thereby gives informed consent. Test takers should engage in assessment of their own free will and must be given the option to withdraw participation without consequences (APA standard 3.10). All necessary information about assessment procedures and findings should be provided in a language or level appropriate for the participant. Furthermore, it is unethical to necessitate or coerce individuals to take part in measurement and assessment for research or practice purposes.
Privacy and Confidentiality and Release of Information
Typically, the ethical standards of organizations with ties to sport psychology (APA ethical standard 4.01 and the AASP) suggest that professionals should not reveal information about clients, test takers, or others without their signed approval to release information or legal requirement. These legal situations may include (a) a test taker who indicates possible self-harm or harm to others (i.e., suicide or homicide), (b) a test taker whose results are subpoenaed by the court, or (c) a test taker who is a minor, in which case the parent or guardian may have access to the data (Etzel et al., 1998). If the test taker or, in the case of a minor, the parent or guardian provides explicit written permission, the specific information identified by the client may be released to the identified parties. Unless these circumstances are met, information from the test taker may not be disclosed to anyone (e.g., coaches, management, parents, administration, athletic trainers, and so on).
In situations where the assessment is requested by a third party (e.g., coaches, management, the court), this third party may also request results from the assessment. It is important for the professional to establish a priori who is the “real client” (Ogilvie, 1979) and to have the ability to control access to the results. Etzel and colleagues (1998) suggest that information about the assessment should be shared only with one predetermined person, unless a release of information form has been completed. Therefore, when engaging in assessments, the professional should set clear boundaries and avoid dual relationships, thereby identifying who is being served (APA standard 4.02a). Another complication of these situations is the role of trust. If athletes or test takers suspect the test results will be used without their permission in decisions regarding performance or other aspects of participation, they may be less likely to respond honestly, thus affecting the validity of the results (see the section on demand characteristics).
Raw Data and Data Storage
Raw data such as the test taker's responses to items, including the professional's notes and final reports, should be stored in locked file cabinets inside the professional's office or in password-protected computer files (Fisher, 2009). Other methods to ensure confidentiality may include limiting access to records to only those people who have a need to know this information and have been trained to handle and understand it, deidentifying records using code numbers, and appropriately disposing of identifiable records (Fisher, 2009). A good policy for data maintenance is that data should be kept for a minimum of 7 y after the last service delivery date or 3 y after a minor reaches the age of 18 (whichever is later), as is recommended by the APA record-keeping guidelines (APA, 2002; Fisher, 2009). Raw data and the instruments used for assessment purposes should not be released to third parties unless a release of information form has been completed and the third party is trained competently to use such information.
Results Discussion
Test feedback and results discussion should be provided in the form of a carefully constructed report using clear language that fully explains the assessment results. Labels and jargon should be eliminated to increase readability. Information necessary to the purpose of the test should be included, and the inclusion of unnecessary and unrelated information should be avoided (APA, 2002; Fisher, 2009). Additionally, as recommended by the APA (APA, 2002), interpretations should take into consideration the participant's gender, race, ethnicity, age, national origin, sexual orientation, religion, disability, language, or socioeconomic status. Participants should receive assessment information and feedback related to their performance on the assessment and should be informed of ways in which they could personally use the test results or how this information may be used by a third party (only if written permission was given to release such information). The information released to the participant should be presented in a verbal or written report and presented in such a way that it may not cause harm to the test taker (Etzel et al., 1998). However, information such as numerical scores or specific responses should not be released to individuals not qualified to interpret such information (Fisher, 2009; Tranel, 1995).
Demand Characteristics
In the sport context, several groups of individuals may be interested in the assessment results of athletes. Interested parties may include coaches, managers, teams, students, or administrators. However, the potential of a third party reviewing the test results may increase social desirability and result in invalid and unreliable information. Therefore, undue pressure to complete an instrument or battery should be considered as a contextual factor.
Another potentially undesirable effect of a third party viewing the test taker's results may be assessment anxiety. The APA standards state that if a test taker is observed to be anxious or reports feeling anxious, this feeling should be taken into account and become a limitation in the interpretation of test data (APA, 2002). Assessment anxiety may be exaggerated in situations where a third party may have access to results. These situations may also lead to faking good or faking bad on the part of respondents who are concerned about how the results may be used. This must also be considered when evaluating the results.
Supervision of Subordinates
In some cases, professionals may hire and train subordinates to help with assessment and measurement tasks. These subordinates may administer, score, and even interpret the results of measurement and assessment. Standard 2.05 of the APA ethics code (APA, 2002) states that professionals utilizing employees, supervisees, or research and teaching assistants for such purposes should take reasonable precautions to put subordinates in situations where (a) they do not face possibly harmful multiple relationships with the client that could affect their objectivity, (b) they are competently trained to perform the delegated task on their own or with supervision, or (c) they are supervised for competent service delivery. Therefore, when using subordinates to help with tasks such as administration, scoring, or interpretation, the professional assumes primary responsibility and liability to ensure that the services are being provided competently. The professional needs to ensure that subordinates are well trained with all potential instruments. To do so, the professional must provide appropriate training, experience, and supervision as well as continue to check the subordinates' work to ensure its quality. As with licensed professionals, not all subordinates have the same competencies with regard to all instruments.
Learn more about Measurement in Sport and Exercise Psychology.
Tools to measure the physical self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002).
Tools to Measure the Physical Self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002). However, following the research of Shavelson and colleagues (1976), a number of multidimensional self-concept instruments containing one or more PSC scales were developed. Thus, in a 1974 review, Wylie concluded that at the time most self-concept instruments focused on global self-concept or self-esteem rather than specific domains such as PSC. Although several of the instruments reviewed by Shavelson and colleagues (1976) contained items relating to physical skills and elements of physical appearance, none provided a clearly interpretable measure of PSC. From a practical perspective, these older instruments appear to be of little value for sport and exercise psychologists. The major exception, perhaps, is the Physical Estimation and Attraction Scale (PEAS; Sonstroem, 1978, 1997), along with the theoretical model on which it is based. This instrument was designed to measure two global components: estimation (competency) and attraction. While the PEAS may not be the instrument of choice today, it has a historical significance in that its research incorporated many of the features of the construct validity approach advocated in this chapter, it was heuristic, and it provided an important basis for subsequent research.
In a subsequent 1989 review, Wylie identified several multidimensional self-concept instruments measuring one or more components of PSC that can be differentiated from other specific domains of self-concept and general self-concept. Included in the list were the three SDQ instruments already discussed. Wylie also evaluated Harter's (1985) Self-Perception Profile for Children, which contains two PSC scales (athletic competence and physical appearance). Other multidimensional instruments containing physical scales that were not reviewed by Wylie include the Self-Rating Scale (Fleming & Courtney, 1984), which measures physical ability and physical appearance; the Song and Hattie Test (Hattie, 1992), which measures physical appearance; and the Multidimensional Self-Concept Scale (Bracken, 1996), which has a physical scale that includes physical competence, physical appearance, physical fitness, and health. The Tennessee Self-Concept Scale (Fitts, 1965) is a multidimensional self-concept instrument that also purports to measure PSC. In their review and empirical evaluation of this instrument, Marsh and Richards (1988) found distinguishable physical components reflecting health, neat appearance, physical attractiveness, and physical fitness that were incorporated into a single PSC score. This detailed breakdown of the Tennessee physical scale was supported by relationships with the SDQ physical ability and physical appearance scales in an MTMM study comparing responses to the two instruments. Because each of the clusters based on responses to the Tennessee instrument is represented by only a few items, it is not appropriate to use the instrument to measure these distinct components of PSC. Marsh and Richards argued that PSC measures that combine and confound a wide range of differentiable physical components—such as those based on the Tennessee Self-Concept Scale—should be interpreted cautiously (see similar comments by Fox & Corbin, 1989).
In summary, although multidimensional self-concept instruments based on Shavelson and colleagues' (1976) model provided good support for the construct validity of the physical ability and appearance scales (e.g., Marsh, 2002; Marsh & Peart, 1988), they left unanswered the question of whether PSC is more differentiated than can be explained in terms of one (physical ability) or two (ability, appearance) physical scales. Subsequent PSC instruments were developed specifically to address the issue of the multidimensionality of PSC.
Physical Self-Perception Profile
The Physical Self-Perception Profile (PSPP; Fox, 1990; Fox & Corbin, 1989) is a 30-item inventory that consists of four specific scales and one general physical self-worth factor. The PSPP was developed to document the physical self-perceptions of college students. It was designed to reflect the advances made by Harter (1985) and Shavelson and colleagues (1976) in identifying the physical self as an important construct to measure in its own right and to reflect the hierarchical, multidimensional nature of the physical self. A qualitative approach was used to reveal dimensions of physical self-esteem salient to the population sampled (Fox & Corbin, 1989). The PSPP consists of five 6-item scales of sport (perceived sport competence), body (perceived bodily attractiveness), strength (perceived physical strength and muscular development), condition (perceived level of physical conditioning and exercise), and physical self-worth. Fox (1990) recommended that the 10-item Rosenberg Self-Esteem Scale (Rosenberg, 1965) be used alongside the PSPP to provide a global measure. Fox (1990) reported factor analyses indicating that each item loads most highly on the factor that it is designed to measure and that individual scale reliabilities are in the .80s.
The PSPP research demonstrates (a) good reliability (coefficient alpha of .80-.95; Fox, 1990; Page, Ashford, Fox, & Biddle, 1993; Sonstroem, Speliotis, & Fava, 1992); (b) good test-retest stability over the short term (rs of .74-.89; Fox, 1990); (c) a well-defined, replicable factor structure as shown by CFA (Fox & Corbin, 1989; Sonstroem, Harlow, & Josephs, 1994); (d) convergent and discriminant validity in studies showing PSPP relationships with external criteria such as exercise behaviors, mental adjustment variables, and health complaints (Fox & Corbin, 1989; Sonstroem & Potts, 1996); and (e) applicability for an older adult population (Sonstroem et al., 1994). However, correlations among the PSPP scales are consistently so high (.65-.89 when disattenuated for measurement error; Marsh, Richards, Johnson, Roche, & Tremayne, 1994) that they detract from the instrument's ability to differentiate among the different PSC factors it purports to measure.
Subsequently, a version of the PSPP for children and adolescents was developed and validated—the Children and Youth Physical Self-Perception Profile (CY-PSPP; Eklund, Whitehead, & Welk, 1997; Whitehead, 1995). Like the PSPP, the CY-PSPP is a 30-item inventory consisting of the same five 6-item scales. The CY-PSPP is a substantially revised version of the PSPP that is most appropriately thought of as a different instrument. The CY-PSPP body, strength, and conditioning subscales are based on minor adaptations of the PSPP to make them more suitable for children. However, the global self-worth (self-esteem) and sport scales are completely different. The PSPP did not have a self-esteem scale of its own but included 6 items adapted from the Rosenberg Self-Esteem Scale. On the CY-PSPP, global self-esteem and sport scales from the PSPP were dropped and replaced with corresponding scales from Harter's (1985) Self-Perception Profile for Children. Correlations among factors remained high (e.g., physical self-worth with attractive body adequacy = .8). Eklund and colleagues (1997) suggested that these results are consistent with the developmental patterns among children, as differentiation in self-concept is less defined at younger ages (Harter, 1985). CFAs have supported the instrument's factor structure, with both the CFI (comparative fit index) and NNFI (non-normed fit index) indexes exceeding the .90 criterion for good model fit (Eklund et al., 1997). Moderate correlations (r = .39-.45) with external criteria such as physical activity and physical fitness have demonstrated its convergent and discriminant validity (Welk & Eklund, 2005). The CY-PSPP has been validated with adolescents (Jones, Polman, & Peters, 2009; Welk, Corbin, & Lewis, 1995; Whitehead, 1995) and younger children (Welk, Corbin, Dowell, & Harris, 1997) and has been validated and translated into other languages (åsci, Eklund, Whitehead, Kirazci, & Koca, 2005; Raustorp, Ståhle, Gudasic, Kinnunen, & Mattsson, 2005; Raustorp, Mattsson, Svensson, & Ståhle, 2006).
Both the PSPP and CY-PSPP use a nonstandard response format based on Harter (1985), in which each item consists of a matched pair of statements, one negative and one positive (e.g., “Some people feel that they are not very good when it comes to sports” but “Others feel that they are really good at just about every sport”). Each item consists of two contrasting descriptions, and respondents are asked which description is most like them and whether the description they select is “Sort of true of me” or “Really true of me.” Responses are scored on a scale of 1 to 4, with 1 representing a “Really true of me” response to the negative statement and 4 representing a “Really true of me” response to the positive statement. Whereas this response format is designed to reduce the influence of social desirability, Wylie's (1989) review of Harter's original instruments provided little or no support for this suggestion, and Marsh and colleagues (1994) suggested that there were substantial method effects associated with the nonstandard response scale. This format has also been shown to be confusing, particularly for children (Eiser, Eiser, & Haversmans, 1995), and even for adults (Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994), unless special care is taken to explain the response scale. Using the suggestion of Marsh and colleagues (1994) that confusion over the structured alternative response scale could be overcome by more detailed instructions at the outset, researchers implementing the CY-PSPP used large illustrations for a sample item (Whitehead, 1995). Wichstrom (1995) found that responses for this format were psychometrically stronger when based on typical Likert responses rather than the structured alternative format, but Welk and colleagues (1997) suggested that the nonstandard response scale on the CY-PSPP worked better than Likert responses worked.
In summary, the PSPP and the CY-PSPP are established instruments that have been translated into several languages and have been used with a range of populations. However, the format and the high correlations among factors in both instruments may limit their usefulness in some settings. The CY-PSPP is a substantially revised version of the PSPP specifically developed for children. Although the CY-PSPP should be used instead of the PSPP for child and adolescent samples, it might even be stronger than the original PSPP is for adult samples.
Subsequent to the completion of this chapter, Lindwall and colleagues (2011) published a revised version of the PSPP (PSPP-R). They reviewed critiques of the PSPP response scale such as those noted here (e.g., Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994) and acknowledged that “the idiosyncratic alternative response format has been difficult to understand for some participants” (pp. 310-311). In recognition of these problems, the idiosyncratic response scale that has been such a salient feature of the PSPP was dropped altogether and replaced with a 4-point Likert response using only positively worded items. Lindwall and colleagues (2011) demonstrated the appropriateness of the revised PSPP-R based on a large sample (N = 1,831) of participants from four countries (Sweden, Great Britain, Portugal, and Turkey). However, they did not indicate whether the PSPP-R supersedes the PSPP or is merely an alternative to it. There also wasn't any discussion of the implications for other instruments using similar idiosyncratic response scales (e.g., PSPP- related instruments such as CY-PSPP or Harter's instruments more generally).
Physical Self-Inventory
The Physical Self-Inventory (PSI) is a French adaptation of the PSPP that was originally developed for use with Francophone adults (Ninot, Delignières, & Fortes, 2000). In two preliminary studies, Ninot and colleagues used the nonstandard response scale from the PSPP. However, consistent with previous research (Marsh et al., 1994), they reported that this response scale was problematic. In a third study, the authors used a 6-point Likert response scale; factor analysis results were reasonable, but reliability coefficients were not completely satisfactory. Next the authors replaced the PSPP global physical items with items from the SDQ physical scale and the PSPP global self-esteem items with items from Coopersmith (1967). The final PSI consists of 25 items measuring six PSC factors (four specific and two global, as with the PSPP) and has satisfactory psychometric properties that have been confirmed in subsequent French studies of adults (Masse, Jung, & Pfister, 2001; Stephan, Bilard, Ninot, & Delignières, 2003; Stephan & Maïano, 2007).
Maïano and coworkers (2008) subsequently constructed a short form of the PSI for use with adolescents. They found that not all items from the adult PSI worked with adolescents, but they were able to construct 18-item (PSI-SF, 3 items per scale) and 12-item (PSI-VSF, 2 items per scale) versions that had good psychometric properties. In particular, the measurement and hierarchical structures were consistent with proposals by Fox and Corbin (1989) and were fully invariant across gender. Maïano and coworkers also noted that PSI-SF responses showed very high test-retest stability. Comparison of the PSI-SF and PSI-VSF demonstrated that the measurement model, mean structure, structural parameters, and criterion-related validity were equivalent across samples and versions. Nevertheless, the authors noted a serious limitation that all versions of the PSI share with the PSPP: Very high correlations among the six PSC factors (correlations among latent factors) that, according to the authors, bring “into question the real independence of some of the models' sub-dimensions, and by extension their discriminant validity, a finding that has already been observed by Marsh (2002; Marsh et al., 2006) on analyses of the PSPP” (Maïano et al. 2008, p. 844). However, Maïano and colleagues also noted that because they used a traditional Likert response scale, the high correlations apparently were not due to the structured alternative format used in the PSPP. In summary, particularly the short and very short forms of the PSI have made a potentially important contribution to applied research. However, further research is needed to evaluate more fully the robustness of support for construct validity and application in non-French-speaking settings.
Richards Physical Self-Concept Scale
The Richards Physical Self-Concept Scale (RPSCS; Marsh et al., 1994; Richards, 1988) is a 35-item instrument designed to measure six specific components of PSC (body build, appearance, health, physical competence, strength, action) and one general physical satisfaction factor. Each item is a simple declarative statement, and subjects respond on an 8-point true-false scale. Extensive research in Australia (e.g., Marsh et al., 1994; Richards, 1988) has indicated that RPSCS responses have good psychometric properties. The factor structure is very robust, generalizing well over ages from 8 to 80 y and over gender.
RPSCS research has demonstrated (a) good reliability (coefficient alpha of .79-.93; Marsh et al., 1994; Richards & Marsh, 2005); (b) good test-retest stability over the short term (coefficient alpha of .77-.90 over 3 wk; Richards, 1988); (c) a well-defined, replicable factor structure as shown by CFA (Marsh et al., 1994; Richards, 2004); (d) a factor structure that is invariant across gender, as shown by multiple-group CFA (Richards, 2004), and across a wide age range; (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (Marsh et al., 1994; Richards & Marsh, 2005); and (f) applicability for participants aged 8 to 60 y and for both genders (Marsh et al., 1994; Richards, 1988, 2004; Richards & Marsh, 2005). In summary, the RPSCS is regarded as a valid, reliable, and structurally sound instrument that has been tested across both genders and a wide population of ages. The applicability across such a wide range of ages is a particular strength.
Physical Self-Description Questionnaire
Extending Fleishman's (1964) classic research on the structure of physical fitness, the Physical Self-Description Questionnaire (PSDQ) scales reflect some of the original SDQ scales and parallel physical fitness components identified in a CFA of physical fitness measures (Marsh, 1993). The PSDQ consists of nine specific components of PSC (strength, body fat, activity, endurance and fitness, sport competence, coordination, health, appearance, and flexibility), a global physical scale, and a global self-esteem scale. Each of the 70 PSDQ items is a simple declarative statement, and individuals respond on a 6-point true-false scale. The PSDQ is designed for adolescents but is also appropriate for older participants.
PSDQ research has demonstrated (a) good reliability (median coefficient alpha of .92) across the 11 scales (Marsh, 1996b; Marsh et al., 1994); (b) good test-retest stability over the short term (median r = .83 over 3 mo) and longer term (median r = .69 over 14 mo; Marsh, 1996b); (c) a well-defined, replicable factor structure as shown by CFA (Marsh, 1996b; Marsh et al., 1994); (d) a factor structure that is invariant over gender as shown by multiple-group CFA (Marsh et al., 1994); (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (see Marsh et al., 1994); (f) convergent and discriminant validity as shown by PSDQ relationships with external criteria (e.g., measures of body composition, physical activity, endurance, strength, and flexibility; see Marsh, 1996a, 1997); and (g) applicability for participants aged 12 to 18 y (or older) and for elite athletes and nonathletes (Marsh, Hey, Roche, & Perry, 1997; Marsh, Perry, Horsely, & Roche, 1995). In summary, the PSDQ is a psychometrically strong instrument.
Marsh, Martin, and Jackson (2010) recently presented a new short form of the PSDQ (PSDQ-S). This short form balances brevity and psychometric quality in relation to established guidelines for evaluating short forms (e.g., Marsh, Ellis, Parada, Richards, & Heubeck, 2005; Smith, McCarthy, & Anderson, 2000) with the construct validity approach that is the basis of PSDQ research. Based on the PSDQ normative archive, 40 of 70 items were selected and evaluated in a new cross-validation sample (N = 708 Australian adolescents). To test the generalizability of results, the authors considered four additional samples: Australian adolescent elite athletes (n = 349), Spanish adolescents (n = 986), Israeli university students (N = 395), and Australian senior citizens (n = 760). Reliabilities for the 40 PSDQ-S items were consistently high in the cross-validation sample (.81-.94; median = .89) and senior sample (.81-.94; median = .91) and reliabilities in the cross-validation sample were higher than they were in comparable groups completing the 70-item PSDQ. The PSDQ-S factor structure in the cross-validation sample was well defined and highly similar to that based on the archive sample as well as to those based on the other four groups. Study 1, using a missing-by-design variation of multigroup invariance tests, showed that invariant factor structures were invariant based on 40 PSDQ-S items and 70 PSDQ items. Study 2 demonstrated factorial invariance of responses over 1 y (test-retest correlations of .57-.90; median = .77) and good support for convergent and discriminant validity in relation to time. Study 3 showed good and nearly identical support for convergent and discriminant validity of PSDQ and PSDQ-S responses in relation to responses on the PSPP and PSC instruments. The four studies reported by Marsh and coworkers demonstrated new, evolving strategies for the construction and evaluation of short forms that support the PSDQ-S. The authors concluded that the strong support for the psychometric properties and construct validity of the widely used PSDQ instrument generalizes very well to the PSDQ-S.
Elite Athlete Self-Description Questionnaire
The PSC instruments discussed thus far may be suitable for elite athletes (e.g., Marsh et al., 1995). There may, however, be other components to PSC that are particularly relevant for elite athletes, and thus the Elite Athlete Self-Description Questionnaire (EASDQ; Marsh, Hey, Roche, et al., 1997; Marsh, Hey, Johnson, & Perry, 1997) was developed to address these other components. For the EASDQ, it was hypothesized that overall performance by elite athletes is a function of skill level, body suitability, aerobic and anaerobic fitness, and mental competence. Thus Marsh and colleagues developed the EASDQ to measure these six factors. For each scale, they developed a pool of items that sport psychologists at the Australian Institute of Sport evaluated for their suitability for elite athletes. Pilot studies were conducted to select the best items to represent each factor. A compromise between brevity and psychometric soundness was achieved, with acceptable levels of reliability (e.g., all scales having reliability estimates of at least .8) based on short scales (4-6 items per scale).
EASDQ research demonstrates (a) adequate reliability (median coefficient alpha of .85) across the six scales (Marsh, Hey, Johnson, et. al., 1997); (b) a well-defined, replicable factor structure as shown by CFA (Marsh, Hey, Johnson, et. al., 1997; Marsh, Hey, Roche, et al., 1997); (c) applicability for elite athletes aged 12 y or older (Marsh, Hey, Roche, et al., 1997); and (d) predictive validity as shown by its ability to predict swimming performances in world championships after controlling for previous personal best performances (Marsh & Perry, 2005). In summary, the EASDQ is a reliable and valid instrument for elite athletes of all ages. More research is needed, however, to relate EASDQ responses to external validity criteria such as those used in PSDQ research and to criteria that are more specific to elite athletes (e.g., actual performance in competition).
Learn more about Measurement in Sport and Exercise Psychology.
Assess intrinsic and extrinsic motivation in sport and exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted.
Evaluation of Measures of Intrinsic and Extrinsic Motivation in Sport and Exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted. Certain criteria have guided the selection of the measures presented in this section. First, we have selected measures that are fully developed instruments that have gone through extensive validation steps. Second, we have chosen scales that have been used in research, published or unpublished, during the past 10 years. Scales that have not been used during that time frame are considered to be obsolete and are not reviewed. Finally, in light of recent theoretical development and because of space limitation, we have focused on motivation scales that assess intrinsic and extrinsic motivation independently of determinants and outcomes, while focusing on the perceived reasons of behavior. Our earlier discussion on the definitions of intrinsic and extrinsic motivation makes it possible to classify the different measures. The measures can vary in terms of the level of generality (situational versus contextual level) and the area (sport versus exercise). This classification appears in table 25.1. Table 25.2 (see p. 291) provides additional information on the concept of, dimensions of, publication source of, and where to obtain the scale. As can be seen, seven measures are reviewed. For each one, we present (a) a description of the instrument, (b) the conceptual and theoretical rationale underlying its scale development, (c) the available evidence concerning its psychometric properties (e.g., factorial validity, reliability, and construct validity), and (d) a broad assessment of the strengths and weaknesses associated with each measure.
Measures Used in Sport
In this section, we review the SMS (Brière et al., 1995; Pelletier et al., 1995), the Sport Motivation Scale-6 (SMS-6; Mallett, Kawabata, Newcombe, Otero-Rorero, & Jackson, 2007), the Behavioral Regulation in Sport Questionnaire (BRSQ; Lonsdale, Hodge, & Rose, 2008), the Pictorial Motivation Scale (PMS; Reid, Vallerand, Poulin, Crocker, & Farrell, 2009), and the SIMS (Guay et al., 2000).
Sport Motivation Scale
The SMS was developed (Brière et al., 1995; Pelletier et al., 1995) in order to assess contextual intrinsic and extrinsic motivation from a multidimensional perspective, as well as amotivation. The SMS has been the most often used motivation measure in sport, being employed with a variety of athletes (recreational to elite), age groups (adolescent to senior), and cultures (e.g., Canada, United States, United Kingdom, Bulgaria, Australia, Spain, and New Zealand). In fact, the SMS has been translated and validated in several languages (see Pelletier & Sarrazin, 2007). The SMS is based on SDT (Deci & Ryan, 1985) and is made up of seven subscales assessing amotivation; external, introjected, and identified regulation; and intrinsic motivation to know, to experience stimulation, and to accomplish. In line with SDT, motivation is assessed as the perceived reasons for participation, or the why of behavior. At the beginning of the scale, participants are asked, “In general, why do you practice your sport?” The items represent the perceived reasons for engaging in the activity, thus reflecting the different types of motivation.
The original scale was developed in French as L'Échelle de Motivation dans les Sports (Brière, Vallerand, Blais, & Pelletier, 1995) and was validated in three steps. The first step involved generating a pool of items explaining various reasons for sport participation through interviews with French Canadian athletes (aged 17-20 y). These reasons were then used to formulate items for the seven subscales of the French SMS. In the second step, a committee of experts evaluated the content validity of the items and eliminated those that were thought to be inadequate. Another sample of athletes from various sports completed the scale. Results from an exploratory factor analysis (EFA) provided support for a seven-factor structure with 4 items per subscale; this second step thus resulted in a 28-item scale. In the third and final step, two additional studies were conducted to further validate the scale. These studies included approximately 500 individuals, most of whom were involved in recreational sports. Results from confirmatory factor analyses (CFA) and correlational analyses confirmed the seven-factor structure, the subscale internal consistency (ranging from .65-.96), and moderate to high indexes of temporal stability (ranging from .54-.82) over 1 month. Furthermore, inspection of correlations among the seven SMS subscales provided support for the simplex pattern proposed by SDT. Results of correlations also showed that (in line with SDT) the most self-determined forms of motivation (intrinsic motivation and identified regulation) were related more strongly to determinants such as autonomy support from coaches and feelings of competence than to other forms of motivation (external and introjected regulation) and amotivation. Similar results were obtained with motivational outcomes such as positive affect, concentration, and intentions to pursue engagement in sport. In sum, adequate construct validity was obtained for the French form of the SMS.
The translation of the French SMS into English involved back-translation and committee procedures as suggested by Vallerand (1989). Pelletier and colleagues (1995) conducted two studies involving college athletes from various sports in order to assess the psychometric properties of the English form of the SMS. Results from CFA with a sample of 593 Canadian university athletes revealed adequate fit indices or the hypothesized seven-factor model (see the Adjusted Goodness of Fit Index and the Normed Fit Index both > .90 and the Root Mean Square Residual < .08), and correlations with determinants and outcomes supported the simplex model. Moreover, internal consistency above .70 was obtained on all of the subscales except the identified subscale (.63). Test-retest correlations were acceptable and very similar to those obtained with the French SMS, as was the scale construct validity.
Since 1995, the SMS has been used extensively in sport psychology research. The seven-factor structure has been supported repeatedly (e.g., Doganis, 2000; Gillet, Vallerand, & Rosnet, 2009; Li & Harmer, 1996; Shaw, Ostrow, & Beckstead, 2005; Standage, Duda, & Ntoumanis, 2003). In addition, Hu and Bentler (1999) obtained support for a five-factor model by combining the three types of intrinsic motivation into one factor. Similar results were obtained by Gillet and colleagues (2009) with the French SMS. However, some studies have not supported the seven-factor model (Hodge, Allen, & Smellie, 2008; Mallett, Kawabata, & Newcombe, 2007; Mallett, Kawabata, Newcombe, & Otero-Rorero, 2007; Martens & Webber, 2002). Why is there such a discrepancy between these two sets of studies? One possibility lies in the populations from which the different samples were taken. Specifically, the SMS was validated using adolescent and young adult athletes and not older athletes. Because of this specific focus, some of the items may reflect a participation rather than an elite orientation, which is more in line with the younger population. For instance, an identified regulation item reads, “Because sport is one of the best ways to maintain good relationships with my friends.” Such an item seems more relevant for a younger population. An older, high-level athlete may disagree with this item but still display a high level of identified regulation for a sport (but not for relationship reasons). Future research using the SMS with different age groups and proficiency levels is needed to clarify this issue.
Whereas the internal consistency of the SMS has systematically shown adequate values, some values below .70 have been found. This is especially the case for the identified regulation subscale (Brière et al., 1995; Kingston, Horrocks, & Hanton, 2006; Li & Harmer, 1996; Pelletier et al., 1995), although some lower values (below .70) have been obtained with the introjected (McNeill & Wang, 2005; Perreault & Vallerand, 2007; Riemer, Fink, & Fitzgerald, 2002; Standage, Duda, & Ntoumanis, 2003) and external regulation (Standage, Duda, & Ntoumanis, 2003) and amotivation subscales (Standage, Duda, & Ntoumanis, 2003). However, very few instances of values below .60 have been obtained. It should be noted that a Cronbach alpha of .60 with only 4 items is acceptable because, as noted by Cronbach (1951), the coefficient alpha underestimates the internal consistency of scales with a low number of items. This is because the coefficient alpha includes the number of items in the formula. For instance, given the same average interitem correlation, a 3-item scale coefficient alpha value of .56 is equivalent to an alpha value of .81 on an 8-item scale!
In line with the original work of Ryan and Connell (1989) and the initial SMS validation procedures (Brière et al., 1995; Pelletier et al., 1995), construct validity has been assessed by other authors in two fashions: (1) with the simplex pattern of correlations among the subscales and (2) with correlations between motivational factors and their determinants and consequences. We do not have space to review all studies. However, overall, there is overwhelming support for the construct validity of the SMS both in French and English. For instance, in addition to finding support for the simplex pattern, Pelletier and Sarrazin (2007) concluded in their review of the evidence that the SMS has been used with success to predict a great variety of specific outcomes and consequences (such as burnout, exercise dependence among endurance athletes, fear of failing, adaptive coping skills, perceptions of constraints, flow, vitality and well-being, sporting behavior orientations, aggression, and performance) in a manner that is consistent with SDT. These findings provide strong support for the construct validity of the SMS.
In sum, the SMS has some positive features. First, it is a multidimensional instrument that assesses different types of intrinsic and extrinsic motivation as well as amotivation. Second, the scale focuses on the why of behavior and thus items are not confounded with determinants and consequences. Finally, it has some excellent psychometric properties. Nevertheless, some limitations should be underscored. First, although internal consistency levels have been acceptable overall, some subscales, especially the identified regulation subscale, have yielded relatively low coefficient alphas at times. Second, the SMS does not assess integrated regulation. Third, the seven-factor structure has not always been supported by CFAs. According to Pelletier, Vallerand, and Sarrazin (2007), this may be explained by a host of factors, including differences in sample sizes, variations in the way the instrument is administrated, or some other characteristics specific to the context of the study. However, as already indicated, it is also possible that the SMS is better suited for a younger, nonelite athlete population. Clearly, future research on this issue is in order.
Sport Motivation Scale-6
Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed another version of the SMS, the SMS-6. This scale has the same underlying rationale that the original SMS scale but was designed to improve the original version of the SMS by including an integrated regulation subscale and attempting to solve some of the inconsistencies with the factor structure and some of the relatively low internal consistency values (below .70). The SMS-6 comprises 24 items, 4 for each of the six subscales, which include amotivation; external, introjected, identified, and integrated regulation; and general intrinsic motivation. Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed 5 items for the integrated regulation subscale as well as 7 other items (4 of which were kept in the final scale) to replace some items in the original SMS. Two samples were used to validate the SMS-6. Sample 1 was composed of 501 first-year university students participating in competitive sport at least twice per week and 113 elite athletes representing Australia at the international level (for a total of 614 participants). Sample 1 was used to derive a factor structure that included the SMS items as well as the reformulated and integrated regulation items. Sample 2 was composed of 557 university students who were engaged in a variety of sports or physical activities twice per week. The second sample was used to confirm the structure of the SMS-6. Participants also completed the Dispositional Flow Scale (DFS).
Results of a CFA with the SMS-6 (with sample 2) provided support for the factor structure as well as for the internal consistency values (all above .70). Concerning the construct validity of the SMS-6, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) reported a rather weak simplex pattern of correlations among the subscales. More specifically, external regulation correlated highly with intrinsic motivation (r = .54), while the correlation between identified regulation and intrinsic motivation was very high (r = .91) and was higher than the one between integrated regulation and intrinsic motivation (r = .75). The construct validity of the SMS-6 was not fully supported, as some of the correlations involving the SMS and flow were not as expected by SDT. For instance, the distinctions among integrated regulation, identified regulation, and intrinsic motivation were not always clear. Furthermore, external regulation revealed some positive and sometimes strong correlations with flow, contrary to hypotheses derived from SDT.
In sum, the SMS-6 contains some nice features. First, it contains an integrated regulation subscale. Furthermore, the addition of 4 new items may make the SMS more acceptable for older and more experienced athletes. Second, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) presented results supporting the validity of a variation of the SMS-6, the SMS-8. The SMS-8 contains the same items that the SMS-6 contains but assesses the three types of intrinsic motivation rather than general intrinsic motivation. The SMS-6 also shows some limitations. First, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) proposed 7 new items to replace those that were presumably problematic in the original SMS. However, only 4 of these items made it to the final version. Thus, it appears that the SMS-6 retained much of the original SMS. Second, even some of the new items appear problematic and may not assess the desired construct (see Pelletier et al., 2007). For instance, a new amotivation item (“I don't seem to be enjoying my sport as much as I previously did”) seems to reflect a decrease in intrinsic motivation rather than amotivation. Finally, results from Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) demonstrated that the integrated regulation subscale may lack discriminant validity, leading to results with flow highly similar to identified regulation and intrinsic motivation.
Behavioral Regulation in Sport Questionnaire
Lonsdale and colleagues (2008) developed the BRSQ to create an alternative measure of elite sport motivation as conceptualized by SDT. However, in contrast to Mallett, Kawabata, Newcombe, and Otero-Rorero (2007), these authors used a complete new pool of items developed by SDT experts and competitive athletes. There are two versions of the BRSQ. The BRSQ-8 contains 32 items assessing integrated, identified, introjected, and external regulation; amotivation; and the three forms of intrinsic motivation (knowledge, experience stimulation, and accomplishment) identified by Vallerand (1997). The BRSQ-6 contains the same items but assesses general intrinsic motivation rather than all three types of intrinsic motivation, for a total of 24 items.
Lonsdale and colleagues (2008) conducted a series of three studies to validate the scale. In the first study, the factorial validity and the internal consistency were assessed with 382 New Zealand elite athletes. Results from a CFA on the 32 items supported the factor structure of the BRSQ. Specifically, fit indexes were acceptable and all items loaded significantly on the appropriate factors (they ranged from .58-.91). Finally, internal consistency of the eight subscales, measured with the Cronbach alpha, showed high values ranging from .71 to 91. Additionally, 1 wk test-retest reliability was tested with 34 competitive adult athletes. Coefficient alphas for all subscales supported the temporal reliability (values ranged from .73-.90).
In a second study with 343 athletes from New Zealand, the results of a CFA on the BRSQ-8 supported once more the factor structure as well as the subscale internal consistency. Lonsdale and colleagues (2008) also showed that the factor structure of the BRSQ-6 model fit the data very well and that subscale coefficient alphas all exceeded .78. Moreover, the construct validity of the BRSQ-6 was assessed by testing for a simplex pattern of correlations among the six subscales. While some relationships were in line with predictions (e.g., amotivation was negatively related to intrinsic motivation), there was a lack of discrimination between some subscales. More specifically, there was no difference between external and introjected regulation scores in terms of their relationships with amotivation. A similar pattern was evident with the identified and integrated regulation subscales, which both had similar high correlations with intrinsic motivation. These results with the simplex pattern were replicated in a third study conducted with nonelite athletes. In this third study, Lonsdale and colleagues also assessed the relationships between the BRSQ-6 and indexes of burnout (Lemyre, Treasure, & Roberts, 2006; Raedeke & Smith, 2001) and flow (Jackson & Eklund, 2002). Overall, results supported hypotheses in line with SDT. Specifically, amotivation and external and introjected regulation showed negative correlations with flow and positive correlations with burnout. The opposite pattern of correlations was found for the self-determined subscales (intrinsic motivation and identified and integrated regulation). However, there was a lack of discrimination between integrated regulation and general intrinsic motivation. Results of another study on burnout (Lonsdale, Hodge, & Rose, 2009) replicated these findings. Thus, overall, the support for the construct validity of the BRSQ-6 appears to be mixed.
It should be underscored that the BRSQ has some nice features. First, the scale is designed in such a way that the researcher can decide to use a multidimensional (BRSQ-8) or unitary (BRSQ-6) conceptualization of intrinsic motivation. Second, the scale is rather short, with 4 items per subscale. Finally, it assesses integrated regulation. At the same time, the BRSQ also displays some limitations. First, additional research is needed on the construct validity of the scale. Whereas there is support distinguishing the self-determined subscales (intrinsic motivation and identified and integrated regulation) from the non-self-determined subscales (external and introjected regulation), the finer discrimination within each type of category appears to be lacking. Such evidence is crucial, and future research is needed in order to show that this scale does indeed assess the SDT constructs rather than two broad sets of subscales tapping self-determined versus non-self-determined motivation. Second, this scale is designed specifically for older participants in competitive sport; it remains to be seen if the BRSQ can be used with younger participants, for whom the integrated regulation subscale may not have full meaning. Finally, research is needed to test the temporal stability of the scale over a time framed longer than 1 week.
Pictorial Motivation Scale
The PMS was designed to measure intrinsic and extrinsic motivation for sport and exercise in people with an intellectual disability. It assesses participants' reasons for engaging in sport and exercise. The scale's main characteristics are drawings depicting each of the 20 items. There are 5 items (pictures) for each of four subscales: intrinsic motivation, self-determined extrinsic motivation (a mixture of integrated and identified regulation), non-self-determined extrinsic motivation (a mixture of introjected and external regulation), and amotivation. These pictures are used to help participants with cognitive difficulties and to help represent the motivational concept depicted in each item.
The original scale was developed in French (Reid, Poulin, & Vallerand, 1994). Results of a study with 62 participants supported the internal consistency, temporal stability, and construct validity, as exemplified by the presence of a simplex pattern among the four subscales. However, the amotivation subscale had poor reliability (α = .52). The French version (Reid et al., 2009) was translated into English according to the back-translation and committee procedures outlined in Vallerand (1989). Then, 6 new items were generated for the less reliable amotivation subscale. Participants in the Special Olympics (n = 160) completed the English version. Results of the CFA confirmed the four-factor structure of the PMS. Furthermore, the internal consistency (Cronbach alphas) ranged from .60 to .71. Finally, the construct validity was assessed by testing for a simplex pattern of correlations among the four subscales. The intercorrelations among latent variables from the CFA provided support for the simplex pattern.
Results from a study conducted with the English version of the PMS involving 80 high school students with mild intellectual disability provided support for the internal consistency, temporal stability (over 3 wk), and construct validity of the PMS with respect to the simplex pattern of correlations among the PMS subscales as well as correlations between the PMS subscales and motivational antecedents (skill and perceived competence) and outcomes (perceived effort) as rated by the physical education teacher. Finally, the internal consistency of each subscale was tested without the pictorial dimension with a subset of 47 high school students with mild intellectual disability. Results indicated poor internal consistency (.91 for intrinsic motivation, .27 for self-determined extrinsic motivation, .20 for non-self-determined extrinsic motivation, and .60 for amotivation). This finding suggests that the scale is not reliable without the drawings.
The preliminary findings with the English version of the PMS are encouraging. Furthermore, this scale is the only one geared for individuals with intellectually disability. The use of drawings to depict the various items makes this scale unique in the field. Nevertheless, the PMS shows some limitations. First, the scale does not differentiate among all forms of intrinsic (knowledge, stimulation, and accomplishment) or extrinsic ( integrated, identified, introjected, and external regulation) motivation. Second, construct validity was tested with only a limited number of variables. Third, it is not known if the scale is usable with children who have severe forms of intellectual disabilities. Clearly additional research is needed on the reliability and validity of the PMS.
Situational Motivation Scale
The SIMS is one of the few scales to assess intrinsic and extrinsic motivation and amotivation at the situational level (Guay et al., 2000). The SIMS is a multidimensional tool that measures four types of motivation: intrinsic motivation, identified regulation, external regulation, and amotivation. The SIMS is made up of 16 items (4 items per subscale) and asks this question: “Why are you currently engaged in this activity?” The items represent potential reasons for task engagement. The scale is worded in such a way that it can be used in most situations (sport and nonsport).
Five studies were reported in the original article. In study 1, the original scale was developed by a committee of experts and completed by 195 French Canadian college students. Results of an EFA revealed a four-factor structure with the final 16 items loading on their respective factor. In study 2, a CFA confirmed the factor structure as well as its invariance across gender. Across the five studies, the internal consistency values of the subscales were acceptable, ranging from .62 to .95 (see Guay et al., 2000). Moreover, across all studies, support was obtained for the construct validity of the SIMS through results from correlations in line with the simplex pattern among the subscales as well as between the SIMS subscales and motivational determinants and consequences. Perhaps of greater interest for the present discussion were the results of study 4, which showed that some subscales (intrinsic motivation and identified regulation) were sensitive enough to detect changes in motivation that took place during two games of a basketball tournament.
Other researchers have also obtained support for the psychometric properties of the SIMS. First, all studies reported acceptable internal consistency values for each subscale (Blanchard, Mask, Vallerand, de la Sablonnière, & Provencher, 2007; Conroy, Coatsworth, & Kaye, 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, Duda, & Prusak, 2003). The coefficient alpha values of all but the amotivation subscale (α = .58) in the Conroy and colleagues study were above .60. Second, support for the factorial validity of the SIMS was obtained through CFAs with one qualification. Whereas the CFA results with the 16 items yielded acceptable fit indexes, removal of 1 item (Jaakkola, Liukkonen, Laakso, & Ommundsen, 2008) and even 2 items (Gillet, Berjot, & Paty, 2009; Standage, Treasure, et al., 2003) yielded better fit indexes. Moreover, Standage, Treasure, and colleagues (2003) conducted multisample CFAs and showed that the pattern of factor loadings was largely invariant across four different samples.
Construct validity of the SIMS was also assessed in several studies (Blanchard et al., 2007; Conroy et al., 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, et al., 2003). In addition to supporting the simplex pattern among the SIMS subscales and between the SIMS subscales and need satisfaction (study 2 of Blanchard and colleagues, 2007), results also supported the postulate from the HMIEM (Vallerand, 1997) for the top-down effect, in which contextual sport motivation was found to predict situational sport motivation (studies 1 and 2 of Blanchard et al., 2007; Jaakkola et al., 2008; Ntoumanis & Blaymires, 2003). Specifically, the more self-determined the motivation was found to be in a specific context (in this case, sport), the more self-determined the motivation was found to be in a given situation. Furthermore, Blanchard and colleagues (2007, studies 1 and 2) found support for another postulate from the HMIEM that suggests that over time, situational motivation in the realm of sport (basketball) has recursive effects on contextual motivation. The more that situational motivation is self-determined, the more that contextual motivation becomes self-determined over time. Finally, Jaakkola and coworkers (2008) demonstrated that, as predicted by the HMIEM, situational self-determined motivation was better than contextual motivation in predicting the situational intensity (as assessed by HR) displayed by students in a physical education class. Overall, these findings provide strong support for the reliability and factorial and construct validity of the SIMS.
The SIMS has several positive features, one of them being that it is the only scale to assess intrinsic and extrinsic motivation and amotivation at the situational level. Furthermore, it does so using only 16 items. Nevertheless, it also has some weaknesses. First, the SIMS does not assess the different types of intrinsic motivation and integrated and introjected regulation, because it was designed to be short. Second, while the factor structure has been supported, it is not clear if some items should be replaced (Gillet, Berjot, et al., 2009; Jaakkola et al., 2008; Standage, Treasure, et al., 2003). Third, research so far has not assessed the validity of the scale with high-performance athletes. Thus, additional research is needed to further test the psychometric properties of the SIMS in sport.
Learn more about Measurement in Sport and Exercise Psychology.
Ethics codes imperative in conducting research
Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body.
Ethics Codes: Their Nature, Purposes, and Application
Ethics codes typically comprise principles and standards. Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body. These general and aspirational statements set the underlying tone for the more specific codes and guide the work-related ethical decision making of professionals. In contrast, ethical standards specify both proscribed and prescribed member behaviors. While not always black and white, these standards serve as a more clear cut and enforceable guide for professional behavior.
Members should apply both the aspirational principles and enforceable standards to shape their thinking and behavior in work settings. Ideally, members self-monitor their own behavior. In an effort to remain ethical, professionals are encouraged to consult with colleagues about ethically challenging situations and to provide constructive feedback about perceived possibly unethical behavior they witness in others.
Assessment and Measurement
A central question to be addressed in this chapter is what are assessment and measurement. Sundberg (1977) defines assessment as the processes used “for developing impressions and images, making decisions and checking hypotheses about another person's pattern of characteristics that determines his or her behavior in interaction with the environment” (p. 21). The assessment process involves collecting and assembling a broad range of objective and subjective information about persons or groups to develop impressions about them; identify their needs; predict how they might think, feel, and behave in future situations; and select and apply interventions based on the content and dependability of that information. Professionals may use multiple assessment methods that include observations of behavior, symptom checklists, surveys and questionnaires, structured and unstructured interview materials, and standardized tests (Bennett et al., 2006). Gardner and Moore (2006) emphasize using a triad of psychological assessment strategies in the practice of clinical sport psychology: (1) initial interviews, (2) behavioral observation, and (3) psychological testing. The nature and assumptions underlying assessment approaches are usually grounded in the theoretical orientation of the professional (Andersen, 2002).
In contrast, measurement can mean many things to many people. It is one of the most common words in the English language and can be used as both a noun and a verb (Lorge, 1967). For the purposes of this chapter, measurement is viewed as an extension of assessment processes. It can be thought of more narrowly as the process of collecting information about psychological characteristics of interest (e.g., attitudes, behaviors, state experiences) using one or more methods or tools (such as those mentioned earlier) to monitor change, the effect of intervention, or treatments postassessment. For example, an educational sport psychology consultant might administer a measure of team cohesion over the course of a competitive season to see how team members perceive their relationships. Another consultant might conduct a preseason baseline screening assessment of cognitive functioning in hockey players and then reevaluate players who incur a mild traumatic brain injury (i.e., concussion) later in the season.
In this chapter, the terms measurement and assessment are used interchangeably. Furthermore, these terms are used to describe the decisions and opinions made by professionals regarding clients with whom they work. As such, measurement and assessment techniques include all methods of gathering information about clients, such as (a) psychological, educational, and neurological tests; (b) data gathered during clinical interviewing; (c) information gathered from significant others (e.g., family members, teachers, friends); (d) direct and indirect observation; and (e) interactions with people via teletherapy (e.g., Internet, phone; Fisher, 2009).
Competence and Education
In order to excel in our professional duties and do well for those we serve, teach, study, and otherwise interact with, we must know what to do and how to do it in a capable manner. The ethics codes mentioned earlier identify the necessity of being knowledgeable and capable in our work. For example, the APA ethical standards provide guidance for organization members in this area, including information about (a) competence limitations, (b) keeping up competence, (c) making sound professional and scientific judgments, (d) delegating work responsibilities to others, (e) engaging in activities in emergencies, and (f) impairment (APA, 2002). Competence in professional behaviors is a personal matter that is frequently challenged. It is the responsibility of professionals to know their limitations and how their knowledge and skills change and require constant upgrading. The APA ethics code also emphasizes the importance of making sound work-related decisions based on scientific knowledge and appropriate discipline-specific practice. This portion of the APA code cautions professionals to be careful when delegating work to others, describes how a professional is responsible for others' work, and explains the necessity of avoiding multiple relationships with those to whom work is delegated. The APA standards note that we can occasionally be thrown into situations in which our competence is stretched; in such cases we need to be very careful, seek supervision if available, and end such work as soon as possible.
Measurement Referral Questions and Appropriateness of Instruments
When selecting assessment instruments, the professional must consider the referral questions that prompted this process (Fisher, 2009; Smith, 1976). The instruments selected should reflect these referral questions and utilize assessment strategies that have appropriate validity and reliability. For example, if a professional is interested in measuring state anxiety for research purposes, an appropriate assessment may be the Competitive State Anxiety Index-2 (CSAI-2; Martens, Burton, Vealey, Bump, & Smith, 1990) as opposed to the State-Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, & Luschene, 1970), which measures both trait and state anxiety. When selecting the assessment, the professional should be aware of limitations or biases regarding cultural sensitivity (see the later section on cultural issues); gender considerations (Etzel, Yura, & Perna, 1998); and age, language, or disability factors that may influence the psychometric qualities of the assessment differently from the way they influenced the normative groups used for the development and validation of the instrument (APA, 2002; Fisher, 2009). It is also important to consider the method of delivery. For example, assessments based on paper and pencil may not have been validated for online use (see the later section on technology), and instruments with elevated reading levels may not be appropriate for certain age or developmental groups. Therefore, the professional should always verify the assessment's validity and reliability when a modified assessment method or group is used (Fisher, 2009). Furthermore, the professional should also attempt to conduct in-person assessments when possible, as a great deal of information can be learned about clients from the way in which they present themselves during the assessment process. This information can affect the richness of the assessment data.
It is also important for professionals to be aware of and competent to assess and use appropriate psychometric strategies for establishing validity and reliability of the instruments they use (AERA, APA, & NCME, 1999). All instruments have unique psychometric properties that affect how they should be administered and interpreted. When validity and reliability issues are not taken into consideration, it is possible to choose and utilize instruments to assess factors that they were not designed to assess. Furthermore, practitioners should be well aware of other psychometric properties such as content and criterion validity and standard error of measurement that may affect how results are interpreted and used. The ethical practitioner needs to be aware of psychometric issues in order to choose appropriate instruments with regard to the referral questions, client characteristics, assessment strategies, and environmental factors.
Consent and Assent
As discussed earlier, the ethical principles for sport and exercise psychology emphasize doing no harm to the client and respecting the individual's rights and dignity (AASP, 1996; APA, 2002). The test taker's right to privacy and confidentiality applies here as well, and the professional should take all necessary precautions to maintain the confidentiality and privacy of the client. To protect the test taker, informed consent must be obtained at the start of the relationship (e.g., research, consultation, therapy). Beyond the informed consent process and before formal assessment, the client or participant should be informed of all pertinent information regarding the assessment process. This information includes (a) the nature and purpose of assessment; (b) any applicable fees; (c) potential involvement of third parties such as a coach, athletic trainer, or manager; (d) limits of privacy and confidentiality (as discussed in the next section); and (e) the timeline for the process and potential feedback (Fisher, 2009). This information should be presented in a clear and understandable manner. Furthermore, this information should be agreed to by the test taker, who thereby gives informed consent. Test takers should engage in assessment of their own free will and must be given the option to withdraw participation without consequences (APA standard 3.10). All necessary information about assessment procedures and findings should be provided in a language or level appropriate for the participant. Furthermore, it is unethical to necessitate or coerce individuals to take part in measurement and assessment for research or practice purposes.
Privacy and Confidentiality and Release of Information
Typically, the ethical standards of organizations with ties to sport psychology (APA ethical standard 4.01 and the AASP) suggest that professionals should not reveal information about clients, test takers, or others without their signed approval to release information or legal requirement. These legal situations may include (a) a test taker who indicates possible self-harm or harm to others (i.e., suicide or homicide), (b) a test taker whose results are subpoenaed by the court, or (c) a test taker who is a minor, in which case the parent or guardian may have access to the data (Etzel et al., 1998). If the test taker or, in the case of a minor, the parent or guardian provides explicit written permission, the specific information identified by the client may be released to the identified parties. Unless these circumstances are met, information from the test taker may not be disclosed to anyone (e.g., coaches, management, parents, administration, athletic trainers, and so on).
In situations where the assessment is requested by a third party (e.g., coaches, management, the court), this third party may also request results from the assessment. It is important for the professional to establish a priori who is the “real client” (Ogilvie, 1979) and to have the ability to control access to the results. Etzel and colleagues (1998) suggest that information about the assessment should be shared only with one predetermined person, unless a release of information form has been completed. Therefore, when engaging in assessments, the professional should set clear boundaries and avoid dual relationships, thereby identifying who is being served (APA standard 4.02a). Another complication of these situations is the role of trust. If athletes or test takers suspect the test results will be used without their permission in decisions regarding performance or other aspects of participation, they may be less likely to respond honestly, thus affecting the validity of the results (see the section on demand characteristics).
Raw Data and Data Storage
Raw data such as the test taker's responses to items, including the professional's notes and final reports, should be stored in locked file cabinets inside the professional's office or in password-protected computer files (Fisher, 2009). Other methods to ensure confidentiality may include limiting access to records to only those people who have a need to know this information and have been trained to handle and understand it, deidentifying records using code numbers, and appropriately disposing of identifiable records (Fisher, 2009). A good policy for data maintenance is that data should be kept for a minimum of 7 y after the last service delivery date or 3 y after a minor reaches the age of 18 (whichever is later), as is recommended by the APA record-keeping guidelines (APA, 2002; Fisher, 2009). Raw data and the instruments used for assessment purposes should not be released to third parties unless a release of information form has been completed and the third party is trained competently to use such information.
Results Discussion
Test feedback and results discussion should be provided in the form of a carefully constructed report using clear language that fully explains the assessment results. Labels and jargon should be eliminated to increase readability. Information necessary to the purpose of the test should be included, and the inclusion of unnecessary and unrelated information should be avoided (APA, 2002; Fisher, 2009). Additionally, as recommended by the APA (APA, 2002), interpretations should take into consideration the participant's gender, race, ethnicity, age, national origin, sexual orientation, religion, disability, language, or socioeconomic status. Participants should receive assessment information and feedback related to their performance on the assessment and should be informed of ways in which they could personally use the test results or how this information may be used by a third party (only if written permission was given to release such information). The information released to the participant should be presented in a verbal or written report and presented in such a way that it may not cause harm to the test taker (Etzel et al., 1998). However, information such as numerical scores or specific responses should not be released to individuals not qualified to interpret such information (Fisher, 2009; Tranel, 1995).
Demand Characteristics
In the sport context, several groups of individuals may be interested in the assessment results of athletes. Interested parties may include coaches, managers, teams, students, or administrators. However, the potential of a third party reviewing the test results may increase social desirability and result in invalid and unreliable information. Therefore, undue pressure to complete an instrument or battery should be considered as a contextual factor.
Another potentially undesirable effect of a third party viewing the test taker's results may be assessment anxiety. The APA standards state that if a test taker is observed to be anxious or reports feeling anxious, this feeling should be taken into account and become a limitation in the interpretation of test data (APA, 2002). Assessment anxiety may be exaggerated in situations where a third party may have access to results. These situations may also lead to faking good or faking bad on the part of respondents who are concerned about how the results may be used. This must also be considered when evaluating the results.
Supervision of Subordinates
In some cases, professionals may hire and train subordinates to help with assessment and measurement tasks. These subordinates may administer, score, and even interpret the results of measurement and assessment. Standard 2.05 of the APA ethics code (APA, 2002) states that professionals utilizing employees, supervisees, or research and teaching assistants for such purposes should take reasonable precautions to put subordinates in situations where (a) they do not face possibly harmful multiple relationships with the client that could affect their objectivity, (b) they are competently trained to perform the delegated task on their own or with supervision, or (c) they are supervised for competent service delivery. Therefore, when using subordinates to help with tasks such as administration, scoring, or interpretation, the professional assumes primary responsibility and liability to ensure that the services are being provided competently. The professional needs to ensure that subordinates are well trained with all potential instruments. To do so, the professional must provide appropriate training, experience, and supervision as well as continue to check the subordinates' work to ensure its quality. As with licensed professionals, not all subordinates have the same competencies with regard to all instruments.
Learn more about Measurement in Sport and Exercise Psychology.
Tools to measure the physical self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002).
Tools to Measure the Physical Self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002). However, following the research of Shavelson and colleagues (1976), a number of multidimensional self-concept instruments containing one or more PSC scales were developed. Thus, in a 1974 review, Wylie concluded that at the time most self-concept instruments focused on global self-concept or self-esteem rather than specific domains such as PSC. Although several of the instruments reviewed by Shavelson and colleagues (1976) contained items relating to physical skills and elements of physical appearance, none provided a clearly interpretable measure of PSC. From a practical perspective, these older instruments appear to be of little value for sport and exercise psychologists. The major exception, perhaps, is the Physical Estimation and Attraction Scale (PEAS; Sonstroem, 1978, 1997), along with the theoretical model on which it is based. This instrument was designed to measure two global components: estimation (competency) and attraction. While the PEAS may not be the instrument of choice today, it has a historical significance in that its research incorporated many of the features of the construct validity approach advocated in this chapter, it was heuristic, and it provided an important basis for subsequent research.
In a subsequent 1989 review, Wylie identified several multidimensional self-concept instruments measuring one or more components of PSC that can be differentiated from other specific domains of self-concept and general self-concept. Included in the list were the three SDQ instruments already discussed. Wylie also evaluated Harter's (1985) Self-Perception Profile for Children, which contains two PSC scales (athletic competence and physical appearance). Other multidimensional instruments containing physical scales that were not reviewed by Wylie include the Self-Rating Scale (Fleming & Courtney, 1984), which measures physical ability and physical appearance; the Song and Hattie Test (Hattie, 1992), which measures physical appearance; and the Multidimensional Self-Concept Scale (Bracken, 1996), which has a physical scale that includes physical competence, physical appearance, physical fitness, and health. The Tennessee Self-Concept Scale (Fitts, 1965) is a multidimensional self-concept instrument that also purports to measure PSC. In their review and empirical evaluation of this instrument, Marsh and Richards (1988) found distinguishable physical components reflecting health, neat appearance, physical attractiveness, and physical fitness that were incorporated into a single PSC score. This detailed breakdown of the Tennessee physical scale was supported by relationships with the SDQ physical ability and physical appearance scales in an MTMM study comparing responses to the two instruments. Because each of the clusters based on responses to the Tennessee instrument is represented by only a few items, it is not appropriate to use the instrument to measure these distinct components of PSC. Marsh and Richards argued that PSC measures that combine and confound a wide range of differentiable physical components—such as those based on the Tennessee Self-Concept Scale—should be interpreted cautiously (see similar comments by Fox & Corbin, 1989).
In summary, although multidimensional self-concept instruments based on Shavelson and colleagues' (1976) model provided good support for the construct validity of the physical ability and appearance scales (e.g., Marsh, 2002; Marsh & Peart, 1988), they left unanswered the question of whether PSC is more differentiated than can be explained in terms of one (physical ability) or two (ability, appearance) physical scales. Subsequent PSC instruments were developed specifically to address the issue of the multidimensionality of PSC.
Physical Self-Perception Profile
The Physical Self-Perception Profile (PSPP; Fox, 1990; Fox & Corbin, 1989) is a 30-item inventory that consists of four specific scales and one general physical self-worth factor. The PSPP was developed to document the physical self-perceptions of college students. It was designed to reflect the advances made by Harter (1985) and Shavelson and colleagues (1976) in identifying the physical self as an important construct to measure in its own right and to reflect the hierarchical, multidimensional nature of the physical self. A qualitative approach was used to reveal dimensions of physical self-esteem salient to the population sampled (Fox & Corbin, 1989). The PSPP consists of five 6-item scales of sport (perceived sport competence), body (perceived bodily attractiveness), strength (perceived physical strength and muscular development), condition (perceived level of physical conditioning and exercise), and physical self-worth. Fox (1990) recommended that the 10-item Rosenberg Self-Esteem Scale (Rosenberg, 1965) be used alongside the PSPP to provide a global measure. Fox (1990) reported factor analyses indicating that each item loads most highly on the factor that it is designed to measure and that individual scale reliabilities are in the .80s.
The PSPP research demonstrates (a) good reliability (coefficient alpha of .80-.95; Fox, 1990; Page, Ashford, Fox, & Biddle, 1993; Sonstroem, Speliotis, & Fava, 1992); (b) good test-retest stability over the short term (rs of .74-.89; Fox, 1990); (c) a well-defined, replicable factor structure as shown by CFA (Fox & Corbin, 1989; Sonstroem, Harlow, & Josephs, 1994); (d) convergent and discriminant validity in studies showing PSPP relationships with external criteria such as exercise behaviors, mental adjustment variables, and health complaints (Fox & Corbin, 1989; Sonstroem & Potts, 1996); and (e) applicability for an older adult population (Sonstroem et al., 1994). However, correlations among the PSPP scales are consistently so high (.65-.89 when disattenuated for measurement error; Marsh, Richards, Johnson, Roche, & Tremayne, 1994) that they detract from the instrument's ability to differentiate among the different PSC factors it purports to measure.
Subsequently, a version of the PSPP for children and adolescents was developed and validated—the Children and Youth Physical Self-Perception Profile (CY-PSPP; Eklund, Whitehead, & Welk, 1997; Whitehead, 1995). Like the PSPP, the CY-PSPP is a 30-item inventory consisting of the same five 6-item scales. The CY-PSPP is a substantially revised version of the PSPP that is most appropriately thought of as a different instrument. The CY-PSPP body, strength, and conditioning subscales are based on minor adaptations of the PSPP to make them more suitable for children. However, the global self-worth (self-esteem) and sport scales are completely different. The PSPP did not have a self-esteem scale of its own but included 6 items adapted from the Rosenberg Self-Esteem Scale. On the CY-PSPP, global self-esteem and sport scales from the PSPP were dropped and replaced with corresponding scales from Harter's (1985) Self-Perception Profile for Children. Correlations among factors remained high (e.g., physical self-worth with attractive body adequacy = .8). Eklund and colleagues (1997) suggested that these results are consistent with the developmental patterns among children, as differentiation in self-concept is less defined at younger ages (Harter, 1985). CFAs have supported the instrument's factor structure, with both the CFI (comparative fit index) and NNFI (non-normed fit index) indexes exceeding the .90 criterion for good model fit (Eklund et al., 1997). Moderate correlations (r = .39-.45) with external criteria such as physical activity and physical fitness have demonstrated its convergent and discriminant validity (Welk & Eklund, 2005). The CY-PSPP has been validated with adolescents (Jones, Polman, & Peters, 2009; Welk, Corbin, & Lewis, 1995; Whitehead, 1995) and younger children (Welk, Corbin, Dowell, & Harris, 1997) and has been validated and translated into other languages (åsci, Eklund, Whitehead, Kirazci, & Koca, 2005; Raustorp, Ståhle, Gudasic, Kinnunen, & Mattsson, 2005; Raustorp, Mattsson, Svensson, & Ståhle, 2006).
Both the PSPP and CY-PSPP use a nonstandard response format based on Harter (1985), in which each item consists of a matched pair of statements, one negative and one positive (e.g., “Some people feel that they are not very good when it comes to sports” but “Others feel that they are really good at just about every sport”). Each item consists of two contrasting descriptions, and respondents are asked which description is most like them and whether the description they select is “Sort of true of me” or “Really true of me.” Responses are scored on a scale of 1 to 4, with 1 representing a “Really true of me” response to the negative statement and 4 representing a “Really true of me” response to the positive statement. Whereas this response format is designed to reduce the influence of social desirability, Wylie's (1989) review of Harter's original instruments provided little or no support for this suggestion, and Marsh and colleagues (1994) suggested that there were substantial method effects associated with the nonstandard response scale. This format has also been shown to be confusing, particularly for children (Eiser, Eiser, & Haversmans, 1995), and even for adults (Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994), unless special care is taken to explain the response scale. Using the suggestion of Marsh and colleagues (1994) that confusion over the structured alternative response scale could be overcome by more detailed instructions at the outset, researchers implementing the CY-PSPP used large illustrations for a sample item (Whitehead, 1995). Wichstrom (1995) found that responses for this format were psychometrically stronger when based on typical Likert responses rather than the structured alternative format, but Welk and colleagues (1997) suggested that the nonstandard response scale on the CY-PSPP worked better than Likert responses worked.
In summary, the PSPP and the CY-PSPP are established instruments that have been translated into several languages and have been used with a range of populations. However, the format and the high correlations among factors in both instruments may limit their usefulness in some settings. The CY-PSPP is a substantially revised version of the PSPP specifically developed for children. Although the CY-PSPP should be used instead of the PSPP for child and adolescent samples, it might even be stronger than the original PSPP is for adult samples.
Subsequent to the completion of this chapter, Lindwall and colleagues (2011) published a revised version of the PSPP (PSPP-R). They reviewed critiques of the PSPP response scale such as those noted here (e.g., Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994) and acknowledged that “the idiosyncratic alternative response format has been difficult to understand for some participants” (pp. 310-311). In recognition of these problems, the idiosyncratic response scale that has been such a salient feature of the PSPP was dropped altogether and replaced with a 4-point Likert response using only positively worded items. Lindwall and colleagues (2011) demonstrated the appropriateness of the revised PSPP-R based on a large sample (N = 1,831) of participants from four countries (Sweden, Great Britain, Portugal, and Turkey). However, they did not indicate whether the PSPP-R supersedes the PSPP or is merely an alternative to it. There also wasn't any discussion of the implications for other instruments using similar idiosyncratic response scales (e.g., PSPP- related instruments such as CY-PSPP or Harter's instruments more generally).
Physical Self-Inventory
The Physical Self-Inventory (PSI) is a French adaptation of the PSPP that was originally developed for use with Francophone adults (Ninot, Delignières, & Fortes, 2000). In two preliminary studies, Ninot and colleagues used the nonstandard response scale from the PSPP. However, consistent with previous research (Marsh et al., 1994), they reported that this response scale was problematic. In a third study, the authors used a 6-point Likert response scale; factor analysis results were reasonable, but reliability coefficients were not completely satisfactory. Next the authors replaced the PSPP global physical items with items from the SDQ physical scale and the PSPP global self-esteem items with items from Coopersmith (1967). The final PSI consists of 25 items measuring six PSC factors (four specific and two global, as with the PSPP) and has satisfactory psychometric properties that have been confirmed in subsequent French studies of adults (Masse, Jung, & Pfister, 2001; Stephan, Bilard, Ninot, & Delignières, 2003; Stephan & Maïano, 2007).
Maïano and coworkers (2008) subsequently constructed a short form of the PSI for use with adolescents. They found that not all items from the adult PSI worked with adolescents, but they were able to construct 18-item (PSI-SF, 3 items per scale) and 12-item (PSI-VSF, 2 items per scale) versions that had good psychometric properties. In particular, the measurement and hierarchical structures were consistent with proposals by Fox and Corbin (1989) and were fully invariant across gender. Maïano and coworkers also noted that PSI-SF responses showed very high test-retest stability. Comparison of the PSI-SF and PSI-VSF demonstrated that the measurement model, mean structure, structural parameters, and criterion-related validity were equivalent across samples and versions. Nevertheless, the authors noted a serious limitation that all versions of the PSI share with the PSPP: Very high correlations among the six PSC factors (correlations among latent factors) that, according to the authors, bring “into question the real independence of some of the models' sub-dimensions, and by extension their discriminant validity, a finding that has already been observed by Marsh (2002; Marsh et al., 2006) on analyses of the PSPP” (Maïano et al. 2008, p. 844). However, Maïano and colleagues also noted that because they used a traditional Likert response scale, the high correlations apparently were not due to the structured alternative format used in the PSPP. In summary, particularly the short and very short forms of the PSI have made a potentially important contribution to applied research. However, further research is needed to evaluate more fully the robustness of support for construct validity and application in non-French-speaking settings.
Richards Physical Self-Concept Scale
The Richards Physical Self-Concept Scale (RPSCS; Marsh et al., 1994; Richards, 1988) is a 35-item instrument designed to measure six specific components of PSC (body build, appearance, health, physical competence, strength, action) and one general physical satisfaction factor. Each item is a simple declarative statement, and subjects respond on an 8-point true-false scale. Extensive research in Australia (e.g., Marsh et al., 1994; Richards, 1988) has indicated that RPSCS responses have good psychometric properties. The factor structure is very robust, generalizing well over ages from 8 to 80 y and over gender.
RPSCS research has demonstrated (a) good reliability (coefficient alpha of .79-.93; Marsh et al., 1994; Richards & Marsh, 2005); (b) good test-retest stability over the short term (coefficient alpha of .77-.90 over 3 wk; Richards, 1988); (c) a well-defined, replicable factor structure as shown by CFA (Marsh et al., 1994; Richards, 2004); (d) a factor structure that is invariant across gender, as shown by multiple-group CFA (Richards, 2004), and across a wide age range; (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (Marsh et al., 1994; Richards & Marsh, 2005); and (f) applicability for participants aged 8 to 60 y and for both genders (Marsh et al., 1994; Richards, 1988, 2004; Richards & Marsh, 2005). In summary, the RPSCS is regarded as a valid, reliable, and structurally sound instrument that has been tested across both genders and a wide population of ages. The applicability across such a wide range of ages is a particular strength.
Physical Self-Description Questionnaire
Extending Fleishman's (1964) classic research on the structure of physical fitness, the Physical Self-Description Questionnaire (PSDQ) scales reflect some of the original SDQ scales and parallel physical fitness components identified in a CFA of physical fitness measures (Marsh, 1993). The PSDQ consists of nine specific components of PSC (strength, body fat, activity, endurance and fitness, sport competence, coordination, health, appearance, and flexibility), a global physical scale, and a global self-esteem scale. Each of the 70 PSDQ items is a simple declarative statement, and individuals respond on a 6-point true-false scale. The PSDQ is designed for adolescents but is also appropriate for older participants.
PSDQ research has demonstrated (a) good reliability (median coefficient alpha of .92) across the 11 scales (Marsh, 1996b; Marsh et al., 1994); (b) good test-retest stability over the short term (median r = .83 over 3 mo) and longer term (median r = .69 over 14 mo; Marsh, 1996b); (c) a well-defined, replicable factor structure as shown by CFA (Marsh, 1996b; Marsh et al., 1994); (d) a factor structure that is invariant over gender as shown by multiple-group CFA (Marsh et al., 1994); (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (see Marsh et al., 1994); (f) convergent and discriminant validity as shown by PSDQ relationships with external criteria (e.g., measures of body composition, physical activity, endurance, strength, and flexibility; see Marsh, 1996a, 1997); and (g) applicability for participants aged 12 to 18 y (or older) and for elite athletes and nonathletes (Marsh, Hey, Roche, & Perry, 1997; Marsh, Perry, Horsely, & Roche, 1995). In summary, the PSDQ is a psychometrically strong instrument.
Marsh, Martin, and Jackson (2010) recently presented a new short form of the PSDQ (PSDQ-S). This short form balances brevity and psychometric quality in relation to established guidelines for evaluating short forms (e.g., Marsh, Ellis, Parada, Richards, & Heubeck, 2005; Smith, McCarthy, & Anderson, 2000) with the construct validity approach that is the basis of PSDQ research. Based on the PSDQ normative archive, 40 of 70 items were selected and evaluated in a new cross-validation sample (N = 708 Australian adolescents). To test the generalizability of results, the authors considered four additional samples: Australian adolescent elite athletes (n = 349), Spanish adolescents (n = 986), Israeli university students (N = 395), and Australian senior citizens (n = 760). Reliabilities for the 40 PSDQ-S items were consistently high in the cross-validation sample (.81-.94; median = .89) and senior sample (.81-.94; median = .91) and reliabilities in the cross-validation sample were higher than they were in comparable groups completing the 70-item PSDQ. The PSDQ-S factor structure in the cross-validation sample was well defined and highly similar to that based on the archive sample as well as to those based on the other four groups. Study 1, using a missing-by-design variation of multigroup invariance tests, showed that invariant factor structures were invariant based on 40 PSDQ-S items and 70 PSDQ items. Study 2 demonstrated factorial invariance of responses over 1 y (test-retest correlations of .57-.90; median = .77) and good support for convergent and discriminant validity in relation to time. Study 3 showed good and nearly identical support for convergent and discriminant validity of PSDQ and PSDQ-S responses in relation to responses on the PSPP and PSC instruments. The four studies reported by Marsh and coworkers demonstrated new, evolving strategies for the construction and evaluation of short forms that support the PSDQ-S. The authors concluded that the strong support for the psychometric properties and construct validity of the widely used PSDQ instrument generalizes very well to the PSDQ-S.
Elite Athlete Self-Description Questionnaire
The PSC instruments discussed thus far may be suitable for elite athletes (e.g., Marsh et al., 1995). There may, however, be other components to PSC that are particularly relevant for elite athletes, and thus the Elite Athlete Self-Description Questionnaire (EASDQ; Marsh, Hey, Roche, et al., 1997; Marsh, Hey, Johnson, & Perry, 1997) was developed to address these other components. For the EASDQ, it was hypothesized that overall performance by elite athletes is a function of skill level, body suitability, aerobic and anaerobic fitness, and mental competence. Thus Marsh and colleagues developed the EASDQ to measure these six factors. For each scale, they developed a pool of items that sport psychologists at the Australian Institute of Sport evaluated for their suitability for elite athletes. Pilot studies were conducted to select the best items to represent each factor. A compromise between brevity and psychometric soundness was achieved, with acceptable levels of reliability (e.g., all scales having reliability estimates of at least .8) based on short scales (4-6 items per scale).
EASDQ research demonstrates (a) adequate reliability (median coefficient alpha of .85) across the six scales (Marsh, Hey, Johnson, et. al., 1997); (b) a well-defined, replicable factor structure as shown by CFA (Marsh, Hey, Johnson, et. al., 1997; Marsh, Hey, Roche, et al., 1997); (c) applicability for elite athletes aged 12 y or older (Marsh, Hey, Roche, et al., 1997); and (d) predictive validity as shown by its ability to predict swimming performances in world championships after controlling for previous personal best performances (Marsh & Perry, 2005). In summary, the EASDQ is a reliable and valid instrument for elite athletes of all ages. More research is needed, however, to relate EASDQ responses to external validity criteria such as those used in PSDQ research and to criteria that are more specific to elite athletes (e.g., actual performance in competition).
Learn more about Measurement in Sport and Exercise Psychology.
Assess intrinsic and extrinsic motivation in sport and exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted.
Evaluation of Measures of Intrinsic and Extrinsic Motivation in Sport and Exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted. Certain criteria have guided the selection of the measures presented in this section. First, we have selected measures that are fully developed instruments that have gone through extensive validation steps. Second, we have chosen scales that have been used in research, published or unpublished, during the past 10 years. Scales that have not been used during that time frame are considered to be obsolete and are not reviewed. Finally, in light of recent theoretical development and because of space limitation, we have focused on motivation scales that assess intrinsic and extrinsic motivation independently of determinants and outcomes, while focusing on the perceived reasons of behavior. Our earlier discussion on the definitions of intrinsic and extrinsic motivation makes it possible to classify the different measures. The measures can vary in terms of the level of generality (situational versus contextual level) and the area (sport versus exercise). This classification appears in table 25.1. Table 25.2 (see p. 291) provides additional information on the concept of, dimensions of, publication source of, and where to obtain the scale. As can be seen, seven measures are reviewed. For each one, we present (a) a description of the instrument, (b) the conceptual and theoretical rationale underlying its scale development, (c) the available evidence concerning its psychometric properties (e.g., factorial validity, reliability, and construct validity), and (d) a broad assessment of the strengths and weaknesses associated with each measure.
Measures Used in Sport
In this section, we review the SMS (Brière et al., 1995; Pelletier et al., 1995), the Sport Motivation Scale-6 (SMS-6; Mallett, Kawabata, Newcombe, Otero-Rorero, & Jackson, 2007), the Behavioral Regulation in Sport Questionnaire (BRSQ; Lonsdale, Hodge, & Rose, 2008), the Pictorial Motivation Scale (PMS; Reid, Vallerand, Poulin, Crocker, & Farrell, 2009), and the SIMS (Guay et al., 2000).
Sport Motivation Scale
The SMS was developed (Brière et al., 1995; Pelletier et al., 1995) in order to assess contextual intrinsic and extrinsic motivation from a multidimensional perspective, as well as amotivation. The SMS has been the most often used motivation measure in sport, being employed with a variety of athletes (recreational to elite), age groups (adolescent to senior), and cultures (e.g., Canada, United States, United Kingdom, Bulgaria, Australia, Spain, and New Zealand). In fact, the SMS has been translated and validated in several languages (see Pelletier & Sarrazin, 2007). The SMS is based on SDT (Deci & Ryan, 1985) and is made up of seven subscales assessing amotivation; external, introjected, and identified regulation; and intrinsic motivation to know, to experience stimulation, and to accomplish. In line with SDT, motivation is assessed as the perceived reasons for participation, or the why of behavior. At the beginning of the scale, participants are asked, “In general, why do you practice your sport?” The items represent the perceived reasons for engaging in the activity, thus reflecting the different types of motivation.
The original scale was developed in French as L'Échelle de Motivation dans les Sports (Brière, Vallerand, Blais, & Pelletier, 1995) and was validated in three steps. The first step involved generating a pool of items explaining various reasons for sport participation through interviews with French Canadian athletes (aged 17-20 y). These reasons were then used to formulate items for the seven subscales of the French SMS. In the second step, a committee of experts evaluated the content validity of the items and eliminated those that were thought to be inadequate. Another sample of athletes from various sports completed the scale. Results from an exploratory factor analysis (EFA) provided support for a seven-factor structure with 4 items per subscale; this second step thus resulted in a 28-item scale. In the third and final step, two additional studies were conducted to further validate the scale. These studies included approximately 500 individuals, most of whom were involved in recreational sports. Results from confirmatory factor analyses (CFA) and correlational analyses confirmed the seven-factor structure, the subscale internal consistency (ranging from .65-.96), and moderate to high indexes of temporal stability (ranging from .54-.82) over 1 month. Furthermore, inspection of correlations among the seven SMS subscales provided support for the simplex pattern proposed by SDT. Results of correlations also showed that (in line with SDT) the most self-determined forms of motivation (intrinsic motivation and identified regulation) were related more strongly to determinants such as autonomy support from coaches and feelings of competence than to other forms of motivation (external and introjected regulation) and amotivation. Similar results were obtained with motivational outcomes such as positive affect, concentration, and intentions to pursue engagement in sport. In sum, adequate construct validity was obtained for the French form of the SMS.
The translation of the French SMS into English involved back-translation and committee procedures as suggested by Vallerand (1989). Pelletier and colleagues (1995) conducted two studies involving college athletes from various sports in order to assess the psychometric properties of the English form of the SMS. Results from CFA with a sample of 593 Canadian university athletes revealed adequate fit indices or the hypothesized seven-factor model (see the Adjusted Goodness of Fit Index and the Normed Fit Index both > .90 and the Root Mean Square Residual < .08), and correlations with determinants and outcomes supported the simplex model. Moreover, internal consistency above .70 was obtained on all of the subscales except the identified subscale (.63). Test-retest correlations were acceptable and very similar to those obtained with the French SMS, as was the scale construct validity.
Since 1995, the SMS has been used extensively in sport psychology research. The seven-factor structure has been supported repeatedly (e.g., Doganis, 2000; Gillet, Vallerand, & Rosnet, 2009; Li & Harmer, 1996; Shaw, Ostrow, & Beckstead, 2005; Standage, Duda, & Ntoumanis, 2003). In addition, Hu and Bentler (1999) obtained support for a five-factor model by combining the three types of intrinsic motivation into one factor. Similar results were obtained by Gillet and colleagues (2009) with the French SMS. However, some studies have not supported the seven-factor model (Hodge, Allen, & Smellie, 2008; Mallett, Kawabata, & Newcombe, 2007; Mallett, Kawabata, Newcombe, & Otero-Rorero, 2007; Martens & Webber, 2002). Why is there such a discrepancy between these two sets of studies? One possibility lies in the populations from which the different samples were taken. Specifically, the SMS was validated using adolescent and young adult athletes and not older athletes. Because of this specific focus, some of the items may reflect a participation rather than an elite orientation, which is more in line with the younger population. For instance, an identified regulation item reads, “Because sport is one of the best ways to maintain good relationships with my friends.” Such an item seems more relevant for a younger population. An older, high-level athlete may disagree with this item but still display a high level of identified regulation for a sport (but not for relationship reasons). Future research using the SMS with different age groups and proficiency levels is needed to clarify this issue.
Whereas the internal consistency of the SMS has systematically shown adequate values, some values below .70 have been found. This is especially the case for the identified regulation subscale (Brière et al., 1995; Kingston, Horrocks, & Hanton, 2006; Li & Harmer, 1996; Pelletier et al., 1995), although some lower values (below .70) have been obtained with the introjected (McNeill & Wang, 2005; Perreault & Vallerand, 2007; Riemer, Fink, & Fitzgerald, 2002; Standage, Duda, & Ntoumanis, 2003) and external regulation (Standage, Duda, & Ntoumanis, 2003) and amotivation subscales (Standage, Duda, & Ntoumanis, 2003). However, very few instances of values below .60 have been obtained. It should be noted that a Cronbach alpha of .60 with only 4 items is acceptable because, as noted by Cronbach (1951), the coefficient alpha underestimates the internal consistency of scales with a low number of items. This is because the coefficient alpha includes the number of items in the formula. For instance, given the same average interitem correlation, a 3-item scale coefficient alpha value of .56 is equivalent to an alpha value of .81 on an 8-item scale!
In line with the original work of Ryan and Connell (1989) and the initial SMS validation procedures (Brière et al., 1995; Pelletier et al., 1995), construct validity has been assessed by other authors in two fashions: (1) with the simplex pattern of correlations among the subscales and (2) with correlations between motivational factors and their determinants and consequences. We do not have space to review all studies. However, overall, there is overwhelming support for the construct validity of the SMS both in French and English. For instance, in addition to finding support for the simplex pattern, Pelletier and Sarrazin (2007) concluded in their review of the evidence that the SMS has been used with success to predict a great variety of specific outcomes and consequences (such as burnout, exercise dependence among endurance athletes, fear of failing, adaptive coping skills, perceptions of constraints, flow, vitality and well-being, sporting behavior orientations, aggression, and performance) in a manner that is consistent with SDT. These findings provide strong support for the construct validity of the SMS.
In sum, the SMS has some positive features. First, it is a multidimensional instrument that assesses different types of intrinsic and extrinsic motivation as well as amotivation. Second, the scale focuses on the why of behavior and thus items are not confounded with determinants and consequences. Finally, it has some excellent psychometric properties. Nevertheless, some limitations should be underscored. First, although internal consistency levels have been acceptable overall, some subscales, especially the identified regulation subscale, have yielded relatively low coefficient alphas at times. Second, the SMS does not assess integrated regulation. Third, the seven-factor structure has not always been supported by CFAs. According to Pelletier, Vallerand, and Sarrazin (2007), this may be explained by a host of factors, including differences in sample sizes, variations in the way the instrument is administrated, or some other characteristics specific to the context of the study. However, as already indicated, it is also possible that the SMS is better suited for a younger, nonelite athlete population. Clearly, future research on this issue is in order.
Sport Motivation Scale-6
Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed another version of the SMS, the SMS-6. This scale has the same underlying rationale that the original SMS scale but was designed to improve the original version of the SMS by including an integrated regulation subscale and attempting to solve some of the inconsistencies with the factor structure and some of the relatively low internal consistency values (below .70). The SMS-6 comprises 24 items, 4 for each of the six subscales, which include amotivation; external, introjected, identified, and integrated regulation; and general intrinsic motivation. Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed 5 items for the integrated regulation subscale as well as 7 other items (4 of which were kept in the final scale) to replace some items in the original SMS. Two samples were used to validate the SMS-6. Sample 1 was composed of 501 first-year university students participating in competitive sport at least twice per week and 113 elite athletes representing Australia at the international level (for a total of 614 participants). Sample 1 was used to derive a factor structure that included the SMS items as well as the reformulated and integrated regulation items. Sample 2 was composed of 557 university students who were engaged in a variety of sports or physical activities twice per week. The second sample was used to confirm the structure of the SMS-6. Participants also completed the Dispositional Flow Scale (DFS).
Results of a CFA with the SMS-6 (with sample 2) provided support for the factor structure as well as for the internal consistency values (all above .70). Concerning the construct validity of the SMS-6, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) reported a rather weak simplex pattern of correlations among the subscales. More specifically, external regulation correlated highly with intrinsic motivation (r = .54), while the correlation between identified regulation and intrinsic motivation was very high (r = .91) and was higher than the one between integrated regulation and intrinsic motivation (r = .75). The construct validity of the SMS-6 was not fully supported, as some of the correlations involving the SMS and flow were not as expected by SDT. For instance, the distinctions among integrated regulation, identified regulation, and intrinsic motivation were not always clear. Furthermore, external regulation revealed some positive and sometimes strong correlations with flow, contrary to hypotheses derived from SDT.
In sum, the SMS-6 contains some nice features. First, it contains an integrated regulation subscale. Furthermore, the addition of 4 new items may make the SMS more acceptable for older and more experienced athletes. Second, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) presented results supporting the validity of a variation of the SMS-6, the SMS-8. The SMS-8 contains the same items that the SMS-6 contains but assesses the three types of intrinsic motivation rather than general intrinsic motivation. The SMS-6 also shows some limitations. First, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) proposed 7 new items to replace those that were presumably problematic in the original SMS. However, only 4 of these items made it to the final version. Thus, it appears that the SMS-6 retained much of the original SMS. Second, even some of the new items appear problematic and may not assess the desired construct (see Pelletier et al., 2007). For instance, a new amotivation item (“I don't seem to be enjoying my sport as much as I previously did”) seems to reflect a decrease in intrinsic motivation rather than amotivation. Finally, results from Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) demonstrated that the integrated regulation subscale may lack discriminant validity, leading to results with flow highly similar to identified regulation and intrinsic motivation.
Behavioral Regulation in Sport Questionnaire
Lonsdale and colleagues (2008) developed the BRSQ to create an alternative measure of elite sport motivation as conceptualized by SDT. However, in contrast to Mallett, Kawabata, Newcombe, and Otero-Rorero (2007), these authors used a complete new pool of items developed by SDT experts and competitive athletes. There are two versions of the BRSQ. The BRSQ-8 contains 32 items assessing integrated, identified, introjected, and external regulation; amotivation; and the three forms of intrinsic motivation (knowledge, experience stimulation, and accomplishment) identified by Vallerand (1997). The BRSQ-6 contains the same items but assesses general intrinsic motivation rather than all three types of intrinsic motivation, for a total of 24 items.
Lonsdale and colleagues (2008) conducted a series of three studies to validate the scale. In the first study, the factorial validity and the internal consistency were assessed with 382 New Zealand elite athletes. Results from a CFA on the 32 items supported the factor structure of the BRSQ. Specifically, fit indexes were acceptable and all items loaded significantly on the appropriate factors (they ranged from .58-.91). Finally, internal consistency of the eight subscales, measured with the Cronbach alpha, showed high values ranging from .71 to 91. Additionally, 1 wk test-retest reliability was tested with 34 competitive adult athletes. Coefficient alphas for all subscales supported the temporal reliability (values ranged from .73-.90).
In a second study with 343 athletes from New Zealand, the results of a CFA on the BRSQ-8 supported once more the factor structure as well as the subscale internal consistency. Lonsdale and colleagues (2008) also showed that the factor structure of the BRSQ-6 model fit the data very well and that subscale coefficient alphas all exceeded .78. Moreover, the construct validity of the BRSQ-6 was assessed by testing for a simplex pattern of correlations among the six subscales. While some relationships were in line with predictions (e.g., amotivation was negatively related to intrinsic motivation), there was a lack of discrimination between some subscales. More specifically, there was no difference between external and introjected regulation scores in terms of their relationships with amotivation. A similar pattern was evident with the identified and integrated regulation subscales, which both had similar high correlations with intrinsic motivation. These results with the simplex pattern were replicated in a third study conducted with nonelite athletes. In this third study, Lonsdale and colleagues also assessed the relationships between the BRSQ-6 and indexes of burnout (Lemyre, Treasure, & Roberts, 2006; Raedeke & Smith, 2001) and flow (Jackson & Eklund, 2002). Overall, results supported hypotheses in line with SDT. Specifically, amotivation and external and introjected regulation showed negative correlations with flow and positive correlations with burnout. The opposite pattern of correlations was found for the self-determined subscales (intrinsic motivation and identified and integrated regulation). However, there was a lack of discrimination between integrated regulation and general intrinsic motivation. Results of another study on burnout (Lonsdale, Hodge, & Rose, 2009) replicated these findings. Thus, overall, the support for the construct validity of the BRSQ-6 appears to be mixed.
It should be underscored that the BRSQ has some nice features. First, the scale is designed in such a way that the researcher can decide to use a multidimensional (BRSQ-8) or unitary (BRSQ-6) conceptualization of intrinsic motivation. Second, the scale is rather short, with 4 items per subscale. Finally, it assesses integrated regulation. At the same time, the BRSQ also displays some limitations. First, additional research is needed on the construct validity of the scale. Whereas there is support distinguishing the self-determined subscales (intrinsic motivation and identified and integrated regulation) from the non-self-determined subscales (external and introjected regulation), the finer discrimination within each type of category appears to be lacking. Such evidence is crucial, and future research is needed in order to show that this scale does indeed assess the SDT constructs rather than two broad sets of subscales tapping self-determined versus non-self-determined motivation. Second, this scale is designed specifically for older participants in competitive sport; it remains to be seen if the BRSQ can be used with younger participants, for whom the integrated regulation subscale may not have full meaning. Finally, research is needed to test the temporal stability of the scale over a time framed longer than 1 week.
Pictorial Motivation Scale
The PMS was designed to measure intrinsic and extrinsic motivation for sport and exercise in people with an intellectual disability. It assesses participants' reasons for engaging in sport and exercise. The scale's main characteristics are drawings depicting each of the 20 items. There are 5 items (pictures) for each of four subscales: intrinsic motivation, self-determined extrinsic motivation (a mixture of integrated and identified regulation), non-self-determined extrinsic motivation (a mixture of introjected and external regulation), and amotivation. These pictures are used to help participants with cognitive difficulties and to help represent the motivational concept depicted in each item.
The original scale was developed in French (Reid, Poulin, & Vallerand, 1994). Results of a study with 62 participants supported the internal consistency, temporal stability, and construct validity, as exemplified by the presence of a simplex pattern among the four subscales. However, the amotivation subscale had poor reliability (α = .52). The French version (Reid et al., 2009) was translated into English according to the back-translation and committee procedures outlined in Vallerand (1989). Then, 6 new items were generated for the less reliable amotivation subscale. Participants in the Special Olympics (n = 160) completed the English version. Results of the CFA confirmed the four-factor structure of the PMS. Furthermore, the internal consistency (Cronbach alphas) ranged from .60 to .71. Finally, the construct validity was assessed by testing for a simplex pattern of correlations among the four subscales. The intercorrelations among latent variables from the CFA provided support for the simplex pattern.
Results from a study conducted with the English version of the PMS involving 80 high school students with mild intellectual disability provided support for the internal consistency, temporal stability (over 3 wk), and construct validity of the PMS with respect to the simplex pattern of correlations among the PMS subscales as well as correlations between the PMS subscales and motivational antecedents (skill and perceived competence) and outcomes (perceived effort) as rated by the physical education teacher. Finally, the internal consistency of each subscale was tested without the pictorial dimension with a subset of 47 high school students with mild intellectual disability. Results indicated poor internal consistency (.91 for intrinsic motivation, .27 for self-determined extrinsic motivation, .20 for non-self-determined extrinsic motivation, and .60 for amotivation). This finding suggests that the scale is not reliable without the drawings.
The preliminary findings with the English version of the PMS are encouraging. Furthermore, this scale is the only one geared for individuals with intellectually disability. The use of drawings to depict the various items makes this scale unique in the field. Nevertheless, the PMS shows some limitations. First, the scale does not differentiate among all forms of intrinsic (knowledge, stimulation, and accomplishment) or extrinsic ( integrated, identified, introjected, and external regulation) motivation. Second, construct validity was tested with only a limited number of variables. Third, it is not known if the scale is usable with children who have severe forms of intellectual disabilities. Clearly additional research is needed on the reliability and validity of the PMS.
Situational Motivation Scale
The SIMS is one of the few scales to assess intrinsic and extrinsic motivation and amotivation at the situational level (Guay et al., 2000). The SIMS is a multidimensional tool that measures four types of motivation: intrinsic motivation, identified regulation, external regulation, and amotivation. The SIMS is made up of 16 items (4 items per subscale) and asks this question: “Why are you currently engaged in this activity?” The items represent potential reasons for task engagement. The scale is worded in such a way that it can be used in most situations (sport and nonsport).
Five studies were reported in the original article. In study 1, the original scale was developed by a committee of experts and completed by 195 French Canadian college students. Results of an EFA revealed a four-factor structure with the final 16 items loading on their respective factor. In study 2, a CFA confirmed the factor structure as well as its invariance across gender. Across the five studies, the internal consistency values of the subscales were acceptable, ranging from .62 to .95 (see Guay et al., 2000). Moreover, across all studies, support was obtained for the construct validity of the SIMS through results from correlations in line with the simplex pattern among the subscales as well as between the SIMS subscales and motivational determinants and consequences. Perhaps of greater interest for the present discussion were the results of study 4, which showed that some subscales (intrinsic motivation and identified regulation) were sensitive enough to detect changes in motivation that took place during two games of a basketball tournament.
Other researchers have also obtained support for the psychometric properties of the SIMS. First, all studies reported acceptable internal consistency values for each subscale (Blanchard, Mask, Vallerand, de la Sablonnière, & Provencher, 2007; Conroy, Coatsworth, & Kaye, 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, Duda, & Prusak, 2003). The coefficient alpha values of all but the amotivation subscale (α = .58) in the Conroy and colleagues study were above .60. Second, support for the factorial validity of the SIMS was obtained through CFAs with one qualification. Whereas the CFA results with the 16 items yielded acceptable fit indexes, removal of 1 item (Jaakkola, Liukkonen, Laakso, & Ommundsen, 2008) and even 2 items (Gillet, Berjot, & Paty, 2009; Standage, Treasure, et al., 2003) yielded better fit indexes. Moreover, Standage, Treasure, and colleagues (2003) conducted multisample CFAs and showed that the pattern of factor loadings was largely invariant across four different samples.
Construct validity of the SIMS was also assessed in several studies (Blanchard et al., 2007; Conroy et al., 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, et al., 2003). In addition to supporting the simplex pattern among the SIMS subscales and between the SIMS subscales and need satisfaction (study 2 of Blanchard and colleagues, 2007), results also supported the postulate from the HMIEM (Vallerand, 1997) for the top-down effect, in which contextual sport motivation was found to predict situational sport motivation (studies 1 and 2 of Blanchard et al., 2007; Jaakkola et al., 2008; Ntoumanis & Blaymires, 2003). Specifically, the more self-determined the motivation was found to be in a specific context (in this case, sport), the more self-determined the motivation was found to be in a given situation. Furthermore, Blanchard and colleagues (2007, studies 1 and 2) found support for another postulate from the HMIEM that suggests that over time, situational motivation in the realm of sport (basketball) has recursive effects on contextual motivation. The more that situational motivation is self-determined, the more that contextual motivation becomes self-determined over time. Finally, Jaakkola and coworkers (2008) demonstrated that, as predicted by the HMIEM, situational self-determined motivation was better than contextual motivation in predicting the situational intensity (as assessed by HR) displayed by students in a physical education class. Overall, these findings provide strong support for the reliability and factorial and construct validity of the SIMS.
The SIMS has several positive features, one of them being that it is the only scale to assess intrinsic and extrinsic motivation and amotivation at the situational level. Furthermore, it does so using only 16 items. Nevertheless, it also has some weaknesses. First, the SIMS does not assess the different types of intrinsic motivation and integrated and introjected regulation, because it was designed to be short. Second, while the factor structure has been supported, it is not clear if some items should be replaced (Gillet, Berjot, et al., 2009; Jaakkola et al., 2008; Standage, Treasure, et al., 2003). Third, research so far has not assessed the validity of the scale with high-performance athletes. Thus, additional research is needed to further test the psychometric properties of the SIMS in sport.
Learn more about Measurement in Sport and Exercise Psychology.
Ethics codes imperative in conducting research
Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body.
Ethics Codes: Their Nature, Purposes, and Application
Ethics codes typically comprise principles and standards. Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body. These general and aspirational statements set the underlying tone for the more specific codes and guide the work-related ethical decision making of professionals. In contrast, ethical standards specify both proscribed and prescribed member behaviors. While not always black and white, these standards serve as a more clear cut and enforceable guide for professional behavior.
Members should apply both the aspirational principles and enforceable standards to shape their thinking and behavior in work settings. Ideally, members self-monitor their own behavior. In an effort to remain ethical, professionals are encouraged to consult with colleagues about ethically challenging situations and to provide constructive feedback about perceived possibly unethical behavior they witness in others.
Assessment and Measurement
A central question to be addressed in this chapter is what are assessment and measurement. Sundberg (1977) defines assessment as the processes used “for developing impressions and images, making decisions and checking hypotheses about another person's pattern of characteristics that determines his or her behavior in interaction with the environment” (p. 21). The assessment process involves collecting and assembling a broad range of objective and subjective information about persons or groups to develop impressions about them; identify their needs; predict how they might think, feel, and behave in future situations; and select and apply interventions based on the content and dependability of that information. Professionals may use multiple assessment methods that include observations of behavior, symptom checklists, surveys and questionnaires, structured and unstructured interview materials, and standardized tests (Bennett et al., 2006). Gardner and Moore (2006) emphasize using a triad of psychological assessment strategies in the practice of clinical sport psychology: (1) initial interviews, (2) behavioral observation, and (3) psychological testing. The nature and assumptions underlying assessment approaches are usually grounded in the theoretical orientation of the professional (Andersen, 2002).
In contrast, measurement can mean many things to many people. It is one of the most common words in the English language and can be used as both a noun and a verb (Lorge, 1967). For the purposes of this chapter, measurement is viewed as an extension of assessment processes. It can be thought of more narrowly as the process of collecting information about psychological characteristics of interest (e.g., attitudes, behaviors, state experiences) using one or more methods or tools (such as those mentioned earlier) to monitor change, the effect of intervention, or treatments postassessment. For example, an educational sport psychology consultant might administer a measure of team cohesion over the course of a competitive season to see how team members perceive their relationships. Another consultant might conduct a preseason baseline screening assessment of cognitive functioning in hockey players and then reevaluate players who incur a mild traumatic brain injury (i.e., concussion) later in the season.
In this chapter, the terms measurement and assessment are used interchangeably. Furthermore, these terms are used to describe the decisions and opinions made by professionals regarding clients with whom they work. As such, measurement and assessment techniques include all methods of gathering information about clients, such as (a) psychological, educational, and neurological tests; (b) data gathered during clinical interviewing; (c) information gathered from significant others (e.g., family members, teachers, friends); (d) direct and indirect observation; and (e) interactions with people via teletherapy (e.g., Internet, phone; Fisher, 2009).
Competence and Education
In order to excel in our professional duties and do well for those we serve, teach, study, and otherwise interact with, we must know what to do and how to do it in a capable manner. The ethics codes mentioned earlier identify the necessity of being knowledgeable and capable in our work. For example, the APA ethical standards provide guidance for organization members in this area, including information about (a) competence limitations, (b) keeping up competence, (c) making sound professional and scientific judgments, (d) delegating work responsibilities to others, (e) engaging in activities in emergencies, and (f) impairment (APA, 2002). Competence in professional behaviors is a personal matter that is frequently challenged. It is the responsibility of professionals to know their limitations and how their knowledge and skills change and require constant upgrading. The APA ethics code also emphasizes the importance of making sound work-related decisions based on scientific knowledge and appropriate discipline-specific practice. This portion of the APA code cautions professionals to be careful when delegating work to others, describes how a professional is responsible for others' work, and explains the necessity of avoiding multiple relationships with those to whom work is delegated. The APA standards note that we can occasionally be thrown into situations in which our competence is stretched; in such cases we need to be very careful, seek supervision if available, and end such work as soon as possible.
Measurement Referral Questions and Appropriateness of Instruments
When selecting assessment instruments, the professional must consider the referral questions that prompted this process (Fisher, 2009; Smith, 1976). The instruments selected should reflect these referral questions and utilize assessment strategies that have appropriate validity and reliability. For example, if a professional is interested in measuring state anxiety for research purposes, an appropriate assessment may be the Competitive State Anxiety Index-2 (CSAI-2; Martens, Burton, Vealey, Bump, & Smith, 1990) as opposed to the State-Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, & Luschene, 1970), which measures both trait and state anxiety. When selecting the assessment, the professional should be aware of limitations or biases regarding cultural sensitivity (see the later section on cultural issues); gender considerations (Etzel, Yura, & Perna, 1998); and age, language, or disability factors that may influence the psychometric qualities of the assessment differently from the way they influenced the normative groups used for the development and validation of the instrument (APA, 2002; Fisher, 2009). It is also important to consider the method of delivery. For example, assessments based on paper and pencil may not have been validated for online use (see the later section on technology), and instruments with elevated reading levels may not be appropriate for certain age or developmental groups. Therefore, the professional should always verify the assessment's validity and reliability when a modified assessment method or group is used (Fisher, 2009). Furthermore, the professional should also attempt to conduct in-person assessments when possible, as a great deal of information can be learned about clients from the way in which they present themselves during the assessment process. This information can affect the richness of the assessment data.
It is also important for professionals to be aware of and competent to assess and use appropriate psychometric strategies for establishing validity and reliability of the instruments they use (AERA, APA, & NCME, 1999). All instruments have unique psychometric properties that affect how they should be administered and interpreted. When validity and reliability issues are not taken into consideration, it is possible to choose and utilize instruments to assess factors that they were not designed to assess. Furthermore, practitioners should be well aware of other psychometric properties such as content and criterion validity and standard error of measurement that may affect how results are interpreted and used. The ethical practitioner needs to be aware of psychometric issues in order to choose appropriate instruments with regard to the referral questions, client characteristics, assessment strategies, and environmental factors.
Consent and Assent
As discussed earlier, the ethical principles for sport and exercise psychology emphasize doing no harm to the client and respecting the individual's rights and dignity (AASP, 1996; APA, 2002). The test taker's right to privacy and confidentiality applies here as well, and the professional should take all necessary precautions to maintain the confidentiality and privacy of the client. To protect the test taker, informed consent must be obtained at the start of the relationship (e.g., research, consultation, therapy). Beyond the informed consent process and before formal assessment, the client or participant should be informed of all pertinent information regarding the assessment process. This information includes (a) the nature and purpose of assessment; (b) any applicable fees; (c) potential involvement of third parties such as a coach, athletic trainer, or manager; (d) limits of privacy and confidentiality (as discussed in the next section); and (e) the timeline for the process and potential feedback (Fisher, 2009). This information should be presented in a clear and understandable manner. Furthermore, this information should be agreed to by the test taker, who thereby gives informed consent. Test takers should engage in assessment of their own free will and must be given the option to withdraw participation without consequences (APA standard 3.10). All necessary information about assessment procedures and findings should be provided in a language or level appropriate for the participant. Furthermore, it is unethical to necessitate or coerce individuals to take part in measurement and assessment for research or practice purposes.
Privacy and Confidentiality and Release of Information
Typically, the ethical standards of organizations with ties to sport psychology (APA ethical standard 4.01 and the AASP) suggest that professionals should not reveal information about clients, test takers, or others without their signed approval to release information or legal requirement. These legal situations may include (a) a test taker who indicates possible self-harm or harm to others (i.e., suicide or homicide), (b) a test taker whose results are subpoenaed by the court, or (c) a test taker who is a minor, in which case the parent or guardian may have access to the data (Etzel et al., 1998). If the test taker or, in the case of a minor, the parent or guardian provides explicit written permission, the specific information identified by the client may be released to the identified parties. Unless these circumstances are met, information from the test taker may not be disclosed to anyone (e.g., coaches, management, parents, administration, athletic trainers, and so on).
In situations where the assessment is requested by a third party (e.g., coaches, management, the court), this third party may also request results from the assessment. It is important for the professional to establish a priori who is the “real client” (Ogilvie, 1979) and to have the ability to control access to the results. Etzel and colleagues (1998) suggest that information about the assessment should be shared only with one predetermined person, unless a release of information form has been completed. Therefore, when engaging in assessments, the professional should set clear boundaries and avoid dual relationships, thereby identifying who is being served (APA standard 4.02a). Another complication of these situations is the role of trust. If athletes or test takers suspect the test results will be used without their permission in decisions regarding performance or other aspects of participation, they may be less likely to respond honestly, thus affecting the validity of the results (see the section on demand characteristics).
Raw Data and Data Storage
Raw data such as the test taker's responses to items, including the professional's notes and final reports, should be stored in locked file cabinets inside the professional's office or in password-protected computer files (Fisher, 2009). Other methods to ensure confidentiality may include limiting access to records to only those people who have a need to know this information and have been trained to handle and understand it, deidentifying records using code numbers, and appropriately disposing of identifiable records (Fisher, 2009). A good policy for data maintenance is that data should be kept for a minimum of 7 y after the last service delivery date or 3 y after a minor reaches the age of 18 (whichever is later), as is recommended by the APA record-keeping guidelines (APA, 2002; Fisher, 2009). Raw data and the instruments used for assessment purposes should not be released to third parties unless a release of information form has been completed and the third party is trained competently to use such information.
Results Discussion
Test feedback and results discussion should be provided in the form of a carefully constructed report using clear language that fully explains the assessment results. Labels and jargon should be eliminated to increase readability. Information necessary to the purpose of the test should be included, and the inclusion of unnecessary and unrelated information should be avoided (APA, 2002; Fisher, 2009). Additionally, as recommended by the APA (APA, 2002), interpretations should take into consideration the participant's gender, race, ethnicity, age, national origin, sexual orientation, religion, disability, language, or socioeconomic status. Participants should receive assessment information and feedback related to their performance on the assessment and should be informed of ways in which they could personally use the test results or how this information may be used by a third party (only if written permission was given to release such information). The information released to the participant should be presented in a verbal or written report and presented in such a way that it may not cause harm to the test taker (Etzel et al., 1998). However, information such as numerical scores or specific responses should not be released to individuals not qualified to interpret such information (Fisher, 2009; Tranel, 1995).
Demand Characteristics
In the sport context, several groups of individuals may be interested in the assessment results of athletes. Interested parties may include coaches, managers, teams, students, or administrators. However, the potential of a third party reviewing the test results may increase social desirability and result in invalid and unreliable information. Therefore, undue pressure to complete an instrument or battery should be considered as a contextual factor.
Another potentially undesirable effect of a third party viewing the test taker's results may be assessment anxiety. The APA standards state that if a test taker is observed to be anxious or reports feeling anxious, this feeling should be taken into account and become a limitation in the interpretation of test data (APA, 2002). Assessment anxiety may be exaggerated in situations where a third party may have access to results. These situations may also lead to faking good or faking bad on the part of respondents who are concerned about how the results may be used. This must also be considered when evaluating the results.
Supervision of Subordinates
In some cases, professionals may hire and train subordinates to help with assessment and measurement tasks. These subordinates may administer, score, and even interpret the results of measurement and assessment. Standard 2.05 of the APA ethics code (APA, 2002) states that professionals utilizing employees, supervisees, or research and teaching assistants for such purposes should take reasonable precautions to put subordinates in situations where (a) they do not face possibly harmful multiple relationships with the client that could affect their objectivity, (b) they are competently trained to perform the delegated task on their own or with supervision, or (c) they are supervised for competent service delivery. Therefore, when using subordinates to help with tasks such as administration, scoring, or interpretation, the professional assumes primary responsibility and liability to ensure that the services are being provided competently. The professional needs to ensure that subordinates are well trained with all potential instruments. To do so, the professional must provide appropriate training, experience, and supervision as well as continue to check the subordinates' work to ensure its quality. As with licensed professionals, not all subordinates have the same competencies with regard to all instruments.
Learn more about Measurement in Sport and Exercise Psychology.
Tools to measure the physical self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002).
Tools to Measure the Physical Self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002). However, following the research of Shavelson and colleagues (1976), a number of multidimensional self-concept instruments containing one or more PSC scales were developed. Thus, in a 1974 review, Wylie concluded that at the time most self-concept instruments focused on global self-concept or self-esteem rather than specific domains such as PSC. Although several of the instruments reviewed by Shavelson and colleagues (1976) contained items relating to physical skills and elements of physical appearance, none provided a clearly interpretable measure of PSC. From a practical perspective, these older instruments appear to be of little value for sport and exercise psychologists. The major exception, perhaps, is the Physical Estimation and Attraction Scale (PEAS; Sonstroem, 1978, 1997), along with the theoretical model on which it is based. This instrument was designed to measure two global components: estimation (competency) and attraction. While the PEAS may not be the instrument of choice today, it has a historical significance in that its research incorporated many of the features of the construct validity approach advocated in this chapter, it was heuristic, and it provided an important basis for subsequent research.
In a subsequent 1989 review, Wylie identified several multidimensional self-concept instruments measuring one or more components of PSC that can be differentiated from other specific domains of self-concept and general self-concept. Included in the list were the three SDQ instruments already discussed. Wylie also evaluated Harter's (1985) Self-Perception Profile for Children, which contains two PSC scales (athletic competence and physical appearance). Other multidimensional instruments containing physical scales that were not reviewed by Wylie include the Self-Rating Scale (Fleming & Courtney, 1984), which measures physical ability and physical appearance; the Song and Hattie Test (Hattie, 1992), which measures physical appearance; and the Multidimensional Self-Concept Scale (Bracken, 1996), which has a physical scale that includes physical competence, physical appearance, physical fitness, and health. The Tennessee Self-Concept Scale (Fitts, 1965) is a multidimensional self-concept instrument that also purports to measure PSC. In their review and empirical evaluation of this instrument, Marsh and Richards (1988) found distinguishable physical components reflecting health, neat appearance, physical attractiveness, and physical fitness that were incorporated into a single PSC score. This detailed breakdown of the Tennessee physical scale was supported by relationships with the SDQ physical ability and physical appearance scales in an MTMM study comparing responses to the two instruments. Because each of the clusters based on responses to the Tennessee instrument is represented by only a few items, it is not appropriate to use the instrument to measure these distinct components of PSC. Marsh and Richards argued that PSC measures that combine and confound a wide range of differentiable physical components—such as those based on the Tennessee Self-Concept Scale—should be interpreted cautiously (see similar comments by Fox & Corbin, 1989).
In summary, although multidimensional self-concept instruments based on Shavelson and colleagues' (1976) model provided good support for the construct validity of the physical ability and appearance scales (e.g., Marsh, 2002; Marsh & Peart, 1988), they left unanswered the question of whether PSC is more differentiated than can be explained in terms of one (physical ability) or two (ability, appearance) physical scales. Subsequent PSC instruments were developed specifically to address the issue of the multidimensionality of PSC.
Physical Self-Perception Profile
The Physical Self-Perception Profile (PSPP; Fox, 1990; Fox & Corbin, 1989) is a 30-item inventory that consists of four specific scales and one general physical self-worth factor. The PSPP was developed to document the physical self-perceptions of college students. It was designed to reflect the advances made by Harter (1985) and Shavelson and colleagues (1976) in identifying the physical self as an important construct to measure in its own right and to reflect the hierarchical, multidimensional nature of the physical self. A qualitative approach was used to reveal dimensions of physical self-esteem salient to the population sampled (Fox & Corbin, 1989). The PSPP consists of five 6-item scales of sport (perceived sport competence), body (perceived bodily attractiveness), strength (perceived physical strength and muscular development), condition (perceived level of physical conditioning and exercise), and physical self-worth. Fox (1990) recommended that the 10-item Rosenberg Self-Esteem Scale (Rosenberg, 1965) be used alongside the PSPP to provide a global measure. Fox (1990) reported factor analyses indicating that each item loads most highly on the factor that it is designed to measure and that individual scale reliabilities are in the .80s.
The PSPP research demonstrates (a) good reliability (coefficient alpha of .80-.95; Fox, 1990; Page, Ashford, Fox, & Biddle, 1993; Sonstroem, Speliotis, & Fava, 1992); (b) good test-retest stability over the short term (rs of .74-.89; Fox, 1990); (c) a well-defined, replicable factor structure as shown by CFA (Fox & Corbin, 1989; Sonstroem, Harlow, & Josephs, 1994); (d) convergent and discriminant validity in studies showing PSPP relationships with external criteria such as exercise behaviors, mental adjustment variables, and health complaints (Fox & Corbin, 1989; Sonstroem & Potts, 1996); and (e) applicability for an older adult population (Sonstroem et al., 1994). However, correlations among the PSPP scales are consistently so high (.65-.89 when disattenuated for measurement error; Marsh, Richards, Johnson, Roche, & Tremayne, 1994) that they detract from the instrument's ability to differentiate among the different PSC factors it purports to measure.
Subsequently, a version of the PSPP for children and adolescents was developed and validated—the Children and Youth Physical Self-Perception Profile (CY-PSPP; Eklund, Whitehead, & Welk, 1997; Whitehead, 1995). Like the PSPP, the CY-PSPP is a 30-item inventory consisting of the same five 6-item scales. The CY-PSPP is a substantially revised version of the PSPP that is most appropriately thought of as a different instrument. The CY-PSPP body, strength, and conditioning subscales are based on minor adaptations of the PSPP to make them more suitable for children. However, the global self-worth (self-esteem) and sport scales are completely different. The PSPP did not have a self-esteem scale of its own but included 6 items adapted from the Rosenberg Self-Esteem Scale. On the CY-PSPP, global self-esteem and sport scales from the PSPP were dropped and replaced with corresponding scales from Harter's (1985) Self-Perception Profile for Children. Correlations among factors remained high (e.g., physical self-worth with attractive body adequacy = .8). Eklund and colleagues (1997) suggested that these results are consistent with the developmental patterns among children, as differentiation in self-concept is less defined at younger ages (Harter, 1985). CFAs have supported the instrument's factor structure, with both the CFI (comparative fit index) and NNFI (non-normed fit index) indexes exceeding the .90 criterion for good model fit (Eklund et al., 1997). Moderate correlations (r = .39-.45) with external criteria such as physical activity and physical fitness have demonstrated its convergent and discriminant validity (Welk & Eklund, 2005). The CY-PSPP has been validated with adolescents (Jones, Polman, & Peters, 2009; Welk, Corbin, & Lewis, 1995; Whitehead, 1995) and younger children (Welk, Corbin, Dowell, & Harris, 1997) and has been validated and translated into other languages (åsci, Eklund, Whitehead, Kirazci, & Koca, 2005; Raustorp, Ståhle, Gudasic, Kinnunen, & Mattsson, 2005; Raustorp, Mattsson, Svensson, & Ståhle, 2006).
Both the PSPP and CY-PSPP use a nonstandard response format based on Harter (1985), in which each item consists of a matched pair of statements, one negative and one positive (e.g., “Some people feel that they are not very good when it comes to sports” but “Others feel that they are really good at just about every sport”). Each item consists of two contrasting descriptions, and respondents are asked which description is most like them and whether the description they select is “Sort of true of me” or “Really true of me.” Responses are scored on a scale of 1 to 4, with 1 representing a “Really true of me” response to the negative statement and 4 representing a “Really true of me” response to the positive statement. Whereas this response format is designed to reduce the influence of social desirability, Wylie's (1989) review of Harter's original instruments provided little or no support for this suggestion, and Marsh and colleagues (1994) suggested that there were substantial method effects associated with the nonstandard response scale. This format has also been shown to be confusing, particularly for children (Eiser, Eiser, & Haversmans, 1995), and even for adults (Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994), unless special care is taken to explain the response scale. Using the suggestion of Marsh and colleagues (1994) that confusion over the structured alternative response scale could be overcome by more detailed instructions at the outset, researchers implementing the CY-PSPP used large illustrations for a sample item (Whitehead, 1995). Wichstrom (1995) found that responses for this format were psychometrically stronger when based on typical Likert responses rather than the structured alternative format, but Welk and colleagues (1997) suggested that the nonstandard response scale on the CY-PSPP worked better than Likert responses worked.
In summary, the PSPP and the CY-PSPP are established instruments that have been translated into several languages and have been used with a range of populations. However, the format and the high correlations among factors in both instruments may limit their usefulness in some settings. The CY-PSPP is a substantially revised version of the PSPP specifically developed for children. Although the CY-PSPP should be used instead of the PSPP for child and adolescent samples, it might even be stronger than the original PSPP is for adult samples.
Subsequent to the completion of this chapter, Lindwall and colleagues (2011) published a revised version of the PSPP (PSPP-R). They reviewed critiques of the PSPP response scale such as those noted here (e.g., Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994) and acknowledged that “the idiosyncratic alternative response format has been difficult to understand for some participants” (pp. 310-311). In recognition of these problems, the idiosyncratic response scale that has been such a salient feature of the PSPP was dropped altogether and replaced with a 4-point Likert response using only positively worded items. Lindwall and colleagues (2011) demonstrated the appropriateness of the revised PSPP-R based on a large sample (N = 1,831) of participants from four countries (Sweden, Great Britain, Portugal, and Turkey). However, they did not indicate whether the PSPP-R supersedes the PSPP or is merely an alternative to it. There also wasn't any discussion of the implications for other instruments using similar idiosyncratic response scales (e.g., PSPP- related instruments such as CY-PSPP or Harter's instruments more generally).
Physical Self-Inventory
The Physical Self-Inventory (PSI) is a French adaptation of the PSPP that was originally developed for use with Francophone adults (Ninot, Delignières, & Fortes, 2000). In two preliminary studies, Ninot and colleagues used the nonstandard response scale from the PSPP. However, consistent with previous research (Marsh et al., 1994), they reported that this response scale was problematic. In a third study, the authors used a 6-point Likert response scale; factor analysis results were reasonable, but reliability coefficients were not completely satisfactory. Next the authors replaced the PSPP global physical items with items from the SDQ physical scale and the PSPP global self-esteem items with items from Coopersmith (1967). The final PSI consists of 25 items measuring six PSC factors (four specific and two global, as with the PSPP) and has satisfactory psychometric properties that have been confirmed in subsequent French studies of adults (Masse, Jung, & Pfister, 2001; Stephan, Bilard, Ninot, & Delignières, 2003; Stephan & Maïano, 2007).
Maïano and coworkers (2008) subsequently constructed a short form of the PSI for use with adolescents. They found that not all items from the adult PSI worked with adolescents, but they were able to construct 18-item (PSI-SF, 3 items per scale) and 12-item (PSI-VSF, 2 items per scale) versions that had good psychometric properties. In particular, the measurement and hierarchical structures were consistent with proposals by Fox and Corbin (1989) and were fully invariant across gender. Maïano and coworkers also noted that PSI-SF responses showed very high test-retest stability. Comparison of the PSI-SF and PSI-VSF demonstrated that the measurement model, mean structure, structural parameters, and criterion-related validity were equivalent across samples and versions. Nevertheless, the authors noted a serious limitation that all versions of the PSI share with the PSPP: Very high correlations among the six PSC factors (correlations among latent factors) that, according to the authors, bring “into question the real independence of some of the models' sub-dimensions, and by extension their discriminant validity, a finding that has already been observed by Marsh (2002; Marsh et al., 2006) on analyses of the PSPP” (Maïano et al. 2008, p. 844). However, Maïano and colleagues also noted that because they used a traditional Likert response scale, the high correlations apparently were not due to the structured alternative format used in the PSPP. In summary, particularly the short and very short forms of the PSI have made a potentially important contribution to applied research. However, further research is needed to evaluate more fully the robustness of support for construct validity and application in non-French-speaking settings.
Richards Physical Self-Concept Scale
The Richards Physical Self-Concept Scale (RPSCS; Marsh et al., 1994; Richards, 1988) is a 35-item instrument designed to measure six specific components of PSC (body build, appearance, health, physical competence, strength, action) and one general physical satisfaction factor. Each item is a simple declarative statement, and subjects respond on an 8-point true-false scale. Extensive research in Australia (e.g., Marsh et al., 1994; Richards, 1988) has indicated that RPSCS responses have good psychometric properties. The factor structure is very robust, generalizing well over ages from 8 to 80 y and over gender.
RPSCS research has demonstrated (a) good reliability (coefficient alpha of .79-.93; Marsh et al., 1994; Richards & Marsh, 2005); (b) good test-retest stability over the short term (coefficient alpha of .77-.90 over 3 wk; Richards, 1988); (c) a well-defined, replicable factor structure as shown by CFA (Marsh et al., 1994; Richards, 2004); (d) a factor structure that is invariant across gender, as shown by multiple-group CFA (Richards, 2004), and across a wide age range; (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (Marsh et al., 1994; Richards & Marsh, 2005); and (f) applicability for participants aged 8 to 60 y and for both genders (Marsh et al., 1994; Richards, 1988, 2004; Richards & Marsh, 2005). In summary, the RPSCS is regarded as a valid, reliable, and structurally sound instrument that has been tested across both genders and a wide population of ages. The applicability across such a wide range of ages is a particular strength.
Physical Self-Description Questionnaire
Extending Fleishman's (1964) classic research on the structure of physical fitness, the Physical Self-Description Questionnaire (PSDQ) scales reflect some of the original SDQ scales and parallel physical fitness components identified in a CFA of physical fitness measures (Marsh, 1993). The PSDQ consists of nine specific components of PSC (strength, body fat, activity, endurance and fitness, sport competence, coordination, health, appearance, and flexibility), a global physical scale, and a global self-esteem scale. Each of the 70 PSDQ items is a simple declarative statement, and individuals respond on a 6-point true-false scale. The PSDQ is designed for adolescents but is also appropriate for older participants.
PSDQ research has demonstrated (a) good reliability (median coefficient alpha of .92) across the 11 scales (Marsh, 1996b; Marsh et al., 1994); (b) good test-retest stability over the short term (median r = .83 over 3 mo) and longer term (median r = .69 over 14 mo; Marsh, 1996b); (c) a well-defined, replicable factor structure as shown by CFA (Marsh, 1996b; Marsh et al., 1994); (d) a factor structure that is invariant over gender as shown by multiple-group CFA (Marsh et al., 1994); (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (see Marsh et al., 1994); (f) convergent and discriminant validity as shown by PSDQ relationships with external criteria (e.g., measures of body composition, physical activity, endurance, strength, and flexibility; see Marsh, 1996a, 1997); and (g) applicability for participants aged 12 to 18 y (or older) and for elite athletes and nonathletes (Marsh, Hey, Roche, & Perry, 1997; Marsh, Perry, Horsely, & Roche, 1995). In summary, the PSDQ is a psychometrically strong instrument.
Marsh, Martin, and Jackson (2010) recently presented a new short form of the PSDQ (PSDQ-S). This short form balances brevity and psychometric quality in relation to established guidelines for evaluating short forms (e.g., Marsh, Ellis, Parada, Richards, & Heubeck, 2005; Smith, McCarthy, & Anderson, 2000) with the construct validity approach that is the basis of PSDQ research. Based on the PSDQ normative archive, 40 of 70 items were selected and evaluated in a new cross-validation sample (N = 708 Australian adolescents). To test the generalizability of results, the authors considered four additional samples: Australian adolescent elite athletes (n = 349), Spanish adolescents (n = 986), Israeli university students (N = 395), and Australian senior citizens (n = 760). Reliabilities for the 40 PSDQ-S items were consistently high in the cross-validation sample (.81-.94; median = .89) and senior sample (.81-.94; median = .91) and reliabilities in the cross-validation sample were higher than they were in comparable groups completing the 70-item PSDQ. The PSDQ-S factor structure in the cross-validation sample was well defined and highly similar to that based on the archive sample as well as to those based on the other four groups. Study 1, using a missing-by-design variation of multigroup invariance tests, showed that invariant factor structures were invariant based on 40 PSDQ-S items and 70 PSDQ items. Study 2 demonstrated factorial invariance of responses over 1 y (test-retest correlations of .57-.90; median = .77) and good support for convergent and discriminant validity in relation to time. Study 3 showed good and nearly identical support for convergent and discriminant validity of PSDQ and PSDQ-S responses in relation to responses on the PSPP and PSC instruments. The four studies reported by Marsh and coworkers demonstrated new, evolving strategies for the construction and evaluation of short forms that support the PSDQ-S. The authors concluded that the strong support for the psychometric properties and construct validity of the widely used PSDQ instrument generalizes very well to the PSDQ-S.
Elite Athlete Self-Description Questionnaire
The PSC instruments discussed thus far may be suitable for elite athletes (e.g., Marsh et al., 1995). There may, however, be other components to PSC that are particularly relevant for elite athletes, and thus the Elite Athlete Self-Description Questionnaire (EASDQ; Marsh, Hey, Roche, et al., 1997; Marsh, Hey, Johnson, & Perry, 1997) was developed to address these other components. For the EASDQ, it was hypothesized that overall performance by elite athletes is a function of skill level, body suitability, aerobic and anaerobic fitness, and mental competence. Thus Marsh and colleagues developed the EASDQ to measure these six factors. For each scale, they developed a pool of items that sport psychologists at the Australian Institute of Sport evaluated for their suitability for elite athletes. Pilot studies were conducted to select the best items to represent each factor. A compromise between brevity and psychometric soundness was achieved, with acceptable levels of reliability (e.g., all scales having reliability estimates of at least .8) based on short scales (4-6 items per scale).
EASDQ research demonstrates (a) adequate reliability (median coefficient alpha of .85) across the six scales (Marsh, Hey, Johnson, et. al., 1997); (b) a well-defined, replicable factor structure as shown by CFA (Marsh, Hey, Johnson, et. al., 1997; Marsh, Hey, Roche, et al., 1997); (c) applicability for elite athletes aged 12 y or older (Marsh, Hey, Roche, et al., 1997); and (d) predictive validity as shown by its ability to predict swimming performances in world championships after controlling for previous personal best performances (Marsh & Perry, 2005). In summary, the EASDQ is a reliable and valid instrument for elite athletes of all ages. More research is needed, however, to relate EASDQ responses to external validity criteria such as those used in PSDQ research and to criteria that are more specific to elite athletes (e.g., actual performance in competition).
Learn more about Measurement in Sport and Exercise Psychology.
Assess intrinsic and extrinsic motivation in sport and exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted.
Evaluation of Measures of Intrinsic and Extrinsic Motivation in Sport and Exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted. Certain criteria have guided the selection of the measures presented in this section. First, we have selected measures that are fully developed instruments that have gone through extensive validation steps. Second, we have chosen scales that have been used in research, published or unpublished, during the past 10 years. Scales that have not been used during that time frame are considered to be obsolete and are not reviewed. Finally, in light of recent theoretical development and because of space limitation, we have focused on motivation scales that assess intrinsic and extrinsic motivation independently of determinants and outcomes, while focusing on the perceived reasons of behavior. Our earlier discussion on the definitions of intrinsic and extrinsic motivation makes it possible to classify the different measures. The measures can vary in terms of the level of generality (situational versus contextual level) and the area (sport versus exercise). This classification appears in table 25.1. Table 25.2 (see p. 291) provides additional information on the concept of, dimensions of, publication source of, and where to obtain the scale. As can be seen, seven measures are reviewed. For each one, we present (a) a description of the instrument, (b) the conceptual and theoretical rationale underlying its scale development, (c) the available evidence concerning its psychometric properties (e.g., factorial validity, reliability, and construct validity), and (d) a broad assessment of the strengths and weaknesses associated with each measure.
Measures Used in Sport
In this section, we review the SMS (Brière et al., 1995; Pelletier et al., 1995), the Sport Motivation Scale-6 (SMS-6; Mallett, Kawabata, Newcombe, Otero-Rorero, & Jackson, 2007), the Behavioral Regulation in Sport Questionnaire (BRSQ; Lonsdale, Hodge, & Rose, 2008), the Pictorial Motivation Scale (PMS; Reid, Vallerand, Poulin, Crocker, & Farrell, 2009), and the SIMS (Guay et al., 2000).
Sport Motivation Scale
The SMS was developed (Brière et al., 1995; Pelletier et al., 1995) in order to assess contextual intrinsic and extrinsic motivation from a multidimensional perspective, as well as amotivation. The SMS has been the most often used motivation measure in sport, being employed with a variety of athletes (recreational to elite), age groups (adolescent to senior), and cultures (e.g., Canada, United States, United Kingdom, Bulgaria, Australia, Spain, and New Zealand). In fact, the SMS has been translated and validated in several languages (see Pelletier & Sarrazin, 2007). The SMS is based on SDT (Deci & Ryan, 1985) and is made up of seven subscales assessing amotivation; external, introjected, and identified regulation; and intrinsic motivation to know, to experience stimulation, and to accomplish. In line with SDT, motivation is assessed as the perceived reasons for participation, or the why of behavior. At the beginning of the scale, participants are asked, “In general, why do you practice your sport?” The items represent the perceived reasons for engaging in the activity, thus reflecting the different types of motivation.
The original scale was developed in French as L'Échelle de Motivation dans les Sports (Brière, Vallerand, Blais, & Pelletier, 1995) and was validated in three steps. The first step involved generating a pool of items explaining various reasons for sport participation through interviews with French Canadian athletes (aged 17-20 y). These reasons were then used to formulate items for the seven subscales of the French SMS. In the second step, a committee of experts evaluated the content validity of the items and eliminated those that were thought to be inadequate. Another sample of athletes from various sports completed the scale. Results from an exploratory factor analysis (EFA) provided support for a seven-factor structure with 4 items per subscale; this second step thus resulted in a 28-item scale. In the third and final step, two additional studies were conducted to further validate the scale. These studies included approximately 500 individuals, most of whom were involved in recreational sports. Results from confirmatory factor analyses (CFA) and correlational analyses confirmed the seven-factor structure, the subscale internal consistency (ranging from .65-.96), and moderate to high indexes of temporal stability (ranging from .54-.82) over 1 month. Furthermore, inspection of correlations among the seven SMS subscales provided support for the simplex pattern proposed by SDT. Results of correlations also showed that (in line with SDT) the most self-determined forms of motivation (intrinsic motivation and identified regulation) were related more strongly to determinants such as autonomy support from coaches and feelings of competence than to other forms of motivation (external and introjected regulation) and amotivation. Similar results were obtained with motivational outcomes such as positive affect, concentration, and intentions to pursue engagement in sport. In sum, adequate construct validity was obtained for the French form of the SMS.
The translation of the French SMS into English involved back-translation and committee procedures as suggested by Vallerand (1989). Pelletier and colleagues (1995) conducted two studies involving college athletes from various sports in order to assess the psychometric properties of the English form of the SMS. Results from CFA with a sample of 593 Canadian university athletes revealed adequate fit indices or the hypothesized seven-factor model (see the Adjusted Goodness of Fit Index and the Normed Fit Index both > .90 and the Root Mean Square Residual < .08), and correlations with determinants and outcomes supported the simplex model. Moreover, internal consistency above .70 was obtained on all of the subscales except the identified subscale (.63). Test-retest correlations were acceptable and very similar to those obtained with the French SMS, as was the scale construct validity.
Since 1995, the SMS has been used extensively in sport psychology research. The seven-factor structure has been supported repeatedly (e.g., Doganis, 2000; Gillet, Vallerand, & Rosnet, 2009; Li & Harmer, 1996; Shaw, Ostrow, & Beckstead, 2005; Standage, Duda, & Ntoumanis, 2003). In addition, Hu and Bentler (1999) obtained support for a five-factor model by combining the three types of intrinsic motivation into one factor. Similar results were obtained by Gillet and colleagues (2009) with the French SMS. However, some studies have not supported the seven-factor model (Hodge, Allen, & Smellie, 2008; Mallett, Kawabata, & Newcombe, 2007; Mallett, Kawabata, Newcombe, & Otero-Rorero, 2007; Martens & Webber, 2002). Why is there such a discrepancy between these two sets of studies? One possibility lies in the populations from which the different samples were taken. Specifically, the SMS was validated using adolescent and young adult athletes and not older athletes. Because of this specific focus, some of the items may reflect a participation rather than an elite orientation, which is more in line with the younger population. For instance, an identified regulation item reads, “Because sport is one of the best ways to maintain good relationships with my friends.” Such an item seems more relevant for a younger population. An older, high-level athlete may disagree with this item but still display a high level of identified regulation for a sport (but not for relationship reasons). Future research using the SMS with different age groups and proficiency levels is needed to clarify this issue.
Whereas the internal consistency of the SMS has systematically shown adequate values, some values below .70 have been found. This is especially the case for the identified regulation subscale (Brière et al., 1995; Kingston, Horrocks, & Hanton, 2006; Li & Harmer, 1996; Pelletier et al., 1995), although some lower values (below .70) have been obtained with the introjected (McNeill & Wang, 2005; Perreault & Vallerand, 2007; Riemer, Fink, & Fitzgerald, 2002; Standage, Duda, & Ntoumanis, 2003) and external regulation (Standage, Duda, & Ntoumanis, 2003) and amotivation subscales (Standage, Duda, & Ntoumanis, 2003). However, very few instances of values below .60 have been obtained. It should be noted that a Cronbach alpha of .60 with only 4 items is acceptable because, as noted by Cronbach (1951), the coefficient alpha underestimates the internal consistency of scales with a low number of items. This is because the coefficient alpha includes the number of items in the formula. For instance, given the same average interitem correlation, a 3-item scale coefficient alpha value of .56 is equivalent to an alpha value of .81 on an 8-item scale!
In line with the original work of Ryan and Connell (1989) and the initial SMS validation procedures (Brière et al., 1995; Pelletier et al., 1995), construct validity has been assessed by other authors in two fashions: (1) with the simplex pattern of correlations among the subscales and (2) with correlations between motivational factors and their determinants and consequences. We do not have space to review all studies. However, overall, there is overwhelming support for the construct validity of the SMS both in French and English. For instance, in addition to finding support for the simplex pattern, Pelletier and Sarrazin (2007) concluded in their review of the evidence that the SMS has been used with success to predict a great variety of specific outcomes and consequences (such as burnout, exercise dependence among endurance athletes, fear of failing, adaptive coping skills, perceptions of constraints, flow, vitality and well-being, sporting behavior orientations, aggression, and performance) in a manner that is consistent with SDT. These findings provide strong support for the construct validity of the SMS.
In sum, the SMS has some positive features. First, it is a multidimensional instrument that assesses different types of intrinsic and extrinsic motivation as well as amotivation. Second, the scale focuses on the why of behavior and thus items are not confounded with determinants and consequences. Finally, it has some excellent psychometric properties. Nevertheless, some limitations should be underscored. First, although internal consistency levels have been acceptable overall, some subscales, especially the identified regulation subscale, have yielded relatively low coefficient alphas at times. Second, the SMS does not assess integrated regulation. Third, the seven-factor structure has not always been supported by CFAs. According to Pelletier, Vallerand, and Sarrazin (2007), this may be explained by a host of factors, including differences in sample sizes, variations in the way the instrument is administrated, or some other characteristics specific to the context of the study. However, as already indicated, it is also possible that the SMS is better suited for a younger, nonelite athlete population. Clearly, future research on this issue is in order.
Sport Motivation Scale-6
Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed another version of the SMS, the SMS-6. This scale has the same underlying rationale that the original SMS scale but was designed to improve the original version of the SMS by including an integrated regulation subscale and attempting to solve some of the inconsistencies with the factor structure and some of the relatively low internal consistency values (below .70). The SMS-6 comprises 24 items, 4 for each of the six subscales, which include amotivation; external, introjected, identified, and integrated regulation; and general intrinsic motivation. Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed 5 items for the integrated regulation subscale as well as 7 other items (4 of which were kept in the final scale) to replace some items in the original SMS. Two samples were used to validate the SMS-6. Sample 1 was composed of 501 first-year university students participating in competitive sport at least twice per week and 113 elite athletes representing Australia at the international level (for a total of 614 participants). Sample 1 was used to derive a factor structure that included the SMS items as well as the reformulated and integrated regulation items. Sample 2 was composed of 557 university students who were engaged in a variety of sports or physical activities twice per week. The second sample was used to confirm the structure of the SMS-6. Participants also completed the Dispositional Flow Scale (DFS).
Results of a CFA with the SMS-6 (with sample 2) provided support for the factor structure as well as for the internal consistency values (all above .70). Concerning the construct validity of the SMS-6, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) reported a rather weak simplex pattern of correlations among the subscales. More specifically, external regulation correlated highly with intrinsic motivation (r = .54), while the correlation between identified regulation and intrinsic motivation was very high (r = .91) and was higher than the one between integrated regulation and intrinsic motivation (r = .75). The construct validity of the SMS-6 was not fully supported, as some of the correlations involving the SMS and flow were not as expected by SDT. For instance, the distinctions among integrated regulation, identified regulation, and intrinsic motivation were not always clear. Furthermore, external regulation revealed some positive and sometimes strong correlations with flow, contrary to hypotheses derived from SDT.
In sum, the SMS-6 contains some nice features. First, it contains an integrated regulation subscale. Furthermore, the addition of 4 new items may make the SMS more acceptable for older and more experienced athletes. Second, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) presented results supporting the validity of a variation of the SMS-6, the SMS-8. The SMS-8 contains the same items that the SMS-6 contains but assesses the three types of intrinsic motivation rather than general intrinsic motivation. The SMS-6 also shows some limitations. First, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) proposed 7 new items to replace those that were presumably problematic in the original SMS. However, only 4 of these items made it to the final version. Thus, it appears that the SMS-6 retained much of the original SMS. Second, even some of the new items appear problematic and may not assess the desired construct (see Pelletier et al., 2007). For instance, a new amotivation item (“I don't seem to be enjoying my sport as much as I previously did”) seems to reflect a decrease in intrinsic motivation rather than amotivation. Finally, results from Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) demonstrated that the integrated regulation subscale may lack discriminant validity, leading to results with flow highly similar to identified regulation and intrinsic motivation.
Behavioral Regulation in Sport Questionnaire
Lonsdale and colleagues (2008) developed the BRSQ to create an alternative measure of elite sport motivation as conceptualized by SDT. However, in contrast to Mallett, Kawabata, Newcombe, and Otero-Rorero (2007), these authors used a complete new pool of items developed by SDT experts and competitive athletes. There are two versions of the BRSQ. The BRSQ-8 contains 32 items assessing integrated, identified, introjected, and external regulation; amotivation; and the three forms of intrinsic motivation (knowledge, experience stimulation, and accomplishment) identified by Vallerand (1997). The BRSQ-6 contains the same items but assesses general intrinsic motivation rather than all three types of intrinsic motivation, for a total of 24 items.
Lonsdale and colleagues (2008) conducted a series of three studies to validate the scale. In the first study, the factorial validity and the internal consistency were assessed with 382 New Zealand elite athletes. Results from a CFA on the 32 items supported the factor structure of the BRSQ. Specifically, fit indexes were acceptable and all items loaded significantly on the appropriate factors (they ranged from .58-.91). Finally, internal consistency of the eight subscales, measured with the Cronbach alpha, showed high values ranging from .71 to 91. Additionally, 1 wk test-retest reliability was tested with 34 competitive adult athletes. Coefficient alphas for all subscales supported the temporal reliability (values ranged from .73-.90).
In a second study with 343 athletes from New Zealand, the results of a CFA on the BRSQ-8 supported once more the factor structure as well as the subscale internal consistency. Lonsdale and colleagues (2008) also showed that the factor structure of the BRSQ-6 model fit the data very well and that subscale coefficient alphas all exceeded .78. Moreover, the construct validity of the BRSQ-6 was assessed by testing for a simplex pattern of correlations among the six subscales. While some relationships were in line with predictions (e.g., amotivation was negatively related to intrinsic motivation), there was a lack of discrimination between some subscales. More specifically, there was no difference between external and introjected regulation scores in terms of their relationships with amotivation. A similar pattern was evident with the identified and integrated regulation subscales, which both had similar high correlations with intrinsic motivation. These results with the simplex pattern were replicated in a third study conducted with nonelite athletes. In this third study, Lonsdale and colleagues also assessed the relationships between the BRSQ-6 and indexes of burnout (Lemyre, Treasure, & Roberts, 2006; Raedeke & Smith, 2001) and flow (Jackson & Eklund, 2002). Overall, results supported hypotheses in line with SDT. Specifically, amotivation and external and introjected regulation showed negative correlations with flow and positive correlations with burnout. The opposite pattern of correlations was found for the self-determined subscales (intrinsic motivation and identified and integrated regulation). However, there was a lack of discrimination between integrated regulation and general intrinsic motivation. Results of another study on burnout (Lonsdale, Hodge, & Rose, 2009) replicated these findings. Thus, overall, the support for the construct validity of the BRSQ-6 appears to be mixed.
It should be underscored that the BRSQ has some nice features. First, the scale is designed in such a way that the researcher can decide to use a multidimensional (BRSQ-8) or unitary (BRSQ-6) conceptualization of intrinsic motivation. Second, the scale is rather short, with 4 items per subscale. Finally, it assesses integrated regulation. At the same time, the BRSQ also displays some limitations. First, additional research is needed on the construct validity of the scale. Whereas there is support distinguishing the self-determined subscales (intrinsic motivation and identified and integrated regulation) from the non-self-determined subscales (external and introjected regulation), the finer discrimination within each type of category appears to be lacking. Such evidence is crucial, and future research is needed in order to show that this scale does indeed assess the SDT constructs rather than two broad sets of subscales tapping self-determined versus non-self-determined motivation. Second, this scale is designed specifically for older participants in competitive sport; it remains to be seen if the BRSQ can be used with younger participants, for whom the integrated regulation subscale may not have full meaning. Finally, research is needed to test the temporal stability of the scale over a time framed longer than 1 week.
Pictorial Motivation Scale
The PMS was designed to measure intrinsic and extrinsic motivation for sport and exercise in people with an intellectual disability. It assesses participants' reasons for engaging in sport and exercise. The scale's main characteristics are drawings depicting each of the 20 items. There are 5 items (pictures) for each of four subscales: intrinsic motivation, self-determined extrinsic motivation (a mixture of integrated and identified regulation), non-self-determined extrinsic motivation (a mixture of introjected and external regulation), and amotivation. These pictures are used to help participants with cognitive difficulties and to help represent the motivational concept depicted in each item.
The original scale was developed in French (Reid, Poulin, & Vallerand, 1994). Results of a study with 62 participants supported the internal consistency, temporal stability, and construct validity, as exemplified by the presence of a simplex pattern among the four subscales. However, the amotivation subscale had poor reliability (α = .52). The French version (Reid et al., 2009) was translated into English according to the back-translation and committee procedures outlined in Vallerand (1989). Then, 6 new items were generated for the less reliable amotivation subscale. Participants in the Special Olympics (n = 160) completed the English version. Results of the CFA confirmed the four-factor structure of the PMS. Furthermore, the internal consistency (Cronbach alphas) ranged from .60 to .71. Finally, the construct validity was assessed by testing for a simplex pattern of correlations among the four subscales. The intercorrelations among latent variables from the CFA provided support for the simplex pattern.
Results from a study conducted with the English version of the PMS involving 80 high school students with mild intellectual disability provided support for the internal consistency, temporal stability (over 3 wk), and construct validity of the PMS with respect to the simplex pattern of correlations among the PMS subscales as well as correlations between the PMS subscales and motivational antecedents (skill and perceived competence) and outcomes (perceived effort) as rated by the physical education teacher. Finally, the internal consistency of each subscale was tested without the pictorial dimension with a subset of 47 high school students with mild intellectual disability. Results indicated poor internal consistency (.91 for intrinsic motivation, .27 for self-determined extrinsic motivation, .20 for non-self-determined extrinsic motivation, and .60 for amotivation). This finding suggests that the scale is not reliable without the drawings.
The preliminary findings with the English version of the PMS are encouraging. Furthermore, this scale is the only one geared for individuals with intellectually disability. The use of drawings to depict the various items makes this scale unique in the field. Nevertheless, the PMS shows some limitations. First, the scale does not differentiate among all forms of intrinsic (knowledge, stimulation, and accomplishment) or extrinsic ( integrated, identified, introjected, and external regulation) motivation. Second, construct validity was tested with only a limited number of variables. Third, it is not known if the scale is usable with children who have severe forms of intellectual disabilities. Clearly additional research is needed on the reliability and validity of the PMS.
Situational Motivation Scale
The SIMS is one of the few scales to assess intrinsic and extrinsic motivation and amotivation at the situational level (Guay et al., 2000). The SIMS is a multidimensional tool that measures four types of motivation: intrinsic motivation, identified regulation, external regulation, and amotivation. The SIMS is made up of 16 items (4 items per subscale) and asks this question: “Why are you currently engaged in this activity?” The items represent potential reasons for task engagement. The scale is worded in such a way that it can be used in most situations (sport and nonsport).
Five studies were reported in the original article. In study 1, the original scale was developed by a committee of experts and completed by 195 French Canadian college students. Results of an EFA revealed a four-factor structure with the final 16 items loading on their respective factor. In study 2, a CFA confirmed the factor structure as well as its invariance across gender. Across the five studies, the internal consistency values of the subscales were acceptable, ranging from .62 to .95 (see Guay et al., 2000). Moreover, across all studies, support was obtained for the construct validity of the SIMS through results from correlations in line with the simplex pattern among the subscales as well as between the SIMS subscales and motivational determinants and consequences. Perhaps of greater interest for the present discussion were the results of study 4, which showed that some subscales (intrinsic motivation and identified regulation) were sensitive enough to detect changes in motivation that took place during two games of a basketball tournament.
Other researchers have also obtained support for the psychometric properties of the SIMS. First, all studies reported acceptable internal consistency values for each subscale (Blanchard, Mask, Vallerand, de la Sablonnière, & Provencher, 2007; Conroy, Coatsworth, & Kaye, 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, Duda, & Prusak, 2003). The coefficient alpha values of all but the amotivation subscale (α = .58) in the Conroy and colleagues study were above .60. Second, support for the factorial validity of the SIMS was obtained through CFAs with one qualification. Whereas the CFA results with the 16 items yielded acceptable fit indexes, removal of 1 item (Jaakkola, Liukkonen, Laakso, & Ommundsen, 2008) and even 2 items (Gillet, Berjot, & Paty, 2009; Standage, Treasure, et al., 2003) yielded better fit indexes. Moreover, Standage, Treasure, and colleagues (2003) conducted multisample CFAs and showed that the pattern of factor loadings was largely invariant across four different samples.
Construct validity of the SIMS was also assessed in several studies (Blanchard et al., 2007; Conroy et al., 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, et al., 2003). In addition to supporting the simplex pattern among the SIMS subscales and between the SIMS subscales and need satisfaction (study 2 of Blanchard and colleagues, 2007), results also supported the postulate from the HMIEM (Vallerand, 1997) for the top-down effect, in which contextual sport motivation was found to predict situational sport motivation (studies 1 and 2 of Blanchard et al., 2007; Jaakkola et al., 2008; Ntoumanis & Blaymires, 2003). Specifically, the more self-determined the motivation was found to be in a specific context (in this case, sport), the more self-determined the motivation was found to be in a given situation. Furthermore, Blanchard and colleagues (2007, studies 1 and 2) found support for another postulate from the HMIEM that suggests that over time, situational motivation in the realm of sport (basketball) has recursive effects on contextual motivation. The more that situational motivation is self-determined, the more that contextual motivation becomes self-determined over time. Finally, Jaakkola and coworkers (2008) demonstrated that, as predicted by the HMIEM, situational self-determined motivation was better than contextual motivation in predicting the situational intensity (as assessed by HR) displayed by students in a physical education class. Overall, these findings provide strong support for the reliability and factorial and construct validity of the SIMS.
The SIMS has several positive features, one of them being that it is the only scale to assess intrinsic and extrinsic motivation and amotivation at the situational level. Furthermore, it does so using only 16 items. Nevertheless, it also has some weaknesses. First, the SIMS does not assess the different types of intrinsic motivation and integrated and introjected regulation, because it was designed to be short. Second, while the factor structure has been supported, it is not clear if some items should be replaced (Gillet, Berjot, et al., 2009; Jaakkola et al., 2008; Standage, Treasure, et al., 2003). Third, research so far has not assessed the validity of the scale with high-performance athletes. Thus, additional research is needed to further test the psychometric properties of the SIMS in sport.
Learn more about Measurement in Sport and Exercise Psychology.
Ethics codes imperative in conducting research
Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body.
Ethics Codes: Their Nature, Purposes, and Application
Ethics codes typically comprise principles and standards. Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body. These general and aspirational statements set the underlying tone for the more specific codes and guide the work-related ethical decision making of professionals. In contrast, ethical standards specify both proscribed and prescribed member behaviors. While not always black and white, these standards serve as a more clear cut and enforceable guide for professional behavior.
Members should apply both the aspirational principles and enforceable standards to shape their thinking and behavior in work settings. Ideally, members self-monitor their own behavior. In an effort to remain ethical, professionals are encouraged to consult with colleagues about ethically challenging situations and to provide constructive feedback about perceived possibly unethical behavior they witness in others.
Assessment and Measurement
A central question to be addressed in this chapter is what are assessment and measurement. Sundberg (1977) defines assessment as the processes used “for developing impressions and images, making decisions and checking hypotheses about another person's pattern of characteristics that determines his or her behavior in interaction with the environment” (p. 21). The assessment process involves collecting and assembling a broad range of objective and subjective information about persons or groups to develop impressions about them; identify their needs; predict how they might think, feel, and behave in future situations; and select and apply interventions based on the content and dependability of that information. Professionals may use multiple assessment methods that include observations of behavior, symptom checklists, surveys and questionnaires, structured and unstructured interview materials, and standardized tests (Bennett et al., 2006). Gardner and Moore (2006) emphasize using a triad of psychological assessment strategies in the practice of clinical sport psychology: (1) initial interviews, (2) behavioral observation, and (3) psychological testing. The nature and assumptions underlying assessment approaches are usually grounded in the theoretical orientation of the professional (Andersen, 2002).
In contrast, measurement can mean many things to many people. It is one of the most common words in the English language and can be used as both a noun and a verb (Lorge, 1967). For the purposes of this chapter, measurement is viewed as an extension of assessment processes. It can be thought of more narrowly as the process of collecting information about psychological characteristics of interest (e.g., attitudes, behaviors, state experiences) using one or more methods or tools (such as those mentioned earlier) to monitor change, the effect of intervention, or treatments postassessment. For example, an educational sport psychology consultant might administer a measure of team cohesion over the course of a competitive season to see how team members perceive their relationships. Another consultant might conduct a preseason baseline screening assessment of cognitive functioning in hockey players and then reevaluate players who incur a mild traumatic brain injury (i.e., concussion) later in the season.
In this chapter, the terms measurement and assessment are used interchangeably. Furthermore, these terms are used to describe the decisions and opinions made by professionals regarding clients with whom they work. As such, measurement and assessment techniques include all methods of gathering information about clients, such as (a) psychological, educational, and neurological tests; (b) data gathered during clinical interviewing; (c) information gathered from significant others (e.g., family members, teachers, friends); (d) direct and indirect observation; and (e) interactions with people via teletherapy (e.g., Internet, phone; Fisher, 2009).
Competence and Education
In order to excel in our professional duties and do well for those we serve, teach, study, and otherwise interact with, we must know what to do and how to do it in a capable manner. The ethics codes mentioned earlier identify the necessity of being knowledgeable and capable in our work. For example, the APA ethical standards provide guidance for organization members in this area, including information about (a) competence limitations, (b) keeping up competence, (c) making sound professional and scientific judgments, (d) delegating work responsibilities to others, (e) engaging in activities in emergencies, and (f) impairment (APA, 2002). Competence in professional behaviors is a personal matter that is frequently challenged. It is the responsibility of professionals to know their limitations and how their knowledge and skills change and require constant upgrading. The APA ethics code also emphasizes the importance of making sound work-related decisions based on scientific knowledge and appropriate discipline-specific practice. This portion of the APA code cautions professionals to be careful when delegating work to others, describes how a professional is responsible for others' work, and explains the necessity of avoiding multiple relationships with those to whom work is delegated. The APA standards note that we can occasionally be thrown into situations in which our competence is stretched; in such cases we need to be very careful, seek supervision if available, and end such work as soon as possible.
Measurement Referral Questions and Appropriateness of Instruments
When selecting assessment instruments, the professional must consider the referral questions that prompted this process (Fisher, 2009; Smith, 1976). The instruments selected should reflect these referral questions and utilize assessment strategies that have appropriate validity and reliability. For example, if a professional is interested in measuring state anxiety for research purposes, an appropriate assessment may be the Competitive State Anxiety Index-2 (CSAI-2; Martens, Burton, Vealey, Bump, & Smith, 1990) as opposed to the State-Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, & Luschene, 1970), which measures both trait and state anxiety. When selecting the assessment, the professional should be aware of limitations or biases regarding cultural sensitivity (see the later section on cultural issues); gender considerations (Etzel, Yura, & Perna, 1998); and age, language, or disability factors that may influence the psychometric qualities of the assessment differently from the way they influenced the normative groups used for the development and validation of the instrument (APA, 2002; Fisher, 2009). It is also important to consider the method of delivery. For example, assessments based on paper and pencil may not have been validated for online use (see the later section on technology), and instruments with elevated reading levels may not be appropriate for certain age or developmental groups. Therefore, the professional should always verify the assessment's validity and reliability when a modified assessment method or group is used (Fisher, 2009). Furthermore, the professional should also attempt to conduct in-person assessments when possible, as a great deal of information can be learned about clients from the way in which they present themselves during the assessment process. This information can affect the richness of the assessment data.
It is also important for professionals to be aware of and competent to assess and use appropriate psychometric strategies for establishing validity and reliability of the instruments they use (AERA, APA, & NCME, 1999). All instruments have unique psychometric properties that affect how they should be administered and interpreted. When validity and reliability issues are not taken into consideration, it is possible to choose and utilize instruments to assess factors that they were not designed to assess. Furthermore, practitioners should be well aware of other psychometric properties such as content and criterion validity and standard error of measurement that may affect how results are interpreted and used. The ethical practitioner needs to be aware of psychometric issues in order to choose appropriate instruments with regard to the referral questions, client characteristics, assessment strategies, and environmental factors.
Consent and Assent
As discussed earlier, the ethical principles for sport and exercise psychology emphasize doing no harm to the client and respecting the individual's rights and dignity (AASP, 1996; APA, 2002). The test taker's right to privacy and confidentiality applies here as well, and the professional should take all necessary precautions to maintain the confidentiality and privacy of the client. To protect the test taker, informed consent must be obtained at the start of the relationship (e.g., research, consultation, therapy). Beyond the informed consent process and before formal assessment, the client or participant should be informed of all pertinent information regarding the assessment process. This information includes (a) the nature and purpose of assessment; (b) any applicable fees; (c) potential involvement of third parties such as a coach, athletic trainer, or manager; (d) limits of privacy and confidentiality (as discussed in the next section); and (e) the timeline for the process and potential feedback (Fisher, 2009). This information should be presented in a clear and understandable manner. Furthermore, this information should be agreed to by the test taker, who thereby gives informed consent. Test takers should engage in assessment of their own free will and must be given the option to withdraw participation without consequences (APA standard 3.10). All necessary information about assessment procedures and findings should be provided in a language or level appropriate for the participant. Furthermore, it is unethical to necessitate or coerce individuals to take part in measurement and assessment for research or practice purposes.
Privacy and Confidentiality and Release of Information
Typically, the ethical standards of organizations with ties to sport psychology (APA ethical standard 4.01 and the AASP) suggest that professionals should not reveal information about clients, test takers, or others without their signed approval to release information or legal requirement. These legal situations may include (a) a test taker who indicates possible self-harm or harm to others (i.e., suicide or homicide), (b) a test taker whose results are subpoenaed by the court, or (c) a test taker who is a minor, in which case the parent or guardian may have access to the data (Etzel et al., 1998). If the test taker or, in the case of a minor, the parent or guardian provides explicit written permission, the specific information identified by the client may be released to the identified parties. Unless these circumstances are met, information from the test taker may not be disclosed to anyone (e.g., coaches, management, parents, administration, athletic trainers, and so on).
In situations where the assessment is requested by a third party (e.g., coaches, management, the court), this third party may also request results from the assessment. It is important for the professional to establish a priori who is the “real client” (Ogilvie, 1979) and to have the ability to control access to the results. Etzel and colleagues (1998) suggest that information about the assessment should be shared only with one predetermined person, unless a release of information form has been completed. Therefore, when engaging in assessments, the professional should set clear boundaries and avoid dual relationships, thereby identifying who is being served (APA standard 4.02a). Another complication of these situations is the role of trust. If athletes or test takers suspect the test results will be used without their permission in decisions regarding performance or other aspects of participation, they may be less likely to respond honestly, thus affecting the validity of the results (see the section on demand characteristics).
Raw Data and Data Storage
Raw data such as the test taker's responses to items, including the professional's notes and final reports, should be stored in locked file cabinets inside the professional's office or in password-protected computer files (Fisher, 2009). Other methods to ensure confidentiality may include limiting access to records to only those people who have a need to know this information and have been trained to handle and understand it, deidentifying records using code numbers, and appropriately disposing of identifiable records (Fisher, 2009). A good policy for data maintenance is that data should be kept for a minimum of 7 y after the last service delivery date or 3 y after a minor reaches the age of 18 (whichever is later), as is recommended by the APA record-keeping guidelines (APA, 2002; Fisher, 2009). Raw data and the instruments used for assessment purposes should not be released to third parties unless a release of information form has been completed and the third party is trained competently to use such information.
Results Discussion
Test feedback and results discussion should be provided in the form of a carefully constructed report using clear language that fully explains the assessment results. Labels and jargon should be eliminated to increase readability. Information necessary to the purpose of the test should be included, and the inclusion of unnecessary and unrelated information should be avoided (APA, 2002; Fisher, 2009). Additionally, as recommended by the APA (APA, 2002), interpretations should take into consideration the participant's gender, race, ethnicity, age, national origin, sexual orientation, religion, disability, language, or socioeconomic status. Participants should receive assessment information and feedback related to their performance on the assessment and should be informed of ways in which they could personally use the test results or how this information may be used by a third party (only if written permission was given to release such information). The information released to the participant should be presented in a verbal or written report and presented in such a way that it may not cause harm to the test taker (Etzel et al., 1998). However, information such as numerical scores or specific responses should not be released to individuals not qualified to interpret such information (Fisher, 2009; Tranel, 1995).
Demand Characteristics
In the sport context, several groups of individuals may be interested in the assessment results of athletes. Interested parties may include coaches, managers, teams, students, or administrators. However, the potential of a third party reviewing the test results may increase social desirability and result in invalid and unreliable information. Therefore, undue pressure to complete an instrument or battery should be considered as a contextual factor.
Another potentially undesirable effect of a third party viewing the test taker's results may be assessment anxiety. The APA standards state that if a test taker is observed to be anxious or reports feeling anxious, this feeling should be taken into account and become a limitation in the interpretation of test data (APA, 2002). Assessment anxiety may be exaggerated in situations where a third party may have access to results. These situations may also lead to faking good or faking bad on the part of respondents who are concerned about how the results may be used. This must also be considered when evaluating the results.
Supervision of Subordinates
In some cases, professionals may hire and train subordinates to help with assessment and measurement tasks. These subordinates may administer, score, and even interpret the results of measurement and assessment. Standard 2.05 of the APA ethics code (APA, 2002) states that professionals utilizing employees, supervisees, or research and teaching assistants for such purposes should take reasonable precautions to put subordinates in situations where (a) they do not face possibly harmful multiple relationships with the client that could affect their objectivity, (b) they are competently trained to perform the delegated task on their own or with supervision, or (c) they are supervised for competent service delivery. Therefore, when using subordinates to help with tasks such as administration, scoring, or interpretation, the professional assumes primary responsibility and liability to ensure that the services are being provided competently. The professional needs to ensure that subordinates are well trained with all potential instruments. To do so, the professional must provide appropriate training, experience, and supervision as well as continue to check the subordinates' work to ensure its quality. As with licensed professionals, not all subordinates have the same competencies with regard to all instruments.
Learn more about Measurement in Sport and Exercise Psychology.
Tools to measure the physical self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002).
Tools to Measure the Physical Self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002). However, following the research of Shavelson and colleagues (1976), a number of multidimensional self-concept instruments containing one or more PSC scales were developed. Thus, in a 1974 review, Wylie concluded that at the time most self-concept instruments focused on global self-concept or self-esteem rather than specific domains such as PSC. Although several of the instruments reviewed by Shavelson and colleagues (1976) contained items relating to physical skills and elements of physical appearance, none provided a clearly interpretable measure of PSC. From a practical perspective, these older instruments appear to be of little value for sport and exercise psychologists. The major exception, perhaps, is the Physical Estimation and Attraction Scale (PEAS; Sonstroem, 1978, 1997), along with the theoretical model on which it is based. This instrument was designed to measure two global components: estimation (competency) and attraction. While the PEAS may not be the instrument of choice today, it has a historical significance in that its research incorporated many of the features of the construct validity approach advocated in this chapter, it was heuristic, and it provided an important basis for subsequent research.
In a subsequent 1989 review, Wylie identified several multidimensional self-concept instruments measuring one or more components of PSC that can be differentiated from other specific domains of self-concept and general self-concept. Included in the list were the three SDQ instruments already discussed. Wylie also evaluated Harter's (1985) Self-Perception Profile for Children, which contains two PSC scales (athletic competence and physical appearance). Other multidimensional instruments containing physical scales that were not reviewed by Wylie include the Self-Rating Scale (Fleming & Courtney, 1984), which measures physical ability and physical appearance; the Song and Hattie Test (Hattie, 1992), which measures physical appearance; and the Multidimensional Self-Concept Scale (Bracken, 1996), which has a physical scale that includes physical competence, physical appearance, physical fitness, and health. The Tennessee Self-Concept Scale (Fitts, 1965) is a multidimensional self-concept instrument that also purports to measure PSC. In their review and empirical evaluation of this instrument, Marsh and Richards (1988) found distinguishable physical components reflecting health, neat appearance, physical attractiveness, and physical fitness that were incorporated into a single PSC score. This detailed breakdown of the Tennessee physical scale was supported by relationships with the SDQ physical ability and physical appearance scales in an MTMM study comparing responses to the two instruments. Because each of the clusters based on responses to the Tennessee instrument is represented by only a few items, it is not appropriate to use the instrument to measure these distinct components of PSC. Marsh and Richards argued that PSC measures that combine and confound a wide range of differentiable physical components—such as those based on the Tennessee Self-Concept Scale—should be interpreted cautiously (see similar comments by Fox & Corbin, 1989).
In summary, although multidimensional self-concept instruments based on Shavelson and colleagues' (1976) model provided good support for the construct validity of the physical ability and appearance scales (e.g., Marsh, 2002; Marsh & Peart, 1988), they left unanswered the question of whether PSC is more differentiated than can be explained in terms of one (physical ability) or two (ability, appearance) physical scales. Subsequent PSC instruments were developed specifically to address the issue of the multidimensionality of PSC.
Physical Self-Perception Profile
The Physical Self-Perception Profile (PSPP; Fox, 1990; Fox & Corbin, 1989) is a 30-item inventory that consists of four specific scales and one general physical self-worth factor. The PSPP was developed to document the physical self-perceptions of college students. It was designed to reflect the advances made by Harter (1985) and Shavelson and colleagues (1976) in identifying the physical self as an important construct to measure in its own right and to reflect the hierarchical, multidimensional nature of the physical self. A qualitative approach was used to reveal dimensions of physical self-esteem salient to the population sampled (Fox & Corbin, 1989). The PSPP consists of five 6-item scales of sport (perceived sport competence), body (perceived bodily attractiveness), strength (perceived physical strength and muscular development), condition (perceived level of physical conditioning and exercise), and physical self-worth. Fox (1990) recommended that the 10-item Rosenberg Self-Esteem Scale (Rosenberg, 1965) be used alongside the PSPP to provide a global measure. Fox (1990) reported factor analyses indicating that each item loads most highly on the factor that it is designed to measure and that individual scale reliabilities are in the .80s.
The PSPP research demonstrates (a) good reliability (coefficient alpha of .80-.95; Fox, 1990; Page, Ashford, Fox, & Biddle, 1993; Sonstroem, Speliotis, & Fava, 1992); (b) good test-retest stability over the short term (rs of .74-.89; Fox, 1990); (c) a well-defined, replicable factor structure as shown by CFA (Fox & Corbin, 1989; Sonstroem, Harlow, & Josephs, 1994); (d) convergent and discriminant validity in studies showing PSPP relationships with external criteria such as exercise behaviors, mental adjustment variables, and health complaints (Fox & Corbin, 1989; Sonstroem & Potts, 1996); and (e) applicability for an older adult population (Sonstroem et al., 1994). However, correlations among the PSPP scales are consistently so high (.65-.89 when disattenuated for measurement error; Marsh, Richards, Johnson, Roche, & Tremayne, 1994) that they detract from the instrument's ability to differentiate among the different PSC factors it purports to measure.
Subsequently, a version of the PSPP for children and adolescents was developed and validated—the Children and Youth Physical Self-Perception Profile (CY-PSPP; Eklund, Whitehead, & Welk, 1997; Whitehead, 1995). Like the PSPP, the CY-PSPP is a 30-item inventory consisting of the same five 6-item scales. The CY-PSPP is a substantially revised version of the PSPP that is most appropriately thought of as a different instrument. The CY-PSPP body, strength, and conditioning subscales are based on minor adaptations of the PSPP to make them more suitable for children. However, the global self-worth (self-esteem) and sport scales are completely different. The PSPP did not have a self-esteem scale of its own but included 6 items adapted from the Rosenberg Self-Esteem Scale. On the CY-PSPP, global self-esteem and sport scales from the PSPP were dropped and replaced with corresponding scales from Harter's (1985) Self-Perception Profile for Children. Correlations among factors remained high (e.g., physical self-worth with attractive body adequacy = .8). Eklund and colleagues (1997) suggested that these results are consistent with the developmental patterns among children, as differentiation in self-concept is less defined at younger ages (Harter, 1985). CFAs have supported the instrument's factor structure, with both the CFI (comparative fit index) and NNFI (non-normed fit index) indexes exceeding the .90 criterion for good model fit (Eklund et al., 1997). Moderate correlations (r = .39-.45) with external criteria such as physical activity and physical fitness have demonstrated its convergent and discriminant validity (Welk & Eklund, 2005). The CY-PSPP has been validated with adolescents (Jones, Polman, & Peters, 2009; Welk, Corbin, & Lewis, 1995; Whitehead, 1995) and younger children (Welk, Corbin, Dowell, & Harris, 1997) and has been validated and translated into other languages (åsci, Eklund, Whitehead, Kirazci, & Koca, 2005; Raustorp, Ståhle, Gudasic, Kinnunen, & Mattsson, 2005; Raustorp, Mattsson, Svensson, & Ståhle, 2006).
Both the PSPP and CY-PSPP use a nonstandard response format based on Harter (1985), in which each item consists of a matched pair of statements, one negative and one positive (e.g., “Some people feel that they are not very good when it comes to sports” but “Others feel that they are really good at just about every sport”). Each item consists of two contrasting descriptions, and respondents are asked which description is most like them and whether the description they select is “Sort of true of me” or “Really true of me.” Responses are scored on a scale of 1 to 4, with 1 representing a “Really true of me” response to the negative statement and 4 representing a “Really true of me” response to the positive statement. Whereas this response format is designed to reduce the influence of social desirability, Wylie's (1989) review of Harter's original instruments provided little or no support for this suggestion, and Marsh and colleagues (1994) suggested that there were substantial method effects associated with the nonstandard response scale. This format has also been shown to be confusing, particularly for children (Eiser, Eiser, & Haversmans, 1995), and even for adults (Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994), unless special care is taken to explain the response scale. Using the suggestion of Marsh and colleagues (1994) that confusion over the structured alternative response scale could be overcome by more detailed instructions at the outset, researchers implementing the CY-PSPP used large illustrations for a sample item (Whitehead, 1995). Wichstrom (1995) found that responses for this format were psychometrically stronger when based on typical Likert responses rather than the structured alternative format, but Welk and colleagues (1997) suggested that the nonstandard response scale on the CY-PSPP worked better than Likert responses worked.
In summary, the PSPP and the CY-PSPP are established instruments that have been translated into several languages and have been used with a range of populations. However, the format and the high correlations among factors in both instruments may limit their usefulness in some settings. The CY-PSPP is a substantially revised version of the PSPP specifically developed for children. Although the CY-PSPP should be used instead of the PSPP for child and adolescent samples, it might even be stronger than the original PSPP is for adult samples.
Subsequent to the completion of this chapter, Lindwall and colleagues (2011) published a revised version of the PSPP (PSPP-R). They reviewed critiques of the PSPP response scale such as those noted here (e.g., Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994) and acknowledged that “the idiosyncratic alternative response format has been difficult to understand for some participants” (pp. 310-311). In recognition of these problems, the idiosyncratic response scale that has been such a salient feature of the PSPP was dropped altogether and replaced with a 4-point Likert response using only positively worded items. Lindwall and colleagues (2011) demonstrated the appropriateness of the revised PSPP-R based on a large sample (N = 1,831) of participants from four countries (Sweden, Great Britain, Portugal, and Turkey). However, they did not indicate whether the PSPP-R supersedes the PSPP or is merely an alternative to it. There also wasn't any discussion of the implications for other instruments using similar idiosyncratic response scales (e.g., PSPP- related instruments such as CY-PSPP or Harter's instruments more generally).
Physical Self-Inventory
The Physical Self-Inventory (PSI) is a French adaptation of the PSPP that was originally developed for use with Francophone adults (Ninot, Delignières, & Fortes, 2000). In two preliminary studies, Ninot and colleagues used the nonstandard response scale from the PSPP. However, consistent with previous research (Marsh et al., 1994), they reported that this response scale was problematic. In a third study, the authors used a 6-point Likert response scale; factor analysis results were reasonable, but reliability coefficients were not completely satisfactory. Next the authors replaced the PSPP global physical items with items from the SDQ physical scale and the PSPP global self-esteem items with items from Coopersmith (1967). The final PSI consists of 25 items measuring six PSC factors (four specific and two global, as with the PSPP) and has satisfactory psychometric properties that have been confirmed in subsequent French studies of adults (Masse, Jung, & Pfister, 2001; Stephan, Bilard, Ninot, & Delignières, 2003; Stephan & Maïano, 2007).
Maïano and coworkers (2008) subsequently constructed a short form of the PSI for use with adolescents. They found that not all items from the adult PSI worked with adolescents, but they were able to construct 18-item (PSI-SF, 3 items per scale) and 12-item (PSI-VSF, 2 items per scale) versions that had good psychometric properties. In particular, the measurement and hierarchical structures were consistent with proposals by Fox and Corbin (1989) and were fully invariant across gender. Maïano and coworkers also noted that PSI-SF responses showed very high test-retest stability. Comparison of the PSI-SF and PSI-VSF demonstrated that the measurement model, mean structure, structural parameters, and criterion-related validity were equivalent across samples and versions. Nevertheless, the authors noted a serious limitation that all versions of the PSI share with the PSPP: Very high correlations among the six PSC factors (correlations among latent factors) that, according to the authors, bring “into question the real independence of some of the models' sub-dimensions, and by extension their discriminant validity, a finding that has already been observed by Marsh (2002; Marsh et al., 2006) on analyses of the PSPP” (Maïano et al. 2008, p. 844). However, Maïano and colleagues also noted that because they used a traditional Likert response scale, the high correlations apparently were not due to the structured alternative format used in the PSPP. In summary, particularly the short and very short forms of the PSI have made a potentially important contribution to applied research. However, further research is needed to evaluate more fully the robustness of support for construct validity and application in non-French-speaking settings.
Richards Physical Self-Concept Scale
The Richards Physical Self-Concept Scale (RPSCS; Marsh et al., 1994; Richards, 1988) is a 35-item instrument designed to measure six specific components of PSC (body build, appearance, health, physical competence, strength, action) and one general physical satisfaction factor. Each item is a simple declarative statement, and subjects respond on an 8-point true-false scale. Extensive research in Australia (e.g., Marsh et al., 1994; Richards, 1988) has indicated that RPSCS responses have good psychometric properties. The factor structure is very robust, generalizing well over ages from 8 to 80 y and over gender.
RPSCS research has demonstrated (a) good reliability (coefficient alpha of .79-.93; Marsh et al., 1994; Richards & Marsh, 2005); (b) good test-retest stability over the short term (coefficient alpha of .77-.90 over 3 wk; Richards, 1988); (c) a well-defined, replicable factor structure as shown by CFA (Marsh et al., 1994; Richards, 2004); (d) a factor structure that is invariant across gender, as shown by multiple-group CFA (Richards, 2004), and across a wide age range; (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (Marsh et al., 1994; Richards & Marsh, 2005); and (f) applicability for participants aged 8 to 60 y and for both genders (Marsh et al., 1994; Richards, 1988, 2004; Richards & Marsh, 2005). In summary, the RPSCS is regarded as a valid, reliable, and structurally sound instrument that has been tested across both genders and a wide population of ages. The applicability across such a wide range of ages is a particular strength.
Physical Self-Description Questionnaire
Extending Fleishman's (1964) classic research on the structure of physical fitness, the Physical Self-Description Questionnaire (PSDQ) scales reflect some of the original SDQ scales and parallel physical fitness components identified in a CFA of physical fitness measures (Marsh, 1993). The PSDQ consists of nine specific components of PSC (strength, body fat, activity, endurance and fitness, sport competence, coordination, health, appearance, and flexibility), a global physical scale, and a global self-esteem scale. Each of the 70 PSDQ items is a simple declarative statement, and individuals respond on a 6-point true-false scale. The PSDQ is designed for adolescents but is also appropriate for older participants.
PSDQ research has demonstrated (a) good reliability (median coefficient alpha of .92) across the 11 scales (Marsh, 1996b; Marsh et al., 1994); (b) good test-retest stability over the short term (median r = .83 over 3 mo) and longer term (median r = .69 over 14 mo; Marsh, 1996b); (c) a well-defined, replicable factor structure as shown by CFA (Marsh, 1996b; Marsh et al., 1994); (d) a factor structure that is invariant over gender as shown by multiple-group CFA (Marsh et al., 1994); (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (see Marsh et al., 1994); (f) convergent and discriminant validity as shown by PSDQ relationships with external criteria (e.g., measures of body composition, physical activity, endurance, strength, and flexibility; see Marsh, 1996a, 1997); and (g) applicability for participants aged 12 to 18 y (or older) and for elite athletes and nonathletes (Marsh, Hey, Roche, & Perry, 1997; Marsh, Perry, Horsely, & Roche, 1995). In summary, the PSDQ is a psychometrically strong instrument.
Marsh, Martin, and Jackson (2010) recently presented a new short form of the PSDQ (PSDQ-S). This short form balances brevity and psychometric quality in relation to established guidelines for evaluating short forms (e.g., Marsh, Ellis, Parada, Richards, & Heubeck, 2005; Smith, McCarthy, & Anderson, 2000) with the construct validity approach that is the basis of PSDQ research. Based on the PSDQ normative archive, 40 of 70 items were selected and evaluated in a new cross-validation sample (N = 708 Australian adolescents). To test the generalizability of results, the authors considered four additional samples: Australian adolescent elite athletes (n = 349), Spanish adolescents (n = 986), Israeli university students (N = 395), and Australian senior citizens (n = 760). Reliabilities for the 40 PSDQ-S items were consistently high in the cross-validation sample (.81-.94; median = .89) and senior sample (.81-.94; median = .91) and reliabilities in the cross-validation sample were higher than they were in comparable groups completing the 70-item PSDQ. The PSDQ-S factor structure in the cross-validation sample was well defined and highly similar to that based on the archive sample as well as to those based on the other four groups. Study 1, using a missing-by-design variation of multigroup invariance tests, showed that invariant factor structures were invariant based on 40 PSDQ-S items and 70 PSDQ items. Study 2 demonstrated factorial invariance of responses over 1 y (test-retest correlations of .57-.90; median = .77) and good support for convergent and discriminant validity in relation to time. Study 3 showed good and nearly identical support for convergent and discriminant validity of PSDQ and PSDQ-S responses in relation to responses on the PSPP and PSC instruments. The four studies reported by Marsh and coworkers demonstrated new, evolving strategies for the construction and evaluation of short forms that support the PSDQ-S. The authors concluded that the strong support for the psychometric properties and construct validity of the widely used PSDQ instrument generalizes very well to the PSDQ-S.
Elite Athlete Self-Description Questionnaire
The PSC instruments discussed thus far may be suitable for elite athletes (e.g., Marsh et al., 1995). There may, however, be other components to PSC that are particularly relevant for elite athletes, and thus the Elite Athlete Self-Description Questionnaire (EASDQ; Marsh, Hey, Roche, et al., 1997; Marsh, Hey, Johnson, & Perry, 1997) was developed to address these other components. For the EASDQ, it was hypothesized that overall performance by elite athletes is a function of skill level, body suitability, aerobic and anaerobic fitness, and mental competence. Thus Marsh and colleagues developed the EASDQ to measure these six factors. For each scale, they developed a pool of items that sport psychologists at the Australian Institute of Sport evaluated for their suitability for elite athletes. Pilot studies were conducted to select the best items to represent each factor. A compromise between brevity and psychometric soundness was achieved, with acceptable levels of reliability (e.g., all scales having reliability estimates of at least .8) based on short scales (4-6 items per scale).
EASDQ research demonstrates (a) adequate reliability (median coefficient alpha of .85) across the six scales (Marsh, Hey, Johnson, et. al., 1997); (b) a well-defined, replicable factor structure as shown by CFA (Marsh, Hey, Johnson, et. al., 1997; Marsh, Hey, Roche, et al., 1997); (c) applicability for elite athletes aged 12 y or older (Marsh, Hey, Roche, et al., 1997); and (d) predictive validity as shown by its ability to predict swimming performances in world championships after controlling for previous personal best performances (Marsh & Perry, 2005). In summary, the EASDQ is a reliable and valid instrument for elite athletes of all ages. More research is needed, however, to relate EASDQ responses to external validity criteria such as those used in PSDQ research and to criteria that are more specific to elite athletes (e.g., actual performance in competition).
Learn more about Measurement in Sport and Exercise Psychology.
Assess intrinsic and extrinsic motivation in sport and exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted.
Evaluation of Measures of Intrinsic and Extrinsic Motivation in Sport and Exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted. Certain criteria have guided the selection of the measures presented in this section. First, we have selected measures that are fully developed instruments that have gone through extensive validation steps. Second, we have chosen scales that have been used in research, published or unpublished, during the past 10 years. Scales that have not been used during that time frame are considered to be obsolete and are not reviewed. Finally, in light of recent theoretical development and because of space limitation, we have focused on motivation scales that assess intrinsic and extrinsic motivation independently of determinants and outcomes, while focusing on the perceived reasons of behavior. Our earlier discussion on the definitions of intrinsic and extrinsic motivation makes it possible to classify the different measures. The measures can vary in terms of the level of generality (situational versus contextual level) and the area (sport versus exercise). This classification appears in table 25.1. Table 25.2 (see p. 291) provides additional information on the concept of, dimensions of, publication source of, and where to obtain the scale. As can be seen, seven measures are reviewed. For each one, we present (a) a description of the instrument, (b) the conceptual and theoretical rationale underlying its scale development, (c) the available evidence concerning its psychometric properties (e.g., factorial validity, reliability, and construct validity), and (d) a broad assessment of the strengths and weaknesses associated with each measure.
Measures Used in Sport
In this section, we review the SMS (Brière et al., 1995; Pelletier et al., 1995), the Sport Motivation Scale-6 (SMS-6; Mallett, Kawabata, Newcombe, Otero-Rorero, & Jackson, 2007), the Behavioral Regulation in Sport Questionnaire (BRSQ; Lonsdale, Hodge, & Rose, 2008), the Pictorial Motivation Scale (PMS; Reid, Vallerand, Poulin, Crocker, & Farrell, 2009), and the SIMS (Guay et al., 2000).
Sport Motivation Scale
The SMS was developed (Brière et al., 1995; Pelletier et al., 1995) in order to assess contextual intrinsic and extrinsic motivation from a multidimensional perspective, as well as amotivation. The SMS has been the most often used motivation measure in sport, being employed with a variety of athletes (recreational to elite), age groups (adolescent to senior), and cultures (e.g., Canada, United States, United Kingdom, Bulgaria, Australia, Spain, and New Zealand). In fact, the SMS has been translated and validated in several languages (see Pelletier & Sarrazin, 2007). The SMS is based on SDT (Deci & Ryan, 1985) and is made up of seven subscales assessing amotivation; external, introjected, and identified regulation; and intrinsic motivation to know, to experience stimulation, and to accomplish. In line with SDT, motivation is assessed as the perceived reasons for participation, or the why of behavior. At the beginning of the scale, participants are asked, “In general, why do you practice your sport?” The items represent the perceived reasons for engaging in the activity, thus reflecting the different types of motivation.
The original scale was developed in French as L'Échelle de Motivation dans les Sports (Brière, Vallerand, Blais, & Pelletier, 1995) and was validated in three steps. The first step involved generating a pool of items explaining various reasons for sport participation through interviews with French Canadian athletes (aged 17-20 y). These reasons were then used to formulate items for the seven subscales of the French SMS. In the second step, a committee of experts evaluated the content validity of the items and eliminated those that were thought to be inadequate. Another sample of athletes from various sports completed the scale. Results from an exploratory factor analysis (EFA) provided support for a seven-factor structure with 4 items per subscale; this second step thus resulted in a 28-item scale. In the third and final step, two additional studies were conducted to further validate the scale. These studies included approximately 500 individuals, most of whom were involved in recreational sports. Results from confirmatory factor analyses (CFA) and correlational analyses confirmed the seven-factor structure, the subscale internal consistency (ranging from .65-.96), and moderate to high indexes of temporal stability (ranging from .54-.82) over 1 month. Furthermore, inspection of correlations among the seven SMS subscales provided support for the simplex pattern proposed by SDT. Results of correlations also showed that (in line with SDT) the most self-determined forms of motivation (intrinsic motivation and identified regulation) were related more strongly to determinants such as autonomy support from coaches and feelings of competence than to other forms of motivation (external and introjected regulation) and amotivation. Similar results were obtained with motivational outcomes such as positive affect, concentration, and intentions to pursue engagement in sport. In sum, adequate construct validity was obtained for the French form of the SMS.
The translation of the French SMS into English involved back-translation and committee procedures as suggested by Vallerand (1989). Pelletier and colleagues (1995) conducted two studies involving college athletes from various sports in order to assess the psychometric properties of the English form of the SMS. Results from CFA with a sample of 593 Canadian university athletes revealed adequate fit indices or the hypothesized seven-factor model (see the Adjusted Goodness of Fit Index and the Normed Fit Index both > .90 and the Root Mean Square Residual < .08), and correlations with determinants and outcomes supported the simplex model. Moreover, internal consistency above .70 was obtained on all of the subscales except the identified subscale (.63). Test-retest correlations were acceptable and very similar to those obtained with the French SMS, as was the scale construct validity.
Since 1995, the SMS has been used extensively in sport psychology research. The seven-factor structure has been supported repeatedly (e.g., Doganis, 2000; Gillet, Vallerand, & Rosnet, 2009; Li & Harmer, 1996; Shaw, Ostrow, & Beckstead, 2005; Standage, Duda, & Ntoumanis, 2003). In addition, Hu and Bentler (1999) obtained support for a five-factor model by combining the three types of intrinsic motivation into one factor. Similar results were obtained by Gillet and colleagues (2009) with the French SMS. However, some studies have not supported the seven-factor model (Hodge, Allen, & Smellie, 2008; Mallett, Kawabata, & Newcombe, 2007; Mallett, Kawabata, Newcombe, & Otero-Rorero, 2007; Martens & Webber, 2002). Why is there such a discrepancy between these two sets of studies? One possibility lies in the populations from which the different samples were taken. Specifically, the SMS was validated using adolescent and young adult athletes and not older athletes. Because of this specific focus, some of the items may reflect a participation rather than an elite orientation, which is more in line with the younger population. For instance, an identified regulation item reads, “Because sport is one of the best ways to maintain good relationships with my friends.” Such an item seems more relevant for a younger population. An older, high-level athlete may disagree with this item but still display a high level of identified regulation for a sport (but not for relationship reasons). Future research using the SMS with different age groups and proficiency levels is needed to clarify this issue.
Whereas the internal consistency of the SMS has systematically shown adequate values, some values below .70 have been found. This is especially the case for the identified regulation subscale (Brière et al., 1995; Kingston, Horrocks, & Hanton, 2006; Li & Harmer, 1996; Pelletier et al., 1995), although some lower values (below .70) have been obtained with the introjected (McNeill & Wang, 2005; Perreault & Vallerand, 2007; Riemer, Fink, & Fitzgerald, 2002; Standage, Duda, & Ntoumanis, 2003) and external regulation (Standage, Duda, & Ntoumanis, 2003) and amotivation subscales (Standage, Duda, & Ntoumanis, 2003). However, very few instances of values below .60 have been obtained. It should be noted that a Cronbach alpha of .60 with only 4 items is acceptable because, as noted by Cronbach (1951), the coefficient alpha underestimates the internal consistency of scales with a low number of items. This is because the coefficient alpha includes the number of items in the formula. For instance, given the same average interitem correlation, a 3-item scale coefficient alpha value of .56 is equivalent to an alpha value of .81 on an 8-item scale!
In line with the original work of Ryan and Connell (1989) and the initial SMS validation procedures (Brière et al., 1995; Pelletier et al., 1995), construct validity has been assessed by other authors in two fashions: (1) with the simplex pattern of correlations among the subscales and (2) with correlations between motivational factors and their determinants and consequences. We do not have space to review all studies. However, overall, there is overwhelming support for the construct validity of the SMS both in French and English. For instance, in addition to finding support for the simplex pattern, Pelletier and Sarrazin (2007) concluded in their review of the evidence that the SMS has been used with success to predict a great variety of specific outcomes and consequences (such as burnout, exercise dependence among endurance athletes, fear of failing, adaptive coping skills, perceptions of constraints, flow, vitality and well-being, sporting behavior orientations, aggression, and performance) in a manner that is consistent with SDT. These findings provide strong support for the construct validity of the SMS.
In sum, the SMS has some positive features. First, it is a multidimensional instrument that assesses different types of intrinsic and extrinsic motivation as well as amotivation. Second, the scale focuses on the why of behavior and thus items are not confounded with determinants and consequences. Finally, it has some excellent psychometric properties. Nevertheless, some limitations should be underscored. First, although internal consistency levels have been acceptable overall, some subscales, especially the identified regulation subscale, have yielded relatively low coefficient alphas at times. Second, the SMS does not assess integrated regulation. Third, the seven-factor structure has not always been supported by CFAs. According to Pelletier, Vallerand, and Sarrazin (2007), this may be explained by a host of factors, including differences in sample sizes, variations in the way the instrument is administrated, or some other characteristics specific to the context of the study. However, as already indicated, it is also possible that the SMS is better suited for a younger, nonelite athlete population. Clearly, future research on this issue is in order.
Sport Motivation Scale-6
Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed another version of the SMS, the SMS-6. This scale has the same underlying rationale that the original SMS scale but was designed to improve the original version of the SMS by including an integrated regulation subscale and attempting to solve some of the inconsistencies with the factor structure and some of the relatively low internal consistency values (below .70). The SMS-6 comprises 24 items, 4 for each of the six subscales, which include amotivation; external, introjected, identified, and integrated regulation; and general intrinsic motivation. Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed 5 items for the integrated regulation subscale as well as 7 other items (4 of which were kept in the final scale) to replace some items in the original SMS. Two samples were used to validate the SMS-6. Sample 1 was composed of 501 first-year university students participating in competitive sport at least twice per week and 113 elite athletes representing Australia at the international level (for a total of 614 participants). Sample 1 was used to derive a factor structure that included the SMS items as well as the reformulated and integrated regulation items. Sample 2 was composed of 557 university students who were engaged in a variety of sports or physical activities twice per week. The second sample was used to confirm the structure of the SMS-6. Participants also completed the Dispositional Flow Scale (DFS).
Results of a CFA with the SMS-6 (with sample 2) provided support for the factor structure as well as for the internal consistency values (all above .70). Concerning the construct validity of the SMS-6, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) reported a rather weak simplex pattern of correlations among the subscales. More specifically, external regulation correlated highly with intrinsic motivation (r = .54), while the correlation between identified regulation and intrinsic motivation was very high (r = .91) and was higher than the one between integrated regulation and intrinsic motivation (r = .75). The construct validity of the SMS-6 was not fully supported, as some of the correlations involving the SMS and flow were not as expected by SDT. For instance, the distinctions among integrated regulation, identified regulation, and intrinsic motivation were not always clear. Furthermore, external regulation revealed some positive and sometimes strong correlations with flow, contrary to hypotheses derived from SDT.
In sum, the SMS-6 contains some nice features. First, it contains an integrated regulation subscale. Furthermore, the addition of 4 new items may make the SMS more acceptable for older and more experienced athletes. Second, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) presented results supporting the validity of a variation of the SMS-6, the SMS-8. The SMS-8 contains the same items that the SMS-6 contains but assesses the three types of intrinsic motivation rather than general intrinsic motivation. The SMS-6 also shows some limitations. First, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) proposed 7 new items to replace those that were presumably problematic in the original SMS. However, only 4 of these items made it to the final version. Thus, it appears that the SMS-6 retained much of the original SMS. Second, even some of the new items appear problematic and may not assess the desired construct (see Pelletier et al., 2007). For instance, a new amotivation item (“I don't seem to be enjoying my sport as much as I previously did”) seems to reflect a decrease in intrinsic motivation rather than amotivation. Finally, results from Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) demonstrated that the integrated regulation subscale may lack discriminant validity, leading to results with flow highly similar to identified regulation and intrinsic motivation.
Behavioral Regulation in Sport Questionnaire
Lonsdale and colleagues (2008) developed the BRSQ to create an alternative measure of elite sport motivation as conceptualized by SDT. However, in contrast to Mallett, Kawabata, Newcombe, and Otero-Rorero (2007), these authors used a complete new pool of items developed by SDT experts and competitive athletes. There are two versions of the BRSQ. The BRSQ-8 contains 32 items assessing integrated, identified, introjected, and external regulation; amotivation; and the three forms of intrinsic motivation (knowledge, experience stimulation, and accomplishment) identified by Vallerand (1997). The BRSQ-6 contains the same items but assesses general intrinsic motivation rather than all three types of intrinsic motivation, for a total of 24 items.
Lonsdale and colleagues (2008) conducted a series of three studies to validate the scale. In the first study, the factorial validity and the internal consistency were assessed with 382 New Zealand elite athletes. Results from a CFA on the 32 items supported the factor structure of the BRSQ. Specifically, fit indexes were acceptable and all items loaded significantly on the appropriate factors (they ranged from .58-.91). Finally, internal consistency of the eight subscales, measured with the Cronbach alpha, showed high values ranging from .71 to 91. Additionally, 1 wk test-retest reliability was tested with 34 competitive adult athletes. Coefficient alphas for all subscales supported the temporal reliability (values ranged from .73-.90).
In a second study with 343 athletes from New Zealand, the results of a CFA on the BRSQ-8 supported once more the factor structure as well as the subscale internal consistency. Lonsdale and colleagues (2008) also showed that the factor structure of the BRSQ-6 model fit the data very well and that subscale coefficient alphas all exceeded .78. Moreover, the construct validity of the BRSQ-6 was assessed by testing for a simplex pattern of correlations among the six subscales. While some relationships were in line with predictions (e.g., amotivation was negatively related to intrinsic motivation), there was a lack of discrimination between some subscales. More specifically, there was no difference between external and introjected regulation scores in terms of their relationships with amotivation. A similar pattern was evident with the identified and integrated regulation subscales, which both had similar high correlations with intrinsic motivation. These results with the simplex pattern were replicated in a third study conducted with nonelite athletes. In this third study, Lonsdale and colleagues also assessed the relationships between the BRSQ-6 and indexes of burnout (Lemyre, Treasure, & Roberts, 2006; Raedeke & Smith, 2001) and flow (Jackson & Eklund, 2002). Overall, results supported hypotheses in line with SDT. Specifically, amotivation and external and introjected regulation showed negative correlations with flow and positive correlations with burnout. The opposite pattern of correlations was found for the self-determined subscales (intrinsic motivation and identified and integrated regulation). However, there was a lack of discrimination between integrated regulation and general intrinsic motivation. Results of another study on burnout (Lonsdale, Hodge, & Rose, 2009) replicated these findings. Thus, overall, the support for the construct validity of the BRSQ-6 appears to be mixed.
It should be underscored that the BRSQ has some nice features. First, the scale is designed in such a way that the researcher can decide to use a multidimensional (BRSQ-8) or unitary (BRSQ-6) conceptualization of intrinsic motivation. Second, the scale is rather short, with 4 items per subscale. Finally, it assesses integrated regulation. At the same time, the BRSQ also displays some limitations. First, additional research is needed on the construct validity of the scale. Whereas there is support distinguishing the self-determined subscales (intrinsic motivation and identified and integrated regulation) from the non-self-determined subscales (external and introjected regulation), the finer discrimination within each type of category appears to be lacking. Such evidence is crucial, and future research is needed in order to show that this scale does indeed assess the SDT constructs rather than two broad sets of subscales tapping self-determined versus non-self-determined motivation. Second, this scale is designed specifically for older participants in competitive sport; it remains to be seen if the BRSQ can be used with younger participants, for whom the integrated regulation subscale may not have full meaning. Finally, research is needed to test the temporal stability of the scale over a time framed longer than 1 week.
Pictorial Motivation Scale
The PMS was designed to measure intrinsic and extrinsic motivation for sport and exercise in people with an intellectual disability. It assesses participants' reasons for engaging in sport and exercise. The scale's main characteristics are drawings depicting each of the 20 items. There are 5 items (pictures) for each of four subscales: intrinsic motivation, self-determined extrinsic motivation (a mixture of integrated and identified regulation), non-self-determined extrinsic motivation (a mixture of introjected and external regulation), and amotivation. These pictures are used to help participants with cognitive difficulties and to help represent the motivational concept depicted in each item.
The original scale was developed in French (Reid, Poulin, & Vallerand, 1994). Results of a study with 62 participants supported the internal consistency, temporal stability, and construct validity, as exemplified by the presence of a simplex pattern among the four subscales. However, the amotivation subscale had poor reliability (α = .52). The French version (Reid et al., 2009) was translated into English according to the back-translation and committee procedures outlined in Vallerand (1989). Then, 6 new items were generated for the less reliable amotivation subscale. Participants in the Special Olympics (n = 160) completed the English version. Results of the CFA confirmed the four-factor structure of the PMS. Furthermore, the internal consistency (Cronbach alphas) ranged from .60 to .71. Finally, the construct validity was assessed by testing for a simplex pattern of correlations among the four subscales. The intercorrelations among latent variables from the CFA provided support for the simplex pattern.
Results from a study conducted with the English version of the PMS involving 80 high school students with mild intellectual disability provided support for the internal consistency, temporal stability (over 3 wk), and construct validity of the PMS with respect to the simplex pattern of correlations among the PMS subscales as well as correlations between the PMS subscales and motivational antecedents (skill and perceived competence) and outcomes (perceived effort) as rated by the physical education teacher. Finally, the internal consistency of each subscale was tested without the pictorial dimension with a subset of 47 high school students with mild intellectual disability. Results indicated poor internal consistency (.91 for intrinsic motivation, .27 for self-determined extrinsic motivation, .20 for non-self-determined extrinsic motivation, and .60 for amotivation). This finding suggests that the scale is not reliable without the drawings.
The preliminary findings with the English version of the PMS are encouraging. Furthermore, this scale is the only one geared for individuals with intellectually disability. The use of drawings to depict the various items makes this scale unique in the field. Nevertheless, the PMS shows some limitations. First, the scale does not differentiate among all forms of intrinsic (knowledge, stimulation, and accomplishment) or extrinsic ( integrated, identified, introjected, and external regulation) motivation. Second, construct validity was tested with only a limited number of variables. Third, it is not known if the scale is usable with children who have severe forms of intellectual disabilities. Clearly additional research is needed on the reliability and validity of the PMS.
Situational Motivation Scale
The SIMS is one of the few scales to assess intrinsic and extrinsic motivation and amotivation at the situational level (Guay et al., 2000). The SIMS is a multidimensional tool that measures four types of motivation: intrinsic motivation, identified regulation, external regulation, and amotivation. The SIMS is made up of 16 items (4 items per subscale) and asks this question: “Why are you currently engaged in this activity?” The items represent potential reasons for task engagement. The scale is worded in such a way that it can be used in most situations (sport and nonsport).
Five studies were reported in the original article. In study 1, the original scale was developed by a committee of experts and completed by 195 French Canadian college students. Results of an EFA revealed a four-factor structure with the final 16 items loading on their respective factor. In study 2, a CFA confirmed the factor structure as well as its invariance across gender. Across the five studies, the internal consistency values of the subscales were acceptable, ranging from .62 to .95 (see Guay et al., 2000). Moreover, across all studies, support was obtained for the construct validity of the SIMS through results from correlations in line with the simplex pattern among the subscales as well as between the SIMS subscales and motivational determinants and consequences. Perhaps of greater interest for the present discussion were the results of study 4, which showed that some subscales (intrinsic motivation and identified regulation) were sensitive enough to detect changes in motivation that took place during two games of a basketball tournament.
Other researchers have also obtained support for the psychometric properties of the SIMS. First, all studies reported acceptable internal consistency values for each subscale (Blanchard, Mask, Vallerand, de la Sablonnière, & Provencher, 2007; Conroy, Coatsworth, & Kaye, 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, Duda, & Prusak, 2003). The coefficient alpha values of all but the amotivation subscale (α = .58) in the Conroy and colleagues study were above .60. Second, support for the factorial validity of the SIMS was obtained through CFAs with one qualification. Whereas the CFA results with the 16 items yielded acceptable fit indexes, removal of 1 item (Jaakkola, Liukkonen, Laakso, & Ommundsen, 2008) and even 2 items (Gillet, Berjot, & Paty, 2009; Standage, Treasure, et al., 2003) yielded better fit indexes. Moreover, Standage, Treasure, and colleagues (2003) conducted multisample CFAs and showed that the pattern of factor loadings was largely invariant across four different samples.
Construct validity of the SIMS was also assessed in several studies (Blanchard et al., 2007; Conroy et al., 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, et al., 2003). In addition to supporting the simplex pattern among the SIMS subscales and between the SIMS subscales and need satisfaction (study 2 of Blanchard and colleagues, 2007), results also supported the postulate from the HMIEM (Vallerand, 1997) for the top-down effect, in which contextual sport motivation was found to predict situational sport motivation (studies 1 and 2 of Blanchard et al., 2007; Jaakkola et al., 2008; Ntoumanis & Blaymires, 2003). Specifically, the more self-determined the motivation was found to be in a specific context (in this case, sport), the more self-determined the motivation was found to be in a given situation. Furthermore, Blanchard and colleagues (2007, studies 1 and 2) found support for another postulate from the HMIEM that suggests that over time, situational motivation in the realm of sport (basketball) has recursive effects on contextual motivation. The more that situational motivation is self-determined, the more that contextual motivation becomes self-determined over time. Finally, Jaakkola and coworkers (2008) demonstrated that, as predicted by the HMIEM, situational self-determined motivation was better than contextual motivation in predicting the situational intensity (as assessed by HR) displayed by students in a physical education class. Overall, these findings provide strong support for the reliability and factorial and construct validity of the SIMS.
The SIMS has several positive features, one of them being that it is the only scale to assess intrinsic and extrinsic motivation and amotivation at the situational level. Furthermore, it does so using only 16 items. Nevertheless, it also has some weaknesses. First, the SIMS does not assess the different types of intrinsic motivation and integrated and introjected regulation, because it was designed to be short. Second, while the factor structure has been supported, it is not clear if some items should be replaced (Gillet, Berjot, et al., 2009; Jaakkola et al., 2008; Standage, Treasure, et al., 2003). Third, research so far has not assessed the validity of the scale with high-performance athletes. Thus, additional research is needed to further test the psychometric properties of the SIMS in sport.
Learn more about Measurement in Sport and Exercise Psychology.
Ethics codes imperative in conducting research
Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body.
Ethics Codes: Their Nature, Purposes, and Application
Ethics codes typically comprise principles and standards. Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body. These general and aspirational statements set the underlying tone for the more specific codes and guide the work-related ethical decision making of professionals. In contrast, ethical standards specify both proscribed and prescribed member behaviors. While not always black and white, these standards serve as a more clear cut and enforceable guide for professional behavior.
Members should apply both the aspirational principles and enforceable standards to shape their thinking and behavior in work settings. Ideally, members self-monitor their own behavior. In an effort to remain ethical, professionals are encouraged to consult with colleagues about ethically challenging situations and to provide constructive feedback about perceived possibly unethical behavior they witness in others.
Assessment and Measurement
A central question to be addressed in this chapter is what are assessment and measurement. Sundberg (1977) defines assessment as the processes used “for developing impressions and images, making decisions and checking hypotheses about another person's pattern of characteristics that determines his or her behavior in interaction with the environment” (p. 21). The assessment process involves collecting and assembling a broad range of objective and subjective information about persons or groups to develop impressions about them; identify their needs; predict how they might think, feel, and behave in future situations; and select and apply interventions based on the content and dependability of that information. Professionals may use multiple assessment methods that include observations of behavior, symptom checklists, surveys and questionnaires, structured and unstructured interview materials, and standardized tests (Bennett et al., 2006). Gardner and Moore (2006) emphasize using a triad of psychological assessment strategies in the practice of clinical sport psychology: (1) initial interviews, (2) behavioral observation, and (3) psychological testing. The nature and assumptions underlying assessment approaches are usually grounded in the theoretical orientation of the professional (Andersen, 2002).
In contrast, measurement can mean many things to many people. It is one of the most common words in the English language and can be used as both a noun and a verb (Lorge, 1967). For the purposes of this chapter, measurement is viewed as an extension of assessment processes. It can be thought of more narrowly as the process of collecting information about psychological characteristics of interest (e.g., attitudes, behaviors, state experiences) using one or more methods or tools (such as those mentioned earlier) to monitor change, the effect of intervention, or treatments postassessment. For example, an educational sport psychology consultant might administer a measure of team cohesion over the course of a competitive season to see how team members perceive their relationships. Another consultant might conduct a preseason baseline screening assessment of cognitive functioning in hockey players and then reevaluate players who incur a mild traumatic brain injury (i.e., concussion) later in the season.
In this chapter, the terms measurement and assessment are used interchangeably. Furthermore, these terms are used to describe the decisions and opinions made by professionals regarding clients with whom they work. As such, measurement and assessment techniques include all methods of gathering information about clients, such as (a) psychological, educational, and neurological tests; (b) data gathered during clinical interviewing; (c) information gathered from significant others (e.g., family members, teachers, friends); (d) direct and indirect observation; and (e) interactions with people via teletherapy (e.g., Internet, phone; Fisher, 2009).
Competence and Education
In order to excel in our professional duties and do well for those we serve, teach, study, and otherwise interact with, we must know what to do and how to do it in a capable manner. The ethics codes mentioned earlier identify the necessity of being knowledgeable and capable in our work. For example, the APA ethical standards provide guidance for organization members in this area, including information about (a) competence limitations, (b) keeping up competence, (c) making sound professional and scientific judgments, (d) delegating work responsibilities to others, (e) engaging in activities in emergencies, and (f) impairment (APA, 2002). Competence in professional behaviors is a personal matter that is frequently challenged. It is the responsibility of professionals to know their limitations and how their knowledge and skills change and require constant upgrading. The APA ethics code also emphasizes the importance of making sound work-related decisions based on scientific knowledge and appropriate discipline-specific practice. This portion of the APA code cautions professionals to be careful when delegating work to others, describes how a professional is responsible for others' work, and explains the necessity of avoiding multiple relationships with those to whom work is delegated. The APA standards note that we can occasionally be thrown into situations in which our competence is stretched; in such cases we need to be very careful, seek supervision if available, and end such work as soon as possible.
Measurement Referral Questions and Appropriateness of Instruments
When selecting assessment instruments, the professional must consider the referral questions that prompted this process (Fisher, 2009; Smith, 1976). The instruments selected should reflect these referral questions and utilize assessment strategies that have appropriate validity and reliability. For example, if a professional is interested in measuring state anxiety for research purposes, an appropriate assessment may be the Competitive State Anxiety Index-2 (CSAI-2; Martens, Burton, Vealey, Bump, & Smith, 1990) as opposed to the State-Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, & Luschene, 1970), which measures both trait and state anxiety. When selecting the assessment, the professional should be aware of limitations or biases regarding cultural sensitivity (see the later section on cultural issues); gender considerations (Etzel, Yura, & Perna, 1998); and age, language, or disability factors that may influence the psychometric qualities of the assessment differently from the way they influenced the normative groups used for the development and validation of the instrument (APA, 2002; Fisher, 2009). It is also important to consider the method of delivery. For example, assessments based on paper and pencil may not have been validated for online use (see the later section on technology), and instruments with elevated reading levels may not be appropriate for certain age or developmental groups. Therefore, the professional should always verify the assessment's validity and reliability when a modified assessment method or group is used (Fisher, 2009). Furthermore, the professional should also attempt to conduct in-person assessments when possible, as a great deal of information can be learned about clients from the way in which they present themselves during the assessment process. This information can affect the richness of the assessment data.
It is also important for professionals to be aware of and competent to assess and use appropriate psychometric strategies for establishing validity and reliability of the instruments they use (AERA, APA, & NCME, 1999). All instruments have unique psychometric properties that affect how they should be administered and interpreted. When validity and reliability issues are not taken into consideration, it is possible to choose and utilize instruments to assess factors that they were not designed to assess. Furthermore, practitioners should be well aware of other psychometric properties such as content and criterion validity and standard error of measurement that may affect how results are interpreted and used. The ethical practitioner needs to be aware of psychometric issues in order to choose appropriate instruments with regard to the referral questions, client characteristics, assessment strategies, and environmental factors.
Consent and Assent
As discussed earlier, the ethical principles for sport and exercise psychology emphasize doing no harm to the client and respecting the individual's rights and dignity (AASP, 1996; APA, 2002). The test taker's right to privacy and confidentiality applies here as well, and the professional should take all necessary precautions to maintain the confidentiality and privacy of the client. To protect the test taker, informed consent must be obtained at the start of the relationship (e.g., research, consultation, therapy). Beyond the informed consent process and before formal assessment, the client or participant should be informed of all pertinent information regarding the assessment process. This information includes (a) the nature and purpose of assessment; (b) any applicable fees; (c) potential involvement of third parties such as a coach, athletic trainer, or manager; (d) limits of privacy and confidentiality (as discussed in the next section); and (e) the timeline for the process and potential feedback (Fisher, 2009). This information should be presented in a clear and understandable manner. Furthermore, this information should be agreed to by the test taker, who thereby gives informed consent. Test takers should engage in assessment of their own free will and must be given the option to withdraw participation without consequences (APA standard 3.10). All necessary information about assessment procedures and findings should be provided in a language or level appropriate for the participant. Furthermore, it is unethical to necessitate or coerce individuals to take part in measurement and assessment for research or practice purposes.
Privacy and Confidentiality and Release of Information
Typically, the ethical standards of organizations with ties to sport psychology (APA ethical standard 4.01 and the AASP) suggest that professionals should not reveal information about clients, test takers, or others without their signed approval to release information or legal requirement. These legal situations may include (a) a test taker who indicates possible self-harm or harm to others (i.e., suicide or homicide), (b) a test taker whose results are subpoenaed by the court, or (c) a test taker who is a minor, in which case the parent or guardian may have access to the data (Etzel et al., 1998). If the test taker or, in the case of a minor, the parent or guardian provides explicit written permission, the specific information identified by the client may be released to the identified parties. Unless these circumstances are met, information from the test taker may not be disclosed to anyone (e.g., coaches, management, parents, administration, athletic trainers, and so on).
In situations where the assessment is requested by a third party (e.g., coaches, management, the court), this third party may also request results from the assessment. It is important for the professional to establish a priori who is the “real client” (Ogilvie, 1979) and to have the ability to control access to the results. Etzel and colleagues (1998) suggest that information about the assessment should be shared only with one predetermined person, unless a release of information form has been completed. Therefore, when engaging in assessments, the professional should set clear boundaries and avoid dual relationships, thereby identifying who is being served (APA standard 4.02a). Another complication of these situations is the role of trust. If athletes or test takers suspect the test results will be used without their permission in decisions regarding performance or other aspects of participation, they may be less likely to respond honestly, thus affecting the validity of the results (see the section on demand characteristics).
Raw Data and Data Storage
Raw data such as the test taker's responses to items, including the professional's notes and final reports, should be stored in locked file cabinets inside the professional's office or in password-protected computer files (Fisher, 2009). Other methods to ensure confidentiality may include limiting access to records to only those people who have a need to know this information and have been trained to handle and understand it, deidentifying records using code numbers, and appropriately disposing of identifiable records (Fisher, 2009). A good policy for data maintenance is that data should be kept for a minimum of 7 y after the last service delivery date or 3 y after a minor reaches the age of 18 (whichever is later), as is recommended by the APA record-keeping guidelines (APA, 2002; Fisher, 2009). Raw data and the instruments used for assessment purposes should not be released to third parties unless a release of information form has been completed and the third party is trained competently to use such information.
Results Discussion
Test feedback and results discussion should be provided in the form of a carefully constructed report using clear language that fully explains the assessment results. Labels and jargon should be eliminated to increase readability. Information necessary to the purpose of the test should be included, and the inclusion of unnecessary and unrelated information should be avoided (APA, 2002; Fisher, 2009). Additionally, as recommended by the APA (APA, 2002), interpretations should take into consideration the participant's gender, race, ethnicity, age, national origin, sexual orientation, religion, disability, language, or socioeconomic status. Participants should receive assessment information and feedback related to their performance on the assessment and should be informed of ways in which they could personally use the test results or how this information may be used by a third party (only if written permission was given to release such information). The information released to the participant should be presented in a verbal or written report and presented in such a way that it may not cause harm to the test taker (Etzel et al., 1998). However, information such as numerical scores or specific responses should not be released to individuals not qualified to interpret such information (Fisher, 2009; Tranel, 1995).
Demand Characteristics
In the sport context, several groups of individuals may be interested in the assessment results of athletes. Interested parties may include coaches, managers, teams, students, or administrators. However, the potential of a third party reviewing the test results may increase social desirability and result in invalid and unreliable information. Therefore, undue pressure to complete an instrument or battery should be considered as a contextual factor.
Another potentially undesirable effect of a third party viewing the test taker's results may be assessment anxiety. The APA standards state that if a test taker is observed to be anxious or reports feeling anxious, this feeling should be taken into account and become a limitation in the interpretation of test data (APA, 2002). Assessment anxiety may be exaggerated in situations where a third party may have access to results. These situations may also lead to faking good or faking bad on the part of respondents who are concerned about how the results may be used. This must also be considered when evaluating the results.
Supervision of Subordinates
In some cases, professionals may hire and train subordinates to help with assessment and measurement tasks. These subordinates may administer, score, and even interpret the results of measurement and assessment. Standard 2.05 of the APA ethics code (APA, 2002) states that professionals utilizing employees, supervisees, or research and teaching assistants for such purposes should take reasonable precautions to put subordinates in situations where (a) they do not face possibly harmful multiple relationships with the client that could affect their objectivity, (b) they are competently trained to perform the delegated task on their own or with supervision, or (c) they are supervised for competent service delivery. Therefore, when using subordinates to help with tasks such as administration, scoring, or interpretation, the professional assumes primary responsibility and liability to ensure that the services are being provided competently. The professional needs to ensure that subordinates are well trained with all potential instruments. To do so, the professional must provide appropriate training, experience, and supervision as well as continue to check the subordinates' work to ensure its quality. As with licensed professionals, not all subordinates have the same competencies with regard to all instruments.
Learn more about Measurement in Sport and Exercise Psychology.
Tools to measure the physical self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002).
Tools to Measure the Physical Self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002). However, following the research of Shavelson and colleagues (1976), a number of multidimensional self-concept instruments containing one or more PSC scales were developed. Thus, in a 1974 review, Wylie concluded that at the time most self-concept instruments focused on global self-concept or self-esteem rather than specific domains such as PSC. Although several of the instruments reviewed by Shavelson and colleagues (1976) contained items relating to physical skills and elements of physical appearance, none provided a clearly interpretable measure of PSC. From a practical perspective, these older instruments appear to be of little value for sport and exercise psychologists. The major exception, perhaps, is the Physical Estimation and Attraction Scale (PEAS; Sonstroem, 1978, 1997), along with the theoretical model on which it is based. This instrument was designed to measure two global components: estimation (competency) and attraction. While the PEAS may not be the instrument of choice today, it has a historical significance in that its research incorporated many of the features of the construct validity approach advocated in this chapter, it was heuristic, and it provided an important basis for subsequent research.
In a subsequent 1989 review, Wylie identified several multidimensional self-concept instruments measuring one or more components of PSC that can be differentiated from other specific domains of self-concept and general self-concept. Included in the list were the three SDQ instruments already discussed. Wylie also evaluated Harter's (1985) Self-Perception Profile for Children, which contains two PSC scales (athletic competence and physical appearance). Other multidimensional instruments containing physical scales that were not reviewed by Wylie include the Self-Rating Scale (Fleming & Courtney, 1984), which measures physical ability and physical appearance; the Song and Hattie Test (Hattie, 1992), which measures physical appearance; and the Multidimensional Self-Concept Scale (Bracken, 1996), which has a physical scale that includes physical competence, physical appearance, physical fitness, and health. The Tennessee Self-Concept Scale (Fitts, 1965) is a multidimensional self-concept instrument that also purports to measure PSC. In their review and empirical evaluation of this instrument, Marsh and Richards (1988) found distinguishable physical components reflecting health, neat appearance, physical attractiveness, and physical fitness that were incorporated into a single PSC score. This detailed breakdown of the Tennessee physical scale was supported by relationships with the SDQ physical ability and physical appearance scales in an MTMM study comparing responses to the two instruments. Because each of the clusters based on responses to the Tennessee instrument is represented by only a few items, it is not appropriate to use the instrument to measure these distinct components of PSC. Marsh and Richards argued that PSC measures that combine and confound a wide range of differentiable physical components—such as those based on the Tennessee Self-Concept Scale—should be interpreted cautiously (see similar comments by Fox & Corbin, 1989).
In summary, although multidimensional self-concept instruments based on Shavelson and colleagues' (1976) model provided good support for the construct validity of the physical ability and appearance scales (e.g., Marsh, 2002; Marsh & Peart, 1988), they left unanswered the question of whether PSC is more differentiated than can be explained in terms of one (physical ability) or two (ability, appearance) physical scales. Subsequent PSC instruments were developed specifically to address the issue of the multidimensionality of PSC.
Physical Self-Perception Profile
The Physical Self-Perception Profile (PSPP; Fox, 1990; Fox & Corbin, 1989) is a 30-item inventory that consists of four specific scales and one general physical self-worth factor. The PSPP was developed to document the physical self-perceptions of college students. It was designed to reflect the advances made by Harter (1985) and Shavelson and colleagues (1976) in identifying the physical self as an important construct to measure in its own right and to reflect the hierarchical, multidimensional nature of the physical self. A qualitative approach was used to reveal dimensions of physical self-esteem salient to the population sampled (Fox & Corbin, 1989). The PSPP consists of five 6-item scales of sport (perceived sport competence), body (perceived bodily attractiveness), strength (perceived physical strength and muscular development), condition (perceived level of physical conditioning and exercise), and physical self-worth. Fox (1990) recommended that the 10-item Rosenberg Self-Esteem Scale (Rosenberg, 1965) be used alongside the PSPP to provide a global measure. Fox (1990) reported factor analyses indicating that each item loads most highly on the factor that it is designed to measure and that individual scale reliabilities are in the .80s.
The PSPP research demonstrates (a) good reliability (coefficient alpha of .80-.95; Fox, 1990; Page, Ashford, Fox, & Biddle, 1993; Sonstroem, Speliotis, & Fava, 1992); (b) good test-retest stability over the short term (rs of .74-.89; Fox, 1990); (c) a well-defined, replicable factor structure as shown by CFA (Fox & Corbin, 1989; Sonstroem, Harlow, & Josephs, 1994); (d) convergent and discriminant validity in studies showing PSPP relationships with external criteria such as exercise behaviors, mental adjustment variables, and health complaints (Fox & Corbin, 1989; Sonstroem & Potts, 1996); and (e) applicability for an older adult population (Sonstroem et al., 1994). However, correlations among the PSPP scales are consistently so high (.65-.89 when disattenuated for measurement error; Marsh, Richards, Johnson, Roche, & Tremayne, 1994) that they detract from the instrument's ability to differentiate among the different PSC factors it purports to measure.
Subsequently, a version of the PSPP for children and adolescents was developed and validated—the Children and Youth Physical Self-Perception Profile (CY-PSPP; Eklund, Whitehead, & Welk, 1997; Whitehead, 1995). Like the PSPP, the CY-PSPP is a 30-item inventory consisting of the same five 6-item scales. The CY-PSPP is a substantially revised version of the PSPP that is most appropriately thought of as a different instrument. The CY-PSPP body, strength, and conditioning subscales are based on minor adaptations of the PSPP to make them more suitable for children. However, the global self-worth (self-esteem) and sport scales are completely different. The PSPP did not have a self-esteem scale of its own but included 6 items adapted from the Rosenberg Self-Esteem Scale. On the CY-PSPP, global self-esteem and sport scales from the PSPP were dropped and replaced with corresponding scales from Harter's (1985) Self-Perception Profile for Children. Correlations among factors remained high (e.g., physical self-worth with attractive body adequacy = .8). Eklund and colleagues (1997) suggested that these results are consistent with the developmental patterns among children, as differentiation in self-concept is less defined at younger ages (Harter, 1985). CFAs have supported the instrument's factor structure, with both the CFI (comparative fit index) and NNFI (non-normed fit index) indexes exceeding the .90 criterion for good model fit (Eklund et al., 1997). Moderate correlations (r = .39-.45) with external criteria such as physical activity and physical fitness have demonstrated its convergent and discriminant validity (Welk & Eklund, 2005). The CY-PSPP has been validated with adolescents (Jones, Polman, & Peters, 2009; Welk, Corbin, & Lewis, 1995; Whitehead, 1995) and younger children (Welk, Corbin, Dowell, & Harris, 1997) and has been validated and translated into other languages (åsci, Eklund, Whitehead, Kirazci, & Koca, 2005; Raustorp, Ståhle, Gudasic, Kinnunen, & Mattsson, 2005; Raustorp, Mattsson, Svensson, & Ståhle, 2006).
Both the PSPP and CY-PSPP use a nonstandard response format based on Harter (1985), in which each item consists of a matched pair of statements, one negative and one positive (e.g., “Some people feel that they are not very good when it comes to sports” but “Others feel that they are really good at just about every sport”). Each item consists of two contrasting descriptions, and respondents are asked which description is most like them and whether the description they select is “Sort of true of me” or “Really true of me.” Responses are scored on a scale of 1 to 4, with 1 representing a “Really true of me” response to the negative statement and 4 representing a “Really true of me” response to the positive statement. Whereas this response format is designed to reduce the influence of social desirability, Wylie's (1989) review of Harter's original instruments provided little or no support for this suggestion, and Marsh and colleagues (1994) suggested that there were substantial method effects associated with the nonstandard response scale. This format has also been shown to be confusing, particularly for children (Eiser, Eiser, & Haversmans, 1995), and even for adults (Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994), unless special care is taken to explain the response scale. Using the suggestion of Marsh and colleagues (1994) that confusion over the structured alternative response scale could be overcome by more detailed instructions at the outset, researchers implementing the CY-PSPP used large illustrations for a sample item (Whitehead, 1995). Wichstrom (1995) found that responses for this format were psychometrically stronger when based on typical Likert responses rather than the structured alternative format, but Welk and colleagues (1997) suggested that the nonstandard response scale on the CY-PSPP worked better than Likert responses worked.
In summary, the PSPP and the CY-PSPP are established instruments that have been translated into several languages and have been used with a range of populations. However, the format and the high correlations among factors in both instruments may limit their usefulness in some settings. The CY-PSPP is a substantially revised version of the PSPP specifically developed for children. Although the CY-PSPP should be used instead of the PSPP for child and adolescent samples, it might even be stronger than the original PSPP is for adult samples.
Subsequent to the completion of this chapter, Lindwall and colleagues (2011) published a revised version of the PSPP (PSPP-R). They reviewed critiques of the PSPP response scale such as those noted here (e.g., Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994) and acknowledged that “the idiosyncratic alternative response format has been difficult to understand for some participants” (pp. 310-311). In recognition of these problems, the idiosyncratic response scale that has been such a salient feature of the PSPP was dropped altogether and replaced with a 4-point Likert response using only positively worded items. Lindwall and colleagues (2011) demonstrated the appropriateness of the revised PSPP-R based on a large sample (N = 1,831) of participants from four countries (Sweden, Great Britain, Portugal, and Turkey). However, they did not indicate whether the PSPP-R supersedes the PSPP or is merely an alternative to it. There also wasn't any discussion of the implications for other instruments using similar idiosyncratic response scales (e.g., PSPP- related instruments such as CY-PSPP or Harter's instruments more generally).
Physical Self-Inventory
The Physical Self-Inventory (PSI) is a French adaptation of the PSPP that was originally developed for use with Francophone adults (Ninot, Delignières, & Fortes, 2000). In two preliminary studies, Ninot and colleagues used the nonstandard response scale from the PSPP. However, consistent with previous research (Marsh et al., 1994), they reported that this response scale was problematic. In a third study, the authors used a 6-point Likert response scale; factor analysis results were reasonable, but reliability coefficients were not completely satisfactory. Next the authors replaced the PSPP global physical items with items from the SDQ physical scale and the PSPP global self-esteem items with items from Coopersmith (1967). The final PSI consists of 25 items measuring six PSC factors (four specific and two global, as with the PSPP) and has satisfactory psychometric properties that have been confirmed in subsequent French studies of adults (Masse, Jung, & Pfister, 2001; Stephan, Bilard, Ninot, & Delignières, 2003; Stephan & Maïano, 2007).
Maïano and coworkers (2008) subsequently constructed a short form of the PSI for use with adolescents. They found that not all items from the adult PSI worked with adolescents, but they were able to construct 18-item (PSI-SF, 3 items per scale) and 12-item (PSI-VSF, 2 items per scale) versions that had good psychometric properties. In particular, the measurement and hierarchical structures were consistent with proposals by Fox and Corbin (1989) and were fully invariant across gender. Maïano and coworkers also noted that PSI-SF responses showed very high test-retest stability. Comparison of the PSI-SF and PSI-VSF demonstrated that the measurement model, mean structure, structural parameters, and criterion-related validity were equivalent across samples and versions. Nevertheless, the authors noted a serious limitation that all versions of the PSI share with the PSPP: Very high correlations among the six PSC factors (correlations among latent factors) that, according to the authors, bring “into question the real independence of some of the models' sub-dimensions, and by extension their discriminant validity, a finding that has already been observed by Marsh (2002; Marsh et al., 2006) on analyses of the PSPP” (Maïano et al. 2008, p. 844). However, Maïano and colleagues also noted that because they used a traditional Likert response scale, the high correlations apparently were not due to the structured alternative format used in the PSPP. In summary, particularly the short and very short forms of the PSI have made a potentially important contribution to applied research. However, further research is needed to evaluate more fully the robustness of support for construct validity and application in non-French-speaking settings.
Richards Physical Self-Concept Scale
The Richards Physical Self-Concept Scale (RPSCS; Marsh et al., 1994; Richards, 1988) is a 35-item instrument designed to measure six specific components of PSC (body build, appearance, health, physical competence, strength, action) and one general physical satisfaction factor. Each item is a simple declarative statement, and subjects respond on an 8-point true-false scale. Extensive research in Australia (e.g., Marsh et al., 1994; Richards, 1988) has indicated that RPSCS responses have good psychometric properties. The factor structure is very robust, generalizing well over ages from 8 to 80 y and over gender.
RPSCS research has demonstrated (a) good reliability (coefficient alpha of .79-.93; Marsh et al., 1994; Richards & Marsh, 2005); (b) good test-retest stability over the short term (coefficient alpha of .77-.90 over 3 wk; Richards, 1988); (c) a well-defined, replicable factor structure as shown by CFA (Marsh et al., 1994; Richards, 2004); (d) a factor structure that is invariant across gender, as shown by multiple-group CFA (Richards, 2004), and across a wide age range; (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (Marsh et al., 1994; Richards & Marsh, 2005); and (f) applicability for participants aged 8 to 60 y and for both genders (Marsh et al., 1994; Richards, 1988, 2004; Richards & Marsh, 2005). In summary, the RPSCS is regarded as a valid, reliable, and structurally sound instrument that has been tested across both genders and a wide population of ages. The applicability across such a wide range of ages is a particular strength.
Physical Self-Description Questionnaire
Extending Fleishman's (1964) classic research on the structure of physical fitness, the Physical Self-Description Questionnaire (PSDQ) scales reflect some of the original SDQ scales and parallel physical fitness components identified in a CFA of physical fitness measures (Marsh, 1993). The PSDQ consists of nine specific components of PSC (strength, body fat, activity, endurance and fitness, sport competence, coordination, health, appearance, and flexibility), a global physical scale, and a global self-esteem scale. Each of the 70 PSDQ items is a simple declarative statement, and individuals respond on a 6-point true-false scale. The PSDQ is designed for adolescents but is also appropriate for older participants.
PSDQ research has demonstrated (a) good reliability (median coefficient alpha of .92) across the 11 scales (Marsh, 1996b; Marsh et al., 1994); (b) good test-retest stability over the short term (median r = .83 over 3 mo) and longer term (median r = .69 over 14 mo; Marsh, 1996b); (c) a well-defined, replicable factor structure as shown by CFA (Marsh, 1996b; Marsh et al., 1994); (d) a factor structure that is invariant over gender as shown by multiple-group CFA (Marsh et al., 1994); (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (see Marsh et al., 1994); (f) convergent and discriminant validity as shown by PSDQ relationships with external criteria (e.g., measures of body composition, physical activity, endurance, strength, and flexibility; see Marsh, 1996a, 1997); and (g) applicability for participants aged 12 to 18 y (or older) and for elite athletes and nonathletes (Marsh, Hey, Roche, & Perry, 1997; Marsh, Perry, Horsely, & Roche, 1995). In summary, the PSDQ is a psychometrically strong instrument.
Marsh, Martin, and Jackson (2010) recently presented a new short form of the PSDQ (PSDQ-S). This short form balances brevity and psychometric quality in relation to established guidelines for evaluating short forms (e.g., Marsh, Ellis, Parada, Richards, & Heubeck, 2005; Smith, McCarthy, & Anderson, 2000) with the construct validity approach that is the basis of PSDQ research. Based on the PSDQ normative archive, 40 of 70 items were selected and evaluated in a new cross-validation sample (N = 708 Australian adolescents). To test the generalizability of results, the authors considered four additional samples: Australian adolescent elite athletes (n = 349), Spanish adolescents (n = 986), Israeli university students (N = 395), and Australian senior citizens (n = 760). Reliabilities for the 40 PSDQ-S items were consistently high in the cross-validation sample (.81-.94; median = .89) and senior sample (.81-.94; median = .91) and reliabilities in the cross-validation sample were higher than they were in comparable groups completing the 70-item PSDQ. The PSDQ-S factor structure in the cross-validation sample was well defined and highly similar to that based on the archive sample as well as to those based on the other four groups. Study 1, using a missing-by-design variation of multigroup invariance tests, showed that invariant factor structures were invariant based on 40 PSDQ-S items and 70 PSDQ items. Study 2 demonstrated factorial invariance of responses over 1 y (test-retest correlations of .57-.90; median = .77) and good support for convergent and discriminant validity in relation to time. Study 3 showed good and nearly identical support for convergent and discriminant validity of PSDQ and PSDQ-S responses in relation to responses on the PSPP and PSC instruments. The four studies reported by Marsh and coworkers demonstrated new, evolving strategies for the construction and evaluation of short forms that support the PSDQ-S. The authors concluded that the strong support for the psychometric properties and construct validity of the widely used PSDQ instrument generalizes very well to the PSDQ-S.
Elite Athlete Self-Description Questionnaire
The PSC instruments discussed thus far may be suitable for elite athletes (e.g., Marsh et al., 1995). There may, however, be other components to PSC that are particularly relevant for elite athletes, and thus the Elite Athlete Self-Description Questionnaire (EASDQ; Marsh, Hey, Roche, et al., 1997; Marsh, Hey, Johnson, & Perry, 1997) was developed to address these other components. For the EASDQ, it was hypothesized that overall performance by elite athletes is a function of skill level, body suitability, aerobic and anaerobic fitness, and mental competence. Thus Marsh and colleagues developed the EASDQ to measure these six factors. For each scale, they developed a pool of items that sport psychologists at the Australian Institute of Sport evaluated for their suitability for elite athletes. Pilot studies were conducted to select the best items to represent each factor. A compromise between brevity and psychometric soundness was achieved, with acceptable levels of reliability (e.g., all scales having reliability estimates of at least .8) based on short scales (4-6 items per scale).
EASDQ research demonstrates (a) adequate reliability (median coefficient alpha of .85) across the six scales (Marsh, Hey, Johnson, et. al., 1997); (b) a well-defined, replicable factor structure as shown by CFA (Marsh, Hey, Johnson, et. al., 1997; Marsh, Hey, Roche, et al., 1997); (c) applicability for elite athletes aged 12 y or older (Marsh, Hey, Roche, et al., 1997); and (d) predictive validity as shown by its ability to predict swimming performances in world championships after controlling for previous personal best performances (Marsh & Perry, 2005). In summary, the EASDQ is a reliable and valid instrument for elite athletes of all ages. More research is needed, however, to relate EASDQ responses to external validity criteria such as those used in PSDQ research and to criteria that are more specific to elite athletes (e.g., actual performance in competition).
Learn more about Measurement in Sport and Exercise Psychology.
Assess intrinsic and extrinsic motivation in sport and exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted.
Evaluation of Measures of Intrinsic and Extrinsic Motivation in Sport and Exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted. Certain criteria have guided the selection of the measures presented in this section. First, we have selected measures that are fully developed instruments that have gone through extensive validation steps. Second, we have chosen scales that have been used in research, published or unpublished, during the past 10 years. Scales that have not been used during that time frame are considered to be obsolete and are not reviewed. Finally, in light of recent theoretical development and because of space limitation, we have focused on motivation scales that assess intrinsic and extrinsic motivation independently of determinants and outcomes, while focusing on the perceived reasons of behavior. Our earlier discussion on the definitions of intrinsic and extrinsic motivation makes it possible to classify the different measures. The measures can vary in terms of the level of generality (situational versus contextual level) and the area (sport versus exercise). This classification appears in table 25.1. Table 25.2 (see p. 291) provides additional information on the concept of, dimensions of, publication source of, and where to obtain the scale. As can be seen, seven measures are reviewed. For each one, we present (a) a description of the instrument, (b) the conceptual and theoretical rationale underlying its scale development, (c) the available evidence concerning its psychometric properties (e.g., factorial validity, reliability, and construct validity), and (d) a broad assessment of the strengths and weaknesses associated with each measure.
Measures Used in Sport
In this section, we review the SMS (Brière et al., 1995; Pelletier et al., 1995), the Sport Motivation Scale-6 (SMS-6; Mallett, Kawabata, Newcombe, Otero-Rorero, & Jackson, 2007), the Behavioral Regulation in Sport Questionnaire (BRSQ; Lonsdale, Hodge, & Rose, 2008), the Pictorial Motivation Scale (PMS; Reid, Vallerand, Poulin, Crocker, & Farrell, 2009), and the SIMS (Guay et al., 2000).
Sport Motivation Scale
The SMS was developed (Brière et al., 1995; Pelletier et al., 1995) in order to assess contextual intrinsic and extrinsic motivation from a multidimensional perspective, as well as amotivation. The SMS has been the most often used motivation measure in sport, being employed with a variety of athletes (recreational to elite), age groups (adolescent to senior), and cultures (e.g., Canada, United States, United Kingdom, Bulgaria, Australia, Spain, and New Zealand). In fact, the SMS has been translated and validated in several languages (see Pelletier & Sarrazin, 2007). The SMS is based on SDT (Deci & Ryan, 1985) and is made up of seven subscales assessing amotivation; external, introjected, and identified regulation; and intrinsic motivation to know, to experience stimulation, and to accomplish. In line with SDT, motivation is assessed as the perceived reasons for participation, or the why of behavior. At the beginning of the scale, participants are asked, “In general, why do you practice your sport?” The items represent the perceived reasons for engaging in the activity, thus reflecting the different types of motivation.
The original scale was developed in French as L'Échelle de Motivation dans les Sports (Brière, Vallerand, Blais, & Pelletier, 1995) and was validated in three steps. The first step involved generating a pool of items explaining various reasons for sport participation through interviews with French Canadian athletes (aged 17-20 y). These reasons were then used to formulate items for the seven subscales of the French SMS. In the second step, a committee of experts evaluated the content validity of the items and eliminated those that were thought to be inadequate. Another sample of athletes from various sports completed the scale. Results from an exploratory factor analysis (EFA) provided support for a seven-factor structure with 4 items per subscale; this second step thus resulted in a 28-item scale. In the third and final step, two additional studies were conducted to further validate the scale. These studies included approximately 500 individuals, most of whom were involved in recreational sports. Results from confirmatory factor analyses (CFA) and correlational analyses confirmed the seven-factor structure, the subscale internal consistency (ranging from .65-.96), and moderate to high indexes of temporal stability (ranging from .54-.82) over 1 month. Furthermore, inspection of correlations among the seven SMS subscales provided support for the simplex pattern proposed by SDT. Results of correlations also showed that (in line with SDT) the most self-determined forms of motivation (intrinsic motivation and identified regulation) were related more strongly to determinants such as autonomy support from coaches and feelings of competence than to other forms of motivation (external and introjected regulation) and amotivation. Similar results were obtained with motivational outcomes such as positive affect, concentration, and intentions to pursue engagement in sport. In sum, adequate construct validity was obtained for the French form of the SMS.
The translation of the French SMS into English involved back-translation and committee procedures as suggested by Vallerand (1989). Pelletier and colleagues (1995) conducted two studies involving college athletes from various sports in order to assess the psychometric properties of the English form of the SMS. Results from CFA with a sample of 593 Canadian university athletes revealed adequate fit indices or the hypothesized seven-factor model (see the Adjusted Goodness of Fit Index and the Normed Fit Index both > .90 and the Root Mean Square Residual < .08), and correlations with determinants and outcomes supported the simplex model. Moreover, internal consistency above .70 was obtained on all of the subscales except the identified subscale (.63). Test-retest correlations were acceptable and very similar to those obtained with the French SMS, as was the scale construct validity.
Since 1995, the SMS has been used extensively in sport psychology research. The seven-factor structure has been supported repeatedly (e.g., Doganis, 2000; Gillet, Vallerand, & Rosnet, 2009; Li & Harmer, 1996; Shaw, Ostrow, & Beckstead, 2005; Standage, Duda, & Ntoumanis, 2003). In addition, Hu and Bentler (1999) obtained support for a five-factor model by combining the three types of intrinsic motivation into one factor. Similar results were obtained by Gillet and colleagues (2009) with the French SMS. However, some studies have not supported the seven-factor model (Hodge, Allen, & Smellie, 2008; Mallett, Kawabata, & Newcombe, 2007; Mallett, Kawabata, Newcombe, & Otero-Rorero, 2007; Martens & Webber, 2002). Why is there such a discrepancy between these two sets of studies? One possibility lies in the populations from which the different samples were taken. Specifically, the SMS was validated using adolescent and young adult athletes and not older athletes. Because of this specific focus, some of the items may reflect a participation rather than an elite orientation, which is more in line with the younger population. For instance, an identified regulation item reads, “Because sport is one of the best ways to maintain good relationships with my friends.” Such an item seems more relevant for a younger population. An older, high-level athlete may disagree with this item but still display a high level of identified regulation for a sport (but not for relationship reasons). Future research using the SMS with different age groups and proficiency levels is needed to clarify this issue.
Whereas the internal consistency of the SMS has systematically shown adequate values, some values below .70 have been found. This is especially the case for the identified regulation subscale (Brière et al., 1995; Kingston, Horrocks, & Hanton, 2006; Li & Harmer, 1996; Pelletier et al., 1995), although some lower values (below .70) have been obtained with the introjected (McNeill & Wang, 2005; Perreault & Vallerand, 2007; Riemer, Fink, & Fitzgerald, 2002; Standage, Duda, & Ntoumanis, 2003) and external regulation (Standage, Duda, & Ntoumanis, 2003) and amotivation subscales (Standage, Duda, & Ntoumanis, 2003). However, very few instances of values below .60 have been obtained. It should be noted that a Cronbach alpha of .60 with only 4 items is acceptable because, as noted by Cronbach (1951), the coefficient alpha underestimates the internal consistency of scales with a low number of items. This is because the coefficient alpha includes the number of items in the formula. For instance, given the same average interitem correlation, a 3-item scale coefficient alpha value of .56 is equivalent to an alpha value of .81 on an 8-item scale!
In line with the original work of Ryan and Connell (1989) and the initial SMS validation procedures (Brière et al., 1995; Pelletier et al., 1995), construct validity has been assessed by other authors in two fashions: (1) with the simplex pattern of correlations among the subscales and (2) with correlations between motivational factors and their determinants and consequences. We do not have space to review all studies. However, overall, there is overwhelming support for the construct validity of the SMS both in French and English. For instance, in addition to finding support for the simplex pattern, Pelletier and Sarrazin (2007) concluded in their review of the evidence that the SMS has been used with success to predict a great variety of specific outcomes and consequences (such as burnout, exercise dependence among endurance athletes, fear of failing, adaptive coping skills, perceptions of constraints, flow, vitality and well-being, sporting behavior orientations, aggression, and performance) in a manner that is consistent with SDT. These findings provide strong support for the construct validity of the SMS.
In sum, the SMS has some positive features. First, it is a multidimensional instrument that assesses different types of intrinsic and extrinsic motivation as well as amotivation. Second, the scale focuses on the why of behavior and thus items are not confounded with determinants and consequences. Finally, it has some excellent psychometric properties. Nevertheless, some limitations should be underscored. First, although internal consistency levels have been acceptable overall, some subscales, especially the identified regulation subscale, have yielded relatively low coefficient alphas at times. Second, the SMS does not assess integrated regulation. Third, the seven-factor structure has not always been supported by CFAs. According to Pelletier, Vallerand, and Sarrazin (2007), this may be explained by a host of factors, including differences in sample sizes, variations in the way the instrument is administrated, or some other characteristics specific to the context of the study. However, as already indicated, it is also possible that the SMS is better suited for a younger, nonelite athlete population. Clearly, future research on this issue is in order.
Sport Motivation Scale-6
Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed another version of the SMS, the SMS-6. This scale has the same underlying rationale that the original SMS scale but was designed to improve the original version of the SMS by including an integrated regulation subscale and attempting to solve some of the inconsistencies with the factor structure and some of the relatively low internal consistency values (below .70). The SMS-6 comprises 24 items, 4 for each of the six subscales, which include amotivation; external, introjected, identified, and integrated regulation; and general intrinsic motivation. Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed 5 items for the integrated regulation subscale as well as 7 other items (4 of which were kept in the final scale) to replace some items in the original SMS. Two samples were used to validate the SMS-6. Sample 1 was composed of 501 first-year university students participating in competitive sport at least twice per week and 113 elite athletes representing Australia at the international level (for a total of 614 participants). Sample 1 was used to derive a factor structure that included the SMS items as well as the reformulated and integrated regulation items. Sample 2 was composed of 557 university students who were engaged in a variety of sports or physical activities twice per week. The second sample was used to confirm the structure of the SMS-6. Participants also completed the Dispositional Flow Scale (DFS).
Results of a CFA with the SMS-6 (with sample 2) provided support for the factor structure as well as for the internal consistency values (all above .70). Concerning the construct validity of the SMS-6, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) reported a rather weak simplex pattern of correlations among the subscales. More specifically, external regulation correlated highly with intrinsic motivation (r = .54), while the correlation between identified regulation and intrinsic motivation was very high (r = .91) and was higher than the one between integrated regulation and intrinsic motivation (r = .75). The construct validity of the SMS-6 was not fully supported, as some of the correlations involving the SMS and flow were not as expected by SDT. For instance, the distinctions among integrated regulation, identified regulation, and intrinsic motivation were not always clear. Furthermore, external regulation revealed some positive and sometimes strong correlations with flow, contrary to hypotheses derived from SDT.
In sum, the SMS-6 contains some nice features. First, it contains an integrated regulation subscale. Furthermore, the addition of 4 new items may make the SMS more acceptable for older and more experienced athletes. Second, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) presented results supporting the validity of a variation of the SMS-6, the SMS-8. The SMS-8 contains the same items that the SMS-6 contains but assesses the three types of intrinsic motivation rather than general intrinsic motivation. The SMS-6 also shows some limitations. First, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) proposed 7 new items to replace those that were presumably problematic in the original SMS. However, only 4 of these items made it to the final version. Thus, it appears that the SMS-6 retained much of the original SMS. Second, even some of the new items appear problematic and may not assess the desired construct (see Pelletier et al., 2007). For instance, a new amotivation item (“I don't seem to be enjoying my sport as much as I previously did”) seems to reflect a decrease in intrinsic motivation rather than amotivation. Finally, results from Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) demonstrated that the integrated regulation subscale may lack discriminant validity, leading to results with flow highly similar to identified regulation and intrinsic motivation.
Behavioral Regulation in Sport Questionnaire
Lonsdale and colleagues (2008) developed the BRSQ to create an alternative measure of elite sport motivation as conceptualized by SDT. However, in contrast to Mallett, Kawabata, Newcombe, and Otero-Rorero (2007), these authors used a complete new pool of items developed by SDT experts and competitive athletes. There are two versions of the BRSQ. The BRSQ-8 contains 32 items assessing integrated, identified, introjected, and external regulation; amotivation; and the three forms of intrinsic motivation (knowledge, experience stimulation, and accomplishment) identified by Vallerand (1997). The BRSQ-6 contains the same items but assesses general intrinsic motivation rather than all three types of intrinsic motivation, for a total of 24 items.
Lonsdale and colleagues (2008) conducted a series of three studies to validate the scale. In the first study, the factorial validity and the internal consistency were assessed with 382 New Zealand elite athletes. Results from a CFA on the 32 items supported the factor structure of the BRSQ. Specifically, fit indexes were acceptable and all items loaded significantly on the appropriate factors (they ranged from .58-.91). Finally, internal consistency of the eight subscales, measured with the Cronbach alpha, showed high values ranging from .71 to 91. Additionally, 1 wk test-retest reliability was tested with 34 competitive adult athletes. Coefficient alphas for all subscales supported the temporal reliability (values ranged from .73-.90).
In a second study with 343 athletes from New Zealand, the results of a CFA on the BRSQ-8 supported once more the factor structure as well as the subscale internal consistency. Lonsdale and colleagues (2008) also showed that the factor structure of the BRSQ-6 model fit the data very well and that subscale coefficient alphas all exceeded .78. Moreover, the construct validity of the BRSQ-6 was assessed by testing for a simplex pattern of correlations among the six subscales. While some relationships were in line with predictions (e.g., amotivation was negatively related to intrinsic motivation), there was a lack of discrimination between some subscales. More specifically, there was no difference between external and introjected regulation scores in terms of their relationships with amotivation. A similar pattern was evident with the identified and integrated regulation subscales, which both had similar high correlations with intrinsic motivation. These results with the simplex pattern were replicated in a third study conducted with nonelite athletes. In this third study, Lonsdale and colleagues also assessed the relationships between the BRSQ-6 and indexes of burnout (Lemyre, Treasure, & Roberts, 2006; Raedeke & Smith, 2001) and flow (Jackson & Eklund, 2002). Overall, results supported hypotheses in line with SDT. Specifically, amotivation and external and introjected regulation showed negative correlations with flow and positive correlations with burnout. The opposite pattern of correlations was found for the self-determined subscales (intrinsic motivation and identified and integrated regulation). However, there was a lack of discrimination between integrated regulation and general intrinsic motivation. Results of another study on burnout (Lonsdale, Hodge, & Rose, 2009) replicated these findings. Thus, overall, the support for the construct validity of the BRSQ-6 appears to be mixed.
It should be underscored that the BRSQ has some nice features. First, the scale is designed in such a way that the researcher can decide to use a multidimensional (BRSQ-8) or unitary (BRSQ-6) conceptualization of intrinsic motivation. Second, the scale is rather short, with 4 items per subscale. Finally, it assesses integrated regulation. At the same time, the BRSQ also displays some limitations. First, additional research is needed on the construct validity of the scale. Whereas there is support distinguishing the self-determined subscales (intrinsic motivation and identified and integrated regulation) from the non-self-determined subscales (external and introjected regulation), the finer discrimination within each type of category appears to be lacking. Such evidence is crucial, and future research is needed in order to show that this scale does indeed assess the SDT constructs rather than two broad sets of subscales tapping self-determined versus non-self-determined motivation. Second, this scale is designed specifically for older participants in competitive sport; it remains to be seen if the BRSQ can be used with younger participants, for whom the integrated regulation subscale may not have full meaning. Finally, research is needed to test the temporal stability of the scale over a time framed longer than 1 week.
Pictorial Motivation Scale
The PMS was designed to measure intrinsic and extrinsic motivation for sport and exercise in people with an intellectual disability. It assesses participants' reasons for engaging in sport and exercise. The scale's main characteristics are drawings depicting each of the 20 items. There are 5 items (pictures) for each of four subscales: intrinsic motivation, self-determined extrinsic motivation (a mixture of integrated and identified regulation), non-self-determined extrinsic motivation (a mixture of introjected and external regulation), and amotivation. These pictures are used to help participants with cognitive difficulties and to help represent the motivational concept depicted in each item.
The original scale was developed in French (Reid, Poulin, & Vallerand, 1994). Results of a study with 62 participants supported the internal consistency, temporal stability, and construct validity, as exemplified by the presence of a simplex pattern among the four subscales. However, the amotivation subscale had poor reliability (α = .52). The French version (Reid et al., 2009) was translated into English according to the back-translation and committee procedures outlined in Vallerand (1989). Then, 6 new items were generated for the less reliable amotivation subscale. Participants in the Special Olympics (n = 160) completed the English version. Results of the CFA confirmed the four-factor structure of the PMS. Furthermore, the internal consistency (Cronbach alphas) ranged from .60 to .71. Finally, the construct validity was assessed by testing for a simplex pattern of correlations among the four subscales. The intercorrelations among latent variables from the CFA provided support for the simplex pattern.
Results from a study conducted with the English version of the PMS involving 80 high school students with mild intellectual disability provided support for the internal consistency, temporal stability (over 3 wk), and construct validity of the PMS with respect to the simplex pattern of correlations among the PMS subscales as well as correlations between the PMS subscales and motivational antecedents (skill and perceived competence) and outcomes (perceived effort) as rated by the physical education teacher. Finally, the internal consistency of each subscale was tested without the pictorial dimension with a subset of 47 high school students with mild intellectual disability. Results indicated poor internal consistency (.91 for intrinsic motivation, .27 for self-determined extrinsic motivation, .20 for non-self-determined extrinsic motivation, and .60 for amotivation). This finding suggests that the scale is not reliable without the drawings.
The preliminary findings with the English version of the PMS are encouraging. Furthermore, this scale is the only one geared for individuals with intellectually disability. The use of drawings to depict the various items makes this scale unique in the field. Nevertheless, the PMS shows some limitations. First, the scale does not differentiate among all forms of intrinsic (knowledge, stimulation, and accomplishment) or extrinsic ( integrated, identified, introjected, and external regulation) motivation. Second, construct validity was tested with only a limited number of variables. Third, it is not known if the scale is usable with children who have severe forms of intellectual disabilities. Clearly additional research is needed on the reliability and validity of the PMS.
Situational Motivation Scale
The SIMS is one of the few scales to assess intrinsic and extrinsic motivation and amotivation at the situational level (Guay et al., 2000). The SIMS is a multidimensional tool that measures four types of motivation: intrinsic motivation, identified regulation, external regulation, and amotivation. The SIMS is made up of 16 items (4 items per subscale) and asks this question: “Why are you currently engaged in this activity?” The items represent potential reasons for task engagement. The scale is worded in such a way that it can be used in most situations (sport and nonsport).
Five studies were reported in the original article. In study 1, the original scale was developed by a committee of experts and completed by 195 French Canadian college students. Results of an EFA revealed a four-factor structure with the final 16 items loading on their respective factor. In study 2, a CFA confirmed the factor structure as well as its invariance across gender. Across the five studies, the internal consistency values of the subscales were acceptable, ranging from .62 to .95 (see Guay et al., 2000). Moreover, across all studies, support was obtained for the construct validity of the SIMS through results from correlations in line with the simplex pattern among the subscales as well as between the SIMS subscales and motivational determinants and consequences. Perhaps of greater interest for the present discussion were the results of study 4, which showed that some subscales (intrinsic motivation and identified regulation) were sensitive enough to detect changes in motivation that took place during two games of a basketball tournament.
Other researchers have also obtained support for the psychometric properties of the SIMS. First, all studies reported acceptable internal consistency values for each subscale (Blanchard, Mask, Vallerand, de la Sablonnière, & Provencher, 2007; Conroy, Coatsworth, & Kaye, 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, Duda, & Prusak, 2003). The coefficient alpha values of all but the amotivation subscale (α = .58) in the Conroy and colleagues study were above .60. Second, support for the factorial validity of the SIMS was obtained through CFAs with one qualification. Whereas the CFA results with the 16 items yielded acceptable fit indexes, removal of 1 item (Jaakkola, Liukkonen, Laakso, & Ommundsen, 2008) and even 2 items (Gillet, Berjot, & Paty, 2009; Standage, Treasure, et al., 2003) yielded better fit indexes. Moreover, Standage, Treasure, and colleagues (2003) conducted multisample CFAs and showed that the pattern of factor loadings was largely invariant across four different samples.
Construct validity of the SIMS was also assessed in several studies (Blanchard et al., 2007; Conroy et al., 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, et al., 2003). In addition to supporting the simplex pattern among the SIMS subscales and between the SIMS subscales and need satisfaction (study 2 of Blanchard and colleagues, 2007), results also supported the postulate from the HMIEM (Vallerand, 1997) for the top-down effect, in which contextual sport motivation was found to predict situational sport motivation (studies 1 and 2 of Blanchard et al., 2007; Jaakkola et al., 2008; Ntoumanis & Blaymires, 2003). Specifically, the more self-determined the motivation was found to be in a specific context (in this case, sport), the more self-determined the motivation was found to be in a given situation. Furthermore, Blanchard and colleagues (2007, studies 1 and 2) found support for another postulate from the HMIEM that suggests that over time, situational motivation in the realm of sport (basketball) has recursive effects on contextual motivation. The more that situational motivation is self-determined, the more that contextual motivation becomes self-determined over time. Finally, Jaakkola and coworkers (2008) demonstrated that, as predicted by the HMIEM, situational self-determined motivation was better than contextual motivation in predicting the situational intensity (as assessed by HR) displayed by students in a physical education class. Overall, these findings provide strong support for the reliability and factorial and construct validity of the SIMS.
The SIMS has several positive features, one of them being that it is the only scale to assess intrinsic and extrinsic motivation and amotivation at the situational level. Furthermore, it does so using only 16 items. Nevertheless, it also has some weaknesses. First, the SIMS does not assess the different types of intrinsic motivation and integrated and introjected regulation, because it was designed to be short. Second, while the factor structure has been supported, it is not clear if some items should be replaced (Gillet, Berjot, et al., 2009; Jaakkola et al., 2008; Standage, Treasure, et al., 2003). Third, research so far has not assessed the validity of the scale with high-performance athletes. Thus, additional research is needed to further test the psychometric properties of the SIMS in sport.
Learn more about Measurement in Sport and Exercise Psychology.
Ethics codes imperative in conducting research
Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body.
Ethics Codes: Their Nature, Purposes, and Application
Ethics codes typically comprise principles and standards. Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body. These general and aspirational statements set the underlying tone for the more specific codes and guide the work-related ethical decision making of professionals. In contrast, ethical standards specify both proscribed and prescribed member behaviors. While not always black and white, these standards serve as a more clear cut and enforceable guide for professional behavior.
Members should apply both the aspirational principles and enforceable standards to shape their thinking and behavior in work settings. Ideally, members self-monitor their own behavior. In an effort to remain ethical, professionals are encouraged to consult with colleagues about ethically challenging situations and to provide constructive feedback about perceived possibly unethical behavior they witness in others.
Assessment and Measurement
A central question to be addressed in this chapter is what are assessment and measurement. Sundberg (1977) defines assessment as the processes used “for developing impressions and images, making decisions and checking hypotheses about another person's pattern of characteristics that determines his or her behavior in interaction with the environment” (p. 21). The assessment process involves collecting and assembling a broad range of objective and subjective information about persons or groups to develop impressions about them; identify their needs; predict how they might think, feel, and behave in future situations; and select and apply interventions based on the content and dependability of that information. Professionals may use multiple assessment methods that include observations of behavior, symptom checklists, surveys and questionnaires, structured and unstructured interview materials, and standardized tests (Bennett et al., 2006). Gardner and Moore (2006) emphasize using a triad of psychological assessment strategies in the practice of clinical sport psychology: (1) initial interviews, (2) behavioral observation, and (3) psychological testing. The nature and assumptions underlying assessment approaches are usually grounded in the theoretical orientation of the professional (Andersen, 2002).
In contrast, measurement can mean many things to many people. It is one of the most common words in the English language and can be used as both a noun and a verb (Lorge, 1967). For the purposes of this chapter, measurement is viewed as an extension of assessment processes. It can be thought of more narrowly as the process of collecting information about psychological characteristics of interest (e.g., attitudes, behaviors, state experiences) using one or more methods or tools (such as those mentioned earlier) to monitor change, the effect of intervention, or treatments postassessment. For example, an educational sport psychology consultant might administer a measure of team cohesion over the course of a competitive season to see how team members perceive their relationships. Another consultant might conduct a preseason baseline screening assessment of cognitive functioning in hockey players and then reevaluate players who incur a mild traumatic brain injury (i.e., concussion) later in the season.
In this chapter, the terms measurement and assessment are used interchangeably. Furthermore, these terms are used to describe the decisions and opinions made by professionals regarding clients with whom they work. As such, measurement and assessment techniques include all methods of gathering information about clients, such as (a) psychological, educational, and neurological tests; (b) data gathered during clinical interviewing; (c) information gathered from significant others (e.g., family members, teachers, friends); (d) direct and indirect observation; and (e) interactions with people via teletherapy (e.g., Internet, phone; Fisher, 2009).
Competence and Education
In order to excel in our professional duties and do well for those we serve, teach, study, and otherwise interact with, we must know what to do and how to do it in a capable manner. The ethics codes mentioned earlier identify the necessity of being knowledgeable and capable in our work. For example, the APA ethical standards provide guidance for organization members in this area, including information about (a) competence limitations, (b) keeping up competence, (c) making sound professional and scientific judgments, (d) delegating work responsibilities to others, (e) engaging in activities in emergencies, and (f) impairment (APA, 2002). Competence in professional behaviors is a personal matter that is frequently challenged. It is the responsibility of professionals to know their limitations and how their knowledge and skills change and require constant upgrading. The APA ethics code also emphasizes the importance of making sound work-related decisions based on scientific knowledge and appropriate discipline-specific practice. This portion of the APA code cautions professionals to be careful when delegating work to others, describes how a professional is responsible for others' work, and explains the necessity of avoiding multiple relationships with those to whom work is delegated. The APA standards note that we can occasionally be thrown into situations in which our competence is stretched; in such cases we need to be very careful, seek supervision if available, and end such work as soon as possible.
Measurement Referral Questions and Appropriateness of Instruments
When selecting assessment instruments, the professional must consider the referral questions that prompted this process (Fisher, 2009; Smith, 1976). The instruments selected should reflect these referral questions and utilize assessment strategies that have appropriate validity and reliability. For example, if a professional is interested in measuring state anxiety for research purposes, an appropriate assessment may be the Competitive State Anxiety Index-2 (CSAI-2; Martens, Burton, Vealey, Bump, & Smith, 1990) as opposed to the State-Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, & Luschene, 1970), which measures both trait and state anxiety. When selecting the assessment, the professional should be aware of limitations or biases regarding cultural sensitivity (see the later section on cultural issues); gender considerations (Etzel, Yura, & Perna, 1998); and age, language, or disability factors that may influence the psychometric qualities of the assessment differently from the way they influenced the normative groups used for the development and validation of the instrument (APA, 2002; Fisher, 2009). It is also important to consider the method of delivery. For example, assessments based on paper and pencil may not have been validated for online use (see the later section on technology), and instruments with elevated reading levels may not be appropriate for certain age or developmental groups. Therefore, the professional should always verify the assessment's validity and reliability when a modified assessment method or group is used (Fisher, 2009). Furthermore, the professional should also attempt to conduct in-person assessments when possible, as a great deal of information can be learned about clients from the way in which they present themselves during the assessment process. This information can affect the richness of the assessment data.
It is also important for professionals to be aware of and competent to assess and use appropriate psychometric strategies for establishing validity and reliability of the instruments they use (AERA, APA, & NCME, 1999). All instruments have unique psychometric properties that affect how they should be administered and interpreted. When validity and reliability issues are not taken into consideration, it is possible to choose and utilize instruments to assess factors that they were not designed to assess. Furthermore, practitioners should be well aware of other psychometric properties such as content and criterion validity and standard error of measurement that may affect how results are interpreted and used. The ethical practitioner needs to be aware of psychometric issues in order to choose appropriate instruments with regard to the referral questions, client characteristics, assessment strategies, and environmental factors.
Consent and Assent
As discussed earlier, the ethical principles for sport and exercise psychology emphasize doing no harm to the client and respecting the individual's rights and dignity (AASP, 1996; APA, 2002). The test taker's right to privacy and confidentiality applies here as well, and the professional should take all necessary precautions to maintain the confidentiality and privacy of the client. To protect the test taker, informed consent must be obtained at the start of the relationship (e.g., research, consultation, therapy). Beyond the informed consent process and before formal assessment, the client or participant should be informed of all pertinent information regarding the assessment process. This information includes (a) the nature and purpose of assessment; (b) any applicable fees; (c) potential involvement of third parties such as a coach, athletic trainer, or manager; (d) limits of privacy and confidentiality (as discussed in the next section); and (e) the timeline for the process and potential feedback (Fisher, 2009). This information should be presented in a clear and understandable manner. Furthermore, this information should be agreed to by the test taker, who thereby gives informed consent. Test takers should engage in assessment of their own free will and must be given the option to withdraw participation without consequences (APA standard 3.10). All necessary information about assessment procedures and findings should be provided in a language or level appropriate for the participant. Furthermore, it is unethical to necessitate or coerce individuals to take part in measurement and assessment for research or practice purposes.
Privacy and Confidentiality and Release of Information
Typically, the ethical standards of organizations with ties to sport psychology (APA ethical standard 4.01 and the AASP) suggest that professionals should not reveal information about clients, test takers, or others without their signed approval to release information or legal requirement. These legal situations may include (a) a test taker who indicates possible self-harm or harm to others (i.e., suicide or homicide), (b) a test taker whose results are subpoenaed by the court, or (c) a test taker who is a minor, in which case the parent or guardian may have access to the data (Etzel et al., 1998). If the test taker or, in the case of a minor, the parent or guardian provides explicit written permission, the specific information identified by the client may be released to the identified parties. Unless these circumstances are met, information from the test taker may not be disclosed to anyone (e.g., coaches, management, parents, administration, athletic trainers, and so on).
In situations where the assessment is requested by a third party (e.g., coaches, management, the court), this third party may also request results from the assessment. It is important for the professional to establish a priori who is the “real client” (Ogilvie, 1979) and to have the ability to control access to the results. Etzel and colleagues (1998) suggest that information about the assessment should be shared only with one predetermined person, unless a release of information form has been completed. Therefore, when engaging in assessments, the professional should set clear boundaries and avoid dual relationships, thereby identifying who is being served (APA standard 4.02a). Another complication of these situations is the role of trust. If athletes or test takers suspect the test results will be used without their permission in decisions regarding performance or other aspects of participation, they may be less likely to respond honestly, thus affecting the validity of the results (see the section on demand characteristics).
Raw Data and Data Storage
Raw data such as the test taker's responses to items, including the professional's notes and final reports, should be stored in locked file cabinets inside the professional's office or in password-protected computer files (Fisher, 2009). Other methods to ensure confidentiality may include limiting access to records to only those people who have a need to know this information and have been trained to handle and understand it, deidentifying records using code numbers, and appropriately disposing of identifiable records (Fisher, 2009). A good policy for data maintenance is that data should be kept for a minimum of 7 y after the last service delivery date or 3 y after a minor reaches the age of 18 (whichever is later), as is recommended by the APA record-keeping guidelines (APA, 2002; Fisher, 2009). Raw data and the instruments used for assessment purposes should not be released to third parties unless a release of information form has been completed and the third party is trained competently to use such information.
Results Discussion
Test feedback and results discussion should be provided in the form of a carefully constructed report using clear language that fully explains the assessment results. Labels and jargon should be eliminated to increase readability. Information necessary to the purpose of the test should be included, and the inclusion of unnecessary and unrelated information should be avoided (APA, 2002; Fisher, 2009). Additionally, as recommended by the APA (APA, 2002), interpretations should take into consideration the participant's gender, race, ethnicity, age, national origin, sexual orientation, religion, disability, language, or socioeconomic status. Participants should receive assessment information and feedback related to their performance on the assessment and should be informed of ways in which they could personally use the test results or how this information may be used by a third party (only if written permission was given to release such information). The information released to the participant should be presented in a verbal or written report and presented in such a way that it may not cause harm to the test taker (Etzel et al., 1998). However, information such as numerical scores or specific responses should not be released to individuals not qualified to interpret such information (Fisher, 2009; Tranel, 1995).
Demand Characteristics
In the sport context, several groups of individuals may be interested in the assessment results of athletes. Interested parties may include coaches, managers, teams, students, or administrators. However, the potential of a third party reviewing the test results may increase social desirability and result in invalid and unreliable information. Therefore, undue pressure to complete an instrument or battery should be considered as a contextual factor.
Another potentially undesirable effect of a third party viewing the test taker's results may be assessment anxiety. The APA standards state that if a test taker is observed to be anxious or reports feeling anxious, this feeling should be taken into account and become a limitation in the interpretation of test data (APA, 2002). Assessment anxiety may be exaggerated in situations where a third party may have access to results. These situations may also lead to faking good or faking bad on the part of respondents who are concerned about how the results may be used. This must also be considered when evaluating the results.
Supervision of Subordinates
In some cases, professionals may hire and train subordinates to help with assessment and measurement tasks. These subordinates may administer, score, and even interpret the results of measurement and assessment. Standard 2.05 of the APA ethics code (APA, 2002) states that professionals utilizing employees, supervisees, or research and teaching assistants for such purposes should take reasonable precautions to put subordinates in situations where (a) they do not face possibly harmful multiple relationships with the client that could affect their objectivity, (b) they are competently trained to perform the delegated task on their own or with supervision, or (c) they are supervised for competent service delivery. Therefore, when using subordinates to help with tasks such as administration, scoring, or interpretation, the professional assumes primary responsibility and liability to ensure that the services are being provided competently. The professional needs to ensure that subordinates are well trained with all potential instruments. To do so, the professional must provide appropriate training, experience, and supervision as well as continue to check the subordinates' work to ensure its quality. As with licensed professionals, not all subordinates have the same competencies with regard to all instruments.
Learn more about Measurement in Sport and Exercise Psychology.
Tools to measure the physical self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002).
Tools to Measure the Physical Self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002). However, following the research of Shavelson and colleagues (1976), a number of multidimensional self-concept instruments containing one or more PSC scales were developed. Thus, in a 1974 review, Wylie concluded that at the time most self-concept instruments focused on global self-concept or self-esteem rather than specific domains such as PSC. Although several of the instruments reviewed by Shavelson and colleagues (1976) contained items relating to physical skills and elements of physical appearance, none provided a clearly interpretable measure of PSC. From a practical perspective, these older instruments appear to be of little value for sport and exercise psychologists. The major exception, perhaps, is the Physical Estimation and Attraction Scale (PEAS; Sonstroem, 1978, 1997), along with the theoretical model on which it is based. This instrument was designed to measure two global components: estimation (competency) and attraction. While the PEAS may not be the instrument of choice today, it has a historical significance in that its research incorporated many of the features of the construct validity approach advocated in this chapter, it was heuristic, and it provided an important basis for subsequent research.
In a subsequent 1989 review, Wylie identified several multidimensional self-concept instruments measuring one or more components of PSC that can be differentiated from other specific domains of self-concept and general self-concept. Included in the list were the three SDQ instruments already discussed. Wylie also evaluated Harter's (1985) Self-Perception Profile for Children, which contains two PSC scales (athletic competence and physical appearance). Other multidimensional instruments containing physical scales that were not reviewed by Wylie include the Self-Rating Scale (Fleming & Courtney, 1984), which measures physical ability and physical appearance; the Song and Hattie Test (Hattie, 1992), which measures physical appearance; and the Multidimensional Self-Concept Scale (Bracken, 1996), which has a physical scale that includes physical competence, physical appearance, physical fitness, and health. The Tennessee Self-Concept Scale (Fitts, 1965) is a multidimensional self-concept instrument that also purports to measure PSC. In their review and empirical evaluation of this instrument, Marsh and Richards (1988) found distinguishable physical components reflecting health, neat appearance, physical attractiveness, and physical fitness that were incorporated into a single PSC score. This detailed breakdown of the Tennessee physical scale was supported by relationships with the SDQ physical ability and physical appearance scales in an MTMM study comparing responses to the two instruments. Because each of the clusters based on responses to the Tennessee instrument is represented by only a few items, it is not appropriate to use the instrument to measure these distinct components of PSC. Marsh and Richards argued that PSC measures that combine and confound a wide range of differentiable physical components—such as those based on the Tennessee Self-Concept Scale—should be interpreted cautiously (see similar comments by Fox & Corbin, 1989).
In summary, although multidimensional self-concept instruments based on Shavelson and colleagues' (1976) model provided good support for the construct validity of the physical ability and appearance scales (e.g., Marsh, 2002; Marsh & Peart, 1988), they left unanswered the question of whether PSC is more differentiated than can be explained in terms of one (physical ability) or two (ability, appearance) physical scales. Subsequent PSC instruments were developed specifically to address the issue of the multidimensionality of PSC.
Physical Self-Perception Profile
The Physical Self-Perception Profile (PSPP; Fox, 1990; Fox & Corbin, 1989) is a 30-item inventory that consists of four specific scales and one general physical self-worth factor. The PSPP was developed to document the physical self-perceptions of college students. It was designed to reflect the advances made by Harter (1985) and Shavelson and colleagues (1976) in identifying the physical self as an important construct to measure in its own right and to reflect the hierarchical, multidimensional nature of the physical self. A qualitative approach was used to reveal dimensions of physical self-esteem salient to the population sampled (Fox & Corbin, 1989). The PSPP consists of five 6-item scales of sport (perceived sport competence), body (perceived bodily attractiveness), strength (perceived physical strength and muscular development), condition (perceived level of physical conditioning and exercise), and physical self-worth. Fox (1990) recommended that the 10-item Rosenberg Self-Esteem Scale (Rosenberg, 1965) be used alongside the PSPP to provide a global measure. Fox (1990) reported factor analyses indicating that each item loads most highly on the factor that it is designed to measure and that individual scale reliabilities are in the .80s.
The PSPP research demonstrates (a) good reliability (coefficient alpha of .80-.95; Fox, 1990; Page, Ashford, Fox, & Biddle, 1993; Sonstroem, Speliotis, & Fava, 1992); (b) good test-retest stability over the short term (rs of .74-.89; Fox, 1990); (c) a well-defined, replicable factor structure as shown by CFA (Fox & Corbin, 1989; Sonstroem, Harlow, & Josephs, 1994); (d) convergent and discriminant validity in studies showing PSPP relationships with external criteria such as exercise behaviors, mental adjustment variables, and health complaints (Fox & Corbin, 1989; Sonstroem & Potts, 1996); and (e) applicability for an older adult population (Sonstroem et al., 1994). However, correlations among the PSPP scales are consistently so high (.65-.89 when disattenuated for measurement error; Marsh, Richards, Johnson, Roche, & Tremayne, 1994) that they detract from the instrument's ability to differentiate among the different PSC factors it purports to measure.
Subsequently, a version of the PSPP for children and adolescents was developed and validated—the Children and Youth Physical Self-Perception Profile (CY-PSPP; Eklund, Whitehead, & Welk, 1997; Whitehead, 1995). Like the PSPP, the CY-PSPP is a 30-item inventory consisting of the same five 6-item scales. The CY-PSPP is a substantially revised version of the PSPP that is most appropriately thought of as a different instrument. The CY-PSPP body, strength, and conditioning subscales are based on minor adaptations of the PSPP to make them more suitable for children. However, the global self-worth (self-esteem) and sport scales are completely different. The PSPP did not have a self-esteem scale of its own but included 6 items adapted from the Rosenberg Self-Esteem Scale. On the CY-PSPP, global self-esteem and sport scales from the PSPP were dropped and replaced with corresponding scales from Harter's (1985) Self-Perception Profile for Children. Correlations among factors remained high (e.g., physical self-worth with attractive body adequacy = .8). Eklund and colleagues (1997) suggested that these results are consistent with the developmental patterns among children, as differentiation in self-concept is less defined at younger ages (Harter, 1985). CFAs have supported the instrument's factor structure, with both the CFI (comparative fit index) and NNFI (non-normed fit index) indexes exceeding the .90 criterion for good model fit (Eklund et al., 1997). Moderate correlations (r = .39-.45) with external criteria such as physical activity and physical fitness have demonstrated its convergent and discriminant validity (Welk & Eklund, 2005). The CY-PSPP has been validated with adolescents (Jones, Polman, & Peters, 2009; Welk, Corbin, & Lewis, 1995; Whitehead, 1995) and younger children (Welk, Corbin, Dowell, & Harris, 1997) and has been validated and translated into other languages (åsci, Eklund, Whitehead, Kirazci, & Koca, 2005; Raustorp, Ståhle, Gudasic, Kinnunen, & Mattsson, 2005; Raustorp, Mattsson, Svensson, & Ståhle, 2006).
Both the PSPP and CY-PSPP use a nonstandard response format based on Harter (1985), in which each item consists of a matched pair of statements, one negative and one positive (e.g., “Some people feel that they are not very good when it comes to sports” but “Others feel that they are really good at just about every sport”). Each item consists of two contrasting descriptions, and respondents are asked which description is most like them and whether the description they select is “Sort of true of me” or “Really true of me.” Responses are scored on a scale of 1 to 4, with 1 representing a “Really true of me” response to the negative statement and 4 representing a “Really true of me” response to the positive statement. Whereas this response format is designed to reduce the influence of social desirability, Wylie's (1989) review of Harter's original instruments provided little or no support for this suggestion, and Marsh and colleagues (1994) suggested that there were substantial method effects associated with the nonstandard response scale. This format has also been shown to be confusing, particularly for children (Eiser, Eiser, & Haversmans, 1995), and even for adults (Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994), unless special care is taken to explain the response scale. Using the suggestion of Marsh and colleagues (1994) that confusion over the structured alternative response scale could be overcome by more detailed instructions at the outset, researchers implementing the CY-PSPP used large illustrations for a sample item (Whitehead, 1995). Wichstrom (1995) found that responses for this format were psychometrically stronger when based on typical Likert responses rather than the structured alternative format, but Welk and colleagues (1997) suggested that the nonstandard response scale on the CY-PSPP worked better than Likert responses worked.
In summary, the PSPP and the CY-PSPP are established instruments that have been translated into several languages and have been used with a range of populations. However, the format and the high correlations among factors in both instruments may limit their usefulness in some settings. The CY-PSPP is a substantially revised version of the PSPP specifically developed for children. Although the CY-PSPP should be used instead of the PSPP for child and adolescent samples, it might even be stronger than the original PSPP is for adult samples.
Subsequent to the completion of this chapter, Lindwall and colleagues (2011) published a revised version of the PSPP (PSPP-R). They reviewed critiques of the PSPP response scale such as those noted here (e.g., Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994) and acknowledged that “the idiosyncratic alternative response format has been difficult to understand for some participants” (pp. 310-311). In recognition of these problems, the idiosyncratic response scale that has been such a salient feature of the PSPP was dropped altogether and replaced with a 4-point Likert response using only positively worded items. Lindwall and colleagues (2011) demonstrated the appropriateness of the revised PSPP-R based on a large sample (N = 1,831) of participants from four countries (Sweden, Great Britain, Portugal, and Turkey). However, they did not indicate whether the PSPP-R supersedes the PSPP or is merely an alternative to it. There also wasn't any discussion of the implications for other instruments using similar idiosyncratic response scales (e.g., PSPP- related instruments such as CY-PSPP or Harter's instruments more generally).
Physical Self-Inventory
The Physical Self-Inventory (PSI) is a French adaptation of the PSPP that was originally developed for use with Francophone adults (Ninot, Delignières, & Fortes, 2000). In two preliminary studies, Ninot and colleagues used the nonstandard response scale from the PSPP. However, consistent with previous research (Marsh et al., 1994), they reported that this response scale was problematic. In a third study, the authors used a 6-point Likert response scale; factor analysis results were reasonable, but reliability coefficients were not completely satisfactory. Next the authors replaced the PSPP global physical items with items from the SDQ physical scale and the PSPP global self-esteem items with items from Coopersmith (1967). The final PSI consists of 25 items measuring six PSC factors (four specific and two global, as with the PSPP) and has satisfactory psychometric properties that have been confirmed in subsequent French studies of adults (Masse, Jung, & Pfister, 2001; Stephan, Bilard, Ninot, & Delignières, 2003; Stephan & Maïano, 2007).
Maïano and coworkers (2008) subsequently constructed a short form of the PSI for use with adolescents. They found that not all items from the adult PSI worked with adolescents, but they were able to construct 18-item (PSI-SF, 3 items per scale) and 12-item (PSI-VSF, 2 items per scale) versions that had good psychometric properties. In particular, the measurement and hierarchical structures were consistent with proposals by Fox and Corbin (1989) and were fully invariant across gender. Maïano and coworkers also noted that PSI-SF responses showed very high test-retest stability. Comparison of the PSI-SF and PSI-VSF demonstrated that the measurement model, mean structure, structural parameters, and criterion-related validity were equivalent across samples and versions. Nevertheless, the authors noted a serious limitation that all versions of the PSI share with the PSPP: Very high correlations among the six PSC factors (correlations among latent factors) that, according to the authors, bring “into question the real independence of some of the models' sub-dimensions, and by extension their discriminant validity, a finding that has already been observed by Marsh (2002; Marsh et al., 2006) on analyses of the PSPP” (Maïano et al. 2008, p. 844). However, Maïano and colleagues also noted that because they used a traditional Likert response scale, the high correlations apparently were not due to the structured alternative format used in the PSPP. In summary, particularly the short and very short forms of the PSI have made a potentially important contribution to applied research. However, further research is needed to evaluate more fully the robustness of support for construct validity and application in non-French-speaking settings.
Richards Physical Self-Concept Scale
The Richards Physical Self-Concept Scale (RPSCS; Marsh et al., 1994; Richards, 1988) is a 35-item instrument designed to measure six specific components of PSC (body build, appearance, health, physical competence, strength, action) and one general physical satisfaction factor. Each item is a simple declarative statement, and subjects respond on an 8-point true-false scale. Extensive research in Australia (e.g., Marsh et al., 1994; Richards, 1988) has indicated that RPSCS responses have good psychometric properties. The factor structure is very robust, generalizing well over ages from 8 to 80 y and over gender.
RPSCS research has demonstrated (a) good reliability (coefficient alpha of .79-.93; Marsh et al., 1994; Richards & Marsh, 2005); (b) good test-retest stability over the short term (coefficient alpha of .77-.90 over 3 wk; Richards, 1988); (c) a well-defined, replicable factor structure as shown by CFA (Marsh et al., 1994; Richards, 2004); (d) a factor structure that is invariant across gender, as shown by multiple-group CFA (Richards, 2004), and across a wide age range; (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (Marsh et al., 1994; Richards & Marsh, 2005); and (f) applicability for participants aged 8 to 60 y and for both genders (Marsh et al., 1994; Richards, 1988, 2004; Richards & Marsh, 2005). In summary, the RPSCS is regarded as a valid, reliable, and structurally sound instrument that has been tested across both genders and a wide population of ages. The applicability across such a wide range of ages is a particular strength.
Physical Self-Description Questionnaire
Extending Fleishman's (1964) classic research on the structure of physical fitness, the Physical Self-Description Questionnaire (PSDQ) scales reflect some of the original SDQ scales and parallel physical fitness components identified in a CFA of physical fitness measures (Marsh, 1993). The PSDQ consists of nine specific components of PSC (strength, body fat, activity, endurance and fitness, sport competence, coordination, health, appearance, and flexibility), a global physical scale, and a global self-esteem scale. Each of the 70 PSDQ items is a simple declarative statement, and individuals respond on a 6-point true-false scale. The PSDQ is designed for adolescents but is also appropriate for older participants.
PSDQ research has demonstrated (a) good reliability (median coefficient alpha of .92) across the 11 scales (Marsh, 1996b; Marsh et al., 1994); (b) good test-retest stability over the short term (median r = .83 over 3 mo) and longer term (median r = .69 over 14 mo; Marsh, 1996b); (c) a well-defined, replicable factor structure as shown by CFA (Marsh, 1996b; Marsh et al., 1994); (d) a factor structure that is invariant over gender as shown by multiple-group CFA (Marsh et al., 1994); (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (see Marsh et al., 1994); (f) convergent and discriminant validity as shown by PSDQ relationships with external criteria (e.g., measures of body composition, physical activity, endurance, strength, and flexibility; see Marsh, 1996a, 1997); and (g) applicability for participants aged 12 to 18 y (or older) and for elite athletes and nonathletes (Marsh, Hey, Roche, & Perry, 1997; Marsh, Perry, Horsely, & Roche, 1995). In summary, the PSDQ is a psychometrically strong instrument.
Marsh, Martin, and Jackson (2010) recently presented a new short form of the PSDQ (PSDQ-S). This short form balances brevity and psychometric quality in relation to established guidelines for evaluating short forms (e.g., Marsh, Ellis, Parada, Richards, & Heubeck, 2005; Smith, McCarthy, & Anderson, 2000) with the construct validity approach that is the basis of PSDQ research. Based on the PSDQ normative archive, 40 of 70 items were selected and evaluated in a new cross-validation sample (N = 708 Australian adolescents). To test the generalizability of results, the authors considered four additional samples: Australian adolescent elite athletes (n = 349), Spanish adolescents (n = 986), Israeli university students (N = 395), and Australian senior citizens (n = 760). Reliabilities for the 40 PSDQ-S items were consistently high in the cross-validation sample (.81-.94; median = .89) and senior sample (.81-.94; median = .91) and reliabilities in the cross-validation sample were higher than they were in comparable groups completing the 70-item PSDQ. The PSDQ-S factor structure in the cross-validation sample was well defined and highly similar to that based on the archive sample as well as to those based on the other four groups. Study 1, using a missing-by-design variation of multigroup invariance tests, showed that invariant factor structures were invariant based on 40 PSDQ-S items and 70 PSDQ items. Study 2 demonstrated factorial invariance of responses over 1 y (test-retest correlations of .57-.90; median = .77) and good support for convergent and discriminant validity in relation to time. Study 3 showed good and nearly identical support for convergent and discriminant validity of PSDQ and PSDQ-S responses in relation to responses on the PSPP and PSC instruments. The four studies reported by Marsh and coworkers demonstrated new, evolving strategies for the construction and evaluation of short forms that support the PSDQ-S. The authors concluded that the strong support for the psychometric properties and construct validity of the widely used PSDQ instrument generalizes very well to the PSDQ-S.
Elite Athlete Self-Description Questionnaire
The PSC instruments discussed thus far may be suitable for elite athletes (e.g., Marsh et al., 1995). There may, however, be other components to PSC that are particularly relevant for elite athletes, and thus the Elite Athlete Self-Description Questionnaire (EASDQ; Marsh, Hey, Roche, et al., 1997; Marsh, Hey, Johnson, & Perry, 1997) was developed to address these other components. For the EASDQ, it was hypothesized that overall performance by elite athletes is a function of skill level, body suitability, aerobic and anaerobic fitness, and mental competence. Thus Marsh and colleagues developed the EASDQ to measure these six factors. For each scale, they developed a pool of items that sport psychologists at the Australian Institute of Sport evaluated for their suitability for elite athletes. Pilot studies were conducted to select the best items to represent each factor. A compromise between brevity and psychometric soundness was achieved, with acceptable levels of reliability (e.g., all scales having reliability estimates of at least .8) based on short scales (4-6 items per scale).
EASDQ research demonstrates (a) adequate reliability (median coefficient alpha of .85) across the six scales (Marsh, Hey, Johnson, et. al., 1997); (b) a well-defined, replicable factor structure as shown by CFA (Marsh, Hey, Johnson, et. al., 1997; Marsh, Hey, Roche, et al., 1997); (c) applicability for elite athletes aged 12 y or older (Marsh, Hey, Roche, et al., 1997); and (d) predictive validity as shown by its ability to predict swimming performances in world championships after controlling for previous personal best performances (Marsh & Perry, 2005). In summary, the EASDQ is a reliable and valid instrument for elite athletes of all ages. More research is needed, however, to relate EASDQ responses to external validity criteria such as those used in PSDQ research and to criteria that are more specific to elite athletes (e.g., actual performance in competition).
Learn more about Measurement in Sport and Exercise Psychology.
Assess intrinsic and extrinsic motivation in sport and exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted.
Evaluation of Measures of Intrinsic and Extrinsic Motivation in Sport and Exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted. Certain criteria have guided the selection of the measures presented in this section. First, we have selected measures that are fully developed instruments that have gone through extensive validation steps. Second, we have chosen scales that have been used in research, published or unpublished, during the past 10 years. Scales that have not been used during that time frame are considered to be obsolete and are not reviewed. Finally, in light of recent theoretical development and because of space limitation, we have focused on motivation scales that assess intrinsic and extrinsic motivation independently of determinants and outcomes, while focusing on the perceived reasons of behavior. Our earlier discussion on the definitions of intrinsic and extrinsic motivation makes it possible to classify the different measures. The measures can vary in terms of the level of generality (situational versus contextual level) and the area (sport versus exercise). This classification appears in table 25.1. Table 25.2 (see p. 291) provides additional information on the concept of, dimensions of, publication source of, and where to obtain the scale. As can be seen, seven measures are reviewed. For each one, we present (a) a description of the instrument, (b) the conceptual and theoretical rationale underlying its scale development, (c) the available evidence concerning its psychometric properties (e.g., factorial validity, reliability, and construct validity), and (d) a broad assessment of the strengths and weaknesses associated with each measure.
Measures Used in Sport
In this section, we review the SMS (Brière et al., 1995; Pelletier et al., 1995), the Sport Motivation Scale-6 (SMS-6; Mallett, Kawabata, Newcombe, Otero-Rorero, & Jackson, 2007), the Behavioral Regulation in Sport Questionnaire (BRSQ; Lonsdale, Hodge, & Rose, 2008), the Pictorial Motivation Scale (PMS; Reid, Vallerand, Poulin, Crocker, & Farrell, 2009), and the SIMS (Guay et al., 2000).
Sport Motivation Scale
The SMS was developed (Brière et al., 1995; Pelletier et al., 1995) in order to assess contextual intrinsic and extrinsic motivation from a multidimensional perspective, as well as amotivation. The SMS has been the most often used motivation measure in sport, being employed with a variety of athletes (recreational to elite), age groups (adolescent to senior), and cultures (e.g., Canada, United States, United Kingdom, Bulgaria, Australia, Spain, and New Zealand). In fact, the SMS has been translated and validated in several languages (see Pelletier & Sarrazin, 2007). The SMS is based on SDT (Deci & Ryan, 1985) and is made up of seven subscales assessing amotivation; external, introjected, and identified regulation; and intrinsic motivation to know, to experience stimulation, and to accomplish. In line with SDT, motivation is assessed as the perceived reasons for participation, or the why of behavior. At the beginning of the scale, participants are asked, “In general, why do you practice your sport?” The items represent the perceived reasons for engaging in the activity, thus reflecting the different types of motivation.
The original scale was developed in French as L'Échelle de Motivation dans les Sports (Brière, Vallerand, Blais, & Pelletier, 1995) and was validated in three steps. The first step involved generating a pool of items explaining various reasons for sport participation through interviews with French Canadian athletes (aged 17-20 y). These reasons were then used to formulate items for the seven subscales of the French SMS. In the second step, a committee of experts evaluated the content validity of the items and eliminated those that were thought to be inadequate. Another sample of athletes from various sports completed the scale. Results from an exploratory factor analysis (EFA) provided support for a seven-factor structure with 4 items per subscale; this second step thus resulted in a 28-item scale. In the third and final step, two additional studies were conducted to further validate the scale. These studies included approximately 500 individuals, most of whom were involved in recreational sports. Results from confirmatory factor analyses (CFA) and correlational analyses confirmed the seven-factor structure, the subscale internal consistency (ranging from .65-.96), and moderate to high indexes of temporal stability (ranging from .54-.82) over 1 month. Furthermore, inspection of correlations among the seven SMS subscales provided support for the simplex pattern proposed by SDT. Results of correlations also showed that (in line with SDT) the most self-determined forms of motivation (intrinsic motivation and identified regulation) were related more strongly to determinants such as autonomy support from coaches and feelings of competence than to other forms of motivation (external and introjected regulation) and amotivation. Similar results were obtained with motivational outcomes such as positive affect, concentration, and intentions to pursue engagement in sport. In sum, adequate construct validity was obtained for the French form of the SMS.
The translation of the French SMS into English involved back-translation and committee procedures as suggested by Vallerand (1989). Pelletier and colleagues (1995) conducted two studies involving college athletes from various sports in order to assess the psychometric properties of the English form of the SMS. Results from CFA with a sample of 593 Canadian university athletes revealed adequate fit indices or the hypothesized seven-factor model (see the Adjusted Goodness of Fit Index and the Normed Fit Index both > .90 and the Root Mean Square Residual < .08), and correlations with determinants and outcomes supported the simplex model. Moreover, internal consistency above .70 was obtained on all of the subscales except the identified subscale (.63). Test-retest correlations were acceptable and very similar to those obtained with the French SMS, as was the scale construct validity.
Since 1995, the SMS has been used extensively in sport psychology research. The seven-factor structure has been supported repeatedly (e.g., Doganis, 2000; Gillet, Vallerand, & Rosnet, 2009; Li & Harmer, 1996; Shaw, Ostrow, & Beckstead, 2005; Standage, Duda, & Ntoumanis, 2003). In addition, Hu and Bentler (1999) obtained support for a five-factor model by combining the three types of intrinsic motivation into one factor. Similar results were obtained by Gillet and colleagues (2009) with the French SMS. However, some studies have not supported the seven-factor model (Hodge, Allen, & Smellie, 2008; Mallett, Kawabata, & Newcombe, 2007; Mallett, Kawabata, Newcombe, & Otero-Rorero, 2007; Martens & Webber, 2002). Why is there such a discrepancy between these two sets of studies? One possibility lies in the populations from which the different samples were taken. Specifically, the SMS was validated using adolescent and young adult athletes and not older athletes. Because of this specific focus, some of the items may reflect a participation rather than an elite orientation, which is more in line with the younger population. For instance, an identified regulation item reads, “Because sport is one of the best ways to maintain good relationships with my friends.” Such an item seems more relevant for a younger population. An older, high-level athlete may disagree with this item but still display a high level of identified regulation for a sport (but not for relationship reasons). Future research using the SMS with different age groups and proficiency levels is needed to clarify this issue.
Whereas the internal consistency of the SMS has systematically shown adequate values, some values below .70 have been found. This is especially the case for the identified regulation subscale (Brière et al., 1995; Kingston, Horrocks, & Hanton, 2006; Li & Harmer, 1996; Pelletier et al., 1995), although some lower values (below .70) have been obtained with the introjected (McNeill & Wang, 2005; Perreault & Vallerand, 2007; Riemer, Fink, & Fitzgerald, 2002; Standage, Duda, & Ntoumanis, 2003) and external regulation (Standage, Duda, & Ntoumanis, 2003) and amotivation subscales (Standage, Duda, & Ntoumanis, 2003). However, very few instances of values below .60 have been obtained. It should be noted that a Cronbach alpha of .60 with only 4 items is acceptable because, as noted by Cronbach (1951), the coefficient alpha underestimates the internal consistency of scales with a low number of items. This is because the coefficient alpha includes the number of items in the formula. For instance, given the same average interitem correlation, a 3-item scale coefficient alpha value of .56 is equivalent to an alpha value of .81 on an 8-item scale!
In line with the original work of Ryan and Connell (1989) and the initial SMS validation procedures (Brière et al., 1995; Pelletier et al., 1995), construct validity has been assessed by other authors in two fashions: (1) with the simplex pattern of correlations among the subscales and (2) with correlations between motivational factors and their determinants and consequences. We do not have space to review all studies. However, overall, there is overwhelming support for the construct validity of the SMS both in French and English. For instance, in addition to finding support for the simplex pattern, Pelletier and Sarrazin (2007) concluded in their review of the evidence that the SMS has been used with success to predict a great variety of specific outcomes and consequences (such as burnout, exercise dependence among endurance athletes, fear of failing, adaptive coping skills, perceptions of constraints, flow, vitality and well-being, sporting behavior orientations, aggression, and performance) in a manner that is consistent with SDT. These findings provide strong support for the construct validity of the SMS.
In sum, the SMS has some positive features. First, it is a multidimensional instrument that assesses different types of intrinsic and extrinsic motivation as well as amotivation. Second, the scale focuses on the why of behavior and thus items are not confounded with determinants and consequences. Finally, it has some excellent psychometric properties. Nevertheless, some limitations should be underscored. First, although internal consistency levels have been acceptable overall, some subscales, especially the identified regulation subscale, have yielded relatively low coefficient alphas at times. Second, the SMS does not assess integrated regulation. Third, the seven-factor structure has not always been supported by CFAs. According to Pelletier, Vallerand, and Sarrazin (2007), this may be explained by a host of factors, including differences in sample sizes, variations in the way the instrument is administrated, or some other characteristics specific to the context of the study. However, as already indicated, it is also possible that the SMS is better suited for a younger, nonelite athlete population. Clearly, future research on this issue is in order.
Sport Motivation Scale-6
Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed another version of the SMS, the SMS-6. This scale has the same underlying rationale that the original SMS scale but was designed to improve the original version of the SMS by including an integrated regulation subscale and attempting to solve some of the inconsistencies with the factor structure and some of the relatively low internal consistency values (below .70). The SMS-6 comprises 24 items, 4 for each of the six subscales, which include amotivation; external, introjected, identified, and integrated regulation; and general intrinsic motivation. Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed 5 items for the integrated regulation subscale as well as 7 other items (4 of which were kept in the final scale) to replace some items in the original SMS. Two samples were used to validate the SMS-6. Sample 1 was composed of 501 first-year university students participating in competitive sport at least twice per week and 113 elite athletes representing Australia at the international level (for a total of 614 participants). Sample 1 was used to derive a factor structure that included the SMS items as well as the reformulated and integrated regulation items. Sample 2 was composed of 557 university students who were engaged in a variety of sports or physical activities twice per week. The second sample was used to confirm the structure of the SMS-6. Participants also completed the Dispositional Flow Scale (DFS).
Results of a CFA with the SMS-6 (with sample 2) provided support for the factor structure as well as for the internal consistency values (all above .70). Concerning the construct validity of the SMS-6, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) reported a rather weak simplex pattern of correlations among the subscales. More specifically, external regulation correlated highly with intrinsic motivation (r = .54), while the correlation between identified regulation and intrinsic motivation was very high (r = .91) and was higher than the one between integrated regulation and intrinsic motivation (r = .75). The construct validity of the SMS-6 was not fully supported, as some of the correlations involving the SMS and flow were not as expected by SDT. For instance, the distinctions among integrated regulation, identified regulation, and intrinsic motivation were not always clear. Furthermore, external regulation revealed some positive and sometimes strong correlations with flow, contrary to hypotheses derived from SDT.
In sum, the SMS-6 contains some nice features. First, it contains an integrated regulation subscale. Furthermore, the addition of 4 new items may make the SMS more acceptable for older and more experienced athletes. Second, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) presented results supporting the validity of a variation of the SMS-6, the SMS-8. The SMS-8 contains the same items that the SMS-6 contains but assesses the three types of intrinsic motivation rather than general intrinsic motivation. The SMS-6 also shows some limitations. First, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) proposed 7 new items to replace those that were presumably problematic in the original SMS. However, only 4 of these items made it to the final version. Thus, it appears that the SMS-6 retained much of the original SMS. Second, even some of the new items appear problematic and may not assess the desired construct (see Pelletier et al., 2007). For instance, a new amotivation item (“I don't seem to be enjoying my sport as much as I previously did”) seems to reflect a decrease in intrinsic motivation rather than amotivation. Finally, results from Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) demonstrated that the integrated regulation subscale may lack discriminant validity, leading to results with flow highly similar to identified regulation and intrinsic motivation.
Behavioral Regulation in Sport Questionnaire
Lonsdale and colleagues (2008) developed the BRSQ to create an alternative measure of elite sport motivation as conceptualized by SDT. However, in contrast to Mallett, Kawabata, Newcombe, and Otero-Rorero (2007), these authors used a complete new pool of items developed by SDT experts and competitive athletes. There are two versions of the BRSQ. The BRSQ-8 contains 32 items assessing integrated, identified, introjected, and external regulation; amotivation; and the three forms of intrinsic motivation (knowledge, experience stimulation, and accomplishment) identified by Vallerand (1997). The BRSQ-6 contains the same items but assesses general intrinsic motivation rather than all three types of intrinsic motivation, for a total of 24 items.
Lonsdale and colleagues (2008) conducted a series of three studies to validate the scale. In the first study, the factorial validity and the internal consistency were assessed with 382 New Zealand elite athletes. Results from a CFA on the 32 items supported the factor structure of the BRSQ. Specifically, fit indexes were acceptable and all items loaded significantly on the appropriate factors (they ranged from .58-.91). Finally, internal consistency of the eight subscales, measured with the Cronbach alpha, showed high values ranging from .71 to 91. Additionally, 1 wk test-retest reliability was tested with 34 competitive adult athletes. Coefficient alphas for all subscales supported the temporal reliability (values ranged from .73-.90).
In a second study with 343 athletes from New Zealand, the results of a CFA on the BRSQ-8 supported once more the factor structure as well as the subscale internal consistency. Lonsdale and colleagues (2008) also showed that the factor structure of the BRSQ-6 model fit the data very well and that subscale coefficient alphas all exceeded .78. Moreover, the construct validity of the BRSQ-6 was assessed by testing for a simplex pattern of correlations among the six subscales. While some relationships were in line with predictions (e.g., amotivation was negatively related to intrinsic motivation), there was a lack of discrimination between some subscales. More specifically, there was no difference between external and introjected regulation scores in terms of their relationships with amotivation. A similar pattern was evident with the identified and integrated regulation subscales, which both had similar high correlations with intrinsic motivation. These results with the simplex pattern were replicated in a third study conducted with nonelite athletes. In this third study, Lonsdale and colleagues also assessed the relationships between the BRSQ-6 and indexes of burnout (Lemyre, Treasure, & Roberts, 2006; Raedeke & Smith, 2001) and flow (Jackson & Eklund, 2002). Overall, results supported hypotheses in line with SDT. Specifically, amotivation and external and introjected regulation showed negative correlations with flow and positive correlations with burnout. The opposite pattern of correlations was found for the self-determined subscales (intrinsic motivation and identified and integrated regulation). However, there was a lack of discrimination between integrated regulation and general intrinsic motivation. Results of another study on burnout (Lonsdale, Hodge, & Rose, 2009) replicated these findings. Thus, overall, the support for the construct validity of the BRSQ-6 appears to be mixed.
It should be underscored that the BRSQ has some nice features. First, the scale is designed in such a way that the researcher can decide to use a multidimensional (BRSQ-8) or unitary (BRSQ-6) conceptualization of intrinsic motivation. Second, the scale is rather short, with 4 items per subscale. Finally, it assesses integrated regulation. At the same time, the BRSQ also displays some limitations. First, additional research is needed on the construct validity of the scale. Whereas there is support distinguishing the self-determined subscales (intrinsic motivation and identified and integrated regulation) from the non-self-determined subscales (external and introjected regulation), the finer discrimination within each type of category appears to be lacking. Such evidence is crucial, and future research is needed in order to show that this scale does indeed assess the SDT constructs rather than two broad sets of subscales tapping self-determined versus non-self-determined motivation. Second, this scale is designed specifically for older participants in competitive sport; it remains to be seen if the BRSQ can be used with younger participants, for whom the integrated regulation subscale may not have full meaning. Finally, research is needed to test the temporal stability of the scale over a time framed longer than 1 week.
Pictorial Motivation Scale
The PMS was designed to measure intrinsic and extrinsic motivation for sport and exercise in people with an intellectual disability. It assesses participants' reasons for engaging in sport and exercise. The scale's main characteristics are drawings depicting each of the 20 items. There are 5 items (pictures) for each of four subscales: intrinsic motivation, self-determined extrinsic motivation (a mixture of integrated and identified regulation), non-self-determined extrinsic motivation (a mixture of introjected and external regulation), and amotivation. These pictures are used to help participants with cognitive difficulties and to help represent the motivational concept depicted in each item.
The original scale was developed in French (Reid, Poulin, & Vallerand, 1994). Results of a study with 62 participants supported the internal consistency, temporal stability, and construct validity, as exemplified by the presence of a simplex pattern among the four subscales. However, the amotivation subscale had poor reliability (α = .52). The French version (Reid et al., 2009) was translated into English according to the back-translation and committee procedures outlined in Vallerand (1989). Then, 6 new items were generated for the less reliable amotivation subscale. Participants in the Special Olympics (n = 160) completed the English version. Results of the CFA confirmed the four-factor structure of the PMS. Furthermore, the internal consistency (Cronbach alphas) ranged from .60 to .71. Finally, the construct validity was assessed by testing for a simplex pattern of correlations among the four subscales. The intercorrelations among latent variables from the CFA provided support for the simplex pattern.
Results from a study conducted with the English version of the PMS involving 80 high school students with mild intellectual disability provided support for the internal consistency, temporal stability (over 3 wk), and construct validity of the PMS with respect to the simplex pattern of correlations among the PMS subscales as well as correlations between the PMS subscales and motivational antecedents (skill and perceived competence) and outcomes (perceived effort) as rated by the physical education teacher. Finally, the internal consistency of each subscale was tested without the pictorial dimension with a subset of 47 high school students with mild intellectual disability. Results indicated poor internal consistency (.91 for intrinsic motivation, .27 for self-determined extrinsic motivation, .20 for non-self-determined extrinsic motivation, and .60 for amotivation). This finding suggests that the scale is not reliable without the drawings.
The preliminary findings with the English version of the PMS are encouraging. Furthermore, this scale is the only one geared for individuals with intellectually disability. The use of drawings to depict the various items makes this scale unique in the field. Nevertheless, the PMS shows some limitations. First, the scale does not differentiate among all forms of intrinsic (knowledge, stimulation, and accomplishment) or extrinsic ( integrated, identified, introjected, and external regulation) motivation. Second, construct validity was tested with only a limited number of variables. Third, it is not known if the scale is usable with children who have severe forms of intellectual disabilities. Clearly additional research is needed on the reliability and validity of the PMS.
Situational Motivation Scale
The SIMS is one of the few scales to assess intrinsic and extrinsic motivation and amotivation at the situational level (Guay et al., 2000). The SIMS is a multidimensional tool that measures four types of motivation: intrinsic motivation, identified regulation, external regulation, and amotivation. The SIMS is made up of 16 items (4 items per subscale) and asks this question: “Why are you currently engaged in this activity?” The items represent potential reasons for task engagement. The scale is worded in such a way that it can be used in most situations (sport and nonsport).
Five studies were reported in the original article. In study 1, the original scale was developed by a committee of experts and completed by 195 French Canadian college students. Results of an EFA revealed a four-factor structure with the final 16 items loading on their respective factor. In study 2, a CFA confirmed the factor structure as well as its invariance across gender. Across the five studies, the internal consistency values of the subscales were acceptable, ranging from .62 to .95 (see Guay et al., 2000). Moreover, across all studies, support was obtained for the construct validity of the SIMS through results from correlations in line with the simplex pattern among the subscales as well as between the SIMS subscales and motivational determinants and consequences. Perhaps of greater interest for the present discussion were the results of study 4, which showed that some subscales (intrinsic motivation and identified regulation) were sensitive enough to detect changes in motivation that took place during two games of a basketball tournament.
Other researchers have also obtained support for the psychometric properties of the SIMS. First, all studies reported acceptable internal consistency values for each subscale (Blanchard, Mask, Vallerand, de la Sablonnière, & Provencher, 2007; Conroy, Coatsworth, & Kaye, 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, Duda, & Prusak, 2003). The coefficient alpha values of all but the amotivation subscale (α = .58) in the Conroy and colleagues study were above .60. Second, support for the factorial validity of the SIMS was obtained through CFAs with one qualification. Whereas the CFA results with the 16 items yielded acceptable fit indexes, removal of 1 item (Jaakkola, Liukkonen, Laakso, & Ommundsen, 2008) and even 2 items (Gillet, Berjot, & Paty, 2009; Standage, Treasure, et al., 2003) yielded better fit indexes. Moreover, Standage, Treasure, and colleagues (2003) conducted multisample CFAs and showed that the pattern of factor loadings was largely invariant across four different samples.
Construct validity of the SIMS was also assessed in several studies (Blanchard et al., 2007; Conroy et al., 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, et al., 2003). In addition to supporting the simplex pattern among the SIMS subscales and between the SIMS subscales and need satisfaction (study 2 of Blanchard and colleagues, 2007), results also supported the postulate from the HMIEM (Vallerand, 1997) for the top-down effect, in which contextual sport motivation was found to predict situational sport motivation (studies 1 and 2 of Blanchard et al., 2007; Jaakkola et al., 2008; Ntoumanis & Blaymires, 2003). Specifically, the more self-determined the motivation was found to be in a specific context (in this case, sport), the more self-determined the motivation was found to be in a given situation. Furthermore, Blanchard and colleagues (2007, studies 1 and 2) found support for another postulate from the HMIEM that suggests that over time, situational motivation in the realm of sport (basketball) has recursive effects on contextual motivation. The more that situational motivation is self-determined, the more that contextual motivation becomes self-determined over time. Finally, Jaakkola and coworkers (2008) demonstrated that, as predicted by the HMIEM, situational self-determined motivation was better than contextual motivation in predicting the situational intensity (as assessed by HR) displayed by students in a physical education class. Overall, these findings provide strong support for the reliability and factorial and construct validity of the SIMS.
The SIMS has several positive features, one of them being that it is the only scale to assess intrinsic and extrinsic motivation and amotivation at the situational level. Furthermore, it does so using only 16 items. Nevertheless, it also has some weaknesses. First, the SIMS does not assess the different types of intrinsic motivation and integrated and introjected regulation, because it was designed to be short. Second, while the factor structure has been supported, it is not clear if some items should be replaced (Gillet, Berjot, et al., 2009; Jaakkola et al., 2008; Standage, Treasure, et al., 2003). Third, research so far has not assessed the validity of the scale with high-performance athletes. Thus, additional research is needed to further test the psychometric properties of the SIMS in sport.
Learn more about Measurement in Sport and Exercise Psychology.
Ethics codes imperative in conducting research
Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body.
Ethics Codes: Their Nature, Purposes, and Application
Ethics codes typically comprise principles and standards. Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body. These general and aspirational statements set the underlying tone for the more specific codes and guide the work-related ethical decision making of professionals. In contrast, ethical standards specify both proscribed and prescribed member behaviors. While not always black and white, these standards serve as a more clear cut and enforceable guide for professional behavior.
Members should apply both the aspirational principles and enforceable standards to shape their thinking and behavior in work settings. Ideally, members self-monitor their own behavior. In an effort to remain ethical, professionals are encouraged to consult with colleagues about ethically challenging situations and to provide constructive feedback about perceived possibly unethical behavior they witness in others.
Assessment and Measurement
A central question to be addressed in this chapter is what are assessment and measurement. Sundberg (1977) defines assessment as the processes used “for developing impressions and images, making decisions and checking hypotheses about another person's pattern of characteristics that determines his or her behavior in interaction with the environment” (p. 21). The assessment process involves collecting and assembling a broad range of objective and subjective information about persons or groups to develop impressions about them; identify their needs; predict how they might think, feel, and behave in future situations; and select and apply interventions based on the content and dependability of that information. Professionals may use multiple assessment methods that include observations of behavior, symptom checklists, surveys and questionnaires, structured and unstructured interview materials, and standardized tests (Bennett et al., 2006). Gardner and Moore (2006) emphasize using a triad of psychological assessment strategies in the practice of clinical sport psychology: (1) initial interviews, (2) behavioral observation, and (3) psychological testing. The nature and assumptions underlying assessment approaches are usually grounded in the theoretical orientation of the professional (Andersen, 2002).
In contrast, measurement can mean many things to many people. It is one of the most common words in the English language and can be used as both a noun and a verb (Lorge, 1967). For the purposes of this chapter, measurement is viewed as an extension of assessment processes. It can be thought of more narrowly as the process of collecting information about psychological characteristics of interest (e.g., attitudes, behaviors, state experiences) using one or more methods or tools (such as those mentioned earlier) to monitor change, the effect of intervention, or treatments postassessment. For example, an educational sport psychology consultant might administer a measure of team cohesion over the course of a competitive season to see how team members perceive their relationships. Another consultant might conduct a preseason baseline screening assessment of cognitive functioning in hockey players and then reevaluate players who incur a mild traumatic brain injury (i.e., concussion) later in the season.
In this chapter, the terms measurement and assessment are used interchangeably. Furthermore, these terms are used to describe the decisions and opinions made by professionals regarding clients with whom they work. As such, measurement and assessment techniques include all methods of gathering information about clients, such as (a) psychological, educational, and neurological tests; (b) data gathered during clinical interviewing; (c) information gathered from significant others (e.g., family members, teachers, friends); (d) direct and indirect observation; and (e) interactions with people via teletherapy (e.g., Internet, phone; Fisher, 2009).
Competence and Education
In order to excel in our professional duties and do well for those we serve, teach, study, and otherwise interact with, we must know what to do and how to do it in a capable manner. The ethics codes mentioned earlier identify the necessity of being knowledgeable and capable in our work. For example, the APA ethical standards provide guidance for organization members in this area, including information about (a) competence limitations, (b) keeping up competence, (c) making sound professional and scientific judgments, (d) delegating work responsibilities to others, (e) engaging in activities in emergencies, and (f) impairment (APA, 2002). Competence in professional behaviors is a personal matter that is frequently challenged. It is the responsibility of professionals to know their limitations and how their knowledge and skills change and require constant upgrading. The APA ethics code also emphasizes the importance of making sound work-related decisions based on scientific knowledge and appropriate discipline-specific practice. This portion of the APA code cautions professionals to be careful when delegating work to others, describes how a professional is responsible for others' work, and explains the necessity of avoiding multiple relationships with those to whom work is delegated. The APA standards note that we can occasionally be thrown into situations in which our competence is stretched; in such cases we need to be very careful, seek supervision if available, and end such work as soon as possible.
Measurement Referral Questions and Appropriateness of Instruments
When selecting assessment instruments, the professional must consider the referral questions that prompted this process (Fisher, 2009; Smith, 1976). The instruments selected should reflect these referral questions and utilize assessment strategies that have appropriate validity and reliability. For example, if a professional is interested in measuring state anxiety for research purposes, an appropriate assessment may be the Competitive State Anxiety Index-2 (CSAI-2; Martens, Burton, Vealey, Bump, & Smith, 1990) as opposed to the State-Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, & Luschene, 1970), which measures both trait and state anxiety. When selecting the assessment, the professional should be aware of limitations or biases regarding cultural sensitivity (see the later section on cultural issues); gender considerations (Etzel, Yura, & Perna, 1998); and age, language, or disability factors that may influence the psychometric qualities of the assessment differently from the way they influenced the normative groups used for the development and validation of the instrument (APA, 2002; Fisher, 2009). It is also important to consider the method of delivery. For example, assessments based on paper and pencil may not have been validated for online use (see the later section on technology), and instruments with elevated reading levels may not be appropriate for certain age or developmental groups. Therefore, the professional should always verify the assessment's validity and reliability when a modified assessment method or group is used (Fisher, 2009). Furthermore, the professional should also attempt to conduct in-person assessments when possible, as a great deal of information can be learned about clients from the way in which they present themselves during the assessment process. This information can affect the richness of the assessment data.
It is also important for professionals to be aware of and competent to assess and use appropriate psychometric strategies for establishing validity and reliability of the instruments they use (AERA, APA, & NCME, 1999). All instruments have unique psychometric properties that affect how they should be administered and interpreted. When validity and reliability issues are not taken into consideration, it is possible to choose and utilize instruments to assess factors that they were not designed to assess. Furthermore, practitioners should be well aware of other psychometric properties such as content and criterion validity and standard error of measurement that may affect how results are interpreted and used. The ethical practitioner needs to be aware of psychometric issues in order to choose appropriate instruments with regard to the referral questions, client characteristics, assessment strategies, and environmental factors.
Consent and Assent
As discussed earlier, the ethical principles for sport and exercise psychology emphasize doing no harm to the client and respecting the individual's rights and dignity (AASP, 1996; APA, 2002). The test taker's right to privacy and confidentiality applies here as well, and the professional should take all necessary precautions to maintain the confidentiality and privacy of the client. To protect the test taker, informed consent must be obtained at the start of the relationship (e.g., research, consultation, therapy). Beyond the informed consent process and before formal assessment, the client or participant should be informed of all pertinent information regarding the assessment process. This information includes (a) the nature and purpose of assessment; (b) any applicable fees; (c) potential involvement of third parties such as a coach, athletic trainer, or manager; (d) limits of privacy and confidentiality (as discussed in the next section); and (e) the timeline for the process and potential feedback (Fisher, 2009). This information should be presented in a clear and understandable manner. Furthermore, this information should be agreed to by the test taker, who thereby gives informed consent. Test takers should engage in assessment of their own free will and must be given the option to withdraw participation without consequences (APA standard 3.10). All necessary information about assessment procedures and findings should be provided in a language or level appropriate for the participant. Furthermore, it is unethical to necessitate or coerce individuals to take part in measurement and assessment for research or practice purposes.
Privacy and Confidentiality and Release of Information
Typically, the ethical standards of organizations with ties to sport psychology (APA ethical standard 4.01 and the AASP) suggest that professionals should not reveal information about clients, test takers, or others without their signed approval to release information or legal requirement. These legal situations may include (a) a test taker who indicates possible self-harm or harm to others (i.e., suicide or homicide), (b) a test taker whose results are subpoenaed by the court, or (c) a test taker who is a minor, in which case the parent or guardian may have access to the data (Etzel et al., 1998). If the test taker or, in the case of a minor, the parent or guardian provides explicit written permission, the specific information identified by the client may be released to the identified parties. Unless these circumstances are met, information from the test taker may not be disclosed to anyone (e.g., coaches, management, parents, administration, athletic trainers, and so on).
In situations where the assessment is requested by a third party (e.g., coaches, management, the court), this third party may also request results from the assessment. It is important for the professional to establish a priori who is the “real client” (Ogilvie, 1979) and to have the ability to control access to the results. Etzel and colleagues (1998) suggest that information about the assessment should be shared only with one predetermined person, unless a release of information form has been completed. Therefore, when engaging in assessments, the professional should set clear boundaries and avoid dual relationships, thereby identifying who is being served (APA standard 4.02a). Another complication of these situations is the role of trust. If athletes or test takers suspect the test results will be used without their permission in decisions regarding performance or other aspects of participation, they may be less likely to respond honestly, thus affecting the validity of the results (see the section on demand characteristics).
Raw Data and Data Storage
Raw data such as the test taker's responses to items, including the professional's notes and final reports, should be stored in locked file cabinets inside the professional's office or in password-protected computer files (Fisher, 2009). Other methods to ensure confidentiality may include limiting access to records to only those people who have a need to know this information and have been trained to handle and understand it, deidentifying records using code numbers, and appropriately disposing of identifiable records (Fisher, 2009). A good policy for data maintenance is that data should be kept for a minimum of 7 y after the last service delivery date or 3 y after a minor reaches the age of 18 (whichever is later), as is recommended by the APA record-keeping guidelines (APA, 2002; Fisher, 2009). Raw data and the instruments used for assessment purposes should not be released to third parties unless a release of information form has been completed and the third party is trained competently to use such information.
Results Discussion
Test feedback and results discussion should be provided in the form of a carefully constructed report using clear language that fully explains the assessment results. Labels and jargon should be eliminated to increase readability. Information necessary to the purpose of the test should be included, and the inclusion of unnecessary and unrelated information should be avoided (APA, 2002; Fisher, 2009). Additionally, as recommended by the APA (APA, 2002), interpretations should take into consideration the participant's gender, race, ethnicity, age, national origin, sexual orientation, religion, disability, language, or socioeconomic status. Participants should receive assessment information and feedback related to their performance on the assessment and should be informed of ways in which they could personally use the test results or how this information may be used by a third party (only if written permission was given to release such information). The information released to the participant should be presented in a verbal or written report and presented in such a way that it may not cause harm to the test taker (Etzel et al., 1998). However, information such as numerical scores or specific responses should not be released to individuals not qualified to interpret such information (Fisher, 2009; Tranel, 1995).
Demand Characteristics
In the sport context, several groups of individuals may be interested in the assessment results of athletes. Interested parties may include coaches, managers, teams, students, or administrators. However, the potential of a third party reviewing the test results may increase social desirability and result in invalid and unreliable information. Therefore, undue pressure to complete an instrument or battery should be considered as a contextual factor.
Another potentially undesirable effect of a third party viewing the test taker's results may be assessment anxiety. The APA standards state that if a test taker is observed to be anxious or reports feeling anxious, this feeling should be taken into account and become a limitation in the interpretation of test data (APA, 2002). Assessment anxiety may be exaggerated in situations where a third party may have access to results. These situations may also lead to faking good or faking bad on the part of respondents who are concerned about how the results may be used. This must also be considered when evaluating the results.
Supervision of Subordinates
In some cases, professionals may hire and train subordinates to help with assessment and measurement tasks. These subordinates may administer, score, and even interpret the results of measurement and assessment. Standard 2.05 of the APA ethics code (APA, 2002) states that professionals utilizing employees, supervisees, or research and teaching assistants for such purposes should take reasonable precautions to put subordinates in situations where (a) they do not face possibly harmful multiple relationships with the client that could affect their objectivity, (b) they are competently trained to perform the delegated task on their own or with supervision, or (c) they are supervised for competent service delivery. Therefore, when using subordinates to help with tasks such as administration, scoring, or interpretation, the professional assumes primary responsibility and liability to ensure that the services are being provided competently. The professional needs to ensure that subordinates are well trained with all potential instruments. To do so, the professional must provide appropriate training, experience, and supervision as well as continue to check the subordinates' work to ensure its quality. As with licensed professionals, not all subordinates have the same competencies with regard to all instruments.
Learn more about Measurement in Sport and Exercise Psychology.
Tools to measure the physical self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002).
Tools to Measure the Physical Self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002). However, following the research of Shavelson and colleagues (1976), a number of multidimensional self-concept instruments containing one or more PSC scales were developed. Thus, in a 1974 review, Wylie concluded that at the time most self-concept instruments focused on global self-concept or self-esteem rather than specific domains such as PSC. Although several of the instruments reviewed by Shavelson and colleagues (1976) contained items relating to physical skills and elements of physical appearance, none provided a clearly interpretable measure of PSC. From a practical perspective, these older instruments appear to be of little value for sport and exercise psychologists. The major exception, perhaps, is the Physical Estimation and Attraction Scale (PEAS; Sonstroem, 1978, 1997), along with the theoretical model on which it is based. This instrument was designed to measure two global components: estimation (competency) and attraction. While the PEAS may not be the instrument of choice today, it has a historical significance in that its research incorporated many of the features of the construct validity approach advocated in this chapter, it was heuristic, and it provided an important basis for subsequent research.
In a subsequent 1989 review, Wylie identified several multidimensional self-concept instruments measuring one or more components of PSC that can be differentiated from other specific domains of self-concept and general self-concept. Included in the list were the three SDQ instruments already discussed. Wylie also evaluated Harter's (1985) Self-Perception Profile for Children, which contains two PSC scales (athletic competence and physical appearance). Other multidimensional instruments containing physical scales that were not reviewed by Wylie include the Self-Rating Scale (Fleming & Courtney, 1984), which measures physical ability and physical appearance; the Song and Hattie Test (Hattie, 1992), which measures physical appearance; and the Multidimensional Self-Concept Scale (Bracken, 1996), which has a physical scale that includes physical competence, physical appearance, physical fitness, and health. The Tennessee Self-Concept Scale (Fitts, 1965) is a multidimensional self-concept instrument that also purports to measure PSC. In their review and empirical evaluation of this instrument, Marsh and Richards (1988) found distinguishable physical components reflecting health, neat appearance, physical attractiveness, and physical fitness that were incorporated into a single PSC score. This detailed breakdown of the Tennessee physical scale was supported by relationships with the SDQ physical ability and physical appearance scales in an MTMM study comparing responses to the two instruments. Because each of the clusters based on responses to the Tennessee instrument is represented by only a few items, it is not appropriate to use the instrument to measure these distinct components of PSC. Marsh and Richards argued that PSC measures that combine and confound a wide range of differentiable physical components—such as those based on the Tennessee Self-Concept Scale—should be interpreted cautiously (see similar comments by Fox & Corbin, 1989).
In summary, although multidimensional self-concept instruments based on Shavelson and colleagues' (1976) model provided good support for the construct validity of the physical ability and appearance scales (e.g., Marsh, 2002; Marsh & Peart, 1988), they left unanswered the question of whether PSC is more differentiated than can be explained in terms of one (physical ability) or two (ability, appearance) physical scales. Subsequent PSC instruments were developed specifically to address the issue of the multidimensionality of PSC.
Physical Self-Perception Profile
The Physical Self-Perception Profile (PSPP; Fox, 1990; Fox & Corbin, 1989) is a 30-item inventory that consists of four specific scales and one general physical self-worth factor. The PSPP was developed to document the physical self-perceptions of college students. It was designed to reflect the advances made by Harter (1985) and Shavelson and colleagues (1976) in identifying the physical self as an important construct to measure in its own right and to reflect the hierarchical, multidimensional nature of the physical self. A qualitative approach was used to reveal dimensions of physical self-esteem salient to the population sampled (Fox & Corbin, 1989). The PSPP consists of five 6-item scales of sport (perceived sport competence), body (perceived bodily attractiveness), strength (perceived physical strength and muscular development), condition (perceived level of physical conditioning and exercise), and physical self-worth. Fox (1990) recommended that the 10-item Rosenberg Self-Esteem Scale (Rosenberg, 1965) be used alongside the PSPP to provide a global measure. Fox (1990) reported factor analyses indicating that each item loads most highly on the factor that it is designed to measure and that individual scale reliabilities are in the .80s.
The PSPP research demonstrates (a) good reliability (coefficient alpha of .80-.95; Fox, 1990; Page, Ashford, Fox, & Biddle, 1993; Sonstroem, Speliotis, & Fava, 1992); (b) good test-retest stability over the short term (rs of .74-.89; Fox, 1990); (c) a well-defined, replicable factor structure as shown by CFA (Fox & Corbin, 1989; Sonstroem, Harlow, & Josephs, 1994); (d) convergent and discriminant validity in studies showing PSPP relationships with external criteria such as exercise behaviors, mental adjustment variables, and health complaints (Fox & Corbin, 1989; Sonstroem & Potts, 1996); and (e) applicability for an older adult population (Sonstroem et al., 1994). However, correlations among the PSPP scales are consistently so high (.65-.89 when disattenuated for measurement error; Marsh, Richards, Johnson, Roche, & Tremayne, 1994) that they detract from the instrument's ability to differentiate among the different PSC factors it purports to measure.
Subsequently, a version of the PSPP for children and adolescents was developed and validated—the Children and Youth Physical Self-Perception Profile (CY-PSPP; Eklund, Whitehead, & Welk, 1997; Whitehead, 1995). Like the PSPP, the CY-PSPP is a 30-item inventory consisting of the same five 6-item scales. The CY-PSPP is a substantially revised version of the PSPP that is most appropriately thought of as a different instrument. The CY-PSPP body, strength, and conditioning subscales are based on minor adaptations of the PSPP to make them more suitable for children. However, the global self-worth (self-esteem) and sport scales are completely different. The PSPP did not have a self-esteem scale of its own but included 6 items adapted from the Rosenberg Self-Esteem Scale. On the CY-PSPP, global self-esteem and sport scales from the PSPP were dropped and replaced with corresponding scales from Harter's (1985) Self-Perception Profile for Children. Correlations among factors remained high (e.g., physical self-worth with attractive body adequacy = .8). Eklund and colleagues (1997) suggested that these results are consistent with the developmental patterns among children, as differentiation in self-concept is less defined at younger ages (Harter, 1985). CFAs have supported the instrument's factor structure, with both the CFI (comparative fit index) and NNFI (non-normed fit index) indexes exceeding the .90 criterion for good model fit (Eklund et al., 1997). Moderate correlations (r = .39-.45) with external criteria such as physical activity and physical fitness have demonstrated its convergent and discriminant validity (Welk & Eklund, 2005). The CY-PSPP has been validated with adolescents (Jones, Polman, & Peters, 2009; Welk, Corbin, & Lewis, 1995; Whitehead, 1995) and younger children (Welk, Corbin, Dowell, & Harris, 1997) and has been validated and translated into other languages (åsci, Eklund, Whitehead, Kirazci, & Koca, 2005; Raustorp, Ståhle, Gudasic, Kinnunen, & Mattsson, 2005; Raustorp, Mattsson, Svensson, & Ståhle, 2006).
Both the PSPP and CY-PSPP use a nonstandard response format based on Harter (1985), in which each item consists of a matched pair of statements, one negative and one positive (e.g., “Some people feel that they are not very good when it comes to sports” but “Others feel that they are really good at just about every sport”). Each item consists of two contrasting descriptions, and respondents are asked which description is most like them and whether the description they select is “Sort of true of me” or “Really true of me.” Responses are scored on a scale of 1 to 4, with 1 representing a “Really true of me” response to the negative statement and 4 representing a “Really true of me” response to the positive statement. Whereas this response format is designed to reduce the influence of social desirability, Wylie's (1989) review of Harter's original instruments provided little or no support for this suggestion, and Marsh and colleagues (1994) suggested that there were substantial method effects associated with the nonstandard response scale. This format has also been shown to be confusing, particularly for children (Eiser, Eiser, & Haversmans, 1995), and even for adults (Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994), unless special care is taken to explain the response scale. Using the suggestion of Marsh and colleagues (1994) that confusion over the structured alternative response scale could be overcome by more detailed instructions at the outset, researchers implementing the CY-PSPP used large illustrations for a sample item (Whitehead, 1995). Wichstrom (1995) found that responses for this format were psychometrically stronger when based on typical Likert responses rather than the structured alternative format, but Welk and colleagues (1997) suggested that the nonstandard response scale on the CY-PSPP worked better than Likert responses worked.
In summary, the PSPP and the CY-PSPP are established instruments that have been translated into several languages and have been used with a range of populations. However, the format and the high correlations among factors in both instruments may limit their usefulness in some settings. The CY-PSPP is a substantially revised version of the PSPP specifically developed for children. Although the CY-PSPP should be used instead of the PSPP for child and adolescent samples, it might even be stronger than the original PSPP is for adult samples.
Subsequent to the completion of this chapter, Lindwall and colleagues (2011) published a revised version of the PSPP (PSPP-R). They reviewed critiques of the PSPP response scale such as those noted here (e.g., Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994) and acknowledged that “the idiosyncratic alternative response format has been difficult to understand for some participants” (pp. 310-311). In recognition of these problems, the idiosyncratic response scale that has been such a salient feature of the PSPP was dropped altogether and replaced with a 4-point Likert response using only positively worded items. Lindwall and colleagues (2011) demonstrated the appropriateness of the revised PSPP-R based on a large sample (N = 1,831) of participants from four countries (Sweden, Great Britain, Portugal, and Turkey). However, they did not indicate whether the PSPP-R supersedes the PSPP or is merely an alternative to it. There also wasn't any discussion of the implications for other instruments using similar idiosyncratic response scales (e.g., PSPP- related instruments such as CY-PSPP or Harter's instruments more generally).
Physical Self-Inventory
The Physical Self-Inventory (PSI) is a French adaptation of the PSPP that was originally developed for use with Francophone adults (Ninot, Delignières, & Fortes, 2000). In two preliminary studies, Ninot and colleagues used the nonstandard response scale from the PSPP. However, consistent with previous research (Marsh et al., 1994), they reported that this response scale was problematic. In a third study, the authors used a 6-point Likert response scale; factor analysis results were reasonable, but reliability coefficients were not completely satisfactory. Next the authors replaced the PSPP global physical items with items from the SDQ physical scale and the PSPP global self-esteem items with items from Coopersmith (1967). The final PSI consists of 25 items measuring six PSC factors (four specific and two global, as with the PSPP) and has satisfactory psychometric properties that have been confirmed in subsequent French studies of adults (Masse, Jung, & Pfister, 2001; Stephan, Bilard, Ninot, & Delignières, 2003; Stephan & Maïano, 2007).
Maïano and coworkers (2008) subsequently constructed a short form of the PSI for use with adolescents. They found that not all items from the adult PSI worked with adolescents, but they were able to construct 18-item (PSI-SF, 3 items per scale) and 12-item (PSI-VSF, 2 items per scale) versions that had good psychometric properties. In particular, the measurement and hierarchical structures were consistent with proposals by Fox and Corbin (1989) and were fully invariant across gender. Maïano and coworkers also noted that PSI-SF responses showed very high test-retest stability. Comparison of the PSI-SF and PSI-VSF demonstrated that the measurement model, mean structure, structural parameters, and criterion-related validity were equivalent across samples and versions. Nevertheless, the authors noted a serious limitation that all versions of the PSI share with the PSPP: Very high correlations among the six PSC factors (correlations among latent factors) that, according to the authors, bring “into question the real independence of some of the models' sub-dimensions, and by extension their discriminant validity, a finding that has already been observed by Marsh (2002; Marsh et al., 2006) on analyses of the PSPP” (Maïano et al. 2008, p. 844). However, Maïano and colleagues also noted that because they used a traditional Likert response scale, the high correlations apparently were not due to the structured alternative format used in the PSPP. In summary, particularly the short and very short forms of the PSI have made a potentially important contribution to applied research. However, further research is needed to evaluate more fully the robustness of support for construct validity and application in non-French-speaking settings.
Richards Physical Self-Concept Scale
The Richards Physical Self-Concept Scale (RPSCS; Marsh et al., 1994; Richards, 1988) is a 35-item instrument designed to measure six specific components of PSC (body build, appearance, health, physical competence, strength, action) and one general physical satisfaction factor. Each item is a simple declarative statement, and subjects respond on an 8-point true-false scale. Extensive research in Australia (e.g., Marsh et al., 1994; Richards, 1988) has indicated that RPSCS responses have good psychometric properties. The factor structure is very robust, generalizing well over ages from 8 to 80 y and over gender.
RPSCS research has demonstrated (a) good reliability (coefficient alpha of .79-.93; Marsh et al., 1994; Richards & Marsh, 2005); (b) good test-retest stability over the short term (coefficient alpha of .77-.90 over 3 wk; Richards, 1988); (c) a well-defined, replicable factor structure as shown by CFA (Marsh et al., 1994; Richards, 2004); (d) a factor structure that is invariant across gender, as shown by multiple-group CFA (Richards, 2004), and across a wide age range; (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (Marsh et al., 1994; Richards & Marsh, 2005); and (f) applicability for participants aged 8 to 60 y and for both genders (Marsh et al., 1994; Richards, 1988, 2004; Richards & Marsh, 2005). In summary, the RPSCS is regarded as a valid, reliable, and structurally sound instrument that has been tested across both genders and a wide population of ages. The applicability across such a wide range of ages is a particular strength.
Physical Self-Description Questionnaire
Extending Fleishman's (1964) classic research on the structure of physical fitness, the Physical Self-Description Questionnaire (PSDQ) scales reflect some of the original SDQ scales and parallel physical fitness components identified in a CFA of physical fitness measures (Marsh, 1993). The PSDQ consists of nine specific components of PSC (strength, body fat, activity, endurance and fitness, sport competence, coordination, health, appearance, and flexibility), a global physical scale, and a global self-esteem scale. Each of the 70 PSDQ items is a simple declarative statement, and individuals respond on a 6-point true-false scale. The PSDQ is designed for adolescents but is also appropriate for older participants.
PSDQ research has demonstrated (a) good reliability (median coefficient alpha of .92) across the 11 scales (Marsh, 1996b; Marsh et al., 1994); (b) good test-retest stability over the short term (median r = .83 over 3 mo) and longer term (median r = .69 over 14 mo; Marsh, 1996b); (c) a well-defined, replicable factor structure as shown by CFA (Marsh, 1996b; Marsh et al., 1994); (d) a factor structure that is invariant over gender as shown by multiple-group CFA (Marsh et al., 1994); (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (see Marsh et al., 1994); (f) convergent and discriminant validity as shown by PSDQ relationships with external criteria (e.g., measures of body composition, physical activity, endurance, strength, and flexibility; see Marsh, 1996a, 1997); and (g) applicability for participants aged 12 to 18 y (or older) and for elite athletes and nonathletes (Marsh, Hey, Roche, & Perry, 1997; Marsh, Perry, Horsely, & Roche, 1995). In summary, the PSDQ is a psychometrically strong instrument.
Marsh, Martin, and Jackson (2010) recently presented a new short form of the PSDQ (PSDQ-S). This short form balances brevity and psychometric quality in relation to established guidelines for evaluating short forms (e.g., Marsh, Ellis, Parada, Richards, & Heubeck, 2005; Smith, McCarthy, & Anderson, 2000) with the construct validity approach that is the basis of PSDQ research. Based on the PSDQ normative archive, 40 of 70 items were selected and evaluated in a new cross-validation sample (N = 708 Australian adolescents). To test the generalizability of results, the authors considered four additional samples: Australian adolescent elite athletes (n = 349), Spanish adolescents (n = 986), Israeli university students (N = 395), and Australian senior citizens (n = 760). Reliabilities for the 40 PSDQ-S items were consistently high in the cross-validation sample (.81-.94; median = .89) and senior sample (.81-.94; median = .91) and reliabilities in the cross-validation sample were higher than they were in comparable groups completing the 70-item PSDQ. The PSDQ-S factor structure in the cross-validation sample was well defined and highly similar to that based on the archive sample as well as to those based on the other four groups. Study 1, using a missing-by-design variation of multigroup invariance tests, showed that invariant factor structures were invariant based on 40 PSDQ-S items and 70 PSDQ items. Study 2 demonstrated factorial invariance of responses over 1 y (test-retest correlations of .57-.90; median = .77) and good support for convergent and discriminant validity in relation to time. Study 3 showed good and nearly identical support for convergent and discriminant validity of PSDQ and PSDQ-S responses in relation to responses on the PSPP and PSC instruments. The four studies reported by Marsh and coworkers demonstrated new, evolving strategies for the construction and evaluation of short forms that support the PSDQ-S. The authors concluded that the strong support for the psychometric properties and construct validity of the widely used PSDQ instrument generalizes very well to the PSDQ-S.
Elite Athlete Self-Description Questionnaire
The PSC instruments discussed thus far may be suitable for elite athletes (e.g., Marsh et al., 1995). There may, however, be other components to PSC that are particularly relevant for elite athletes, and thus the Elite Athlete Self-Description Questionnaire (EASDQ; Marsh, Hey, Roche, et al., 1997; Marsh, Hey, Johnson, & Perry, 1997) was developed to address these other components. For the EASDQ, it was hypothesized that overall performance by elite athletes is a function of skill level, body suitability, aerobic and anaerobic fitness, and mental competence. Thus Marsh and colleagues developed the EASDQ to measure these six factors. For each scale, they developed a pool of items that sport psychologists at the Australian Institute of Sport evaluated for their suitability for elite athletes. Pilot studies were conducted to select the best items to represent each factor. A compromise between brevity and psychometric soundness was achieved, with acceptable levels of reliability (e.g., all scales having reliability estimates of at least .8) based on short scales (4-6 items per scale).
EASDQ research demonstrates (a) adequate reliability (median coefficient alpha of .85) across the six scales (Marsh, Hey, Johnson, et. al., 1997); (b) a well-defined, replicable factor structure as shown by CFA (Marsh, Hey, Johnson, et. al., 1997; Marsh, Hey, Roche, et al., 1997); (c) applicability for elite athletes aged 12 y or older (Marsh, Hey, Roche, et al., 1997); and (d) predictive validity as shown by its ability to predict swimming performances in world championships after controlling for previous personal best performances (Marsh & Perry, 2005). In summary, the EASDQ is a reliable and valid instrument for elite athletes of all ages. More research is needed, however, to relate EASDQ responses to external validity criteria such as those used in PSDQ research and to criteria that are more specific to elite athletes (e.g., actual performance in competition).
Learn more about Measurement in Sport and Exercise Psychology.
Assess intrinsic and extrinsic motivation in sport and exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted.
Evaluation of Measures of Intrinsic and Extrinsic Motivation in Sport and Exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted. Certain criteria have guided the selection of the measures presented in this section. First, we have selected measures that are fully developed instruments that have gone through extensive validation steps. Second, we have chosen scales that have been used in research, published or unpublished, during the past 10 years. Scales that have not been used during that time frame are considered to be obsolete and are not reviewed. Finally, in light of recent theoretical development and because of space limitation, we have focused on motivation scales that assess intrinsic and extrinsic motivation independently of determinants and outcomes, while focusing on the perceived reasons of behavior. Our earlier discussion on the definitions of intrinsic and extrinsic motivation makes it possible to classify the different measures. The measures can vary in terms of the level of generality (situational versus contextual level) and the area (sport versus exercise). This classification appears in table 25.1. Table 25.2 (see p. 291) provides additional information on the concept of, dimensions of, publication source of, and where to obtain the scale. As can be seen, seven measures are reviewed. For each one, we present (a) a description of the instrument, (b) the conceptual and theoretical rationale underlying its scale development, (c) the available evidence concerning its psychometric properties (e.g., factorial validity, reliability, and construct validity), and (d) a broad assessment of the strengths and weaknesses associated with each measure.
Measures Used in Sport
In this section, we review the SMS (Brière et al., 1995; Pelletier et al., 1995), the Sport Motivation Scale-6 (SMS-6; Mallett, Kawabata, Newcombe, Otero-Rorero, & Jackson, 2007), the Behavioral Regulation in Sport Questionnaire (BRSQ; Lonsdale, Hodge, & Rose, 2008), the Pictorial Motivation Scale (PMS; Reid, Vallerand, Poulin, Crocker, & Farrell, 2009), and the SIMS (Guay et al., 2000).
Sport Motivation Scale
The SMS was developed (Brière et al., 1995; Pelletier et al., 1995) in order to assess contextual intrinsic and extrinsic motivation from a multidimensional perspective, as well as amotivation. The SMS has been the most often used motivation measure in sport, being employed with a variety of athletes (recreational to elite), age groups (adolescent to senior), and cultures (e.g., Canada, United States, United Kingdom, Bulgaria, Australia, Spain, and New Zealand). In fact, the SMS has been translated and validated in several languages (see Pelletier & Sarrazin, 2007). The SMS is based on SDT (Deci & Ryan, 1985) and is made up of seven subscales assessing amotivation; external, introjected, and identified regulation; and intrinsic motivation to know, to experience stimulation, and to accomplish. In line with SDT, motivation is assessed as the perceived reasons for participation, or the why of behavior. At the beginning of the scale, participants are asked, “In general, why do you practice your sport?” The items represent the perceived reasons for engaging in the activity, thus reflecting the different types of motivation.
The original scale was developed in French as L'Échelle de Motivation dans les Sports (Brière, Vallerand, Blais, & Pelletier, 1995) and was validated in three steps. The first step involved generating a pool of items explaining various reasons for sport participation through interviews with French Canadian athletes (aged 17-20 y). These reasons were then used to formulate items for the seven subscales of the French SMS. In the second step, a committee of experts evaluated the content validity of the items and eliminated those that were thought to be inadequate. Another sample of athletes from various sports completed the scale. Results from an exploratory factor analysis (EFA) provided support for a seven-factor structure with 4 items per subscale; this second step thus resulted in a 28-item scale. In the third and final step, two additional studies were conducted to further validate the scale. These studies included approximately 500 individuals, most of whom were involved in recreational sports. Results from confirmatory factor analyses (CFA) and correlational analyses confirmed the seven-factor structure, the subscale internal consistency (ranging from .65-.96), and moderate to high indexes of temporal stability (ranging from .54-.82) over 1 month. Furthermore, inspection of correlations among the seven SMS subscales provided support for the simplex pattern proposed by SDT. Results of correlations also showed that (in line with SDT) the most self-determined forms of motivation (intrinsic motivation and identified regulation) were related more strongly to determinants such as autonomy support from coaches and feelings of competence than to other forms of motivation (external and introjected regulation) and amotivation. Similar results were obtained with motivational outcomes such as positive affect, concentration, and intentions to pursue engagement in sport. In sum, adequate construct validity was obtained for the French form of the SMS.
The translation of the French SMS into English involved back-translation and committee procedures as suggested by Vallerand (1989). Pelletier and colleagues (1995) conducted two studies involving college athletes from various sports in order to assess the psychometric properties of the English form of the SMS. Results from CFA with a sample of 593 Canadian university athletes revealed adequate fit indices or the hypothesized seven-factor model (see the Adjusted Goodness of Fit Index and the Normed Fit Index both > .90 and the Root Mean Square Residual < .08), and correlations with determinants and outcomes supported the simplex model. Moreover, internal consistency above .70 was obtained on all of the subscales except the identified subscale (.63). Test-retest correlations were acceptable and very similar to those obtained with the French SMS, as was the scale construct validity.
Since 1995, the SMS has been used extensively in sport psychology research. The seven-factor structure has been supported repeatedly (e.g., Doganis, 2000; Gillet, Vallerand, & Rosnet, 2009; Li & Harmer, 1996; Shaw, Ostrow, & Beckstead, 2005; Standage, Duda, & Ntoumanis, 2003). In addition, Hu and Bentler (1999) obtained support for a five-factor model by combining the three types of intrinsic motivation into one factor. Similar results were obtained by Gillet and colleagues (2009) with the French SMS. However, some studies have not supported the seven-factor model (Hodge, Allen, & Smellie, 2008; Mallett, Kawabata, & Newcombe, 2007; Mallett, Kawabata, Newcombe, & Otero-Rorero, 2007; Martens & Webber, 2002). Why is there such a discrepancy between these two sets of studies? One possibility lies in the populations from which the different samples were taken. Specifically, the SMS was validated using adolescent and young adult athletes and not older athletes. Because of this specific focus, some of the items may reflect a participation rather than an elite orientation, which is more in line with the younger population. For instance, an identified regulation item reads, “Because sport is one of the best ways to maintain good relationships with my friends.” Such an item seems more relevant for a younger population. An older, high-level athlete may disagree with this item but still display a high level of identified regulation for a sport (but not for relationship reasons). Future research using the SMS with different age groups and proficiency levels is needed to clarify this issue.
Whereas the internal consistency of the SMS has systematically shown adequate values, some values below .70 have been found. This is especially the case for the identified regulation subscale (Brière et al., 1995; Kingston, Horrocks, & Hanton, 2006; Li & Harmer, 1996; Pelletier et al., 1995), although some lower values (below .70) have been obtained with the introjected (McNeill & Wang, 2005; Perreault & Vallerand, 2007; Riemer, Fink, & Fitzgerald, 2002; Standage, Duda, & Ntoumanis, 2003) and external regulation (Standage, Duda, & Ntoumanis, 2003) and amotivation subscales (Standage, Duda, & Ntoumanis, 2003). However, very few instances of values below .60 have been obtained. It should be noted that a Cronbach alpha of .60 with only 4 items is acceptable because, as noted by Cronbach (1951), the coefficient alpha underestimates the internal consistency of scales with a low number of items. This is because the coefficient alpha includes the number of items in the formula. For instance, given the same average interitem correlation, a 3-item scale coefficient alpha value of .56 is equivalent to an alpha value of .81 on an 8-item scale!
In line with the original work of Ryan and Connell (1989) and the initial SMS validation procedures (Brière et al., 1995; Pelletier et al., 1995), construct validity has been assessed by other authors in two fashions: (1) with the simplex pattern of correlations among the subscales and (2) with correlations between motivational factors and their determinants and consequences. We do not have space to review all studies. However, overall, there is overwhelming support for the construct validity of the SMS both in French and English. For instance, in addition to finding support for the simplex pattern, Pelletier and Sarrazin (2007) concluded in their review of the evidence that the SMS has been used with success to predict a great variety of specific outcomes and consequences (such as burnout, exercise dependence among endurance athletes, fear of failing, adaptive coping skills, perceptions of constraints, flow, vitality and well-being, sporting behavior orientations, aggression, and performance) in a manner that is consistent with SDT. These findings provide strong support for the construct validity of the SMS.
In sum, the SMS has some positive features. First, it is a multidimensional instrument that assesses different types of intrinsic and extrinsic motivation as well as amotivation. Second, the scale focuses on the why of behavior and thus items are not confounded with determinants and consequences. Finally, it has some excellent psychometric properties. Nevertheless, some limitations should be underscored. First, although internal consistency levels have been acceptable overall, some subscales, especially the identified regulation subscale, have yielded relatively low coefficient alphas at times. Second, the SMS does not assess integrated regulation. Third, the seven-factor structure has not always been supported by CFAs. According to Pelletier, Vallerand, and Sarrazin (2007), this may be explained by a host of factors, including differences in sample sizes, variations in the way the instrument is administrated, or some other characteristics specific to the context of the study. However, as already indicated, it is also possible that the SMS is better suited for a younger, nonelite athlete population. Clearly, future research on this issue is in order.
Sport Motivation Scale-6
Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed another version of the SMS, the SMS-6. This scale has the same underlying rationale that the original SMS scale but was designed to improve the original version of the SMS by including an integrated regulation subscale and attempting to solve some of the inconsistencies with the factor structure and some of the relatively low internal consistency values (below .70). The SMS-6 comprises 24 items, 4 for each of the six subscales, which include amotivation; external, introjected, identified, and integrated regulation; and general intrinsic motivation. Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed 5 items for the integrated regulation subscale as well as 7 other items (4 of which were kept in the final scale) to replace some items in the original SMS. Two samples were used to validate the SMS-6. Sample 1 was composed of 501 first-year university students participating in competitive sport at least twice per week and 113 elite athletes representing Australia at the international level (for a total of 614 participants). Sample 1 was used to derive a factor structure that included the SMS items as well as the reformulated and integrated regulation items. Sample 2 was composed of 557 university students who were engaged in a variety of sports or physical activities twice per week. The second sample was used to confirm the structure of the SMS-6. Participants also completed the Dispositional Flow Scale (DFS).
Results of a CFA with the SMS-6 (with sample 2) provided support for the factor structure as well as for the internal consistency values (all above .70). Concerning the construct validity of the SMS-6, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) reported a rather weak simplex pattern of correlations among the subscales. More specifically, external regulation correlated highly with intrinsic motivation (r = .54), while the correlation between identified regulation and intrinsic motivation was very high (r = .91) and was higher than the one between integrated regulation and intrinsic motivation (r = .75). The construct validity of the SMS-6 was not fully supported, as some of the correlations involving the SMS and flow were not as expected by SDT. For instance, the distinctions among integrated regulation, identified regulation, and intrinsic motivation were not always clear. Furthermore, external regulation revealed some positive and sometimes strong correlations with flow, contrary to hypotheses derived from SDT.
In sum, the SMS-6 contains some nice features. First, it contains an integrated regulation subscale. Furthermore, the addition of 4 new items may make the SMS more acceptable for older and more experienced athletes. Second, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) presented results supporting the validity of a variation of the SMS-6, the SMS-8. The SMS-8 contains the same items that the SMS-6 contains but assesses the three types of intrinsic motivation rather than general intrinsic motivation. The SMS-6 also shows some limitations. First, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) proposed 7 new items to replace those that were presumably problematic in the original SMS. However, only 4 of these items made it to the final version. Thus, it appears that the SMS-6 retained much of the original SMS. Second, even some of the new items appear problematic and may not assess the desired construct (see Pelletier et al., 2007). For instance, a new amotivation item (“I don't seem to be enjoying my sport as much as I previously did”) seems to reflect a decrease in intrinsic motivation rather than amotivation. Finally, results from Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) demonstrated that the integrated regulation subscale may lack discriminant validity, leading to results with flow highly similar to identified regulation and intrinsic motivation.
Behavioral Regulation in Sport Questionnaire
Lonsdale and colleagues (2008) developed the BRSQ to create an alternative measure of elite sport motivation as conceptualized by SDT. However, in contrast to Mallett, Kawabata, Newcombe, and Otero-Rorero (2007), these authors used a complete new pool of items developed by SDT experts and competitive athletes. There are two versions of the BRSQ. The BRSQ-8 contains 32 items assessing integrated, identified, introjected, and external regulation; amotivation; and the three forms of intrinsic motivation (knowledge, experience stimulation, and accomplishment) identified by Vallerand (1997). The BRSQ-6 contains the same items but assesses general intrinsic motivation rather than all three types of intrinsic motivation, for a total of 24 items.
Lonsdale and colleagues (2008) conducted a series of three studies to validate the scale. In the first study, the factorial validity and the internal consistency were assessed with 382 New Zealand elite athletes. Results from a CFA on the 32 items supported the factor structure of the BRSQ. Specifically, fit indexes were acceptable and all items loaded significantly on the appropriate factors (they ranged from .58-.91). Finally, internal consistency of the eight subscales, measured with the Cronbach alpha, showed high values ranging from .71 to 91. Additionally, 1 wk test-retest reliability was tested with 34 competitive adult athletes. Coefficient alphas for all subscales supported the temporal reliability (values ranged from .73-.90).
In a second study with 343 athletes from New Zealand, the results of a CFA on the BRSQ-8 supported once more the factor structure as well as the subscale internal consistency. Lonsdale and colleagues (2008) also showed that the factor structure of the BRSQ-6 model fit the data very well and that subscale coefficient alphas all exceeded .78. Moreover, the construct validity of the BRSQ-6 was assessed by testing for a simplex pattern of correlations among the six subscales. While some relationships were in line with predictions (e.g., amotivation was negatively related to intrinsic motivation), there was a lack of discrimination between some subscales. More specifically, there was no difference between external and introjected regulation scores in terms of their relationships with amotivation. A similar pattern was evident with the identified and integrated regulation subscales, which both had similar high correlations with intrinsic motivation. These results with the simplex pattern were replicated in a third study conducted with nonelite athletes. In this third study, Lonsdale and colleagues also assessed the relationships between the BRSQ-6 and indexes of burnout (Lemyre, Treasure, & Roberts, 2006; Raedeke & Smith, 2001) and flow (Jackson & Eklund, 2002). Overall, results supported hypotheses in line with SDT. Specifically, amotivation and external and introjected regulation showed negative correlations with flow and positive correlations with burnout. The opposite pattern of correlations was found for the self-determined subscales (intrinsic motivation and identified and integrated regulation). However, there was a lack of discrimination between integrated regulation and general intrinsic motivation. Results of another study on burnout (Lonsdale, Hodge, & Rose, 2009) replicated these findings. Thus, overall, the support for the construct validity of the BRSQ-6 appears to be mixed.
It should be underscored that the BRSQ has some nice features. First, the scale is designed in such a way that the researcher can decide to use a multidimensional (BRSQ-8) or unitary (BRSQ-6) conceptualization of intrinsic motivation. Second, the scale is rather short, with 4 items per subscale. Finally, it assesses integrated regulation. At the same time, the BRSQ also displays some limitations. First, additional research is needed on the construct validity of the scale. Whereas there is support distinguishing the self-determined subscales (intrinsic motivation and identified and integrated regulation) from the non-self-determined subscales (external and introjected regulation), the finer discrimination within each type of category appears to be lacking. Such evidence is crucial, and future research is needed in order to show that this scale does indeed assess the SDT constructs rather than two broad sets of subscales tapping self-determined versus non-self-determined motivation. Second, this scale is designed specifically for older participants in competitive sport; it remains to be seen if the BRSQ can be used with younger participants, for whom the integrated regulation subscale may not have full meaning. Finally, research is needed to test the temporal stability of the scale over a time framed longer than 1 week.
Pictorial Motivation Scale
The PMS was designed to measure intrinsic and extrinsic motivation for sport and exercise in people with an intellectual disability. It assesses participants' reasons for engaging in sport and exercise. The scale's main characteristics are drawings depicting each of the 20 items. There are 5 items (pictures) for each of four subscales: intrinsic motivation, self-determined extrinsic motivation (a mixture of integrated and identified regulation), non-self-determined extrinsic motivation (a mixture of introjected and external regulation), and amotivation. These pictures are used to help participants with cognitive difficulties and to help represent the motivational concept depicted in each item.
The original scale was developed in French (Reid, Poulin, & Vallerand, 1994). Results of a study with 62 participants supported the internal consistency, temporal stability, and construct validity, as exemplified by the presence of a simplex pattern among the four subscales. However, the amotivation subscale had poor reliability (α = .52). The French version (Reid et al., 2009) was translated into English according to the back-translation and committee procedures outlined in Vallerand (1989). Then, 6 new items were generated for the less reliable amotivation subscale. Participants in the Special Olympics (n = 160) completed the English version. Results of the CFA confirmed the four-factor structure of the PMS. Furthermore, the internal consistency (Cronbach alphas) ranged from .60 to .71. Finally, the construct validity was assessed by testing for a simplex pattern of correlations among the four subscales. The intercorrelations among latent variables from the CFA provided support for the simplex pattern.
Results from a study conducted with the English version of the PMS involving 80 high school students with mild intellectual disability provided support for the internal consistency, temporal stability (over 3 wk), and construct validity of the PMS with respect to the simplex pattern of correlations among the PMS subscales as well as correlations between the PMS subscales and motivational antecedents (skill and perceived competence) and outcomes (perceived effort) as rated by the physical education teacher. Finally, the internal consistency of each subscale was tested without the pictorial dimension with a subset of 47 high school students with mild intellectual disability. Results indicated poor internal consistency (.91 for intrinsic motivation, .27 for self-determined extrinsic motivation, .20 for non-self-determined extrinsic motivation, and .60 for amotivation). This finding suggests that the scale is not reliable without the drawings.
The preliminary findings with the English version of the PMS are encouraging. Furthermore, this scale is the only one geared for individuals with intellectually disability. The use of drawings to depict the various items makes this scale unique in the field. Nevertheless, the PMS shows some limitations. First, the scale does not differentiate among all forms of intrinsic (knowledge, stimulation, and accomplishment) or extrinsic ( integrated, identified, introjected, and external regulation) motivation. Second, construct validity was tested with only a limited number of variables. Third, it is not known if the scale is usable with children who have severe forms of intellectual disabilities. Clearly additional research is needed on the reliability and validity of the PMS.
Situational Motivation Scale
The SIMS is one of the few scales to assess intrinsic and extrinsic motivation and amotivation at the situational level (Guay et al., 2000). The SIMS is a multidimensional tool that measures four types of motivation: intrinsic motivation, identified regulation, external regulation, and amotivation. The SIMS is made up of 16 items (4 items per subscale) and asks this question: “Why are you currently engaged in this activity?” The items represent potential reasons for task engagement. The scale is worded in such a way that it can be used in most situations (sport and nonsport).
Five studies were reported in the original article. In study 1, the original scale was developed by a committee of experts and completed by 195 French Canadian college students. Results of an EFA revealed a four-factor structure with the final 16 items loading on their respective factor. In study 2, a CFA confirmed the factor structure as well as its invariance across gender. Across the five studies, the internal consistency values of the subscales were acceptable, ranging from .62 to .95 (see Guay et al., 2000). Moreover, across all studies, support was obtained for the construct validity of the SIMS through results from correlations in line with the simplex pattern among the subscales as well as between the SIMS subscales and motivational determinants and consequences. Perhaps of greater interest for the present discussion were the results of study 4, which showed that some subscales (intrinsic motivation and identified regulation) were sensitive enough to detect changes in motivation that took place during two games of a basketball tournament.
Other researchers have also obtained support for the psychometric properties of the SIMS. First, all studies reported acceptable internal consistency values for each subscale (Blanchard, Mask, Vallerand, de la Sablonnière, & Provencher, 2007; Conroy, Coatsworth, & Kaye, 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, Duda, & Prusak, 2003). The coefficient alpha values of all but the amotivation subscale (α = .58) in the Conroy and colleagues study were above .60. Second, support for the factorial validity of the SIMS was obtained through CFAs with one qualification. Whereas the CFA results with the 16 items yielded acceptable fit indexes, removal of 1 item (Jaakkola, Liukkonen, Laakso, & Ommundsen, 2008) and even 2 items (Gillet, Berjot, & Paty, 2009; Standage, Treasure, et al., 2003) yielded better fit indexes. Moreover, Standage, Treasure, and colleagues (2003) conducted multisample CFAs and showed that the pattern of factor loadings was largely invariant across four different samples.
Construct validity of the SIMS was also assessed in several studies (Blanchard et al., 2007; Conroy et al., 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, et al., 2003). In addition to supporting the simplex pattern among the SIMS subscales and between the SIMS subscales and need satisfaction (study 2 of Blanchard and colleagues, 2007), results also supported the postulate from the HMIEM (Vallerand, 1997) for the top-down effect, in which contextual sport motivation was found to predict situational sport motivation (studies 1 and 2 of Blanchard et al., 2007; Jaakkola et al., 2008; Ntoumanis & Blaymires, 2003). Specifically, the more self-determined the motivation was found to be in a specific context (in this case, sport), the more self-determined the motivation was found to be in a given situation. Furthermore, Blanchard and colleagues (2007, studies 1 and 2) found support for another postulate from the HMIEM that suggests that over time, situational motivation in the realm of sport (basketball) has recursive effects on contextual motivation. The more that situational motivation is self-determined, the more that contextual motivation becomes self-determined over time. Finally, Jaakkola and coworkers (2008) demonstrated that, as predicted by the HMIEM, situational self-determined motivation was better than contextual motivation in predicting the situational intensity (as assessed by HR) displayed by students in a physical education class. Overall, these findings provide strong support for the reliability and factorial and construct validity of the SIMS.
The SIMS has several positive features, one of them being that it is the only scale to assess intrinsic and extrinsic motivation and amotivation at the situational level. Furthermore, it does so using only 16 items. Nevertheless, it also has some weaknesses. First, the SIMS does not assess the different types of intrinsic motivation and integrated and introjected regulation, because it was designed to be short. Second, while the factor structure has been supported, it is not clear if some items should be replaced (Gillet, Berjot, et al., 2009; Jaakkola et al., 2008; Standage, Treasure, et al., 2003). Third, research so far has not assessed the validity of the scale with high-performance athletes. Thus, additional research is needed to further test the psychometric properties of the SIMS in sport.
Learn more about Measurement in Sport and Exercise Psychology.
Ethics codes imperative in conducting research
Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body.
Ethics Codes: Their Nature, Purposes, and Application
Ethics codes typically comprise principles and standards. Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body. These general and aspirational statements set the underlying tone for the more specific codes and guide the work-related ethical decision making of professionals. In contrast, ethical standards specify both proscribed and prescribed member behaviors. While not always black and white, these standards serve as a more clear cut and enforceable guide for professional behavior.
Members should apply both the aspirational principles and enforceable standards to shape their thinking and behavior in work settings. Ideally, members self-monitor their own behavior. In an effort to remain ethical, professionals are encouraged to consult with colleagues about ethically challenging situations and to provide constructive feedback about perceived possibly unethical behavior they witness in others.
Assessment and Measurement
A central question to be addressed in this chapter is what are assessment and measurement. Sundberg (1977) defines assessment as the processes used “for developing impressions and images, making decisions and checking hypotheses about another person's pattern of characteristics that determines his or her behavior in interaction with the environment” (p. 21). The assessment process involves collecting and assembling a broad range of objective and subjective information about persons or groups to develop impressions about them; identify their needs; predict how they might think, feel, and behave in future situations; and select and apply interventions based on the content and dependability of that information. Professionals may use multiple assessment methods that include observations of behavior, symptom checklists, surveys and questionnaires, structured and unstructured interview materials, and standardized tests (Bennett et al., 2006). Gardner and Moore (2006) emphasize using a triad of psychological assessment strategies in the practice of clinical sport psychology: (1) initial interviews, (2) behavioral observation, and (3) psychological testing. The nature and assumptions underlying assessment approaches are usually grounded in the theoretical orientation of the professional (Andersen, 2002).
In contrast, measurement can mean many things to many people. It is one of the most common words in the English language and can be used as both a noun and a verb (Lorge, 1967). For the purposes of this chapter, measurement is viewed as an extension of assessment processes. It can be thought of more narrowly as the process of collecting information about psychological characteristics of interest (e.g., attitudes, behaviors, state experiences) using one or more methods or tools (such as those mentioned earlier) to monitor change, the effect of intervention, or treatments postassessment. For example, an educational sport psychology consultant might administer a measure of team cohesion over the course of a competitive season to see how team members perceive their relationships. Another consultant might conduct a preseason baseline screening assessment of cognitive functioning in hockey players and then reevaluate players who incur a mild traumatic brain injury (i.e., concussion) later in the season.
In this chapter, the terms measurement and assessment are used interchangeably. Furthermore, these terms are used to describe the decisions and opinions made by professionals regarding clients with whom they work. As such, measurement and assessment techniques include all methods of gathering information about clients, such as (a) psychological, educational, and neurological tests; (b) data gathered during clinical interviewing; (c) information gathered from significant others (e.g., family members, teachers, friends); (d) direct and indirect observation; and (e) interactions with people via teletherapy (e.g., Internet, phone; Fisher, 2009).
Competence and Education
In order to excel in our professional duties and do well for those we serve, teach, study, and otherwise interact with, we must know what to do and how to do it in a capable manner. The ethics codes mentioned earlier identify the necessity of being knowledgeable and capable in our work. For example, the APA ethical standards provide guidance for organization members in this area, including information about (a) competence limitations, (b) keeping up competence, (c) making sound professional and scientific judgments, (d) delegating work responsibilities to others, (e) engaging in activities in emergencies, and (f) impairment (APA, 2002). Competence in professional behaviors is a personal matter that is frequently challenged. It is the responsibility of professionals to know their limitations and how their knowledge and skills change and require constant upgrading. The APA ethics code also emphasizes the importance of making sound work-related decisions based on scientific knowledge and appropriate discipline-specific practice. This portion of the APA code cautions professionals to be careful when delegating work to others, describes how a professional is responsible for others' work, and explains the necessity of avoiding multiple relationships with those to whom work is delegated. The APA standards note that we can occasionally be thrown into situations in which our competence is stretched; in such cases we need to be very careful, seek supervision if available, and end such work as soon as possible.
Measurement Referral Questions and Appropriateness of Instruments
When selecting assessment instruments, the professional must consider the referral questions that prompted this process (Fisher, 2009; Smith, 1976). The instruments selected should reflect these referral questions and utilize assessment strategies that have appropriate validity and reliability. For example, if a professional is interested in measuring state anxiety for research purposes, an appropriate assessment may be the Competitive State Anxiety Index-2 (CSAI-2; Martens, Burton, Vealey, Bump, & Smith, 1990) as opposed to the State-Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, & Luschene, 1970), which measures both trait and state anxiety. When selecting the assessment, the professional should be aware of limitations or biases regarding cultural sensitivity (see the later section on cultural issues); gender considerations (Etzel, Yura, & Perna, 1998); and age, language, or disability factors that may influence the psychometric qualities of the assessment differently from the way they influenced the normative groups used for the development and validation of the instrument (APA, 2002; Fisher, 2009). It is also important to consider the method of delivery. For example, assessments based on paper and pencil may not have been validated for online use (see the later section on technology), and instruments with elevated reading levels may not be appropriate for certain age or developmental groups. Therefore, the professional should always verify the assessment's validity and reliability when a modified assessment method or group is used (Fisher, 2009). Furthermore, the professional should also attempt to conduct in-person assessments when possible, as a great deal of information can be learned about clients from the way in which they present themselves during the assessment process. This information can affect the richness of the assessment data.
It is also important for professionals to be aware of and competent to assess and use appropriate psychometric strategies for establishing validity and reliability of the instruments they use (AERA, APA, & NCME, 1999). All instruments have unique psychometric properties that affect how they should be administered and interpreted. When validity and reliability issues are not taken into consideration, it is possible to choose and utilize instruments to assess factors that they were not designed to assess. Furthermore, practitioners should be well aware of other psychometric properties such as content and criterion validity and standard error of measurement that may affect how results are interpreted and used. The ethical practitioner needs to be aware of psychometric issues in order to choose appropriate instruments with regard to the referral questions, client characteristics, assessment strategies, and environmental factors.
Consent and Assent
As discussed earlier, the ethical principles for sport and exercise psychology emphasize doing no harm to the client and respecting the individual's rights and dignity (AASP, 1996; APA, 2002). The test taker's right to privacy and confidentiality applies here as well, and the professional should take all necessary precautions to maintain the confidentiality and privacy of the client. To protect the test taker, informed consent must be obtained at the start of the relationship (e.g., research, consultation, therapy). Beyond the informed consent process and before formal assessment, the client or participant should be informed of all pertinent information regarding the assessment process. This information includes (a) the nature and purpose of assessment; (b) any applicable fees; (c) potential involvement of third parties such as a coach, athletic trainer, or manager; (d) limits of privacy and confidentiality (as discussed in the next section); and (e) the timeline for the process and potential feedback (Fisher, 2009). This information should be presented in a clear and understandable manner. Furthermore, this information should be agreed to by the test taker, who thereby gives informed consent. Test takers should engage in assessment of their own free will and must be given the option to withdraw participation without consequences (APA standard 3.10). All necessary information about assessment procedures and findings should be provided in a language or level appropriate for the participant. Furthermore, it is unethical to necessitate or coerce individuals to take part in measurement and assessment for research or practice purposes.
Privacy and Confidentiality and Release of Information
Typically, the ethical standards of organizations with ties to sport psychology (APA ethical standard 4.01 and the AASP) suggest that professionals should not reveal information about clients, test takers, or others without their signed approval to release information or legal requirement. These legal situations may include (a) a test taker who indicates possible self-harm or harm to others (i.e., suicide or homicide), (b) a test taker whose results are subpoenaed by the court, or (c) a test taker who is a minor, in which case the parent or guardian may have access to the data (Etzel et al., 1998). If the test taker or, in the case of a minor, the parent or guardian provides explicit written permission, the specific information identified by the client may be released to the identified parties. Unless these circumstances are met, information from the test taker may not be disclosed to anyone (e.g., coaches, management, parents, administration, athletic trainers, and so on).
In situations where the assessment is requested by a third party (e.g., coaches, management, the court), this third party may also request results from the assessment. It is important for the professional to establish a priori who is the “real client” (Ogilvie, 1979) and to have the ability to control access to the results. Etzel and colleagues (1998) suggest that information about the assessment should be shared only with one predetermined person, unless a release of information form has been completed. Therefore, when engaging in assessments, the professional should set clear boundaries and avoid dual relationships, thereby identifying who is being served (APA standard 4.02a). Another complication of these situations is the role of trust. If athletes or test takers suspect the test results will be used without their permission in decisions regarding performance or other aspects of participation, they may be less likely to respond honestly, thus affecting the validity of the results (see the section on demand characteristics).
Raw Data and Data Storage
Raw data such as the test taker's responses to items, including the professional's notes and final reports, should be stored in locked file cabinets inside the professional's office or in password-protected computer files (Fisher, 2009). Other methods to ensure confidentiality may include limiting access to records to only those people who have a need to know this information and have been trained to handle and understand it, deidentifying records using code numbers, and appropriately disposing of identifiable records (Fisher, 2009). A good policy for data maintenance is that data should be kept for a minimum of 7 y after the last service delivery date or 3 y after a minor reaches the age of 18 (whichever is later), as is recommended by the APA record-keeping guidelines (APA, 2002; Fisher, 2009). Raw data and the instruments used for assessment purposes should not be released to third parties unless a release of information form has been completed and the third party is trained competently to use such information.
Results Discussion
Test feedback and results discussion should be provided in the form of a carefully constructed report using clear language that fully explains the assessment results. Labels and jargon should be eliminated to increase readability. Information necessary to the purpose of the test should be included, and the inclusion of unnecessary and unrelated information should be avoided (APA, 2002; Fisher, 2009). Additionally, as recommended by the APA (APA, 2002), interpretations should take into consideration the participant's gender, race, ethnicity, age, national origin, sexual orientation, religion, disability, language, or socioeconomic status. Participants should receive assessment information and feedback related to their performance on the assessment and should be informed of ways in which they could personally use the test results or how this information may be used by a third party (only if written permission was given to release such information). The information released to the participant should be presented in a verbal or written report and presented in such a way that it may not cause harm to the test taker (Etzel et al., 1998). However, information such as numerical scores or specific responses should not be released to individuals not qualified to interpret such information (Fisher, 2009; Tranel, 1995).
Demand Characteristics
In the sport context, several groups of individuals may be interested in the assessment results of athletes. Interested parties may include coaches, managers, teams, students, or administrators. However, the potential of a third party reviewing the test results may increase social desirability and result in invalid and unreliable information. Therefore, undue pressure to complete an instrument or battery should be considered as a contextual factor.
Another potentially undesirable effect of a third party viewing the test taker's results may be assessment anxiety. The APA standards state that if a test taker is observed to be anxious or reports feeling anxious, this feeling should be taken into account and become a limitation in the interpretation of test data (APA, 2002). Assessment anxiety may be exaggerated in situations where a third party may have access to results. These situations may also lead to faking good or faking bad on the part of respondents who are concerned about how the results may be used. This must also be considered when evaluating the results.
Supervision of Subordinates
In some cases, professionals may hire and train subordinates to help with assessment and measurement tasks. These subordinates may administer, score, and even interpret the results of measurement and assessment. Standard 2.05 of the APA ethics code (APA, 2002) states that professionals utilizing employees, supervisees, or research and teaching assistants for such purposes should take reasonable precautions to put subordinates in situations where (a) they do not face possibly harmful multiple relationships with the client that could affect their objectivity, (b) they are competently trained to perform the delegated task on their own or with supervision, or (c) they are supervised for competent service delivery. Therefore, when using subordinates to help with tasks such as administration, scoring, or interpretation, the professional assumes primary responsibility and liability to ensure that the services are being provided competently. The professional needs to ensure that subordinates are well trained with all potential instruments. To do so, the professional must provide appropriate training, experience, and supervision as well as continue to check the subordinates' work to ensure its quality. As with licensed professionals, not all subordinates have the same competencies with regard to all instruments.
Learn more about Measurement in Sport and Exercise Psychology.
Tools to measure the physical self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002).
Tools to Measure the Physical Self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002). However, following the research of Shavelson and colleagues (1976), a number of multidimensional self-concept instruments containing one or more PSC scales were developed. Thus, in a 1974 review, Wylie concluded that at the time most self-concept instruments focused on global self-concept or self-esteem rather than specific domains such as PSC. Although several of the instruments reviewed by Shavelson and colleagues (1976) contained items relating to physical skills and elements of physical appearance, none provided a clearly interpretable measure of PSC. From a practical perspective, these older instruments appear to be of little value for sport and exercise psychologists. The major exception, perhaps, is the Physical Estimation and Attraction Scale (PEAS; Sonstroem, 1978, 1997), along with the theoretical model on which it is based. This instrument was designed to measure two global components: estimation (competency) and attraction. While the PEAS may not be the instrument of choice today, it has a historical significance in that its research incorporated many of the features of the construct validity approach advocated in this chapter, it was heuristic, and it provided an important basis for subsequent research.
In a subsequent 1989 review, Wylie identified several multidimensional self-concept instruments measuring one or more components of PSC that can be differentiated from other specific domains of self-concept and general self-concept. Included in the list were the three SDQ instruments already discussed. Wylie also evaluated Harter's (1985) Self-Perception Profile for Children, which contains two PSC scales (athletic competence and physical appearance). Other multidimensional instruments containing physical scales that were not reviewed by Wylie include the Self-Rating Scale (Fleming & Courtney, 1984), which measures physical ability and physical appearance; the Song and Hattie Test (Hattie, 1992), which measures physical appearance; and the Multidimensional Self-Concept Scale (Bracken, 1996), which has a physical scale that includes physical competence, physical appearance, physical fitness, and health. The Tennessee Self-Concept Scale (Fitts, 1965) is a multidimensional self-concept instrument that also purports to measure PSC. In their review and empirical evaluation of this instrument, Marsh and Richards (1988) found distinguishable physical components reflecting health, neat appearance, physical attractiveness, and physical fitness that were incorporated into a single PSC score. This detailed breakdown of the Tennessee physical scale was supported by relationships with the SDQ physical ability and physical appearance scales in an MTMM study comparing responses to the two instruments. Because each of the clusters based on responses to the Tennessee instrument is represented by only a few items, it is not appropriate to use the instrument to measure these distinct components of PSC. Marsh and Richards argued that PSC measures that combine and confound a wide range of differentiable physical components—such as those based on the Tennessee Self-Concept Scale—should be interpreted cautiously (see similar comments by Fox & Corbin, 1989).
In summary, although multidimensional self-concept instruments based on Shavelson and colleagues' (1976) model provided good support for the construct validity of the physical ability and appearance scales (e.g., Marsh, 2002; Marsh & Peart, 1988), they left unanswered the question of whether PSC is more differentiated than can be explained in terms of one (physical ability) or two (ability, appearance) physical scales. Subsequent PSC instruments were developed specifically to address the issue of the multidimensionality of PSC.
Physical Self-Perception Profile
The Physical Self-Perception Profile (PSPP; Fox, 1990; Fox & Corbin, 1989) is a 30-item inventory that consists of four specific scales and one general physical self-worth factor. The PSPP was developed to document the physical self-perceptions of college students. It was designed to reflect the advances made by Harter (1985) and Shavelson and colleagues (1976) in identifying the physical self as an important construct to measure in its own right and to reflect the hierarchical, multidimensional nature of the physical self. A qualitative approach was used to reveal dimensions of physical self-esteem salient to the population sampled (Fox & Corbin, 1989). The PSPP consists of five 6-item scales of sport (perceived sport competence), body (perceived bodily attractiveness), strength (perceived physical strength and muscular development), condition (perceived level of physical conditioning and exercise), and physical self-worth. Fox (1990) recommended that the 10-item Rosenberg Self-Esteem Scale (Rosenberg, 1965) be used alongside the PSPP to provide a global measure. Fox (1990) reported factor analyses indicating that each item loads most highly on the factor that it is designed to measure and that individual scale reliabilities are in the .80s.
The PSPP research demonstrates (a) good reliability (coefficient alpha of .80-.95; Fox, 1990; Page, Ashford, Fox, & Biddle, 1993; Sonstroem, Speliotis, & Fava, 1992); (b) good test-retest stability over the short term (rs of .74-.89; Fox, 1990); (c) a well-defined, replicable factor structure as shown by CFA (Fox & Corbin, 1989; Sonstroem, Harlow, & Josephs, 1994); (d) convergent and discriminant validity in studies showing PSPP relationships with external criteria such as exercise behaviors, mental adjustment variables, and health complaints (Fox & Corbin, 1989; Sonstroem & Potts, 1996); and (e) applicability for an older adult population (Sonstroem et al., 1994). However, correlations among the PSPP scales are consistently so high (.65-.89 when disattenuated for measurement error; Marsh, Richards, Johnson, Roche, & Tremayne, 1994) that they detract from the instrument's ability to differentiate among the different PSC factors it purports to measure.
Subsequently, a version of the PSPP for children and adolescents was developed and validated—the Children and Youth Physical Self-Perception Profile (CY-PSPP; Eklund, Whitehead, & Welk, 1997; Whitehead, 1995). Like the PSPP, the CY-PSPP is a 30-item inventory consisting of the same five 6-item scales. The CY-PSPP is a substantially revised version of the PSPP that is most appropriately thought of as a different instrument. The CY-PSPP body, strength, and conditioning subscales are based on minor adaptations of the PSPP to make them more suitable for children. However, the global self-worth (self-esteem) and sport scales are completely different. The PSPP did not have a self-esteem scale of its own but included 6 items adapted from the Rosenberg Self-Esteem Scale. On the CY-PSPP, global self-esteem and sport scales from the PSPP were dropped and replaced with corresponding scales from Harter's (1985) Self-Perception Profile for Children. Correlations among factors remained high (e.g., physical self-worth with attractive body adequacy = .8). Eklund and colleagues (1997) suggested that these results are consistent with the developmental patterns among children, as differentiation in self-concept is less defined at younger ages (Harter, 1985). CFAs have supported the instrument's factor structure, with both the CFI (comparative fit index) and NNFI (non-normed fit index) indexes exceeding the .90 criterion for good model fit (Eklund et al., 1997). Moderate correlations (r = .39-.45) with external criteria such as physical activity and physical fitness have demonstrated its convergent and discriminant validity (Welk & Eklund, 2005). The CY-PSPP has been validated with adolescents (Jones, Polman, & Peters, 2009; Welk, Corbin, & Lewis, 1995; Whitehead, 1995) and younger children (Welk, Corbin, Dowell, & Harris, 1997) and has been validated and translated into other languages (åsci, Eklund, Whitehead, Kirazci, & Koca, 2005; Raustorp, Ståhle, Gudasic, Kinnunen, & Mattsson, 2005; Raustorp, Mattsson, Svensson, & Ståhle, 2006).
Both the PSPP and CY-PSPP use a nonstandard response format based on Harter (1985), in which each item consists of a matched pair of statements, one negative and one positive (e.g., “Some people feel that they are not very good when it comes to sports” but “Others feel that they are really good at just about every sport”). Each item consists of two contrasting descriptions, and respondents are asked which description is most like them and whether the description they select is “Sort of true of me” or “Really true of me.” Responses are scored on a scale of 1 to 4, with 1 representing a “Really true of me” response to the negative statement and 4 representing a “Really true of me” response to the positive statement. Whereas this response format is designed to reduce the influence of social desirability, Wylie's (1989) review of Harter's original instruments provided little or no support for this suggestion, and Marsh and colleagues (1994) suggested that there were substantial method effects associated with the nonstandard response scale. This format has also been shown to be confusing, particularly for children (Eiser, Eiser, & Haversmans, 1995), and even for adults (Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994), unless special care is taken to explain the response scale. Using the suggestion of Marsh and colleagues (1994) that confusion over the structured alternative response scale could be overcome by more detailed instructions at the outset, researchers implementing the CY-PSPP used large illustrations for a sample item (Whitehead, 1995). Wichstrom (1995) found that responses for this format were psychometrically stronger when based on typical Likert responses rather than the structured alternative format, but Welk and colleagues (1997) suggested that the nonstandard response scale on the CY-PSPP worked better than Likert responses worked.
In summary, the PSPP and the CY-PSPP are established instruments that have been translated into several languages and have been used with a range of populations. However, the format and the high correlations among factors in both instruments may limit their usefulness in some settings. The CY-PSPP is a substantially revised version of the PSPP specifically developed for children. Although the CY-PSPP should be used instead of the PSPP for child and adolescent samples, it might even be stronger than the original PSPP is for adult samples.
Subsequent to the completion of this chapter, Lindwall and colleagues (2011) published a revised version of the PSPP (PSPP-R). They reviewed critiques of the PSPP response scale such as those noted here (e.g., Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994) and acknowledged that “the idiosyncratic alternative response format has been difficult to understand for some participants” (pp. 310-311). In recognition of these problems, the idiosyncratic response scale that has been such a salient feature of the PSPP was dropped altogether and replaced with a 4-point Likert response using only positively worded items. Lindwall and colleagues (2011) demonstrated the appropriateness of the revised PSPP-R based on a large sample (N = 1,831) of participants from four countries (Sweden, Great Britain, Portugal, and Turkey). However, they did not indicate whether the PSPP-R supersedes the PSPP or is merely an alternative to it. There also wasn't any discussion of the implications for other instruments using similar idiosyncratic response scales (e.g., PSPP- related instruments such as CY-PSPP or Harter's instruments more generally).
Physical Self-Inventory
The Physical Self-Inventory (PSI) is a French adaptation of the PSPP that was originally developed for use with Francophone adults (Ninot, Delignières, & Fortes, 2000). In two preliminary studies, Ninot and colleagues used the nonstandard response scale from the PSPP. However, consistent with previous research (Marsh et al., 1994), they reported that this response scale was problematic. In a third study, the authors used a 6-point Likert response scale; factor analysis results were reasonable, but reliability coefficients were not completely satisfactory. Next the authors replaced the PSPP global physical items with items from the SDQ physical scale and the PSPP global self-esteem items with items from Coopersmith (1967). The final PSI consists of 25 items measuring six PSC factors (four specific and two global, as with the PSPP) and has satisfactory psychometric properties that have been confirmed in subsequent French studies of adults (Masse, Jung, & Pfister, 2001; Stephan, Bilard, Ninot, & Delignières, 2003; Stephan & Maïano, 2007).
Maïano and coworkers (2008) subsequently constructed a short form of the PSI for use with adolescents. They found that not all items from the adult PSI worked with adolescents, but they were able to construct 18-item (PSI-SF, 3 items per scale) and 12-item (PSI-VSF, 2 items per scale) versions that had good psychometric properties. In particular, the measurement and hierarchical structures were consistent with proposals by Fox and Corbin (1989) and were fully invariant across gender. Maïano and coworkers also noted that PSI-SF responses showed very high test-retest stability. Comparison of the PSI-SF and PSI-VSF demonstrated that the measurement model, mean structure, structural parameters, and criterion-related validity were equivalent across samples and versions. Nevertheless, the authors noted a serious limitation that all versions of the PSI share with the PSPP: Very high correlations among the six PSC factors (correlations among latent factors) that, according to the authors, bring “into question the real independence of some of the models' sub-dimensions, and by extension their discriminant validity, a finding that has already been observed by Marsh (2002; Marsh et al., 2006) on analyses of the PSPP” (Maïano et al. 2008, p. 844). However, Maïano and colleagues also noted that because they used a traditional Likert response scale, the high correlations apparently were not due to the structured alternative format used in the PSPP. In summary, particularly the short and very short forms of the PSI have made a potentially important contribution to applied research. However, further research is needed to evaluate more fully the robustness of support for construct validity and application in non-French-speaking settings.
Richards Physical Self-Concept Scale
The Richards Physical Self-Concept Scale (RPSCS; Marsh et al., 1994; Richards, 1988) is a 35-item instrument designed to measure six specific components of PSC (body build, appearance, health, physical competence, strength, action) and one general physical satisfaction factor. Each item is a simple declarative statement, and subjects respond on an 8-point true-false scale. Extensive research in Australia (e.g., Marsh et al., 1994; Richards, 1988) has indicated that RPSCS responses have good psychometric properties. The factor structure is very robust, generalizing well over ages from 8 to 80 y and over gender.
RPSCS research has demonstrated (a) good reliability (coefficient alpha of .79-.93; Marsh et al., 1994; Richards & Marsh, 2005); (b) good test-retest stability over the short term (coefficient alpha of .77-.90 over 3 wk; Richards, 1988); (c) a well-defined, replicable factor structure as shown by CFA (Marsh et al., 1994; Richards, 2004); (d) a factor structure that is invariant across gender, as shown by multiple-group CFA (Richards, 2004), and across a wide age range; (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (Marsh et al., 1994; Richards & Marsh, 2005); and (f) applicability for participants aged 8 to 60 y and for both genders (Marsh et al., 1994; Richards, 1988, 2004; Richards & Marsh, 2005). In summary, the RPSCS is regarded as a valid, reliable, and structurally sound instrument that has been tested across both genders and a wide population of ages. The applicability across such a wide range of ages is a particular strength.
Physical Self-Description Questionnaire
Extending Fleishman's (1964) classic research on the structure of physical fitness, the Physical Self-Description Questionnaire (PSDQ) scales reflect some of the original SDQ scales and parallel physical fitness components identified in a CFA of physical fitness measures (Marsh, 1993). The PSDQ consists of nine specific components of PSC (strength, body fat, activity, endurance and fitness, sport competence, coordination, health, appearance, and flexibility), a global physical scale, and a global self-esteem scale. Each of the 70 PSDQ items is a simple declarative statement, and individuals respond on a 6-point true-false scale. The PSDQ is designed for adolescents but is also appropriate for older participants.
PSDQ research has demonstrated (a) good reliability (median coefficient alpha of .92) across the 11 scales (Marsh, 1996b; Marsh et al., 1994); (b) good test-retest stability over the short term (median r = .83 over 3 mo) and longer term (median r = .69 over 14 mo; Marsh, 1996b); (c) a well-defined, replicable factor structure as shown by CFA (Marsh, 1996b; Marsh et al., 1994); (d) a factor structure that is invariant over gender as shown by multiple-group CFA (Marsh et al., 1994); (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (see Marsh et al., 1994); (f) convergent and discriminant validity as shown by PSDQ relationships with external criteria (e.g., measures of body composition, physical activity, endurance, strength, and flexibility; see Marsh, 1996a, 1997); and (g) applicability for participants aged 12 to 18 y (or older) and for elite athletes and nonathletes (Marsh, Hey, Roche, & Perry, 1997; Marsh, Perry, Horsely, & Roche, 1995). In summary, the PSDQ is a psychometrically strong instrument.
Marsh, Martin, and Jackson (2010) recently presented a new short form of the PSDQ (PSDQ-S). This short form balances brevity and psychometric quality in relation to established guidelines for evaluating short forms (e.g., Marsh, Ellis, Parada, Richards, & Heubeck, 2005; Smith, McCarthy, & Anderson, 2000) with the construct validity approach that is the basis of PSDQ research. Based on the PSDQ normative archive, 40 of 70 items were selected and evaluated in a new cross-validation sample (N = 708 Australian adolescents). To test the generalizability of results, the authors considered four additional samples: Australian adolescent elite athletes (n = 349), Spanish adolescents (n = 986), Israeli university students (N = 395), and Australian senior citizens (n = 760). Reliabilities for the 40 PSDQ-S items were consistently high in the cross-validation sample (.81-.94; median = .89) and senior sample (.81-.94; median = .91) and reliabilities in the cross-validation sample were higher than they were in comparable groups completing the 70-item PSDQ. The PSDQ-S factor structure in the cross-validation sample was well defined and highly similar to that based on the archive sample as well as to those based on the other four groups. Study 1, using a missing-by-design variation of multigroup invariance tests, showed that invariant factor structures were invariant based on 40 PSDQ-S items and 70 PSDQ items. Study 2 demonstrated factorial invariance of responses over 1 y (test-retest correlations of .57-.90; median = .77) and good support for convergent and discriminant validity in relation to time. Study 3 showed good and nearly identical support for convergent and discriminant validity of PSDQ and PSDQ-S responses in relation to responses on the PSPP and PSC instruments. The four studies reported by Marsh and coworkers demonstrated new, evolving strategies for the construction and evaluation of short forms that support the PSDQ-S. The authors concluded that the strong support for the psychometric properties and construct validity of the widely used PSDQ instrument generalizes very well to the PSDQ-S.
Elite Athlete Self-Description Questionnaire
The PSC instruments discussed thus far may be suitable for elite athletes (e.g., Marsh et al., 1995). There may, however, be other components to PSC that are particularly relevant for elite athletes, and thus the Elite Athlete Self-Description Questionnaire (EASDQ; Marsh, Hey, Roche, et al., 1997; Marsh, Hey, Johnson, & Perry, 1997) was developed to address these other components. For the EASDQ, it was hypothesized that overall performance by elite athletes is a function of skill level, body suitability, aerobic and anaerobic fitness, and mental competence. Thus Marsh and colleagues developed the EASDQ to measure these six factors. For each scale, they developed a pool of items that sport psychologists at the Australian Institute of Sport evaluated for their suitability for elite athletes. Pilot studies were conducted to select the best items to represent each factor. A compromise between brevity and psychometric soundness was achieved, with acceptable levels of reliability (e.g., all scales having reliability estimates of at least .8) based on short scales (4-6 items per scale).
EASDQ research demonstrates (a) adequate reliability (median coefficient alpha of .85) across the six scales (Marsh, Hey, Johnson, et. al., 1997); (b) a well-defined, replicable factor structure as shown by CFA (Marsh, Hey, Johnson, et. al., 1997; Marsh, Hey, Roche, et al., 1997); (c) applicability for elite athletes aged 12 y or older (Marsh, Hey, Roche, et al., 1997); and (d) predictive validity as shown by its ability to predict swimming performances in world championships after controlling for previous personal best performances (Marsh & Perry, 2005). In summary, the EASDQ is a reliable and valid instrument for elite athletes of all ages. More research is needed, however, to relate EASDQ responses to external validity criteria such as those used in PSDQ research and to criteria that are more specific to elite athletes (e.g., actual performance in competition).
Learn more about Measurement in Sport and Exercise Psychology.
Assess intrinsic and extrinsic motivation in sport and exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted.
Evaluation of Measures of Intrinsic and Extrinsic Motivation in Sport and Exercise
In this section, a critical review of the different measures used to assess intrinsic and extrinsic motivation in sport and exercise research is conducted. Certain criteria have guided the selection of the measures presented in this section. First, we have selected measures that are fully developed instruments that have gone through extensive validation steps. Second, we have chosen scales that have been used in research, published or unpublished, during the past 10 years. Scales that have not been used during that time frame are considered to be obsolete and are not reviewed. Finally, in light of recent theoretical development and because of space limitation, we have focused on motivation scales that assess intrinsic and extrinsic motivation independently of determinants and outcomes, while focusing on the perceived reasons of behavior. Our earlier discussion on the definitions of intrinsic and extrinsic motivation makes it possible to classify the different measures. The measures can vary in terms of the level of generality (situational versus contextual level) and the area (sport versus exercise). This classification appears in table 25.1. Table 25.2 (see p. 291) provides additional information on the concept of, dimensions of, publication source of, and where to obtain the scale. As can be seen, seven measures are reviewed. For each one, we present (a) a description of the instrument, (b) the conceptual and theoretical rationale underlying its scale development, (c) the available evidence concerning its psychometric properties (e.g., factorial validity, reliability, and construct validity), and (d) a broad assessment of the strengths and weaknesses associated with each measure.
Measures Used in Sport
In this section, we review the SMS (Brière et al., 1995; Pelletier et al., 1995), the Sport Motivation Scale-6 (SMS-6; Mallett, Kawabata, Newcombe, Otero-Rorero, & Jackson, 2007), the Behavioral Regulation in Sport Questionnaire (BRSQ; Lonsdale, Hodge, & Rose, 2008), the Pictorial Motivation Scale (PMS; Reid, Vallerand, Poulin, Crocker, & Farrell, 2009), and the SIMS (Guay et al., 2000).
Sport Motivation Scale
The SMS was developed (Brière et al., 1995; Pelletier et al., 1995) in order to assess contextual intrinsic and extrinsic motivation from a multidimensional perspective, as well as amotivation. The SMS has been the most often used motivation measure in sport, being employed with a variety of athletes (recreational to elite), age groups (adolescent to senior), and cultures (e.g., Canada, United States, United Kingdom, Bulgaria, Australia, Spain, and New Zealand). In fact, the SMS has been translated and validated in several languages (see Pelletier & Sarrazin, 2007). The SMS is based on SDT (Deci & Ryan, 1985) and is made up of seven subscales assessing amotivation; external, introjected, and identified regulation; and intrinsic motivation to know, to experience stimulation, and to accomplish. In line with SDT, motivation is assessed as the perceived reasons for participation, or the why of behavior. At the beginning of the scale, participants are asked, “In general, why do you practice your sport?” The items represent the perceived reasons for engaging in the activity, thus reflecting the different types of motivation.
The original scale was developed in French as L'Échelle de Motivation dans les Sports (Brière, Vallerand, Blais, & Pelletier, 1995) and was validated in three steps. The first step involved generating a pool of items explaining various reasons for sport participation through interviews with French Canadian athletes (aged 17-20 y). These reasons were then used to formulate items for the seven subscales of the French SMS. In the second step, a committee of experts evaluated the content validity of the items and eliminated those that were thought to be inadequate. Another sample of athletes from various sports completed the scale. Results from an exploratory factor analysis (EFA) provided support for a seven-factor structure with 4 items per subscale; this second step thus resulted in a 28-item scale. In the third and final step, two additional studies were conducted to further validate the scale. These studies included approximately 500 individuals, most of whom were involved in recreational sports. Results from confirmatory factor analyses (CFA) and correlational analyses confirmed the seven-factor structure, the subscale internal consistency (ranging from .65-.96), and moderate to high indexes of temporal stability (ranging from .54-.82) over 1 month. Furthermore, inspection of correlations among the seven SMS subscales provided support for the simplex pattern proposed by SDT. Results of correlations also showed that (in line with SDT) the most self-determined forms of motivation (intrinsic motivation and identified regulation) were related more strongly to determinants such as autonomy support from coaches and feelings of competence than to other forms of motivation (external and introjected regulation) and amotivation. Similar results were obtained with motivational outcomes such as positive affect, concentration, and intentions to pursue engagement in sport. In sum, adequate construct validity was obtained for the French form of the SMS.
The translation of the French SMS into English involved back-translation and committee procedures as suggested by Vallerand (1989). Pelletier and colleagues (1995) conducted two studies involving college athletes from various sports in order to assess the psychometric properties of the English form of the SMS. Results from CFA with a sample of 593 Canadian university athletes revealed adequate fit indices or the hypothesized seven-factor model (see the Adjusted Goodness of Fit Index and the Normed Fit Index both > .90 and the Root Mean Square Residual < .08), and correlations with determinants and outcomes supported the simplex model. Moreover, internal consistency above .70 was obtained on all of the subscales except the identified subscale (.63). Test-retest correlations were acceptable and very similar to those obtained with the French SMS, as was the scale construct validity.
Since 1995, the SMS has been used extensively in sport psychology research. The seven-factor structure has been supported repeatedly (e.g., Doganis, 2000; Gillet, Vallerand, & Rosnet, 2009; Li & Harmer, 1996; Shaw, Ostrow, & Beckstead, 2005; Standage, Duda, & Ntoumanis, 2003). In addition, Hu and Bentler (1999) obtained support for a five-factor model by combining the three types of intrinsic motivation into one factor. Similar results were obtained by Gillet and colleagues (2009) with the French SMS. However, some studies have not supported the seven-factor model (Hodge, Allen, & Smellie, 2008; Mallett, Kawabata, & Newcombe, 2007; Mallett, Kawabata, Newcombe, & Otero-Rorero, 2007; Martens & Webber, 2002). Why is there such a discrepancy between these two sets of studies? One possibility lies in the populations from which the different samples were taken. Specifically, the SMS was validated using adolescent and young adult athletes and not older athletes. Because of this specific focus, some of the items may reflect a participation rather than an elite orientation, which is more in line with the younger population. For instance, an identified regulation item reads, “Because sport is one of the best ways to maintain good relationships with my friends.” Such an item seems more relevant for a younger population. An older, high-level athlete may disagree with this item but still display a high level of identified regulation for a sport (but not for relationship reasons). Future research using the SMS with different age groups and proficiency levels is needed to clarify this issue.
Whereas the internal consistency of the SMS has systematically shown adequate values, some values below .70 have been found. This is especially the case for the identified regulation subscale (Brière et al., 1995; Kingston, Horrocks, & Hanton, 2006; Li & Harmer, 1996; Pelletier et al., 1995), although some lower values (below .70) have been obtained with the introjected (McNeill & Wang, 2005; Perreault & Vallerand, 2007; Riemer, Fink, & Fitzgerald, 2002; Standage, Duda, & Ntoumanis, 2003) and external regulation (Standage, Duda, & Ntoumanis, 2003) and amotivation subscales (Standage, Duda, & Ntoumanis, 2003). However, very few instances of values below .60 have been obtained. It should be noted that a Cronbach alpha of .60 with only 4 items is acceptable because, as noted by Cronbach (1951), the coefficient alpha underestimates the internal consistency of scales with a low number of items. This is because the coefficient alpha includes the number of items in the formula. For instance, given the same average interitem correlation, a 3-item scale coefficient alpha value of .56 is equivalent to an alpha value of .81 on an 8-item scale!
In line with the original work of Ryan and Connell (1989) and the initial SMS validation procedures (Brière et al., 1995; Pelletier et al., 1995), construct validity has been assessed by other authors in two fashions: (1) with the simplex pattern of correlations among the subscales and (2) with correlations between motivational factors and their determinants and consequences. We do not have space to review all studies. However, overall, there is overwhelming support for the construct validity of the SMS both in French and English. For instance, in addition to finding support for the simplex pattern, Pelletier and Sarrazin (2007) concluded in their review of the evidence that the SMS has been used with success to predict a great variety of specific outcomes and consequences (such as burnout, exercise dependence among endurance athletes, fear of failing, adaptive coping skills, perceptions of constraints, flow, vitality and well-being, sporting behavior orientations, aggression, and performance) in a manner that is consistent with SDT. These findings provide strong support for the construct validity of the SMS.
In sum, the SMS has some positive features. First, it is a multidimensional instrument that assesses different types of intrinsic and extrinsic motivation as well as amotivation. Second, the scale focuses on the why of behavior and thus items are not confounded with determinants and consequences. Finally, it has some excellent psychometric properties. Nevertheless, some limitations should be underscored. First, although internal consistency levels have been acceptable overall, some subscales, especially the identified regulation subscale, have yielded relatively low coefficient alphas at times. Second, the SMS does not assess integrated regulation. Third, the seven-factor structure has not always been supported by CFAs. According to Pelletier, Vallerand, and Sarrazin (2007), this may be explained by a host of factors, including differences in sample sizes, variations in the way the instrument is administrated, or some other characteristics specific to the context of the study. However, as already indicated, it is also possible that the SMS is better suited for a younger, nonelite athlete population. Clearly, future research on this issue is in order.
Sport Motivation Scale-6
Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed another version of the SMS, the SMS-6. This scale has the same underlying rationale that the original SMS scale but was designed to improve the original version of the SMS by including an integrated regulation subscale and attempting to solve some of the inconsistencies with the factor structure and some of the relatively low internal consistency values (below .70). The SMS-6 comprises 24 items, 4 for each of the six subscales, which include amotivation; external, introjected, identified, and integrated regulation; and general intrinsic motivation. Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) developed 5 items for the integrated regulation subscale as well as 7 other items (4 of which were kept in the final scale) to replace some items in the original SMS. Two samples were used to validate the SMS-6. Sample 1 was composed of 501 first-year university students participating in competitive sport at least twice per week and 113 elite athletes representing Australia at the international level (for a total of 614 participants). Sample 1 was used to derive a factor structure that included the SMS items as well as the reformulated and integrated regulation items. Sample 2 was composed of 557 university students who were engaged in a variety of sports or physical activities twice per week. The second sample was used to confirm the structure of the SMS-6. Participants also completed the Dispositional Flow Scale (DFS).
Results of a CFA with the SMS-6 (with sample 2) provided support for the factor structure as well as for the internal consistency values (all above .70). Concerning the construct validity of the SMS-6, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) reported a rather weak simplex pattern of correlations among the subscales. More specifically, external regulation correlated highly with intrinsic motivation (r = .54), while the correlation between identified regulation and intrinsic motivation was very high (r = .91) and was higher than the one between integrated regulation and intrinsic motivation (r = .75). The construct validity of the SMS-6 was not fully supported, as some of the correlations involving the SMS and flow were not as expected by SDT. For instance, the distinctions among integrated regulation, identified regulation, and intrinsic motivation were not always clear. Furthermore, external regulation revealed some positive and sometimes strong correlations with flow, contrary to hypotheses derived from SDT.
In sum, the SMS-6 contains some nice features. First, it contains an integrated regulation subscale. Furthermore, the addition of 4 new items may make the SMS more acceptable for older and more experienced athletes. Second, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) presented results supporting the validity of a variation of the SMS-6, the SMS-8. The SMS-8 contains the same items that the SMS-6 contains but assesses the three types of intrinsic motivation rather than general intrinsic motivation. The SMS-6 also shows some limitations. First, Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) proposed 7 new items to replace those that were presumably problematic in the original SMS. However, only 4 of these items made it to the final version. Thus, it appears that the SMS-6 retained much of the original SMS. Second, even some of the new items appear problematic and may not assess the desired construct (see Pelletier et al., 2007). For instance, a new amotivation item (“I don't seem to be enjoying my sport as much as I previously did”) seems to reflect a decrease in intrinsic motivation rather than amotivation. Finally, results from Mallett, Kawabata, Newcombe, and Otero-Rorero (2007) demonstrated that the integrated regulation subscale may lack discriminant validity, leading to results with flow highly similar to identified regulation and intrinsic motivation.
Behavioral Regulation in Sport Questionnaire
Lonsdale and colleagues (2008) developed the BRSQ to create an alternative measure of elite sport motivation as conceptualized by SDT. However, in contrast to Mallett, Kawabata, Newcombe, and Otero-Rorero (2007), these authors used a complete new pool of items developed by SDT experts and competitive athletes. There are two versions of the BRSQ. The BRSQ-8 contains 32 items assessing integrated, identified, introjected, and external regulation; amotivation; and the three forms of intrinsic motivation (knowledge, experience stimulation, and accomplishment) identified by Vallerand (1997). The BRSQ-6 contains the same items but assesses general intrinsic motivation rather than all three types of intrinsic motivation, for a total of 24 items.
Lonsdale and colleagues (2008) conducted a series of three studies to validate the scale. In the first study, the factorial validity and the internal consistency were assessed with 382 New Zealand elite athletes. Results from a CFA on the 32 items supported the factor structure of the BRSQ. Specifically, fit indexes were acceptable and all items loaded significantly on the appropriate factors (they ranged from .58-.91). Finally, internal consistency of the eight subscales, measured with the Cronbach alpha, showed high values ranging from .71 to 91. Additionally, 1 wk test-retest reliability was tested with 34 competitive adult athletes. Coefficient alphas for all subscales supported the temporal reliability (values ranged from .73-.90).
In a second study with 343 athletes from New Zealand, the results of a CFA on the BRSQ-8 supported once more the factor structure as well as the subscale internal consistency. Lonsdale and colleagues (2008) also showed that the factor structure of the BRSQ-6 model fit the data very well and that subscale coefficient alphas all exceeded .78. Moreover, the construct validity of the BRSQ-6 was assessed by testing for a simplex pattern of correlations among the six subscales. While some relationships were in line with predictions (e.g., amotivation was negatively related to intrinsic motivation), there was a lack of discrimination between some subscales. More specifically, there was no difference between external and introjected regulation scores in terms of their relationships with amotivation. A similar pattern was evident with the identified and integrated regulation subscales, which both had similar high correlations with intrinsic motivation. These results with the simplex pattern were replicated in a third study conducted with nonelite athletes. In this third study, Lonsdale and colleagues also assessed the relationships between the BRSQ-6 and indexes of burnout (Lemyre, Treasure, & Roberts, 2006; Raedeke & Smith, 2001) and flow (Jackson & Eklund, 2002). Overall, results supported hypotheses in line with SDT. Specifically, amotivation and external and introjected regulation showed negative correlations with flow and positive correlations with burnout. The opposite pattern of correlations was found for the self-determined subscales (intrinsic motivation and identified and integrated regulation). However, there was a lack of discrimination between integrated regulation and general intrinsic motivation. Results of another study on burnout (Lonsdale, Hodge, & Rose, 2009) replicated these findings. Thus, overall, the support for the construct validity of the BRSQ-6 appears to be mixed.
It should be underscored that the BRSQ has some nice features. First, the scale is designed in such a way that the researcher can decide to use a multidimensional (BRSQ-8) or unitary (BRSQ-6) conceptualization of intrinsic motivation. Second, the scale is rather short, with 4 items per subscale. Finally, it assesses integrated regulation. At the same time, the BRSQ also displays some limitations. First, additional research is needed on the construct validity of the scale. Whereas there is support distinguishing the self-determined subscales (intrinsic motivation and identified and integrated regulation) from the non-self-determined subscales (external and introjected regulation), the finer discrimination within each type of category appears to be lacking. Such evidence is crucial, and future research is needed in order to show that this scale does indeed assess the SDT constructs rather than two broad sets of subscales tapping self-determined versus non-self-determined motivation. Second, this scale is designed specifically for older participants in competitive sport; it remains to be seen if the BRSQ can be used with younger participants, for whom the integrated regulation subscale may not have full meaning. Finally, research is needed to test the temporal stability of the scale over a time framed longer than 1 week.
Pictorial Motivation Scale
The PMS was designed to measure intrinsic and extrinsic motivation for sport and exercise in people with an intellectual disability. It assesses participants' reasons for engaging in sport and exercise. The scale's main characteristics are drawings depicting each of the 20 items. There are 5 items (pictures) for each of four subscales: intrinsic motivation, self-determined extrinsic motivation (a mixture of integrated and identified regulation), non-self-determined extrinsic motivation (a mixture of introjected and external regulation), and amotivation. These pictures are used to help participants with cognitive difficulties and to help represent the motivational concept depicted in each item.
The original scale was developed in French (Reid, Poulin, & Vallerand, 1994). Results of a study with 62 participants supported the internal consistency, temporal stability, and construct validity, as exemplified by the presence of a simplex pattern among the four subscales. However, the amotivation subscale had poor reliability (α = .52). The French version (Reid et al., 2009) was translated into English according to the back-translation and committee procedures outlined in Vallerand (1989). Then, 6 new items were generated for the less reliable amotivation subscale. Participants in the Special Olympics (n = 160) completed the English version. Results of the CFA confirmed the four-factor structure of the PMS. Furthermore, the internal consistency (Cronbach alphas) ranged from .60 to .71. Finally, the construct validity was assessed by testing for a simplex pattern of correlations among the four subscales. The intercorrelations among latent variables from the CFA provided support for the simplex pattern.
Results from a study conducted with the English version of the PMS involving 80 high school students with mild intellectual disability provided support for the internal consistency, temporal stability (over 3 wk), and construct validity of the PMS with respect to the simplex pattern of correlations among the PMS subscales as well as correlations between the PMS subscales and motivational antecedents (skill and perceived competence) and outcomes (perceived effort) as rated by the physical education teacher. Finally, the internal consistency of each subscale was tested without the pictorial dimension with a subset of 47 high school students with mild intellectual disability. Results indicated poor internal consistency (.91 for intrinsic motivation, .27 for self-determined extrinsic motivation, .20 for non-self-determined extrinsic motivation, and .60 for amotivation). This finding suggests that the scale is not reliable without the drawings.
The preliminary findings with the English version of the PMS are encouraging. Furthermore, this scale is the only one geared for individuals with intellectually disability. The use of drawings to depict the various items makes this scale unique in the field. Nevertheless, the PMS shows some limitations. First, the scale does not differentiate among all forms of intrinsic (knowledge, stimulation, and accomplishment) or extrinsic ( integrated, identified, introjected, and external regulation) motivation. Second, construct validity was tested with only a limited number of variables. Third, it is not known if the scale is usable with children who have severe forms of intellectual disabilities. Clearly additional research is needed on the reliability and validity of the PMS.
Situational Motivation Scale
The SIMS is one of the few scales to assess intrinsic and extrinsic motivation and amotivation at the situational level (Guay et al., 2000). The SIMS is a multidimensional tool that measures four types of motivation: intrinsic motivation, identified regulation, external regulation, and amotivation. The SIMS is made up of 16 items (4 items per subscale) and asks this question: “Why are you currently engaged in this activity?” The items represent potential reasons for task engagement. The scale is worded in such a way that it can be used in most situations (sport and nonsport).
Five studies were reported in the original article. In study 1, the original scale was developed by a committee of experts and completed by 195 French Canadian college students. Results of an EFA revealed a four-factor structure with the final 16 items loading on their respective factor. In study 2, a CFA confirmed the factor structure as well as its invariance across gender. Across the five studies, the internal consistency values of the subscales were acceptable, ranging from .62 to .95 (see Guay et al., 2000). Moreover, across all studies, support was obtained for the construct validity of the SIMS through results from correlations in line with the simplex pattern among the subscales as well as between the SIMS subscales and motivational determinants and consequences. Perhaps of greater interest for the present discussion were the results of study 4, which showed that some subscales (intrinsic motivation and identified regulation) were sensitive enough to detect changes in motivation that took place during two games of a basketball tournament.
Other researchers have also obtained support for the psychometric properties of the SIMS. First, all studies reported acceptable internal consistency values for each subscale (Blanchard, Mask, Vallerand, de la Sablonnière, & Provencher, 2007; Conroy, Coatsworth, & Kaye, 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, Duda, & Prusak, 2003). The coefficient alpha values of all but the amotivation subscale (α = .58) in the Conroy and colleagues study were above .60. Second, support for the factorial validity of the SIMS was obtained through CFAs with one qualification. Whereas the CFA results with the 16 items yielded acceptable fit indexes, removal of 1 item (Jaakkola, Liukkonen, Laakso, & Ommundsen, 2008) and even 2 items (Gillet, Berjot, & Paty, 2009; Standage, Treasure, et al., 2003) yielded better fit indexes. Moreover, Standage, Treasure, and colleagues (2003) conducted multisample CFAs and showed that the pattern of factor loadings was largely invariant across four different samples.
Construct validity of the SIMS was also assessed in several studies (Blanchard et al., 2007; Conroy et al., 2007; Law & Ste-Marie, 2005; Ntoumanis & Blaymires, 2003; Standage, Treasure, et al., 2003). In addition to supporting the simplex pattern among the SIMS subscales and between the SIMS subscales and need satisfaction (study 2 of Blanchard and colleagues, 2007), results also supported the postulate from the HMIEM (Vallerand, 1997) for the top-down effect, in which contextual sport motivation was found to predict situational sport motivation (studies 1 and 2 of Blanchard et al., 2007; Jaakkola et al., 2008; Ntoumanis & Blaymires, 2003). Specifically, the more self-determined the motivation was found to be in a specific context (in this case, sport), the more self-determined the motivation was found to be in a given situation. Furthermore, Blanchard and colleagues (2007, studies 1 and 2) found support for another postulate from the HMIEM that suggests that over time, situational motivation in the realm of sport (basketball) has recursive effects on contextual motivation. The more that situational motivation is self-determined, the more that contextual motivation becomes self-determined over time. Finally, Jaakkola and coworkers (2008) demonstrated that, as predicted by the HMIEM, situational self-determined motivation was better than contextual motivation in predicting the situational intensity (as assessed by HR) displayed by students in a physical education class. Overall, these findings provide strong support for the reliability and factorial and construct validity of the SIMS.
The SIMS has several positive features, one of them being that it is the only scale to assess intrinsic and extrinsic motivation and amotivation at the situational level. Furthermore, it does so using only 16 items. Nevertheless, it also has some weaknesses. First, the SIMS does not assess the different types of intrinsic motivation and integrated and introjected regulation, because it was designed to be short. Second, while the factor structure has been supported, it is not clear if some items should be replaced (Gillet, Berjot, et al., 2009; Jaakkola et al., 2008; Standage, Treasure, et al., 2003). Third, research so far has not assessed the validity of the scale with high-performance athletes. Thus, additional research is needed to further test the psychometric properties of the SIMS in sport.
Learn more about Measurement in Sport and Exercise Psychology.
Ethics codes imperative in conducting research
Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body.
Ethics Codes: Their Nature, Purposes, and Application
Ethics codes typically comprise principles and standards. Ethical principles are broad-spectrum statements that summarize and reflect the values of the parent organization or governing body. These general and aspirational statements set the underlying tone for the more specific codes and guide the work-related ethical decision making of professionals. In contrast, ethical standards specify both proscribed and prescribed member behaviors. While not always black and white, these standards serve as a more clear cut and enforceable guide for professional behavior.
Members should apply both the aspirational principles and enforceable standards to shape their thinking and behavior in work settings. Ideally, members self-monitor their own behavior. In an effort to remain ethical, professionals are encouraged to consult with colleagues about ethically challenging situations and to provide constructive feedback about perceived possibly unethical behavior they witness in others.
Assessment and Measurement
A central question to be addressed in this chapter is what are assessment and measurement. Sundberg (1977) defines assessment as the processes used “for developing impressions and images, making decisions and checking hypotheses about another person's pattern of characteristics that determines his or her behavior in interaction with the environment” (p. 21). The assessment process involves collecting and assembling a broad range of objective and subjective information about persons or groups to develop impressions about them; identify their needs; predict how they might think, feel, and behave in future situations; and select and apply interventions based on the content and dependability of that information. Professionals may use multiple assessment methods that include observations of behavior, symptom checklists, surveys and questionnaires, structured and unstructured interview materials, and standardized tests (Bennett et al., 2006). Gardner and Moore (2006) emphasize using a triad of psychological assessment strategies in the practice of clinical sport psychology: (1) initial interviews, (2) behavioral observation, and (3) psychological testing. The nature and assumptions underlying assessment approaches are usually grounded in the theoretical orientation of the professional (Andersen, 2002).
In contrast, measurement can mean many things to many people. It is one of the most common words in the English language and can be used as both a noun and a verb (Lorge, 1967). For the purposes of this chapter, measurement is viewed as an extension of assessment processes. It can be thought of more narrowly as the process of collecting information about psychological characteristics of interest (e.g., attitudes, behaviors, state experiences) using one or more methods or tools (such as those mentioned earlier) to monitor change, the effect of intervention, or treatments postassessment. For example, an educational sport psychology consultant might administer a measure of team cohesion over the course of a competitive season to see how team members perceive their relationships. Another consultant might conduct a preseason baseline screening assessment of cognitive functioning in hockey players and then reevaluate players who incur a mild traumatic brain injury (i.e., concussion) later in the season.
In this chapter, the terms measurement and assessment are used interchangeably. Furthermore, these terms are used to describe the decisions and opinions made by professionals regarding clients with whom they work. As such, measurement and assessment techniques include all methods of gathering information about clients, such as (a) psychological, educational, and neurological tests; (b) data gathered during clinical interviewing; (c) information gathered from significant others (e.g., family members, teachers, friends); (d) direct and indirect observation; and (e) interactions with people via teletherapy (e.g., Internet, phone; Fisher, 2009).
Competence and Education
In order to excel in our professional duties and do well for those we serve, teach, study, and otherwise interact with, we must know what to do and how to do it in a capable manner. The ethics codes mentioned earlier identify the necessity of being knowledgeable and capable in our work. For example, the APA ethical standards provide guidance for organization members in this area, including information about (a) competence limitations, (b) keeping up competence, (c) making sound professional and scientific judgments, (d) delegating work responsibilities to others, (e) engaging in activities in emergencies, and (f) impairment (APA, 2002). Competence in professional behaviors is a personal matter that is frequently challenged. It is the responsibility of professionals to know their limitations and how their knowledge and skills change and require constant upgrading. The APA ethics code also emphasizes the importance of making sound work-related decisions based on scientific knowledge and appropriate discipline-specific practice. This portion of the APA code cautions professionals to be careful when delegating work to others, describes how a professional is responsible for others' work, and explains the necessity of avoiding multiple relationships with those to whom work is delegated. The APA standards note that we can occasionally be thrown into situations in which our competence is stretched; in such cases we need to be very careful, seek supervision if available, and end such work as soon as possible.
Measurement Referral Questions and Appropriateness of Instruments
When selecting assessment instruments, the professional must consider the referral questions that prompted this process (Fisher, 2009; Smith, 1976). The instruments selected should reflect these referral questions and utilize assessment strategies that have appropriate validity and reliability. For example, if a professional is interested in measuring state anxiety for research purposes, an appropriate assessment may be the Competitive State Anxiety Index-2 (CSAI-2; Martens, Burton, Vealey, Bump, & Smith, 1990) as opposed to the State-Trait Anxiety Inventory (STAI; Spielberger, Gorsuch, & Luschene, 1970), which measures both trait and state anxiety. When selecting the assessment, the professional should be aware of limitations or biases regarding cultural sensitivity (see the later section on cultural issues); gender considerations (Etzel, Yura, & Perna, 1998); and age, language, or disability factors that may influence the psychometric qualities of the assessment differently from the way they influenced the normative groups used for the development and validation of the instrument (APA, 2002; Fisher, 2009). It is also important to consider the method of delivery. For example, assessments based on paper and pencil may not have been validated for online use (see the later section on technology), and instruments with elevated reading levels may not be appropriate for certain age or developmental groups. Therefore, the professional should always verify the assessment's validity and reliability when a modified assessment method or group is used (Fisher, 2009). Furthermore, the professional should also attempt to conduct in-person assessments when possible, as a great deal of information can be learned about clients from the way in which they present themselves during the assessment process. This information can affect the richness of the assessment data.
It is also important for professionals to be aware of and competent to assess and use appropriate psychometric strategies for establishing validity and reliability of the instruments they use (AERA, APA, & NCME, 1999). All instruments have unique psychometric properties that affect how they should be administered and interpreted. When validity and reliability issues are not taken into consideration, it is possible to choose and utilize instruments to assess factors that they were not designed to assess. Furthermore, practitioners should be well aware of other psychometric properties such as content and criterion validity and standard error of measurement that may affect how results are interpreted and used. The ethical practitioner needs to be aware of psychometric issues in order to choose appropriate instruments with regard to the referral questions, client characteristics, assessment strategies, and environmental factors.
Consent and Assent
As discussed earlier, the ethical principles for sport and exercise psychology emphasize doing no harm to the client and respecting the individual's rights and dignity (AASP, 1996; APA, 2002). The test taker's right to privacy and confidentiality applies here as well, and the professional should take all necessary precautions to maintain the confidentiality and privacy of the client. To protect the test taker, informed consent must be obtained at the start of the relationship (e.g., research, consultation, therapy). Beyond the informed consent process and before formal assessment, the client or participant should be informed of all pertinent information regarding the assessment process. This information includes (a) the nature and purpose of assessment; (b) any applicable fees; (c) potential involvement of third parties such as a coach, athletic trainer, or manager; (d) limits of privacy and confidentiality (as discussed in the next section); and (e) the timeline for the process and potential feedback (Fisher, 2009). This information should be presented in a clear and understandable manner. Furthermore, this information should be agreed to by the test taker, who thereby gives informed consent. Test takers should engage in assessment of their own free will and must be given the option to withdraw participation without consequences (APA standard 3.10). All necessary information about assessment procedures and findings should be provided in a language or level appropriate for the participant. Furthermore, it is unethical to necessitate or coerce individuals to take part in measurement and assessment for research or practice purposes.
Privacy and Confidentiality and Release of Information
Typically, the ethical standards of organizations with ties to sport psychology (APA ethical standard 4.01 and the AASP) suggest that professionals should not reveal information about clients, test takers, or others without their signed approval to release information or legal requirement. These legal situations may include (a) a test taker who indicates possible self-harm or harm to others (i.e., suicide or homicide), (b) a test taker whose results are subpoenaed by the court, or (c) a test taker who is a minor, in which case the parent or guardian may have access to the data (Etzel et al., 1998). If the test taker or, in the case of a minor, the parent or guardian provides explicit written permission, the specific information identified by the client may be released to the identified parties. Unless these circumstances are met, information from the test taker may not be disclosed to anyone (e.g., coaches, management, parents, administration, athletic trainers, and so on).
In situations where the assessment is requested by a third party (e.g., coaches, management, the court), this third party may also request results from the assessment. It is important for the professional to establish a priori who is the “real client” (Ogilvie, 1979) and to have the ability to control access to the results. Etzel and colleagues (1998) suggest that information about the assessment should be shared only with one predetermined person, unless a release of information form has been completed. Therefore, when engaging in assessments, the professional should set clear boundaries and avoid dual relationships, thereby identifying who is being served (APA standard 4.02a). Another complication of these situations is the role of trust. If athletes or test takers suspect the test results will be used without their permission in decisions regarding performance or other aspects of participation, they may be less likely to respond honestly, thus affecting the validity of the results (see the section on demand characteristics).
Raw Data and Data Storage
Raw data such as the test taker's responses to items, including the professional's notes and final reports, should be stored in locked file cabinets inside the professional's office or in password-protected computer files (Fisher, 2009). Other methods to ensure confidentiality may include limiting access to records to only those people who have a need to know this information and have been trained to handle and understand it, deidentifying records using code numbers, and appropriately disposing of identifiable records (Fisher, 2009). A good policy for data maintenance is that data should be kept for a minimum of 7 y after the last service delivery date or 3 y after a minor reaches the age of 18 (whichever is later), as is recommended by the APA record-keeping guidelines (APA, 2002; Fisher, 2009). Raw data and the instruments used for assessment purposes should not be released to third parties unless a release of information form has been completed and the third party is trained competently to use such information.
Results Discussion
Test feedback and results discussion should be provided in the form of a carefully constructed report using clear language that fully explains the assessment results. Labels and jargon should be eliminated to increase readability. Information necessary to the purpose of the test should be included, and the inclusion of unnecessary and unrelated information should be avoided (APA, 2002; Fisher, 2009). Additionally, as recommended by the APA (APA, 2002), interpretations should take into consideration the participant's gender, race, ethnicity, age, national origin, sexual orientation, religion, disability, language, or socioeconomic status. Participants should receive assessment information and feedback related to their performance on the assessment and should be informed of ways in which they could personally use the test results or how this information may be used by a third party (only if written permission was given to release such information). The information released to the participant should be presented in a verbal or written report and presented in such a way that it may not cause harm to the test taker (Etzel et al., 1998). However, information such as numerical scores or specific responses should not be released to individuals not qualified to interpret such information (Fisher, 2009; Tranel, 1995).
Demand Characteristics
In the sport context, several groups of individuals may be interested in the assessment results of athletes. Interested parties may include coaches, managers, teams, students, or administrators. However, the potential of a third party reviewing the test results may increase social desirability and result in invalid and unreliable information. Therefore, undue pressure to complete an instrument or battery should be considered as a contextual factor.
Another potentially undesirable effect of a third party viewing the test taker's results may be assessment anxiety. The APA standards state that if a test taker is observed to be anxious or reports feeling anxious, this feeling should be taken into account and become a limitation in the interpretation of test data (APA, 2002). Assessment anxiety may be exaggerated in situations where a third party may have access to results. These situations may also lead to faking good or faking bad on the part of respondents who are concerned about how the results may be used. This must also be considered when evaluating the results.
Supervision of Subordinates
In some cases, professionals may hire and train subordinates to help with assessment and measurement tasks. These subordinates may administer, score, and even interpret the results of measurement and assessment. Standard 2.05 of the APA ethics code (APA, 2002) states that professionals utilizing employees, supervisees, or research and teaching assistants for such purposes should take reasonable precautions to put subordinates in situations where (a) they do not face possibly harmful multiple relationships with the client that could affect their objectivity, (b) they are competently trained to perform the delegated task on their own or with supervision, or (c) they are supervised for competent service delivery. Therefore, when using subordinates to help with tasks such as administration, scoring, or interpretation, the professional assumes primary responsibility and liability to ensure that the services are being provided competently. The professional needs to ensure that subordinates are well trained with all potential instruments. To do so, the professional must provide appropriate training, experience, and supervision as well as continue to check the subordinates' work to ensure its quality. As with licensed professionals, not all subordinates have the same competencies with regard to all instruments.
Learn more about Measurement in Sport and Exercise Psychology.
Tools to measure the physical self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002).
Tools to Measure the Physical Self
Reflecting the general historical trends in self-concept research, self-concept instruments used in early sport and exercise research focused on global self-esteem (Marsh, 1997, 2002). However, following the research of Shavelson and colleagues (1976), a number of multidimensional self-concept instruments containing one or more PSC scales were developed. Thus, in a 1974 review, Wylie concluded that at the time most self-concept instruments focused on global self-concept or self-esteem rather than specific domains such as PSC. Although several of the instruments reviewed by Shavelson and colleagues (1976) contained items relating to physical skills and elements of physical appearance, none provided a clearly interpretable measure of PSC. From a practical perspective, these older instruments appear to be of little value for sport and exercise psychologists. The major exception, perhaps, is the Physical Estimation and Attraction Scale (PEAS; Sonstroem, 1978, 1997), along with the theoretical model on which it is based. This instrument was designed to measure two global components: estimation (competency) and attraction. While the PEAS may not be the instrument of choice today, it has a historical significance in that its research incorporated many of the features of the construct validity approach advocated in this chapter, it was heuristic, and it provided an important basis for subsequent research.
In a subsequent 1989 review, Wylie identified several multidimensional self-concept instruments measuring one or more components of PSC that can be differentiated from other specific domains of self-concept and general self-concept. Included in the list were the three SDQ instruments already discussed. Wylie also evaluated Harter's (1985) Self-Perception Profile for Children, which contains two PSC scales (athletic competence and physical appearance). Other multidimensional instruments containing physical scales that were not reviewed by Wylie include the Self-Rating Scale (Fleming & Courtney, 1984), which measures physical ability and physical appearance; the Song and Hattie Test (Hattie, 1992), which measures physical appearance; and the Multidimensional Self-Concept Scale (Bracken, 1996), which has a physical scale that includes physical competence, physical appearance, physical fitness, and health. The Tennessee Self-Concept Scale (Fitts, 1965) is a multidimensional self-concept instrument that also purports to measure PSC. In their review and empirical evaluation of this instrument, Marsh and Richards (1988) found distinguishable physical components reflecting health, neat appearance, physical attractiveness, and physical fitness that were incorporated into a single PSC score. This detailed breakdown of the Tennessee physical scale was supported by relationships with the SDQ physical ability and physical appearance scales in an MTMM study comparing responses to the two instruments. Because each of the clusters based on responses to the Tennessee instrument is represented by only a few items, it is not appropriate to use the instrument to measure these distinct components of PSC. Marsh and Richards argued that PSC measures that combine and confound a wide range of differentiable physical components—such as those based on the Tennessee Self-Concept Scale—should be interpreted cautiously (see similar comments by Fox & Corbin, 1989).
In summary, although multidimensional self-concept instruments based on Shavelson and colleagues' (1976) model provided good support for the construct validity of the physical ability and appearance scales (e.g., Marsh, 2002; Marsh & Peart, 1988), they left unanswered the question of whether PSC is more differentiated than can be explained in terms of one (physical ability) or two (ability, appearance) physical scales. Subsequent PSC instruments were developed specifically to address the issue of the multidimensionality of PSC.
Physical Self-Perception Profile
The Physical Self-Perception Profile (PSPP; Fox, 1990; Fox & Corbin, 1989) is a 30-item inventory that consists of four specific scales and one general physical self-worth factor. The PSPP was developed to document the physical self-perceptions of college students. It was designed to reflect the advances made by Harter (1985) and Shavelson and colleagues (1976) in identifying the physical self as an important construct to measure in its own right and to reflect the hierarchical, multidimensional nature of the physical self. A qualitative approach was used to reveal dimensions of physical self-esteem salient to the population sampled (Fox & Corbin, 1989). The PSPP consists of five 6-item scales of sport (perceived sport competence), body (perceived bodily attractiveness), strength (perceived physical strength and muscular development), condition (perceived level of physical conditioning and exercise), and physical self-worth. Fox (1990) recommended that the 10-item Rosenberg Self-Esteem Scale (Rosenberg, 1965) be used alongside the PSPP to provide a global measure. Fox (1990) reported factor analyses indicating that each item loads most highly on the factor that it is designed to measure and that individual scale reliabilities are in the .80s.
The PSPP research demonstrates (a) good reliability (coefficient alpha of .80-.95; Fox, 1990; Page, Ashford, Fox, & Biddle, 1993; Sonstroem, Speliotis, & Fava, 1992); (b) good test-retest stability over the short term (rs of .74-.89; Fox, 1990); (c) a well-defined, replicable factor structure as shown by CFA (Fox & Corbin, 1989; Sonstroem, Harlow, & Josephs, 1994); (d) convergent and discriminant validity in studies showing PSPP relationships with external criteria such as exercise behaviors, mental adjustment variables, and health complaints (Fox & Corbin, 1989; Sonstroem & Potts, 1996); and (e) applicability for an older adult population (Sonstroem et al., 1994). However, correlations among the PSPP scales are consistently so high (.65-.89 when disattenuated for measurement error; Marsh, Richards, Johnson, Roche, & Tremayne, 1994) that they detract from the instrument's ability to differentiate among the different PSC factors it purports to measure.
Subsequently, a version of the PSPP for children and adolescents was developed and validated—the Children and Youth Physical Self-Perception Profile (CY-PSPP; Eklund, Whitehead, & Welk, 1997; Whitehead, 1995). Like the PSPP, the CY-PSPP is a 30-item inventory consisting of the same five 6-item scales. The CY-PSPP is a substantially revised version of the PSPP that is most appropriately thought of as a different instrument. The CY-PSPP body, strength, and conditioning subscales are based on minor adaptations of the PSPP to make them more suitable for children. However, the global self-worth (self-esteem) and sport scales are completely different. The PSPP did not have a self-esteem scale of its own but included 6 items adapted from the Rosenberg Self-Esteem Scale. On the CY-PSPP, global self-esteem and sport scales from the PSPP were dropped and replaced with corresponding scales from Harter's (1985) Self-Perception Profile for Children. Correlations among factors remained high (e.g., physical self-worth with attractive body adequacy = .8). Eklund and colleagues (1997) suggested that these results are consistent with the developmental patterns among children, as differentiation in self-concept is less defined at younger ages (Harter, 1985). CFAs have supported the instrument's factor structure, with both the CFI (comparative fit index) and NNFI (non-normed fit index) indexes exceeding the .90 criterion for good model fit (Eklund et al., 1997). Moderate correlations (r = .39-.45) with external criteria such as physical activity and physical fitness have demonstrated its convergent and discriminant validity (Welk & Eklund, 2005). The CY-PSPP has been validated with adolescents (Jones, Polman, & Peters, 2009; Welk, Corbin, & Lewis, 1995; Whitehead, 1995) and younger children (Welk, Corbin, Dowell, & Harris, 1997) and has been validated and translated into other languages (åsci, Eklund, Whitehead, Kirazci, & Koca, 2005; Raustorp, Ståhle, Gudasic, Kinnunen, & Mattsson, 2005; Raustorp, Mattsson, Svensson, & Ståhle, 2006).
Both the PSPP and CY-PSPP use a nonstandard response format based on Harter (1985), in which each item consists of a matched pair of statements, one negative and one positive (e.g., “Some people feel that they are not very good when it comes to sports” but “Others feel that they are really good at just about every sport”). Each item consists of two contrasting descriptions, and respondents are asked which description is most like them and whether the description they select is “Sort of true of me” or “Really true of me.” Responses are scored on a scale of 1 to 4, with 1 representing a “Really true of me” response to the negative statement and 4 representing a “Really true of me” response to the positive statement. Whereas this response format is designed to reduce the influence of social desirability, Wylie's (1989) review of Harter's original instruments provided little or no support for this suggestion, and Marsh and colleagues (1994) suggested that there were substantial method effects associated with the nonstandard response scale. This format has also been shown to be confusing, particularly for children (Eiser, Eiser, & Haversmans, 1995), and even for adults (Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994), unless special care is taken to explain the response scale. Using the suggestion of Marsh and colleagues (1994) that confusion over the structured alternative response scale could be overcome by more detailed instructions at the outset, researchers implementing the CY-PSPP used large illustrations for a sample item (Whitehead, 1995). Wichstrom (1995) found that responses for this format were psychometrically stronger when based on typical Likert responses rather than the structured alternative format, but Welk and colleagues (1997) suggested that the nonstandard response scale on the CY-PSPP worked better than Likert responses worked.
In summary, the PSPP and the CY-PSPP are established instruments that have been translated into several languages and have been used with a range of populations. However, the format and the high correlations among factors in both instruments may limit their usefulness in some settings. The CY-PSPP is a substantially revised version of the PSPP specifically developed for children. Although the CY-PSPP should be used instead of the PSPP for child and adolescent samples, it might even be stronger than the original PSPP is for adult samples.
Subsequent to the completion of this chapter, Lindwall and colleagues (2011) published a revised version of the PSPP (PSPP-R). They reviewed critiques of the PSPP response scale such as those noted here (e.g., Marsh, Bar-Eli, Zach, & Richards, 2006; Marsh et al., 1994) and acknowledged that “the idiosyncratic alternative response format has been difficult to understand for some participants” (pp. 310-311). In recognition of these problems, the idiosyncratic response scale that has been such a salient feature of the PSPP was dropped altogether and replaced with a 4-point Likert response using only positively worded items. Lindwall and colleagues (2011) demonstrated the appropriateness of the revised PSPP-R based on a large sample (N = 1,831) of participants from four countries (Sweden, Great Britain, Portugal, and Turkey). However, they did not indicate whether the PSPP-R supersedes the PSPP or is merely an alternative to it. There also wasn't any discussion of the implications for other instruments using similar idiosyncratic response scales (e.g., PSPP- related instruments such as CY-PSPP or Harter's instruments more generally).
Physical Self-Inventory
The Physical Self-Inventory (PSI) is a French adaptation of the PSPP that was originally developed for use with Francophone adults (Ninot, Delignières, & Fortes, 2000). In two preliminary studies, Ninot and colleagues used the nonstandard response scale from the PSPP. However, consistent with previous research (Marsh et al., 1994), they reported that this response scale was problematic. In a third study, the authors used a 6-point Likert response scale; factor analysis results were reasonable, but reliability coefficients were not completely satisfactory. Next the authors replaced the PSPP global physical items with items from the SDQ physical scale and the PSPP global self-esteem items with items from Coopersmith (1967). The final PSI consists of 25 items measuring six PSC factors (four specific and two global, as with the PSPP) and has satisfactory psychometric properties that have been confirmed in subsequent French studies of adults (Masse, Jung, & Pfister, 2001; Stephan, Bilard, Ninot, & Delignières, 2003; Stephan & Maïano, 2007).
Maïano and coworkers (2008) subsequently constructed a short form of the PSI for use with adolescents. They found that not all items from the adult PSI worked with adolescents, but they were able to construct 18-item (PSI-SF, 3 items per scale) and 12-item (PSI-VSF, 2 items per scale) versions that had good psychometric properties. In particular, the measurement and hierarchical structures were consistent with proposals by Fox and Corbin (1989) and were fully invariant across gender. Maïano and coworkers also noted that PSI-SF responses showed very high test-retest stability. Comparison of the PSI-SF and PSI-VSF demonstrated that the measurement model, mean structure, structural parameters, and criterion-related validity were equivalent across samples and versions. Nevertheless, the authors noted a serious limitation that all versions of the PSI share with the PSPP: Very high correlations among the six PSC factors (correlations among latent factors) that, according to the authors, bring “into question the real independence of some of the models' sub-dimensions, and by extension their discriminant validity, a finding that has already been observed by Marsh (2002; Marsh et al., 2006) on analyses of the PSPP” (Maïano et al. 2008, p. 844). However, Maïano and colleagues also noted that because they used a traditional Likert response scale, the high correlations apparently were not due to the structured alternative format used in the PSPP. In summary, particularly the short and very short forms of the PSI have made a potentially important contribution to applied research. However, further research is needed to evaluate more fully the robustness of support for construct validity and application in non-French-speaking settings.
Richards Physical Self-Concept Scale
The Richards Physical Self-Concept Scale (RPSCS; Marsh et al., 1994; Richards, 1988) is a 35-item instrument designed to measure six specific components of PSC (body build, appearance, health, physical competence, strength, action) and one general physical satisfaction factor. Each item is a simple declarative statement, and subjects respond on an 8-point true-false scale. Extensive research in Australia (e.g., Marsh et al., 1994; Richards, 1988) has indicated that RPSCS responses have good psychometric properties. The factor structure is very robust, generalizing well over ages from 8 to 80 y and over gender.
RPSCS research has demonstrated (a) good reliability (coefficient alpha of .79-.93; Marsh et al., 1994; Richards & Marsh, 2005); (b) good test-retest stability over the short term (coefficient alpha of .77-.90 over 3 wk; Richards, 1988); (c) a well-defined, replicable factor structure as shown by CFA (Marsh et al., 1994; Richards, 2004); (d) a factor structure that is invariant across gender, as shown by multiple-group CFA (Richards, 2004), and across a wide age range; (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (Marsh et al., 1994; Richards & Marsh, 2005); and (f) applicability for participants aged 8 to 60 y and for both genders (Marsh et al., 1994; Richards, 1988, 2004; Richards & Marsh, 2005). In summary, the RPSCS is regarded as a valid, reliable, and structurally sound instrument that has been tested across both genders and a wide population of ages. The applicability across such a wide range of ages is a particular strength.
Physical Self-Description Questionnaire
Extending Fleishman's (1964) classic research on the structure of physical fitness, the Physical Self-Description Questionnaire (PSDQ) scales reflect some of the original SDQ scales and parallel physical fitness components identified in a CFA of physical fitness measures (Marsh, 1993). The PSDQ consists of nine specific components of PSC (strength, body fat, activity, endurance and fitness, sport competence, coordination, health, appearance, and flexibility), a global physical scale, and a global self-esteem scale. Each of the 70 PSDQ items is a simple declarative statement, and individuals respond on a 6-point true-false scale. The PSDQ is designed for adolescents but is also appropriate for older participants.
PSDQ research has demonstrated (a) good reliability (median coefficient alpha of .92) across the 11 scales (Marsh, 1996b; Marsh et al., 1994); (b) good test-retest stability over the short term (median r = .83 over 3 mo) and longer term (median r = .69 over 14 mo; Marsh, 1996b); (c) a well-defined, replicable factor structure as shown by CFA (Marsh, 1996b; Marsh et al., 1994); (d) a factor structure that is invariant over gender as shown by multiple-group CFA (Marsh et al., 1994); (e) convergent and discriminant validity as shown by MTMM studies of responses to three PSC instruments (see Marsh et al., 1994); (f) convergent and discriminant validity as shown by PSDQ relationships with external criteria (e.g., measures of body composition, physical activity, endurance, strength, and flexibility; see Marsh, 1996a, 1997); and (g) applicability for participants aged 12 to 18 y (or older) and for elite athletes and nonathletes (Marsh, Hey, Roche, & Perry, 1997; Marsh, Perry, Horsely, & Roche, 1995). In summary, the PSDQ is a psychometrically strong instrument.
Marsh, Martin, and Jackson (2010) recently presented a new short form of the PSDQ (PSDQ-S). This short form balances brevity and psychometric quality in relation to established guidelines for evaluating short forms (e.g., Marsh, Ellis, Parada, Richards, & Heubeck, 2005; Smith, McCarthy, & Anderson, 2000) with the construct validity approach that is the basis of PSDQ research. Based on the PSDQ normative archive, 40 of 70 items were selected and evaluated in a new cross-validation sample (N = 708 Australian adolescents). To test the generalizability of results, the authors considered four additional samples: Australian adolescent elite athletes (n = 349), Spanish adolescents (n = 986), Israeli university students (N = 395), and Australian senior citizens (n = 760). Reliabilities for the 40 PSDQ-S items were consistently high in the cross-validation sample (.81-.94; median = .89) and senior sample (.81-.94; median = .91) and reliabilities in the cross-validation sample were higher than they were in comparable groups completing the 70-item PSDQ. The PSDQ-S factor structure in the cross-validation sample was well defined and highly similar to that based on the archive sample as well as to those based on the other four groups. Study 1, using a missing-by-design variation of multigroup invariance tests, showed that invariant factor structures were invariant based on 40 PSDQ-S items and 70 PSDQ items. Study 2 demonstrated factorial invariance of responses over 1 y (test-retest correlations of .57-.90; median = .77) and good support for convergent and discriminant validity in relation to time. Study 3 showed good and nearly identical support for convergent and discriminant validity of PSDQ and PSDQ-S responses in relation to responses on the PSPP and PSC instruments. The four studies reported by Marsh and coworkers demonstrated new, evolving strategies for the construction and evaluation of short forms that support the PSDQ-S. The authors concluded that the strong support for the psychometric properties and construct validity of the widely used PSDQ instrument generalizes very well to the PSDQ-S.
Elite Athlete Self-Description Questionnaire
The PSC instruments discussed thus far may be suitable for elite athletes (e.g., Marsh et al., 1995). There may, however, be other components to PSC that are particularly relevant for elite athletes, and thus the Elite Athlete Self-Description Questionnaire (EASDQ; Marsh, Hey, Roche, et al., 1997; Marsh, Hey, Johnson, & Perry, 1997) was developed to address these other components. For the EASDQ, it was hypothesized that overall performance by elite athletes is a function of skill level, body suitability, aerobic and anaerobic fitness, and mental competence. Thus Marsh and colleagues developed the EASDQ to measure these six factors. For each scale, they developed a pool of items that sport psychologists at the Australian Institute of Sport evaluated for their suitability for elite athletes. Pilot studies were conducted to select the best items to represent each factor. A compromise between brevity and psychometric soundness was achieved, with acceptable levels of reliability (e.g., all scales having reliability estimates of at least .8) based on short scales (4-6 items per scale).
EASDQ research demonstrates (a) adequate reliability (median coefficient alpha of .85) across the six scales (Marsh, Hey, Johnson, et. al., 1997); (b) a well-defined, replicable factor structure as shown by CFA (Marsh, Hey, Johnson, et. al., 1997; Marsh, Hey, Roche, et al., 1997); (c) applicability for elite athletes aged 12 y or older (Marsh, Hey, Roche, et al., 1997); and (d) predictive validity as shown by its ability to predict swimming performances in world championships after controlling for previous personal best performances (Marsh & Perry, 2005). In summary, the EASDQ is a reliable and valid instrument for elite athletes of all ages. More research is needed, however, to relate EASDQ responses to external validity criteria such as those used in PSDQ research and to criteria that are more specific to elite athletes (e.g., actual performance in competition).
Learn more about Measurement in Sport and Exercise Psychology.