Book Text

EVALUATION OF CLINICAL PERFORMANCE DURING SIMULATION SCENARIOS

The introduction of patient simulators allowed the study of human performance when responding to critical events (see Chapter 83 ). Therefore, techniques are needed to assess anesthesiologists' performance. Performance can be divided into two components: technical performance, which is the appropriateness and thoroughness of the medical and technical response to the critical event, and behavioral performance, which is the appropriate use of sound crisis management behavior (such as leadership, communication, distribution of workload). ^[63] ^[64]

Assessment of medical and technical responses has resulted in "technical scores" suggested by different authors.^[64] ^[65] ^[66] ^[67] ^[68] ^[69] ^[70] ^[71]

Simulation offers some benefits in assessing performance. Because the nature and cause of the critical incident are known, one can, in advance, construct a list of essential or appropriate technical activities with relative weights of importance. For example, when assessing technical performance in managing malignant hyperthermia, terminating the trigger agent and administering intravenous dantrolene would be essential items, whereas cooling measures, hyperventilation, and bicarbonate therapy would be among many appropriate (but less critical) technical responses. One can also predict in advance specific technical pitfalls. For malignant hyperthermia, such pitfalls could include diluting dantrolene with the wrong diluent (not sterile water) or an insufficient quantity of diluent. These pitfalls are known to plague those unfamiliar with therapy for malignant hyperthermia.

The University of Toronto (Canadian Simulation Centre) demonstrated good inter-rater reliability between two raters of a simplified performance assessment rating scale. This scale was tested by using scripted, role-played variants

3085

**TABLE 84-7** -- Characteristics of anesthesia crisis resource management-like simulator training
Objectives
To learn generic principles of complex problem solving, decision making, resource management, and teamwork behavior
To improve participants' medical/technical, cognitive, and social skills in the recognition and treatment of realistic, complex medical situations
By enhancing capacity for reflection, self-discovery, and teamwork and building a personalized tool kit of attitudes, behaviors, and skills
Aim
Prevent, ameliorate, and resolve critical incidents
Setting Characteristics
Realistic simulation environment replicating a relevant work setting
Personnel to individually represent those found in the typical work environment of the participant, including nurses, surgeons, and technicians
The bulk of the training course consists of realistic simulations followed by detailed debriefings
The primary participant can request and receive help from other participants
Participants may rotate between different roles during different scenarios to gain different perspectives
Simulation scenarios may be supplemented by additional modalities, including such activities as assigned readings, didactic presentations, analysis of videotapes, role playing, or group discussions
The training involves significant time (more than 4 hours, typically 8 or more) and is conducted with a small group of participants
Content Characteristics
Scenarios require participants to engage in appropriate professional interactions
At least 50% of the emphasis of the course is on crisis resource management behavior (nontechnical skills) rather than medical or technical issues
Key points in teaching ACRM are mentioned in Table 84-5
Observation only is not equivalent to actual participation in the course
Faculty Characteristics
The training is intense and entails high involvement of faculty with the participants and a low participant-to-faculty ratio
The course is intensive and uses highly interactive instruction, critique, and feedback
Debriefings are led by one or more instructors with special training or experience
Debriefing Characteristics
Debriefings are performed with the whole group of participants together and the use of audio/video recordings of the simulation sessions
The primary emphasis in debriefing is on exploring aspects of nontechnical skills (ACRM)
Debriefings emphasize constructive critique in which the participants are given the greatest opportunity possible to speak and to critique and learn from each other (debriefing facilitation)

of standardized scenarios containing multiple anesthetic problems that were acted out for videotape on the simulator.^[72] Subsequent analysis of the rating system showed poor internal consistency among the different anesthetic problems presented in the scenarios. This poor consistency suggests that the items acted "independently, reflecting different aspects of anesthesia care." When aggregated across the five problems, the results were affected by the "level of importance placed on each problem by individual subjects."^[72] ^[73] The Danish group also gave a preliminary report on their attempts to validate subjective and objective evaluation parameters, a technique they named "PEANUTS" (Performance Enhancement in Anesthesia Using the Training Simulator). ^[74]

An interesting problem for technical scoring systems is that the medical domain remains complex enough that subjects may perform innovative or unconventional clinical actions that are obviously appropriate but were not included in the fixed scoring checklists.

Forrest and colleagues developed a very detailed scoring system to measure the technical performance of novice anesthetists.^[69] For development they used a modified Delphi technique that was explained in an excellent accompanying editorial by Glavin and Maran about assessment.^[75] They studied six novice anesthetists five times between weeks 1 and 12 of their training in the simulator. The videotapes of the simulator sessions were then assessed by two raters. To gain more information about the scoring system, they also had experienced clinicians do the sessions and be evaluated and had one videotape scored by another five raters. The results show high content validity, high construct validity, and good inter-rater reliability for their score. A significant difference in performance was noted between weeks 1 and 12, but not between any other weeks. Because even the experienced group did not nearly achieve the maximum score (the average was around 70%), the relevance or precision of the Delphi technique was questioned. When looking closer at the Delphi technique used, one recognizes that the change from the first to the second round was only marginal, thus questioning the need or impact of this technique. In addition, the tasks added by the respondents of the first Delphi round were not included in the rating score. Clearly, as Forrest and colleagues developed

3086

a technical score, they did not perform a cognitive task analysis and left all the important nontechnical skills^[76] (see the later section "Nontechnical Skills in Anesthesia") untouched. On the other hand, the score used elements of a nontechnical nature that were "hidden" behind the technical score (e.g., "discusses and confirms the procedure with the surgeon").

Schwid and associates and the Anesthesia Research Consortium performed a large multicenter study that also used only a technical score for assessment of the performance of residents. The study showed good test characteristics, including construct validity, criterion validity, internal consistency, and inter-rater reliability. The study is referenced in more detail later in this chapter.

Can the "clinical" outcome of the simulator's mathematical physiology predict how a real patient would have fared under that individual's care? In extreme cases, this is likely to be true. A subject who demonstrates totally erroneous decision making (e.g., failure to defibrillate a simulated patient with ventricular fibrillation) quickly allows the patient's state to deteriorate unmistakably. However, the mathematical models are not sufficient to predict what would happen to any actual patient after complex sequences of therapy and more subtle patient care judgments.

Thus, the clinical outcome of the simulated patient is one datum that can be used to assess the performance of the anesthetist on a simulation scenario, but for the foreseeable future, any credible performance measurement technique must involve subjective and semiobjective judgments by clinical experts.

Nontechnical Skills in Anesthesia

A study by Morgan and Cleave-Hogg concluded that "the simulator environment is somehow unique and allows different behaviors to be assessed."^[77] As Glavin and Maran stated, "any scoring system that attempts to address the assessment of clinical competence clearly has to address both technical and non-technical skills"; consequently, there is still a long road ahead in measuring performance.^[75] Over the last 20 years the health care professions have become more aware of the importance of "nontechnical" skills in the delivery of safe, high-quality medical care. This recognition has brought an increased need for assessment, evaluation, and training of these skills. Patient simulators were perhaps the first opportunity to show and train these behaviors under realistic stressful conditions.^[17] ^[30] ^[38] ^[70] ^[78] The introduction of simulators and the associated training concepts accelerated understanding of these human factors by the medical community.^[79] It is obvious that some of the needed "crisis management behaviors"^[39] ^[64] ^[80] can be trained without the use of simulators, as shown in other domains (aviation, oil platforms, military) (see also Chapter 83 ).^[81] ^[82] ^[83] ^[84] ^[85] ^[86] ^[86A] It is also known that the baseline level of CRM performance is rather low.^[23] Helmreich states that as a first step in establishing error management programs, it is necessary to provide formal training in teamwork, the nature of error, and the limitations of human performance.^[87] The role of "seminar-based" training in CRM principles relative to that of hands-on simulation-based CRM training is not yet established. Given the experience in the airline industry, it is likely that to fully address CRM skills for both trainees and experienced personnel, a combination of seminars and simulation-based exercises will be needed.

Assessing Nontechnical Skills Is More Subjective than Assessing Technical Skills

Two research groups (VA-Stanford and the University of Basel) studied adaptations of the anchored subjective rating scales developed by the NASA/University of Texas Aerospace Crew Performance Project. The VA-Stanford group published preliminary data reviewing the inter-rater reliability of subjective ratings of behavior on five-point anchored scales.^[64] ^[88] Using a fairly stringent test of inter-rater reliability (the topic is quite complex in the statistical literature), they found only moderate reliability when five trained raters used a five-point scale to score 14 anesthesia teams, each managing two different complex critical events in the simulator (malignant hyperthermia and cardiac arrest). Despite some difficulty in agreement on the operational definitions of each type of behavior, the investigators stated that the largest problem in achieving agreement was the high variability of each behavior over the course of a simulation. For example, an anesthesia crew could show evidence of good communication at one instant, only to be shouting ambiguous orders into thin air at the next instant. Aggregating these behaviors into a single rating was extremely difficult, even for bounded time segments of the scenario. These data demonstrate the importance of evaluating performance by more than one rater, who no matter how well trained, may produce scores that differ significantly from another single rater. The investigators suggested combining scores from a minimum of a pair of raters inasmuch as it was shown that the mean of scores from two raters had a very low probability of differing from the mean of five raters by more than a single rating point.

The behavioral markers of their score are shown in Table 84-8 and compared with Fletcher's "anesthesia nontechnical skills (ANTS)" score and the ACRM key teaching points (see Table 84-5 ).

The ANTS System

Fletcher, from the Industrial Psychology Group of Aberdeen (headed by Rhona Flin) and working with clinicians from the Scottish Clinical Simulation Centre (Glavin and Maran), performed an in-depth review of the role of nontechnical skills in anesthesia.^[76] Fletcher stated that nontechnical skills have not been explicitly addressed in the traditional education and training of anesthesiologists, even though they have always been demonstrated and used during clinical work. The group analyzed incident reports and observations of real cases, as well as attitude questionnaires and theoretical models.^[39] ^[89] ^[90] Like others, they recognized that simulation offers the opportunity to identify, develop, measure, and train nontechnical skills in a safe learning environment, so they also included significant observations during realistic simulations.

Incident reports proved very limited because they did "not provide the finer-grained level of information necessary to understand where the skills broke down."^[91] They defined nontechnical skills as "attitudes and behaviours not directly related to the use of medical expertise, drugs

3087

**TABLE 84-8** -- Nontechnical skills in anesthesia: classification, markers, and teaching points
	Anesthesia Nontechnical Skills (Fletcher et al.)^[76] ^[91] ^[96]		Performance Markers (Gaba et al.) ^[64]	Key Teaching Points in Anesthesia Crisis Resource Management (Gaba et al.)^[38] ^[39]
Concepts	Elements	Categories	Markers	Reminders
Cognitive and mental skills	Planning and preparing	Task management	Orientation to case	Anticipate and plan
				Know your environment
	Prioritizing		Leadership (also a social and interpersonal skill)	Exercise leadership
				Set priorities dynamically
	Providing and maintaining standards		Planning	Use cognitive aids
	Identifying and using resources		Workload distribution	Distribute the workload
				Mobilize all available resources
	Gathering information	Situation awareness	Anticipation	Use all available information
	Recognizing and understanding		Vigilance	Allocate of attention
	Anticipating			Anticipate and plan
	Identifying options	Decision making	Preparation
	Balancing risks and selecting options		Re-evaluation	Prevent and manage fixation errors
	Re-evaluating			Re-evaluate repeatedly
Social and interpersonal skills	Coordinating activities with team	Team working	Inquiry/assertion	Communicate effectively
				Teamwork
	Exchanging information		Communication feedback	Communicate effectively
	Using authority and assertiveness		Group climate	Exercise leadership and followership
	Assessing capabilities		Followership	Exercise followership
	Supporting others
Overall assessment	Not applicable		Overall nontechnical performance of the primary anesthetist	Teamwork!
	Assessments are made only at the element and category level			Concentrate on what is right, not who is right
			Overall nontechnical performance of the whole team

or equipment." Although nontechnical skills can be seen under the heading of "human factors," it is preferred to address them as "nontechnical skills," which is more specific.

As in the ACRM instructional paradigm,^[38] ^[39] ^[92] Fletcher and colleagues identified two categories of nontechnical skills:

: • Cognitive and mental skills, including decision making, planning, and situation awareness
: • Social and interpersonal skills with aspects of team working, communication, and leadership

Fletcher's ANTS score is shown in Table 84-8 along with the behavioral markers of Gaba's team and the ACRM key teaching points. A description of the ANTS categories and elements, including examples of good practice and poor practice, is presented in Table 84-9 .

The structure of the new ANTS scheme was derived from a system of behavioral markers that has been developed for aviation in an European project called "NOTECHS," which itself was an evolution of the UT-Markers of the University of Texas (Helmreich). A summarized comparison of the aviation systems and explanations about using nontechnical markers for training and evaluation can be found in the downloadable documentation of the "Group Interaction in High Risk Environments," created by an international group of human factors specialists (http://www2.hu-berlin.de/gihre/Download/Publications/GIHRE2.pdf). ^[93] Several comments about the ANTS approach are appropriate.

ANTS' intent is to score only those skills that can be identified unambiguously through observable behavior. Such restriction may enhance reliability of the scoring but could exclude relevant personal factors such as self-presentation, stress management, and maintaining perspective. ANTS assumes that "communication" is included or "even pervades" all other categories and does not score communication as a separate skill. This approach is in contrast to that of others who believe that communication is a specific skill that should be rated separately.^[81] ^[94] ^[95] The category "task management" of the ANTS includes the

3088

**TABLE 84-9** -- Example of a description of the anesthesia nontechnical skills system (ANTS system—v.10)
Task Management—skills for organizing resources and required activities to achieve goals, be they individual case plans or longer-term scheduling issues. It has four skill elements: planning and preparing, prioritizing, providing and maintaining standards, and identifying and using resources
*Planning and preparing*—developing in advance primary and contingency strategies for managing tasks, reviewing these tasks, and updating them if required to ensure that goals will be met; making necessary arrangements to ensure that plans can be achieved
Behavioral markers for good practice	Behavioral markers for poor practice
Communicates plan for case to relevant staff	Does not adapt plan in light of new information
Reviews case plan in light of changes	Does not ask for drugs or equipment until the last minute
Makes postoperative arrangements for patient	Does not have emergency/alternative drugs available that are suitable for the patient
Lays out drugs and equipment needed before starting case	Fails to prepare postoperative management plan
*Prioritizing*—scheduling tasks, activities, issues, information channels, etc., according to importance (e.g., because of time, seriousness, plans); being able to identify key issues and allocate attention to them accordingly; and avoiding being distracted by less important or irrelevant matters
Behavioral markers for good practice	Behavioral markers for poor practice
Discusses priority issues in case	Becomes distracted by teaching trainees
Negotiates sequence of cases on list with surgeon	Fails to allocate attention to critical areas
Conveys order of actions in critical situations	Fails to adapt list to changing clinical conditions
*Providing and maintaining standards*—supporting safety and quality by adhering to accepted principles of anesthesia; following, when possible, codes of good practice, treatment protocols or guidelines, and mental checklists
Behavioral markers for good practice	Behavioral markers for poor practice
Follows published protocols and guidelines	Does not check blood with patient and notes
Cross-checks drug labels	Breaches guidelines such as minimum monitoring standards
Checks machine at beginning of each session	Fails to confirm patient identity and consent details
Maintains accurate anesthetic records	Does not adhere to emergency protocols or guidelines
From Fletcher GCL, Flin R, Glavin RJ, Maran NJ: Framework for Observing and Rating Anaesthetists' Non-Technical Skills—Anaesthetists' Non-Technical Skills (ANTS) System v1.0, 22nd version. Aberdeen, Scotland, University of Aberdeen, 2003.

element "providing and maintaining standards," which might be a point of discussion if it can be an observable behavior "not directly related to medical expertise" as stated in the definition. In addition, there might be a problem because there are not so many well-accepted standards in medicine (in strong contrast to, e.g., aviation).

ANTS makes no distinction between nontechnical skills needed for different clinical settings or clinical teams under the assumption that behavioral skills are completely generic and context free.

Not all nontechnical skills will be expected to be observed during every scenario or clinical situation. It is important to delineate between "required behaviors" in a given scenario and the generic set of behaviors. If a required behavior is not observed, the rating system advises that one rate that behavior as poor nontechnical skills, whereas the absence of a given nontechnical skill behavior otherwise has no particular meaning and should be rated as "not observed." As in all subjective nontechnical performance systems, training plus calibration of raters is necessary.

Fletcher and coworkers evaluated ANTS by using scripted videotapes taken in a realistic simulator.^[96] Fifty consultant anesthetists were given 4 hours of training as raters and then rated eight test scenarios ranging from 4 to 21 minutes each. They rated performance at the level of specific elements and also at the broader level of category (see Table 84-8 ) by using a 4-point scale (they could also enter "not observed"). Three investigator anesthetists also rated the scenarios and agreed on a "reference rating" to be used as the benchmark for the study. In questionnaires the ANTS system was evaluated by raters as relatively complete, possibly with superfluous elements. Raters found that nontechnical skills were usually observable, and most thought that it was not difficult to relate observed behaviors to ANTS elements. The interrater reliability, accuracy, and internal consistency of the ratings were good to acceptable and are presented in Table 84-10 .

Though well performed, there are some caveats about these data. Because scripted videos were used as the basis for ratings, it is possible that the observability or scorability of the desired behaviors was greater than in actual simulation scenarios. The scripted scenarios were rather short (4 to 21 minutes), perhaps making it relatively easy to remember certain aspects of the performance and reducing the likelihood of raters encountering the problem of aggregating a score from fluctuating behaviors over time. The window of "accuracy" as "±1 point" on a 4-point scale seems to be rather wide.

On the whole, the ANTS system appears to be a useful tool to further enhance assessment of nontechnical skills in anesthesia, and its careful derivation from a current system of nontechnical assessment in aviation (NOTECHS) may allow for some interdomain comparisons.

3089

**TABLE 84-10** -- Results of an evaluation study of anesthesia nontechnical skills by Fletcher and colleagues
Measure	Score	Range	Max/Min	Element/Category
Inter-rater agreement	Element level	0.55–0.67	Highest element	Identifying/using resources
			Lowest element	Recognizing/understanding
	Category level	0.56–0.65	Highest category	Task management
				Team working
			Lowest category	Situation awareness
Accuracy relative to the reference rater's score	% within ±1 point	88%–97%	Highest element	Identifying options
			Lowest element	Assessing capabilities
	Mean absolute deviation	0.49–0.84, depending on elements	Highest element	Authority/assertiveness
			Lowest element	Provide standards
From Fletcher R, Flin P, McGeorge R, et al: Anaesthetists' non-technical skills (ANTS): Evaluation of a behavioural marker system. Br J Anaesth 90:580–588, 2003.