|
The introduction of patient simulators allowed the study of human performance when responding to critical events (see Chapter 83 ). Therefore, techniques are needed to assess anesthesiologists' performance. Performance can be divided into two components: technical performance, which is the appropriateness and thoroughness of the medical and technical response to the critical event, and behavioral performance, which is the appropriate use of sound crisis management behavior (such as leadership, communication, distribution of workload). [63] [64]
Assessment of medical and technical responses has resulted in "technical scores" suggested by different authors.[64] [65] [66] [67] [68] [69] [70] [71]
Simulation offers some benefits in assessing performance. Because the nature and cause of the critical incident are known, one can, in advance, construct a list of essential or appropriate technical activities with relative weights of importance. For example, when assessing technical performance in managing malignant hyperthermia, terminating the trigger agent and administering intravenous dantrolene would be essential items, whereas cooling measures, hyperventilation, and bicarbonate therapy would be among many appropriate (but less critical) technical responses. One can also predict in advance specific technical pitfalls. For malignant hyperthermia, such pitfalls could include diluting dantrolene with the wrong diluent (not sterile water) or an insufficient quantity of diluent. These pitfalls are known to plague those unfamiliar with therapy for malignant hyperthermia.
The University of Toronto (Canadian Simulation Centre) demonstrated good inter-rater reliability between two raters of a simplified performance assessment rating scale. This scale was tested by using scripted, role-played variants
Objectives |
To learn generic principles of complex problem solving, decision making, resource management, and teamwork behavior |
To improve participants' medical/technical, cognitive, and social skills in the recognition and treatment of realistic, complex medical situations |
By enhancing capacity for reflection, self-discovery, and teamwork and building a personalized tool kit of attitudes, behaviors, and skills |
Aim |
Prevent, ameliorate, and resolve critical incidents |
Setting Characteristics |
Realistic simulation environment replicating a relevant work setting |
Personnel to individually represent those found in the typical work environment of the participant, including nurses, surgeons, and technicians |
The bulk of the training course consists of realistic simulations followed by detailed debriefings |
The primary participant can request and receive help from other participants |
Participants may rotate between different roles during different scenarios to gain different perspectives |
Simulation scenarios may be supplemented by additional modalities, including such activities as assigned readings, didactic presentations, analysis of videotapes, role playing, or group discussions |
The training involves significant time (more than 4 hours, typically 8 or more) and is conducted with a small group of participants |
Content Characteristics |
Scenarios require participants to engage in appropriate professional interactions |
At least 50% of the emphasis of the course is on crisis resource management behavior (nontechnical skills) rather than medical or technical issues |
Key points in teaching ACRM are mentioned in Table 84-5 |
Observation only is not equivalent to actual participation in the course |
Faculty Characteristics |
The training is intense and entails high involvement of faculty with the participants and a low participant-to-faculty ratio |
The course is intensive and uses highly interactive instruction, critique, and feedback |
Debriefings are led by one or more instructors with special training or experience |
Debriefing Characteristics |
Debriefings are performed with the whole group of participants together and the use of audio/video recordings of the simulation sessions |
The primary emphasis in debriefing is on exploring aspects of nontechnical skills (ACRM) |
Debriefings emphasize constructive critique in which the participants are given the greatest opportunity possible to speak and to critique and learn from each other (debriefing facilitation) |
An interesting problem for technical scoring systems is that the medical domain remains complex enough that subjects may perform innovative or unconventional clinical actions that are obviously appropriate but were not included in the fixed scoring checklists.
Forrest and colleagues developed a very detailed scoring system to measure the technical performance of novice anesthetists.[69] For development they used a modified Delphi technique that was explained in an excellent accompanying editorial by Glavin and Maran about assessment.[75] They studied six novice anesthetists five times between weeks 1 and 12 of their training in the simulator. The videotapes of the simulator sessions were then assessed by two raters. To gain more information about the scoring system, they also had experienced clinicians do the sessions and be evaluated and had one videotape scored by another five raters. The results show high content validity, high construct validity, and good inter-rater reliability for their score. A significant difference in performance was noted between weeks 1 and 12, but not between any other weeks. Because even the experienced group did not nearly achieve the maximum score (the average was around 70%), the relevance or precision of the Delphi technique was questioned. When looking closer at the Delphi technique used, one recognizes that the change from the first to the second round was only marginal, thus questioning the need or impact of this technique. In addition, the tasks added by the respondents of the first Delphi round were not included in the rating score. Clearly, as Forrest and colleagues developed
Schwid and associates and the Anesthesia Research Consortium performed a large multicenter study that also used only a technical score for assessment of the performance of residents. The study showed good test characteristics, including construct validity, criterion validity, internal consistency, and inter-rater reliability. The study is referenced in more detail later in this chapter.
Can the "clinical" outcome of the simulator's mathematical physiology predict how a real patient would have fared under that individual's care? In extreme cases, this is likely to be true. A subject who demonstrates totally erroneous decision making (e.g., failure to defibrillate a simulated patient with ventricular fibrillation) quickly allows the patient's state to deteriorate unmistakably. However, the mathematical models are not sufficient to predict what would happen to any actual patient after complex sequences of therapy and more subtle patient care judgments.
Thus, the clinical outcome of the simulated patient is one datum that can be used to assess the performance of the anesthetist on a simulation scenario, but for the foreseeable future, any credible performance measurement technique must involve subjective and semiobjective judgments by clinical experts.
A study by Morgan and Cleave-Hogg concluded that "the simulator environment is somehow unique and allows different behaviors to be assessed."[77] As Glavin and Maran stated, "any scoring system that attempts to address the assessment of clinical competence clearly has to address both technical and non-technical skills"; consequently, there is still a long road ahead in measuring performance.[75] Over the last 20 years the health care professions have become more aware of the importance of "nontechnical" skills in the delivery of safe, high-quality medical care. This recognition has brought an increased need for assessment, evaluation, and training of these skills. Patient simulators were perhaps the first opportunity to show and train these behaviors under realistic stressful conditions.[17] [30] [38] [70] [78] The introduction of simulators and the associated training concepts accelerated understanding of these human factors by the medical community.[79] It is obvious that some of the needed "crisis management behaviors"[39] [64] [80] can be trained without the use of simulators, as shown in other domains (aviation, oil platforms, military) (see also Chapter 83 ).[81] [82] [83] [84] [85] [86] [86A] It is also known that the baseline level of CRM performance is rather low.[23] Helmreich states that as a first step in establishing error management programs, it is necessary to provide formal training in teamwork, the nature of error, and the limitations of human performance.[87] The role of "seminar-based" training in CRM principles relative to that of hands-on simulation-based CRM training is not yet established. Given the experience in the airline industry, it is likely that to fully address CRM skills for both trainees and experienced personnel, a combination of seminars and simulation-based exercises will be needed.
Two research groups (VA-Stanford and the University of Basel) studied adaptations of the anchored subjective rating scales developed by the NASA/University of Texas Aerospace Crew Performance Project. The VA-Stanford group published preliminary data reviewing the inter-rater reliability of subjective ratings of behavior on five-point anchored scales.[64] [88] Using a fairly stringent test of inter-rater reliability (the topic is quite complex in the statistical literature), they found only moderate reliability when five trained raters used a five-point scale to score 14 anesthesia teams, each managing two different complex critical events in the simulator (malignant hyperthermia and cardiac arrest). Despite some difficulty in agreement on the operational definitions of each type of behavior, the investigators stated that the largest problem in achieving agreement was the high variability of each behavior over the course of a simulation. For example, an anesthesia crew could show evidence of good communication at one instant, only to be shouting ambiguous orders into thin air at the next instant. Aggregating these behaviors into a single rating was extremely difficult, even for bounded time segments of the scenario. These data demonstrate the importance of evaluating performance by more than one rater, who no matter how well trained, may produce scores that differ significantly from another single rater. The investigators suggested combining scores from a minimum of a pair of raters inasmuch as it was shown that the mean of scores from two raters had a very low probability of differing from the mean of five raters by more than a single rating point.
The behavioral markers of their score are shown in Table 84-8 and compared with Fletcher's "anesthesia nontechnical skills (ANTS)" score and the ACRM key teaching points (see Table 84-5 ).
Fletcher, from the Industrial Psychology Group of Aberdeen (headed by Rhona Flin) and working with clinicians from the Scottish Clinical Simulation Centre (Glavin and Maran), performed an in-depth review of the role of nontechnical skills in anesthesia.[76] Fletcher stated that nontechnical skills have not been explicitly addressed in the traditional education and training of anesthesiologists, even though they have always been demonstrated and used during clinical work. The group analyzed incident reports and observations of real cases, as well as attitude questionnaires and theoretical models.[39] [89] [90] Like others, they recognized that simulation offers the opportunity to identify, develop, measure, and train nontechnical skills in a safe learning environment, so they also included significant observations during realistic simulations.
Incident reports proved very limited because they did "not provide the finer-grained level of information necessary to understand where the skills broke down."[91] They defined nontechnical skills as "attitudes and behaviours not directly related to the use of medical expertise, drugs
|
Anesthesia Nontechnical Skills (Fletcher et al.)[76] [91] [96] | Performance Markers (Gaba et al.) [64] | Key Teaching Points in Anesthesia Crisis Resource Management (Gaba et al.)[38] [39] | |
---|---|---|---|---|
Concepts | Elements | Categories | Markers | Reminders |
Cognitive and mental skills | Planning and preparing | Task management | Orientation to case | Anticipate and plan |
|
|
|
|
Know your environment |
|
Prioritizing |
|
Leadership (also a social and interpersonal skill) | Exercise leadership |
|
|
|
|
Set priorities dynamically |
|
Providing and maintaining standards |
|
Planning | Use cognitive aids |
|
Identifying and using resources |
|
Workload distribution | Distribute the workload |
|
|
|
|
Mobilize all available resources |
|
Gathering information | Situation awareness | Anticipation | Use all available information |
|
Recognizing and understanding |
|
Vigilance | Allocate of attention |
|
Anticipating |
|
|
Anticipate and plan |
|
Identifying options | Decision making | Preparation |
|
|
Balancing risks and selecting options |
|
Re-evaluation | Prevent and manage fixation errors |
|
Re-evaluating |
|
|
Re-evaluate repeatedly |
Social and interpersonal skills | Coordinating activities with team | Team working | Inquiry/assertion | Communicate effectively |
|
|
|
|
Teamwork |
|
Exchanging information |
|
Communication feedback | Communicate effectively |
|
Using authority and assertiveness |
|
Group climate | Exercise leadership and followership |
|
Assessing capabilities |
|
Followership | Exercise followership |
|
Supporting others |
|
|
|
Overall assessment | Not applicable |
|
Overall nontechnical performance of the primary anesthetist | Teamwork! |
|
Assessments are made only at the element and category level |
|
|
Concentrate on what is right, not who is right |
|
|
|
Overall nontechnical performance of the whole team |
|
As in the ACRM instructional paradigm,[38] [39] [92] Fletcher and colleagues identified two categories of nontechnical skills:
Fletcher's ANTS score is shown in Table 84-8 along with the behavioral markers of Gaba's team and the ACRM key teaching points. A description of the ANTS categories and elements, including examples of good practice and poor practice, is presented in Table 84-9 .
The structure of the new ANTS scheme was derived from a system of behavioral markers that has been developed for aviation in an European project called "NOTECHS," which itself was an evolution of the UT-Markers of the University of Texas (Helmreich). A summarized comparison of the aviation systems and explanations about using nontechnical markers for training and evaluation can be found in the downloadable documentation of the "Group Interaction in High Risk Environments," created by an international group of human factors specialists (http://www2.hu-berlin.de/gihre/Download/Publications/GIHRE2.pdf). [93] Several comments about the ANTS approach are appropriate.
ANTS' intent is to score only those skills that can be identified unambiguously through observable behavior. Such restriction may enhance reliability of the scoring but could exclude relevant personal factors such as self-presentation, stress management, and maintaining perspective. ANTS assumes that "communication" is included or "even pervades" all other categories and does not score communication as a separate skill. This approach is in contrast to that of others who believe that communication is a specific skill that should be rated separately.[81] [94] [95] The category "task management" of the ANTS includes the
Task Management—skills for organizing resources and required activities to achieve goals, be they individual case plans or longer-term scheduling issues. It has four skill elements: planning and preparing, prioritizing, providing and maintaining standards, and identifying and using resources | |
Planning and preparing—developing in advance primary and contingency strategies for managing tasks, reviewing these tasks, and updating them if required to ensure that goals will be met; making necessary arrangements to ensure that plans can be achieved | |
Behavioral markers for good practice | Behavioral markers for poor practice |
Communicates plan for case to relevant staff | Does not adapt plan in light of new information |
Reviews case plan in light of changes | Does not ask for drugs or equipment until the last minute |
Makes postoperative arrangements for patient | Does not have emergency/alternative drugs available that are suitable for the patient |
Lays out drugs and equipment needed before starting case | Fails to prepare postoperative management plan |
Prioritizing—scheduling tasks, activities, issues, information channels, etc., according to importance (e.g., because of time, seriousness, plans); being able to identify key issues and allocate attention to them accordingly; and avoiding being distracted by less important or irrelevant matters | |
Behavioral markers for good practice | Behavioral markers for poor practice |
Discusses priority issues in case | Becomes distracted by teaching trainees |
Negotiates sequence of cases on list with surgeon | Fails to allocate attention to critical areas |
Conveys order of actions in critical situations | Fails to adapt list to changing clinical conditions |
Providing and maintaining standards—supporting safety and quality by adhering to accepted principles of anesthesia; following, when possible, codes of good practice, treatment protocols or guidelines, and mental checklists | |
Behavioral markers for good practice | Behavioral markers for poor practice |
Follows published protocols and guidelines | Does not check blood with patient and notes |
Cross-checks drug labels | Breaches guidelines such as minimum monitoring standards |
Checks machine at beginning of each session | Fails to confirm patient identity and consent details |
Maintains accurate anesthetic records | Does not adhere to emergency protocols or guidelines |
From Fletcher GCL, Flin R, Glavin RJ, Maran NJ: Framework for Observing and Rating Anaesthetists' Non-Technical Skills—Anaesthetists' Non-Technical Skills (ANTS) System v1.0, 22nd version. Aberdeen, Scotland, University of Aberdeen, 2003. |
ANTS makes no distinction between nontechnical skills needed for different clinical settings or clinical teams under the assumption that behavioral skills are completely generic and context free.
Not all nontechnical skills will be expected to be observed during every scenario or clinical situation. It is important to delineate between "required behaviors" in a given scenario and the generic set of behaviors. If a required behavior is not observed, the rating system advises that one rate that behavior as poor nontechnical skills, whereas the absence of a given nontechnical skill behavior otherwise has no particular meaning and should be rated as "not observed." As in all subjective nontechnical performance systems, training plus calibration of raters is necessary.
Fletcher and coworkers evaluated ANTS by using scripted videotapes taken in a realistic simulator.[96] Fifty consultant anesthetists were given 4 hours of training as raters and then rated eight test scenarios ranging from 4 to 21 minutes each. They rated performance at the level of specific elements and also at the broader level of category (see Table 84-8 ) by using a 4-point scale (they could also enter "not observed"). Three investigator anesthetists also rated the scenarios and agreed on a "reference rating" to be used as the benchmark for the study. In questionnaires the ANTS system was evaluated by raters as relatively complete, possibly with superfluous elements. Raters found that nontechnical skills were usually observable, and most thought that it was not difficult to relate observed behaviors to ANTS elements. The interrater reliability, accuracy, and internal consistency of the ratings were good to acceptable and are presented in Table 84-10 .
Though well performed, there are some caveats about these data. Because scripted videos were used as the basis for ratings, it is possible that the observability or scorability of the desired behaviors was greater than in actual simulation scenarios. The scripted scenarios were rather short (4 to 21 minutes), perhaps making it relatively easy to remember certain aspects of the performance and reducing the likelihood of raters encountering the problem of aggregating a score from fluctuating behaviors over time. The window of "accuracy" as "±1 point" on a 4-point scale seems to be rather wide.
On the whole, the ANTS system appears to be a useful tool to further enhance assessment of nontechnical skills in anesthesia, and its careful derivation from a current system of nontechnical assessment in aviation (NOTECHS) may allow for some interdomain comparisons.
Measure | Score | Range | Max/Min | Element/Category |
---|---|---|---|---|
Inter-rater agreement | Element level | 0.55–0.67 | Highest element | Identifying/using resources |
|
|
|
Lowest element | Recognizing/understanding |
|
Category level | 0.56–0.65 | Highest category | Task management |
|
|
|
|
Team working |
|
|
|
Lowest category | Situation awareness |
Accuracy relative to the reference rater's score | % within ±1 point | 88%–97% | Highest element | Identifying options |
|
|
|
Lowest element | Assessing capabilities |
|
Mean absolute deviation | 0.49–0.84, depending on elements | Highest element | Authority/assertiveness |
|
|
|
Lowest element | Provide standards |
From Fletcher R, Flin P, McGeorge R, et al: Anaesthetists' non-technical skills (ANTS): Evaluation of a behavioural marker system. Br J Anaesth 90:580–588, 2003. |
|