BJCP Scoresheet Guide

Objective

The BJCP Scoresheet Guide is a rubric for BJCP Judging Exam graders and a training tool for current and aspiring judges. While originally developed for the BJCP Beer Judging Exam, the methodology applies to each type of judging exam.

This document supplements the Exam Scoring Guide used by exam graders but should not be used as a recipe for finding faults when evaluating a scoresheet. Graders should not use the algorithm in a spreadsheet to enumerate every positive and negative aspect of a scoresheet. Rather, it should serve as a self-check that the assigned score correlates with the maximum judging level achievable based on that score.

Background

The BJCP Exam Directorate created this guide in response to member requests for a more transparent and consistent exam grading process. Quantitative metrics were created after scoresheets prepared by senior judges were evaluated and reviewed by active exam graders and BJCP Associate Exam Directors (ADs).

This guide was created using a data-driven approach that minimizes subjectivity (or perceived subjectivity) within the exam grading process. However, there is a learning curve for making an accurate and expedient evaluation of scoresheet quality. Fortunately, this guide reduces that curve for graders while making the grading process more transparent.

References

These supporting documents should be reviewed by graders, particularly new graders and graders who have not graded within the past twelve months.

  1. BJCP Exam Scoring Guide outlines the mechanics of the exam grading process and sets expectations for scoresheets representative of the different judging levels.
  2. Exam Grading Form (EGF) is a fairly complex Excel spreadsheet that streamlines the grading process, but can intimidate some graders. The Exam Director (ED) first enters component and total scores for each beer, which then calculates the scoring accuracy. The lead and second graders then enter their scores for Perception, Descriptive Ability, Feedback and Completeness/Communication into the appropriate cells in their respective tabs in the EGF. This produces a summary report that graders then use to compare and reconcile their scores, and feedback tables that the lead grader can paste into the Report to Participant. Exam graders should verify that they are using the current version of the EGF since some previous ones had errors that produced incorrect scores or tables.
  3. Report to Participant (RTP) is a Word document; most graders use the checkbox version of this form, but a free-format version is available. The ED provides each grader team with a template incorporating the exam code and other exam-specific information. The graders should review the checkbox fields before scoring the exam so they know what issues can be noted on the RTP. In particular, they will want to note:
    1. Characteristics that were not identified (either by omission, unfamiliarity, or insensitivity)
    2. Characteristics to which the examinee may be sensitive (e.g., diacetyl)
    3. Whether the examinee described the intensity of key characteristics of the sample
    4. Whether the examinee gave feedback on stylistic accuracy
    5. Whether the examinee suggested corrective actions for perceived technical issues
    6. Whether the examinee described characteristics in incorrect sections of the scoresheet (e.g., astringency as an Aroma component)
    7. Whether the feedback assumed anything about process or ingredients
    8. Whether excessive blank lines were in the scoresheet
    9. Whether the handwriting was difficult to read

BJCP Scoresheet Components

The five scoresheet components evaluated on BJCP Judging Exams are Scoring Accuracy, Perception, Descriptive Ability, Completeness and Communications, and Feedback. Guidelines for assessing examinee proficiency in each competency follows. 

While these components can be quantified to some extent, graders should evaluate each scoresheet as a whole, not as the sum of its parts. Graders should also understand that the criteria for a Master-level scoresheet is the 90% level rather than a perfect 100%; this fact was taken into account when the grading rubric was designed.

Scoring Accuracy

Scoring Accuracy for individual exam beers is calculated based on the absolute difference between the total score assigned by the examinee and the consensus total score from the proctors, according to the chart below. A variance of seven points or less is required to earn at least 60% of the possible scoring accuracy points for each exam beer. Thus, a passing score equates to the maximum expected scoring deviation between judges in a competition setting.

The consensus score is typically the value assigned by the proctors, but the Exam Director may adjust this baseline if proctor consensus score is inconsistent with the average score of exam participants and the beer description supplied by the exam administrator. While scoring adjustments are not common due to the requirement that all proctors be National, Master, or Grand Master judges, this system provides checks and balances to ensure that the baseline for Scoring Accuracy is not solely dependent on potentially-flawed input from proctors.

Observe in the Scoring Accuracy Rubric that 9 is the minimum number of points for scoring accuracy per beer, which is 45% of the maximum and well below the passing threshold of 60%. This is low enough to send a message to the examinee without being discouraging, and to reduce the impact on the overall exam score from wildly mis-scoring a single beer. This reasoning also applies to other components of the scoresheet unless there are no comments recorded.

Scoring Accuracy Rubric

Variance from Consensus Points per Beer
0 20
1.5 19
2.5 18
3.5 17
4.5 16
5.5 15
6 14
6.5 13
7 12
8 11
9 10
10 9

Perception

Perception is arguably the most difficult competency to evaluate because it requires condensing data from several sources into a concise representation of the characteristics of each exam beer. These data sources include not only the proctors’ scoresheets, but the scoresheets completed by the exam participants, the exam beer descriptions supplied by the exam administrator, and the BJCP style guidelines.

Perception Ability

Each exam beer results in a unique set of information that the graders need to distill into a finite number of essential components in the Perception rubric. As with Scoring Accuracy, it is important for BJCP Beer Judge Exam participants and graders to recognize that this competency is not evaluated solely on the comments supplied by the exam proctors. BJCP Beer Judge Exam graders should also be careful to not create a lengthy list of every characteristic independently noted by each proctor with the unrealistic expectation that all of them should be mentioned by the examinee.

Judges and graders need to be cognizant that the same characteristic can be perceived differently by different tasters. For example, suppose one proctor perceives a grainy character and another proctor comments that the beer has husky notes. An examinee who mentions either descriptor or a closely related one such as grassiness should be credited with accurate perception for this characteristic. These flavors are neighbors on the Beer Flavor Wheel, so describing the same characteristic differently should not be regarded as a perception error.

Many perception issues on scoresheets from newer judges will simply be errors of omission. For example, an examinee may omit the level of hop bitterness in a style where it is a desired attribute. Graders should not assume that this person cannot taste bitterness, but this mistake should be counted as both a perception error and also as a deduction for not providing a complete description. This situation often occurs when the examinee is not well-versed in the style being judged.

A process for evaluating perception skills that works well for many graders is:

  1. Review the BJCP Style Guidelines for the beers served during the exam.
  2. Review the information provided by the exam administrator on the background of the exam beers and make note of any property that might impact the judging. For example, was it a classic example of the style, was it doctored, or was it perhaps an aged sample?
  3. Begin with the scoresheet of the more experienced proctor and write down the key descriptors that were noted for the malt, hop and other components of the aroma. Try to capture three or primary and secondary characteristics that are at least low to moderate in intensity as well as anything that is out of place or missing in the style being judged.
  4. Write down key descriptors that were noted for the appearance, particularly those which may not be appropriate in the style, such as haze or being too dark or light in color.
  5. Write down the key descriptors that were noted for the malt, hop character, balance and other components of the flavor. Again, try to capture three or four primary and secondary characteristics that are at least low to moderate in intensity as well as anything that is out of place or missing in the style being judged.
  6. Write down any key descriptors that were noted for the mouthfeel.
  7. Read through the scoresheets of the other proctor(s) and circle any descriptors which are already captured on the list you created from the scoresheet from the lead proctor. Add any new descriptors that capture the perception of the other proctors(s).
  8. Narrow down the list of descriptors from the proctors to a subset of six to eight consensus characteristics that capture the essence of the beer. These are the primary and secondary characteristics that are expected to be noted on exemplary participant scoresheets. These descriptors can be incorporated into an NxM matrix for each exam beer, where N is the number of key descriptors and M is the number of examinees.
  9. Print out a hard copy of the matrix after you have defined the list of key characteristics for each exam beer. 
  10. As you review the participant scoresheets, mark the descriptors on the matrix that were correctly identified by the examinee with solid circles ● and mark any omitted descriptors with an X. If any characteristic is particularly important (e.g., hop character in an American IPA) or inappropriate (e.g., diacetyl in a German lager), it should be given double weight with two X or two ● symbols in the appropriate cell of the matrix. These symbols will give you a visual indication of the perception skills of each examinee for each exam beer.
  11. If the examinee noted a characteristic (such as astringency) that was not among the descriptors used by the proctors, the graders should make a comment in one of the blank grids at the bottom of the perception matrix for that examinee. This could be due to a perception error (a false positive), but it could also be incorporated into an updated rubric for that beer if the graders observe that a significant fraction of the other examinees noted that same characteristic. In this case, it would be appropriate to add astringency to the perception matrix and give credit to the examinees who noted it on their scoresheets, but without making a deduction on the scoresheets of the judges who were silent on that characteristic.

Perception Competencies

In general, judges should strive to achieve the following perception competencies that are evident in most scoresheets from Master and Grand Master judges:

  1. The primary components of the aroma described on the examinee’s scoresheet are consistent with the consensus of the proctors and/or over one-half of the examinees.
  2. There are no significant aroma perception errors, i.e., there are no aroma components perceived at low or higher levels that are not included in the consensus of the proctors and/or over one-half of the examinees.
  3. The descriptions of the color, clarity and head are consistent with the consensus of the proctors and/or over one-half of the examinees. It is also desirable to describe other characteristics of the head, but this is not essential information for the brewer.
  4. The primary components of the flavor described on the scoresheet are consistent with the consensus of the proctors and/or over one-half of the examinees.
  5. There are no significant flavor perception errors, i.e., there are no flavor components perceived at low or higher levels that are not included in the consensus of the proctors and/or over one-half of the examinees.
  6. The primary components of the mouthfeel described on the scoresheet are consistent with the consensus of the proctors and/or over one-half of the examinees.

Perception Rubric

RANK POINT RANGE PERCEPTION ELEMENTS
Master 18 – 20 No more than two elements of the aroma, appearance, flavor and mouthfeel differ from the consensus of the proctors and/or over one-half of the examinees
National 16 – 17 No more than three elements of the aroma, appearance, flavor and mouthfeel differ from the consensus of the proctors and/or over one-half of the examinees
Certified 14 – 15 No more than four elements of the aroma, appearance, flavor and mouthfeel differ from the consensus of the proctors and/or over one-half of the examinees
Recognized 12 – 13 No more than five elements of the aroma, appearance, flavor and mouthfeel differ from the consensus of the proctors and/or over one-half of the examinees
Apprentice 10 – 11 Five or more elements of the aroma, appearance, flavor and mouthfeel differ from the consensus of the proctors and/or over one-half of the examinees. The judge did however make an effort to judge the beer
Minimum 8 – 9 There are minimal comments on the scoresheet, but not enough information is provided to access perception skills

Descriptive Ability

Descriptive Ability is a measure of competency in using adjectives and phrases to identify the aroma, flavor and other characteristics of beer. Beer judges should be able to describe “how much” as well as “what kind of” aroma and flavor components are present. Scoresheets from Master and Grand Master judges typically have a proliferation of descriptive information beyond just “malty” or “hoppy,” and this is important because the map between the perceived characteristics and the BJCP Style Guidelines for any style in part determines the stylistic accuracy and technical merits of the beer.

As an example of adding layers of descriptive information, consider an American Pale Ale that an Apprentice judge might describe as having a “hoppy aroma.” A Recognized judge might add a descriptive adjective, stating that it has an “American hop aroma,” while a Certified judge may also note the intensity and write that it has a “moderate American hop aroma.” National judges typically add an additional layer of descriptive information, such as “moderate citrusy American hop aroma,” while Master judges could take that a step further and write “moderate citrusy American hop aroma, with notes of tangerine and grapefruit.”

The key to grading exams is to reward examinees for using descriptive terminology but to not set the bar too high for less experienced judges who are still learning beer vocabulary and how to translate the beer characteristics they perceive into written descriptions on the scoresheet.

Descriptive Ability Competencies

  1. Master level: At least five descriptive adjectives[1] or phrases[2] are used to describe the aroma of the beer.
  2. Master: At least three descriptive adjectives or phrases are used to describe the intensity of the aromatic components of the beer.
  3. Master level: At least three descriptive adjectives or phrases are used to describe the appearance of the beer.
  4. Master level: At least five descriptive adjectives or phrases are used to describe the flavor of the beer.
  5. Master level: At least three descriptive adjectives or phrases are used to describe the intensity of the flavor components of the beer.
  6. Master level: At least three descriptive adjectives or phrases are used to describe the mouthfeel of the beer.

[1] In this context, descriptive means adjectives more specific than generic words such as “malty,” “hoppy,” and “estery.”  Note that “No” or “None” are valid adjectives for characteristics which may be appropriate in the beer style being judged but are not present in that particular sample.

[2] In this context, the comment, “A medium spicy noble hop flavor emerges mid-palate,” has two descriptive adjectives for the hop flavor, one for the intensity and one descriptive phrase.

Descriptive Ability Rubric

There are a total of 22 opportunities to use descriptive adjectives or phrases.  Each judge rank has a target for the total number of adjectives and phrases used:

Rank Point Range Descriptive Adjectives and Phrases
Master 18 – 20 20 or more
National 16 – 17 18 – 19
Certified 14 – 15 15 – 17
Recognized 12 – 13 12 – 14
Apprentice 10 – 11 9 – 11
Minimum 8 – 9 Fewer than 9

Feedback

The BJCP Exam Scoring Guide states that “the brewer should receive useful and constructive feedback explaining how to adjust the recipe or brewing procedure in order to produce a beer that is closer to style. The comments should be constructive and consistent with the characteristics perceived by the examinee as well as with the score assigned to the beer.” This is a great synopsis on the purpose and desired elements of feedback on a beer scoresheet.

Note that feedback is not restricted to the Overall Impression section, so careful reading of the entire scoresheet is required to accurately assess this competency. For example, stating in the Flavor section that the balance is appropriate for the style provides feedback to the examinee.

Feedback should be specific in terms of how the process or recipe could be corrected rather than just pointing out that something needs to be addressed. Feedback also includes calling out when characteristics are permissible for the style, such as “low levels of yeast character” in an International Pale Lager.

Feedback Competencies

  1. (2 points) The examinee provides component and total scores for the beer. This is feedback allows the brewer to match the assigned score with the appropriate range in the scoring guide on the scoresheet.
  2. (1 point) The feedback is constructive, polite, and professional, generally including at least one supportive or positive comment about the beer.
  3. (1 point) The feedback is consistent with the score assigned to the beer.
  4. (2 points) The judge attempts to provide an accurate diagnosis of any stylistic flaws which impacted the perceived beer quality and/or score.
  5. (2 points) The judge attempts to provide an accurate diagnosis of any technical flaws which impacted the perceived beer quality and/or score.
  6. (2 points) The feedback given to the brewer is accurate with respect to the characteristics perceived by the judge. This is independent of whether the perceptions are accurate.
  7. (1 point) Feedback does not make any assumptions about the process or ingredients.
  8. (1 point) The feedback given in the Overall Impression section is consistent with comments in other sections of the scoresheet.
  9. (2 points) Observations on the aroma, appearance, flavor, and mouthfeel are noted in the appropriate sections of the scoresheet, with feedback either given in those sections or as part of the Overall Impression. For example, astringency should be discussed in the Mouthfeel section, not Flavor.

Feedback Rubric

There are a total of 14 points which may be earned by providing complete and accurate feedback; the targets for each judge rank are: 

Rank Point Range Feedback Elements
Master 18 – 20 13 or more
National 16 – 17 11 – 12
Certified 14 – 15 9 – 10
Recognized 12 – 13 7 – 8
Apprentice 10 – 11 Fewer than 7, but an effort was made to provide feedback
Minimum 8 – 9 There are minimal comments on the scoresheet, but not enough information is provided to assess feedback skills

Completeness and Communication

Completeness and Communication measure a judge’s ability to produce significant scoresheet content clearly and effectively communicating information about the beer to the brewer. This is, in theory, the easiest category in which to achieve a high score on the Beer Judging Exam, but it does require some attention to detail.

A complete master-level scoresheet not only requires addressing aroma, appearance, flavor, and mouthfeel components that are appropriate for a given beer style, but it also includes bookkeeping details such as identifying the beer style, noting the beer or entry number, providing the participant number or name of the judge, filling in the scores for the beer, and marking the appropriate checkboxes at the bottom and left side of the scoresheet.

Completeness and Communication Competencies

  1. (2 points) All applicable components[1] of the aroma listed on the scoresheet are addressed. Partial credit may be awarded.
  2. (2 points) All applicable components of the appearance listed on the scoresheet are addressed. Partial credit may be awarded.
  3. (2 points) All applicable components of the flavor listed on the scoresheet are addressed. Partial credit may be awarded.
  4. (2 points) All applicable components of the mouthfeel listed on the scoresheet are addressed. Partial credit may be awarded.
  5. (2 points) The Overall Impression section includes a comment on overall drinking pleasure associated with entry (1 point) and if the total score is less than 45, offers at least one suggestion for improvement (1 point).
  6. (4 points) Efficient use of vertical space: For perfect score, fewer than two blank lines remain on the completed scoresheet (these are typically only in the Appearance or Mouthfeel sections on a Master level scoresheet). Deduct 0.5 point for every blank line beyond 2, up to a maximum of 4 points deducted. For example, 5 blank lines would be a (5-2)/2 = 1.5 point deduction.
  7. (2 points) Numerical values are assigned for all component scores (1 point) and also for the total score (1 point).
  8. (1 point) The stylistic accuracy, technical merit, and intangibles boxes are checked. There is no partial credit here.
  9. (1 point) Descriptor definitions are checked when applicable (characteristics are either perceived at moderate or higher levels or are flaws in the style being judged). Partial credit may be awarded.
  10. (1 point) Comments are well organized and legible.
  11. (1 point) There is efficient use of horizontal space. A complete scoresheet typically has six to seven words per line with a font size and spacing that balances content and legibility. The objective here is to discourage judges from writing in an extremely large font to fill up the space on the scoresheet without conveying much information.

[1] In this context, the phrase “applicable components” means characteristics which are expected in the beer style being judged according to the BJCP Style Guidelines and/or to perceived characteristics which would be regarded as flaws in that particular style.

Completeness and Communication Rubric

There are a total of 20 completeness competency points available on a complete scoresheet; the targets for each judge rank are:

Rank Point Range completeness competencies
Master 18 – 20 18 or more
National 16 – 17 16 – 17
Certified 14 – 15 14 – 15
Recognized 12 – 13 12 – 13
Apprentice 10 – 11 Fewer than 12, but an effort was made to judge the beer
Minimum 8 – 9 There are comments on the scoresheet, but an unacceptably low level of effort was made to judge the beer

Summary

This BJCP Scoresheet Guide provides guidance for recognizing high-quality beer scoresheets like those produced by most Master and Grand Master judges. As with other BJCP materials, this is not a stand-alone document; it supplements other guidance on the BJCP website related to beer styles, beer judging and exam grading. If you have any comments or suggestions, please communicate them to the  BJCP Exam Directors.