AMRrules Specification

AMRrules Specification#

Rule Specification#

This section details how interpretive rules should be encoded in the AMRrules format. The current version of the AMRrules Specification is v1.0, for use with the AMRrules software package v1.0. The syntax for specifying different types of variants to which a rule should be applied is given in the next section.

On this page you will find the full list of fields (indicating which external databases or ontologies apply to each field, along with a description and guidance on defining/interpreting each field), as well as bespoke AMRrules-specific controlled vocabulary for some fields.

AMRrules template (Google sheet)#

The v1.0 rule specification is also available in a Google sheet that includes the AMRrules template, with allowed values encoded in drop-down menus, to facilitate rule curation.

Full list of fields#

The full list of fields is below, with guidelines on how each field should be specified and interpreted.

Download

Required fields	status	description	reference standard	reference link	guidance	rationale
ruleID	required	unique identifier for this rule {values listed in ‘organism subgroup codes’}	AMRrules	‘organism subgroup codes’ tab	Combination of 3-letter code (to indicate the organism subgroup who curated the rule, see tab ‘organism subgroup codes’) followed by 4-digit number (assigned by the subgroup).	Each rule needs a unique identifier, so that combinatorial rules can be defined as combinations of component parts. These need to be unique across the entire AMRrules set, but assigned and managed within the subgroups who are defining the individual and combinatorial rules.
txid	required	taxonomy ID of the species that this rule applies to	NCBI Taxonomy	https://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/	There should be one row per species/marker combination, for clarity of interpretation and parsing the rules files, and for clarity of recording evidence for each rule and its relevance to a given species. The primary taxonomy identifier for AMRrules is the NCBI Taxonomy, this field should contain a valid taxid for a species or genus. Note these identifiers are stable, even when the species or genus name changes.
organism	required	species that this rule applies to, normally a species {scientific name}	NCBI Taxonomy	https://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/	Indicate the name of the organism the rule applies to. Include the prefix ‘s_’, ‘g_’ etc to indicate the taxonomic level (species, genus). E.g. ‘s_Klebsiella pneumoniae’ indicates species Klebsiella pneumoniae. ‘g_Klebsiella’ indicates genus Klebsiella. This should usually be the value of the ‘current name’ field associated with the taxid in the NCBI Taxonomy, however if there are issues with the current name, e.g. if it does not match the organism nomenclature used by EUCAST to define a breakpoint, you may use a different organism name.
gene	required	name of the gene that this rule applies to {node ID, or gene symbol if node ID not available} OR a logical expression describing a combination of other ruleIDs {logical}	refgene, NCBI Gene Hierarchy	https://www.ncbi.nlm.nih.gov/pathogens/refgene/ https://www.ncbi.nlm.nih.gov/pathogens/genehierarchy, https://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/latest/ReferenceGeneHierarchy.txt	If the gene is in the NCBI hierarchy, specify the node ID. If it is not in the NCBI hierarchy, indicate the gene or allele name in NCBI refgene. If it is not in NCBI refgene, use the gene symbol (e.g. ‘mexB’) - if the gene is present in CARD, use the gene symbol present there, otherwise try to identify the most suitable gene symbol and be sure to include refseq and ARO accessions for clarity). For combinatorial rules, this should be a logical expression based on other single-marker rules, which when evaluated as TRUE means this rule should be applied. E.g. “ECO001 & ECO002” means this rule should be applied when both rule ECO001 and rule ECO002 apply (i.e. when the markers defined by these rules are both detected). “(ECO001 \| ECO003) & ECO002” means this rule should be applied when either one or both of rules ECO001 or ECO003 apply and ECO002 also applies. Syntax should use ‘&’ for logical AND and ‘\|’ for logical OR. If the rule is intended to convey an unexplained mechanism of expected resistance, gene should be set to ‘unknown’, with context ‘core’, phenotype ‘wildtype’, clinical category ‘R’, and breakpoint standard ‘EUCAST Expected Resistant Phenotypes vX (year)’ (all gene identifier fields should be ‘-’, and the curation note should explain the reasoning). If the rule is intended to convey an expected resistance due to lack of the drug target, the same applies but the gene should be set to ‘none’.
nodeID	uniquely identify the gene using AT LEAST ONE NCBI accession: nodeID (preferred) or refseq protein or GenBank protein or HMM (for protein-coding genes); or nucleotide accession with coordinates (for nucleotide variants e.g. 23S or promoter regions)	name of the gene that this rule applies to {node ID in NCBI gene hierarchy}	NCBI Gene Hierarchy	https://www.ncbi.nlm.nih.gov/pathogens/genehierarchy, https://ftp.ncbi.nlm.nih.gov/pathogen/Antimicrobial_resistance/AMRFinderPlus/database/latest/ReferenceGeneHierarchy.txt	Can be a leaf node or internal node in the NCBI Gene Hierarchy. Where a rule applies to multiple leaf nodes and/or all descendants of an internal node, it is recommended to specify one row per node, and provide evidence for each one (unless the number of leaf nodes is large and all have the same categorization and evidence).
protein accession		refseq protein accession for the gene this rule applies to	refseq or GenBank protein sequence accession	https://www.ncbi.nlm.nih.gov/refseq/ https://www.ncbi.nlm.nih.gov/genbank/	Indicate the refseq (preferred) or Genbank protein accession for the most appropriate protein sequence. Wherever possible this should match that used in the NCBI Pathogens refgene database.
HMM accession		HMM accession for the gene this rule applies to (suitable for internal nodes in the NCBI Gene Hierarchy)	HMM accession	https://www.ncbi.nlm.nih.gov/pathogens/refgene/ https://www.ncbi.nlm.nih.gov/pathogens/hmm/	Indicate the HMM accession for the most appropriate protein sequence; this is mainly relevant for internal nodes in the NCBI Gene Hierarchy.
nucleotide accession		nucleotide sequence accession and coordinates defining the gene this rule applies to (suitable for e.g. rRNA genes or promoter variants)	nucleotide sequence accession	https://www.ncbi.nlm.nih.gov/pathogens/refgene/ https://www.ncbi.nlm.nih.gov/refseq/	For variants defined by nucleotide sequences not proteins (e.g. 23S, or promoter mutations), indicate the nucleotide sequence accession and coordinates of the relevant gene within that sequence, in the format: accession:start-stop (for genes encoded on forward strand) and accession:stop-start (for genes encoded on reverse strand). The refgene AMR database gives the relevant accessions and coordinates for AMR variants included in AMRfinderplus.
ARO accession	optional	Antiboitic Resistance Ontology (ARO) identifier for the gene this rule applies to	ARO gene ID	https://card.mcmaster.ca	Optional. Note AROs are not associated with specific sequences, so are insufficient to define a rule.	Useful for harmonization with CARD (for drug dictionary and other things) and for annotation of genotypes generated using other DBs/tools based on CARD (which can be mapped to ARO using argNorm tool)
mutation	required (set to ‘-’ if non-specific)	specific mutation in this gene to which the rule applies	HGVS (with some AMRrules modifications)	https://hgvs-nomenclature.org/stable/ interpretAMR/AMRrulesCuration	Indicate the mutation relative to the gene in ‘gene’. Typically this will be a protein mutation (in the format ‘p.Ser83Tyr’) or a nucleotide mutation in a coding sequence (in the format ‘c.25T’). For more complex examples see interpretAMR/AMRrulesCuration
variation type	required	explanation of the type of variation this rule applies to {values listed in ‘variation type’ tab}	AMRrules	‘variation type’ tab	Indicate the type of variation this rule applies to. Allowed values are in the ‘variation type’ tab. Most common examples are ‘Gene presence detected’, ‘Protein variant detected’, ‘Nucleotide variant detected’ or ‘Combination’.	Based on the ‘variant type’ column in hAMRonization, helps to clarify the nature of the variation to which the rule applies.
gene context	required	indicates the genomic context for this gene in this species {core, acquired, unknown}	AMRrules		Indicate the genomic context of this gene within this species, i.e. whether the gene is ‘core’ or ‘acquired’. Working definition of ‘core’ is: present (>90% identity, >90% length) in the chromosome of >95% of genomes of this species and at least >95% those that have wildtype AST profiles. Note that a resistance-associated mutation in a core gene (e.g. Ser83Phe in chromosomal GyrA) should be coded as ‘core’. A mutation in an acquired gene should be coded as ‘acquired’.
drug	optional (need drug OR drug class)	name of drug for which the rule applies {ARO term}	ARO term	https://card.mcmaster.ca	Indicate the name of the drug that the rule applies to. Where rules apply to multiple drugs, they should be specified in separate rows (i.e. as separate rules), with individual references for each gene-drug combination. Alternatively, if the rule applies to all drugs in a defined drug class, leave this blank and indicate the ‘drug class’ field instead. Allowed values are all CARD ARO entries of type ‘antibiotic’ (which includes disinfectant agents) or ‘adjuvant’ (which includes inhibitors).
drug class	optional (need drug OR drug class)	name of drug class for which the rule applies (ONLY if the rule is consistent across the entire drug class) {ARO term}	ARO term	https://card.mcmaster.ca	Indicate the name of the drug class that the rule applies to. This field should be completed ONLY IF there is evidence that the gene has activity against all drugs in the class. Note that CARD defines five classes of cephalosporins: first-generation cephalosporin, second-generation cephalosporin, third-generation cephalosporin, fourth-generation cephalosporin, other cephalosporin and penam.	Useful as there are likely to be a lot of determinants that apply across a whole drug class.
phenotype	required	indicates whether members of this species with this gene are expected to fall in the wildtype or non-wildtype part of the reference MIC distribution; this is equivalent to identifying whether the MIC is expected to fall below or above the ECOFF, if one is defined {wildtype, nonwildtype}	EUCAST distribution	mic.eucast.org	Indicates whether isolates of this species, with this gene, are considered to have a wildtype or nonwildtype susceptibility phenotype, equivalent to being below vs above the MIC ECOFF if one is defined. If the gene is a core gene, the expected phenotype should generally be ‘wildtype’, unless the rule refers to a specific variant of the core gene for which there is evidence of a nonwildtype phenotype.
clinical category	required	expected clinical category for members of this species with this gene {S, I, R, NS}	EUCAST	https://www.eucast.org/clinical_breakpoints https://www.eucast.org/expert_rules_and_expected_phenotypes/expected_phenotypes	Indicates the categorization associated with this gene, for members of this species {S, I, R, NS} using the breakpoint standard indicated. If the drug this rule applies to appears on the EUCAST Expected Resistances list for this organism, and the gene is a core gene, the expected phenotype should be ‘wildtype’ and the category should be ‘R’. If the gene is identified as a core gene but the drug does not appear on the EUCAST Expected Resistances list for this organism, and there are no EUCAST Expert Rules recommending reporting as R, there should be strong evidence from literature and/or matched genome/phenotype data to support the assignment of ‘R’. Note that ‘NS’ is only an allowed value for CLSI, not EUCAST, and has a specific meaning that is only relevant when there is a breakpoint for S but not for I or R.
breakpoint	required	indicate the breakpoint that was used to define the expected phenotype category (note this is ‘not applicable’ if rule is specified for a drug class rather than a single drug)	EUCAST	https://www.eucast.org/clinical_breakpoints https://www.eucast.org/expert_rules_and_expected_phenotypes/expected_phenotypes	Give the breakpoint used to define the indicated category for the specified drug (please enter ‘not applicable’ if rule applies to a drug class). E.g. for categorization as ‘R’ based on MIC, breakpoint should be given in the form ‘MIC >X [units]’ or ‘disk zone <X mm’; for categorization as ‘S’, use ‘MIC <=X [units]’ or ‘disk zone > X mm’; for categorization as ‘I’ use ‘MIC range, >X and <= Y [units]’. For bug/drug combinations with wildtype ‘I’, the S breakpoint may be arbitrarily set to 0.001 (MIC) or 50 (disk); in this case it is inappropriate to define the breakpoint for ‘I’ as a range, e.g. ‘MIC <=X’ rather than ‘MIC range, >0.001 and <=X [units]’. If the rule is defined on the basis of an ECOFF, indicate the threshold used in the same manner as for a breakpoint. If it is an Expected (intrinsic) resistance, the breakpoint is irrelevant (and usually undefined) so enter ‘not applicable’. If the rule applies to a drug class, enter ‘not applicable’, but consider whether it would be more informative to set specific rules for individual drugs. If there is no breakpoint or ECOFF, enter ‘not available’.	As genotype interpretations are defined relative to clinical categorizations, and there are multiple sources for these and they are updated continuously, we need to record which standard was used to define each rule. This also facilitates accommodating multiple breakpoints for same bug-drug, using different standards or clinical indications (e.g. EUCAST sometimes has different breakpoints for IV vs oral, or for treatment of specific syndromes). This approach also facilitates using ECOFF in the absence of a breakpoint; facilitates specifying rules defined against other standards such as CLSI or veterinary standards
breakpoint standard	required	indicate the AST phenotyping standard used to interpret this rule	EUCAST	https://www.eucast.org/clinical_breakpoints https://www.eucast.org/expert_rules_and_expected_phenotypes/expected_phenotypes	In the format ‘[Name] [version] ([year])’, e.g. ‘EUCAST v15.0 (2025)’ or ‘ECOFF (May 2025)’ (as ECOFFs at mic.eucast.org are not versioned, indicate month and year). If it is an Expected (intrinsic) resistance, there will not typically be a breakpoint, in this case indicate the version of the expert rules e.g. ‘EUCAST Expected Resistant Phenotypes v1.2 (2023)’ or ‘EUCAST Salmonella Expert Rules v3.2 (2019)’. If the rule is defined based on an informal breakpoint defined in a paper, indicate the PubMed identifier for the relevant paper in this field as: ‘PMID xxx’
breakpoint condition	optional	indicate the specific conditions for this breakpoint, if relevant (e.g. meningitis, uncomplicated UTI, iv, oral)	EUCAST	https://www.eucast.org/clinical_breakpoints	If different breakpoints are defined for different conditions, indicate the conditions relevant to the breakpoint used to define this rule. For example different breakpoints may be given for different infection types (meningitis, uncomplicated UTI) or therapy types (iv, oral). If all breakpoints are the same, or all result in the same interpretation for this gene, it is preferable to specify a single rule without conditions. If multiple breakpoints are defined, and the interpretation is different using the different breakpoints, it is preferable to define separate interpretive rules for each condition. If the stated purpose of a condition-specific breakpoint is to screen for likely resistance mechanisms (e.g. ciprofloxacin for meningitis), or to enforce reporting of all isolates as ‘I’ for a specific condition, then a condition-specific rule is not needed as this is better managed in downstream reporting logic. Wherever possible, use the controlled vocabulary in sheet (see dropdown menu and ‘breakpoint condition values’ tab), which includes all such terms used in the EUCAST or CLSI 2025 breakpoints table.
PMID	required	PubMed identifier/s for literature supporting the rule (comma-separated list)	PubMed	https://pubmed.ncbi.nlm.nih.gov/	Provide PubMed identifier for the ‘best’ peer-reviewed research article/s providing specific evidence that this gene is associated with this phenotype category for this drug in this species (separate multiple entries with ‘, ‘). Literature demonstrating evidence in other species, or related drugs, should not be included.
evidence code	required	indicate the nature of the evidence that supports the rule {ECO code; select from controlled list, multiple selections allowed in comma-separated list}	ECO	https://www.evidenceontology.org/	Indicate the nature of the evidence supporting the rule. More than one can be listed, please include all forms of evidence available to support the rule (separate multiple entries with ‘, ‘). In principle any codes in the Evidence and Conclusion Ontology can be used, but in most cases it will be most appropriate to choose from the subset listed in the ‘evidence codes’ tab of this spreadsheet (also provided as a dropdown selection in the main data entry tab of this spreadsheet). The source for each type of evidence should be given in the ‘PMID’.	If you want to use an ECO code not yet included in the dropdown list, please let ESGEM-AMR chairs know so that we can add it to the specification as others may find this helpful also. If you feel something is missing from ECO, please also let us know so that we can discuss, and potentially work together to request the addition of new terms to the ontology.
evidence grade	required	expert curators’ overall assessment of the level of support provided by all evidence considered {high, moderate, low, very low}	AMRrules		Indicate the expert curators’ overall assessment of the level of support provided by all evidence considered.	There will often be a need to specify a rule for which the evidence is not yet conclusive. It is important to flag these and give some indication of what is lacking. Allowed terms and their definitions are given in the ‘evidence grades’ tab. Note that if no experimental evidence is available, the rule should NOT be graded as ‘high’, even if there is good evidence of statistical association between genotype and phenotype in natural populations. (Future updates will include additional fields to record quantitative details of genotype/phenotype associations.)
evidence limitations	optional	expert curators’ assessment of the key limitations of the available evidence {values listed in ‘evidence grades’ tab}	AMRrules	‘evidence grades’ tab	This should be completed for all rules with evidence grades other than ‘high’. Use the values listed in the ‘evidence grades’ tab (separate multiple entries with ‘, ‘).
rule curation note	optional	short explanatory note describing the mechanism and/or reasoning for the rule	free text		Highly recommended to complete for all core genes, or combinatorial rules, to explain why this results in susceptibility or resistance.

Download

Controlled vocabularies#

Variation type#

Specifies the nature of the type of variation to which the rule applies. Based on the ‘variant type’ column in the hAMRonization AMR detection specification scheme, with additional terms from the NCIT ontology.

Values allowed in variation type column	The specified AMRrule applies if…	Notes or source
Gene presence detected	…the gene specified in the ‘gene’ column is detected as being present.	hAMRonization
Protein variant detected	…the protein variant specified in the ‘mutation’ column is detected in the specified ‘gene’.	hAMRonization
Nucleotide variant detected	…the nucleotide variant specified in the ‘mutation’ column is detected in the specified ‘gene’.	hAMRonization
Promoter variant detected	…the promoter variant specified in the ‘mutation’ column is detected in the specified ‘gene’.	NCIT:C190205
Inactivating mutation detected	…the gene specified in the ‘gene’ column is inactivated by any type of mechanism (e.g. frameshift, internal stop, deletion, truncation), in the amino acid range specified in the ‘mutation’ column (or anywhere in the gene, if the ‘mutation’ column is blank i.e. ‘-‘).	NCIT:C178119
Gene truncation detected	…the gene specified in the ‘gene’ column is truncated, within the amino acid range specified in the ‘mutation’ column.
Gene copy number variant detected	…the gene specified in the ‘gene’ column is detected in at least the minimum number of copies specified in the ‘mutation’ column.	NCIT:C189957
Nucleotide variant detected in multi-copy gene	…the gene specified in the ‘gene’ column is a gene that is normally present in multiple copies (e.g. rRNA genes), and the nucleotide variant specified in the ‘mutation’ column is detected in at least the minimum number of alleles specified in the ‘mutation’ column.
Low frequency variant detected	…the reads data supports a mixed population, for which a minimum fraction specified in the ‘mutation’ column support the presence of the nucleotide variant specified in the ‘mutation’ column being present in the gene specified in the ‘gene’ column (currently intended for TB only).
Combination	…the logical expression in the ‘gene’ column, which expresses a combination of component rules identified by their ‘ruleID’, evaluates as true.

Download

Evidence codes#

Specified using the Evidence and Conclusion Ontology (ECO), this field indicates the nature of the evidence supporting the rule. More than one can be listed, and the field should include all forms of evidence available to support the rule (multiple entries separated with ‘, ‘).

Any ECO codes can be used, but curators are encouraged to choose from the subset listed here, which covers the types of evidence typically available to support resistance mechanisms in bacteria. Note the literature source for each type of evidence noted here should be indicated in the PMID field.

Download

Evidence grade#

This fields indicates the expert curators’ overall assessment of the level of support provided by all evidence considered. It is modelled on the GRADE (Grading of Recommendations, Assessment, Development, and Evaluation) approach to assessing the certainty of evidence to guide decision making in healthcare.

AMRrules aims to provide rules to interpret all markers that have been detected in a given species, but in many cases the evidence can be quite limited. The evidence grade field gives users an overall guide to the strength of evidence, and the evidence limitations field highlights what kind of evidence is lacking.

Note that if no experimental evidence is available, the rule should NOT be graded as ‘high’, even if there is strong evidence of statistical association between genotype and phenotype in natural populations. (Future updates to the rule specification will include additional fields to record quantitative details of genotype/phenotype associations.)

There are four possible ‘grades’ in AMRrules, these are listed below with guidance on what they mean in the context of AMRrules (modelled on the GRADE framework).

Evidence grade	What it means	Use this when
high	The curators are confident in the categorisation, and believe that the likelihood that the effect will be substantially different from this is low.	Experimental evidence provides strong support for the interpretation of this gene/variant in this species for this drug. If there is statistical geno/pheno evidence available, it supports this interpretation.
moderate	The curators believe that the categorisation most likely reflects the true effect, and the likelihood that the effect will be substantially different is moderate.	There is good evidence to support the interpretation of this gene/variant in this species for this drug, but there is some uncertainty (e.g. lack of direct evidence in this organism although evidence from related organisms is convincing; or there is good statistical geno/pheno evidence but no experimental evidence of mechanism).
low	The curators believe that the categorisation might not reflect the true effect, and the likelihood that the effect will be substantially different is high.	There is evidence supporting a link between this gene/variant and this drug, but the interpretation in this species is unclear (e.g. lack of evidence in this organism or related organisms; statistical geno/pheno evidence is lacking, or does not support a clear effect; or there are trustworthy but conflicting reports).
very low	The curators have no confidence that the categorisation reflects the true effect, and the likelihood that the effect will be substantially different is high.	There is no trustworthy evidence as to the effect in this organism, or there is conflicting evidence. The categorical interpretation is based on assumptions made from unrelated organisms and may be wrong.

Download

Evidence limitations#

This fields indicates highlights what kind of evidence is lacking to support interpretation of this marker in this organism. All rules with an evidence grade other than ‘high’ should have at least one limitation recorded.

Evidence limitations
lacks evidence for this species
lacks evidence for this genus
lacks evidence for this allele
lacks evidence of the degree to which MIC is affected
low clinical relevance
unknown clinical relevance
statistical geno/pheno evidence but no experimental evidence
conflicting evidence
lacks formal breakpoints
lacks evidence for this drug

Download

Breakpoint condition#

EUCAST, CLSI and others sometimes assign different breakpoints for different clinical conditions, infection sites, or drug delivery routes (e.g. intravenous vs oral). In such cases, this field is used to indicate which specific breakpoint the rule was defined against. This will often be blank, indicating that the rule is not specific to any particular type of infection or delivery route.

The list of allowed terms is taken from the EUCAST and CLSI 2025 Breakpoints, sourced from the digitized versions in the AMR R package using this command: ` clinical_breakpoints %>% filter(guideline=="CLSI 2025" | guideline=="EUCAST 2025") %>% group_by(site) %>% count() `

Endocarditis

Endocarditis with combination treatment

Extraintestinal

Intravenous

Intravenous, Oral

Investigational agent

Liposomal, Inhaled

Mammary gland

Mastitis

Meningitis

Meningitis, Endocarditis

Metritis

Non-endocarditis

Non-meningitis

Non-meningitis, Non-endocarditis

Non-pneumonia

Oral

Oral, Infections originating from the urinary tract

Oral, Other indications

Oral, Uncomplicated urinary tract infection

Parenteral

Pneumonia

Prophylaxis

Respiratory

Respiratory, genital

Respiratory, soft tissue

Screen

Skin

Skin, respiratory

Skin, soft tissue

Skin, soft tissue, respiratory

Skin, soft tissue, respiratory, uncomplicated urinary tract infection

Skin, soft tissue, respiratory, uncomplicated urinary tract infection, genital

Skin, soft tissue, uncomplicated urinary tract infection

Skin, uncomplicated urinary tract infection

Uncomplicated urinary tract infection

Uncomplicated urinary tract infection, Investigational agent

Wounds, abscesses

Wounds, abscesses, uncomplicated urinary tract infection

Download

Organism code#

Each rule is assigned a ruleID, which starts with a 3-letter code to indicate the organism subgroup who curated the rule. The list of available organism subgroup codes is below.

Organism	Prefix for ‘ruleID’
Achromobacter xylosoxidans	AXY
Acinetobacter	ACI
Aeromonas	AER
Anaerobes	ANA
Bordetella	BOR
Brucella	BRU
Burkholderia cepacia complex	BCC
Burkholderia pseudomallei	BPM
Campylobacter jejuni	CAJ
Campylobacter fetus	CAF
Campylobacter coli	CAC
Chryseobacterium indologenes	CIN
Corynebacterium diphtheriae	CDP
coli/Shigella	ECO
Edwardsiella	EDW
Enterobacter cloacae complex	ECC
Enterococcus	ENT
Haemophilus influenzae	HIN
Helicobacter	HEL
Klebsiella pneumoniae	KPN
Legionella	LEG
Listeria	LIS
Mycobacterium non-Tb	MYC
Mycobacterium tuberculosis	MTB
Mycoplasma pneumoniae	MPN
Neisseria commensals	NEI
Neisseria gonorrhoeae	NGO
Neisseria meningitidis	NMN
Pasteurella	PAS
Proteus mirabilis	PRM
Pseudomonas aeruginosa	PSA
Salmonella	SAL
Serratia	SER
Shewanella	SHW
Staphylococcus aureus	STA
Staphylococcus epidermidis	STE
Staphylococcus saprophyticus	STS
Stenotrophomonas maltophilia	STM
Streptococcus	STR
Treponema	TRE
Vibrio	VIB
Yersinia	YER

Download

Variant Specification#

The AMRrules specification needs to be able to encode interpretive rules for all types of genetic variants relevant to AMR in bacteria.

In 2024, the ESGEM-AMR working group collated and reviewed examples of known variants across diverse bacteria, and identified the following types of AMR variants:

Gene presence detected
Amino acid substitution or insertion
Nucleotide substitution or insertion
Gene truncated (loss of function)
Mutation in promoter region (substitution, deletion or insertion, including IS)
Gene copy number changes
Mutations in multi-copy genes (e.g. 23S rRNA)
Low frequency variants (i.e. heterozygosity)

It was concluded that all such variants could be adequately addressed using a combination of three fields:

gene
mutation (based on HGVS syntax, with some modifications)
variation type (based on hAMRonization field Genetic Variation Type, with some additions).

Specific examples of each AMR variant are shown below, with proposed mutation syntax and variation types for each (note that other fields required for rule definition, like organism, refseq accession, context, PMID are not included here for simplicity, as they are not essential to illustrate how to define a specific kind of variation):

ID	gene	mutation	variation type	drug	category
KPN0001	blaSHV	`-`	Gene presence detected	ampicillin	wt R
KPN0002	gyrA	p.Ser83Tyr	Protein variant detected	ciprofloxacin	nwt I
KPN0003	parC	p.Ser80Ile	Protein variant detected	ciprofloxacin	nwt I
KPN0004	ompK36	c.25C>T	Nucleotide variant detected	meropenem	nwt S
KPN0005	ompK36	p.114_115insGlyAsp	Protein variant detected	meropenem	nwt I
KPN0006	mgrB	p.(1_100)	Gene truncation detected	colistin	nwt R
ECO0001	ampC	c.-11C>T	Promoter variant detected	ceftriaxone	nwt R
ECO0002	ampC	c.-14_-13insGT	Promoter variant detected	ceftriaxone	nwt R
ACI0001	blaOXA-58	c.(-35_1)ins[ISAba125:inv]	Promoter variant detected	ceftriaxone	nwt R
NGO0002	23S rDNA	c.[2045A>G][3]	Nucleotide variant detected in multi-copy gene	azithromycin	nwt R
ECO0003	blaTEM	c.[3]	Gene copy number variant detected	piperacillin+tazobactam	nwt R
MTC0001	gyrA	p.[Ala94Gly][0.13]	Low frequency variant detected	ciprofloxacin	nwt R

Syntax for mutations#

Syntax for ‘mutation’ column follows HGVS, including:

Gene and protein start sites are position 1 (there is no position 0)
Ranges are specified using x_y; for insertions the coordinates are specified as inclusive_exclusive, otherwise ranges are inclusive_inclusive
Unknown ranges are specified with parentheses, (x_y). E.g. p.(1_100)insGlyAsp means an insertion of 2 amino acids (Gly and Asp) anywhere between codons 1 and 100 inclusive (as opposed to a replacement of amino acids 1 through 100 with GlyAsp, which would be expressed as p.1_100delinsGlyAsp).
1. Coordinates are specified relative to the reference sequence of a protein (p) or coding sequence (c)
Coordinates upstream of coding sequence are specified relative to the start site, with a hyphen, e.g. c.-35 indicates 35 bp upstream
Mutations in protein and DNA are specified differently, e.g.
1. p.Ser83Tyr: change to protein sequence from Ser to Tyr at codon 83
2. c.25C>T: change to nucleotide coding region from C to T at nucleotide position 25
Stop codons are specified (in both DNA and protein variants) as Ter
Following IUPAC, X signifies any amino acid, N signifies any DNA base
^ (caret) is used as “or”, e.g. p.(Gly719Ala^Ser)
The letters inv indicate the inverse (i.e. reverse complement) of a sequence
Repeat sequences are specified as sequence[N] where N is the number of copies of the repeat

AMRrules-specific syntax#

AMRrules requires amino acids be specified as three-letter codes (whereas HGVS allows single-letter or three-letter codes)
- Accordingly, the STOP codon should be specified as ‘Ter’ rather than ‘*’
In HGVS you must specify the reference sequence explicitly using a sequence accession, followed by : and then the mutation, e.g. NF000285.3:p.Gly238Ser. In AMRrules the gene is specified in separate column/s (‘gene’, ‘refseq accession’, ‘ARO accession’) and should not be repeated in the mutation column. So the above rule should be coded as:
- gene = blaSHV
- node = blaSHV
- refseq accession = NF000285.3
- ARO accession = ARO:3000015
- mutation = p.Gly238Ser
In AMRrules, insertion sequences (IS) should be labelled with their IS name as per ISfinder, as many do not have their own sequence accessions in refseq. E.g. insertion of ISAba125 should be specified as ins[ISAba125], and insertion in reverse orientation to the gene to which the rule applies should be specified as ins[ISAba125:inv].
In AMRrules, rules intended to apply when a gene is present in a minimum of N copies can be specified using the [N] syntax to indicate the minimum repeat/copy number of the whole coding sequence, as c.[N].
1. Note this syntax does not convey any information about the location of the copies, i.e. c.[2] simply indicates that there are at least 2 copies of the gene detected in the genome, whether they are tandem repeats or in different replicons such as one in the chromosome and one in a plasmid.
In HGVS, the presence of multiple alleles (i.e. heterozygous) is specified as a colon-separated list of allelic variants e.g. [allele1];[allele2].
In AMRrules, rules that apply to variation in a multi-copy gene can be specified in this way, with each allele explicitly stated.
1. Alternatively if the rule applies when a minimum of N copies of the gene carry the mutation (e.g. mutation in ≥3 copies of 23S rRNA resulting in resistance to azithromycin), this can be abbreviated using the [N] syntax to indicate the minimum repeat/copy number, as c.[allele][N] or p.[allele][N], e.g. c.[2045A>G][3].
In AMRrules, rules that apply to ‘low frequency variants’, i.e. when a minimum fraction of reads, P, support presence of the allelic variant in a sequenced population, the minimum fraction can be specified by extension of the syntax for copy number, as [X]. E.g. p.[Ala94Gly][0.13] (example from the Mycobacterium tuberculosis gyrA gene).
1. To put another way, in AMRrules the repeat syntax [X] is interpreted as a minimum copy number if X is an integer, and as a minimum read fraction if X is a double/float between 0 and 1.

Explanation of ‘mutation’ syntax relevant to known AMR variants#

p.Ser83Tyr: change to protein sequence from Ser to Tyr at codon 83
c.25C>T: change to nucleotide coding region from C to T at nucleotide position 25
p.114_115insGlyAsp: change to protein sequence, with an insertion of amino acids Gly and Asp between codons 114 and 115
p.(1_100): truncation (of any kind) anywhere in the first 100 amino acids of the protein sequence
c.-11C>T: change to nucleotide sequence from C to T, 11 bases upstream of the start site for the gene.
c.-14_-13insGT: insertion of nucleotides GT between positions -14 and -13, upstream of the start site of the gene
c.(-35_1)ins[ISAba125:inv]: insertion of ISAba125, in reverse orientation (:inv), anywhere between 35 bases upstream of the start site, and the start of the gene coding sequence
c.[2045A>G][3]: substitution of A to G at position 2045 of the gene. This mutation must occur in minimum 3 copies
c.[3]: gene needs to be present with a minimum of 2 copies
p.[Ala94Gly][0.13]: protein variant is present in >13% of reads

Combinatorial rules#

Combinatorial rules are defined using logical expressions in the ‘gene’ column, where the objects of the expression are rule identifiers (ruleID) that can be used as shorthand labels for the variants defined by gene:mutation (variant type) specified in the corresponding rules. The variation type should be specified as ‘Combination’.

Each rule must have a unique ruleID, assigned by the curating subgroup and prefixed with a 3-letter code that identifies the subgroup.
E.g. in the table below, KPN0008 can be used in a logical expression in the ‘gene’ column to demarcate gyrA:p.Ser83Tyr, and KPN0013 can be used to demarcate qnr (Gene presence detected).
So, the combination of these two variants can be specified as KPN0008 & KPN0013, which expands to gyrA:p.Ser83Tyr & qnr (Gene presence detected).

Rules must be specified explicitly if the effect of the combination is NOT the same as the ‘most resistant’ (in terms of exceeding breakpoints, R > I > S; or deviation from wildtype, nonwildtype > wildtype) predicted category of the component rules. E.g. in the table below:

The individual rules KPN0008 and KPN0009 solo each have expected category ‘nonwildtype I’, but in combination we expect ‘nonwildtype R’, so we need to specify the rule for the combination KPN0008 & KPN0009.
The expected category for genomes meeting rule KPN0002 (i.e. carrying core gene oqxA, => wildtype S) in addition to rule KPN0008 (i.e. with an acquired gyrA mutation, => nonwildtype I) is nonwildtype I. This is the same, not greater, than one of the component rules (KPN0008) so we do not need to specify the combination explicitly.

Note this means the combination must be specified explicitly if the combined effect is LESS resistant than the ‘most resistant’ component, e.g. in this example from TB, deletion in one gene renders the resistance mutation in another gene irrelevant so the combination must be specified.

ID	gene	mutation	variation type	drug	category
KPN0002	oqxA	`-`	Gene presence detected	ciprofloxacin	wt S
KPN0008	gyrA	p.Ser83Tyr	Protein variant detected	ciprofloxacin	nwt I
KPN0009	parC	p.Ser80Ile	Protein variant detected	ciprofloxacin	nwt I
KPN0013	qnr	`-`	Gene presence detected	ciprofloxacin	nwt I
KPN0051	KPN0008 & KPN0009	`-`	Combination	ciprofloxacin	nwt R
KPN0052	(KPN0008 \| KPN0009) & KPN0013	`-`	Combination	ciprofloxacin	nwt R

ECO:0001091	knockout phenotypic evidence	ECO:0001091 knockout phenotypic evidence	E.g. evidence that knocking out the proposed AMR gene in a phenotypically resistant strain results in loss of resistance
ECO:0000012	functional complementation evidence	ECO:0000012 functional complementation evidence	E.g. evidence that, when a gene knockout results in change from R to S, the phenotype is reversed (resistance is restored) when the gene is reintroduced
ECO:0001113	point mutation phenotypic evidence	ECO:0001113 point mutation phenotypic evidence	E.g. for a mutation, evidence that this specific mutation is associated with a change in susceptibility phenotype
ECO:0000024	protein-binding evidence	ECO:0000024 protein-binding evidence	E.g. evidence that the gene product binds to this drug
ECO:0001034	crystallography evidence	ECO:0001034 crystallography evidence	E.g. structural evidence from crystallography that the mutated position in this gene product interacts with the drug
ECO:0000005	enzymatic activity assay evidence	ECO:0000005 enzymatic activity assay evidence	E.g. evidence that the gene product has enzymatic activity against the drug
ECO:0000042	gain-of-function mutant phenotypic evidence	ECO:0000042 gain-of-function mutant phenotypic evidence	E.g. for a mutation, evidence that introducing this specific mutation into a wildtype background is associated with a change in susceptibility phenotype
ECO:0007000	high throughput mutant phenotypic evidence	ECO:0007000 high throughput mutant phenotypic evidence	E.g. evidence from a transposon mutant library that mutation or loss of a gene in a phenotypically resistant strain results in loss of resistance
ECO:0001103	natural variation mutant evidence	ECO:0001103 natural variation mutant evidence	E.g. for an acquired gene or mutation, evidence that natural variation in presence vs absence is associated with susceptibility to the drug (genotype-phenotype association in a natural population)
ECO:0005027	genetic transformation evidence	ECO:0005027 genetic transformation evidence	E.g. evidence that transfer of the gene into a susceptible recipient strain results in resistance
ECO:0000020	protein inhibition evidence	ECO:0000020 protein inhibition evidence	E.g. evidence that a mutation inhibits protein function to reduce interaction the effect of the drug and confer resistance
ECO:0006404	experimentally evolved mutant phenotypic evidence	ECO:0006404 experimentally evolved mutant phenotypic evidence	E.g. evidence that the mutation arises in response to drug exposure during experimental evolution, resulting in resistant mutants
ECO:0000054	double mutant phenotype evidence	ECO:0000054 double mutant phenotype evidence	E.g. evidence resulting from an experiment typically constructed to determine if two different genes have an observable genetic interaction (functional connection) as the result of a mutation occurring in the alleles of the two genes of interest
ECO:0000154	heterologous protein expression evidence	ECO:0000154 heterologous protein expression evidence	E.g. a type of protein expression evidence where a gene from one cell is inserted into a cell that does not typically contain that gene and heterologous protein expression is assessed
ECO:0000006	experimental evidence	ECO:0000006 experimental evidence	Experimental evidence not otherwise classified
ECO:0001583	small interfering RNA knockdown evidence	ECO:0001583 small interfering RNA knockdown evidence	a type of anti-sense experiment evidence where gene expression is disrupted through the introduction of double-stranded RNA molecules, 20-25 base pairs in length, which operate within the RNA interference pathway

AMRrules Specification

Contents

AMRrules Specification#

Rule Specification#

AMRrules template (Google sheet)#

Full list of fields#

Controlled vocabularies#

Variation type#

Evidence codes#

Evidence grade#

Evidence limitations#

Breakpoint condition#

Organism code#

Variant Specification#

Syntax for mutations#

AMRrules-specific syntax#

Explanation of ‘mutation’ syntax relevant to known AMR variants#

Combinatorial rules#