Description
When parsing a PDB structure in mmCIF format using Bio.PDB.MMCIFParser, I would like to clarify which mmCIF field the chain.id maps to when compared with the data from the RCSB PDB API.
In mmCIF files, there are two common ways to identify a chain:
_atom_site.label_asym_id (The "canonical" or "systematic" ID assigned by the PDB).
_atom_site.auth_asym_id (The "author" ID, often what users see in papers or older PDB files).
Background / Example
Using 7W1M as an example:
1. Biopython Parsing:
from Bio.PDB import MMCIFParser
parser = MMCIFParser()
structure = parser.get_structure("7W1M", "7w1m.cif")
for chain in structure[0]:
print(f"Chain ID: {chain.id}")
Output:
Chain ID: A
Chain ID: B
...
2. RCSB PDB API Data:
A request to https://data.rcsb.org/rest/v1/core/polymer_entity/7W1M/1 returns:
"rcsb_polymer_entity_container_identifiers": {
"asym_ids": ["A"],
"auth_asym_ids": ["A"],
...
}
In this specific case (7W1M), both asym_ids and auth_asym_ids are "A", so it is ambiguous which one Biopython is using. However, in many structures, these two IDs differ (e.g., asym_id is "A" but auth_asym_id is "Chain1").
Question
When using Bio.PDB.MMCIFParser, does chain.id consistently reference the asym_id (Label) or the auth_asym_id (Author)?
If I am integrating Biopython data with the RCSB REST API, should I be mapping chain.id to response['rcsb_polymer_entity_container_identifiers']['asym_ids'] or ['auth_asym_ids']?
Is there a way to toggle this behavior or access both IDs within the Chain object?
Description
When parsing a PDB structure in mmCIF format using
Bio.PDB.MMCIFParser, I would like to clarify which mmCIF field thechain.idmaps to when compared with the data from the RCSB PDB API.In mmCIF files, there are two common ways to identify a chain:
_atom_site.label_asym_id(The "canonical" or "systematic" ID assigned by the PDB)._atom_site.auth_asym_id(The "author" ID, often what users see in papers or older PDB files).Background / Example
Using
7W1Mas an example:1. Biopython Parsing:
Output:
2. RCSB PDB API Data:
A request to
https://data.rcsb.org/rest/v1/core/polymer_entity/7W1M/1returns:In this specific case (
7W1M), bothasym_idsandauth_asym_idsare "A", so it is ambiguous which one Biopython is using. However, in many structures, these two IDs differ (e.g.,asym_idis "A" butauth_asym_idis "Chain1").Question
When using
Bio.PDB.MMCIFParser, doeschain.idconsistently reference theasym_id(Label) or theauth_asym_id(Author)?If I am integrating Biopython data with the RCSB REST API, should I be mapping
chain.idtoresponse['rcsb_polymer_entity_container_identifiers']['asym_ids']or['auth_asym_ids']?Is there a way to toggle this behavior or access both IDs within the
Chainobject?