Skip to content

Clarification on chain.id mapping in MMCIFParser: asym_id vs auth_asym_id #5195

@MaybeBio

Description

@MaybeBio

Description

When parsing a PDB structure in mmCIF format using Bio.PDB.MMCIFParser, I would like to clarify which mmCIF field the chain.id maps to when compared with the data from the RCSB PDB API.

In mmCIF files, there are two common ways to identify a chain:

  1. _atom_site.label_asym_id (The "canonical" or "systematic" ID assigned by the PDB).
  2. _atom_site.auth_asym_id (The "author" ID, often what users see in papers or older PDB files).

Background / Example

Using 7W1M as an example:

1. Biopython Parsing:

from Bio.PDB import MMCIFParser
parser = MMCIFParser()
structure = parser.get_structure("7W1M", "7w1m.cif")
for chain in structure[0]:
    print(f"Chain ID: {chain.id}")

Output:

Chain ID: A
Chain ID: B
...

2. RCSB PDB API Data:
A request to https://data.rcsb.org/rest/v1/core/polymer_entity/7W1M/1 returns:

"rcsb_polymer_entity_container_identifiers": {
    "asym_ids": ["A"],
    "auth_asym_ids": ["A"],
    ...
}

In this specific case (7W1M), both asym_ids and auth_asym_ids are "A", so it is ambiguous which one Biopython is using. However, in many structures, these two IDs differ (e.g., asym_id is "A" but auth_asym_id is "Chain1").

Question

When using Bio.PDB.MMCIFParser, does chain.id consistently reference the asym_id (Label) or the auth_asym_id (Author)?

If I am integrating Biopython data with the RCSB REST API, should I be mapping chain.id to response['rcsb_polymer_entity_container_identifiers']['asym_ids'] or ['auth_asym_ids']?

Is there a way to toggle this behavior or access both IDs within the Chain object?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions