-
Notifications
You must be signed in to change notification settings - Fork 4
Model bias
Home > Best practices in processing EM data > Model bias in EM data processing
##Model bias in EM data processing
Single particle EM requires an initial 3D model in order to begin 3D analysis
Due to the fact that the single particle images are noisy 2D projections from a 3D object (or multiple 3D objects), all 3D analysis requires an initial model to start with. If you know the 3D structure of your sample, then you can use that 3D model as an initial model, BUT, if you are studying a new sample, you will need to calculate the 3D initial model from scratch.
This requirement for an initial model means that users could provide any initial 3D model and you will always get back a 3D model of what you put into your data. This phenomena is known as 'Einstein from noise' because you can use Einstein's face as a search model to align images made up of pure noise, and you can always regain the original Einstein template even though the underlying data were pure noise.
To read more about how to avoid such situations, Richard Henderson wrote a great primer for users of single particle EM:
As an objective structural biologist, it is your responsibility to ensure that your 3D structure reflects the underlying 2D images of your particles
To make sure that your 3D structure is 'real', it is important to compare your 3D model to your reference-free 2D class averages:
- By comparing 3D projections with 2D class averages that have NEVER 'seen' this 3D model, an agreement is consistent with your 3D model being 'real', although this condition is not sufficient (just because your 2D class averages match your 3D model does not mean your 3D model is 100% accurate).
Other considerations for assessing the quality of a 3D model:
-
Your 3D model should only show features that are consistent with the reported resolution. Here are a list of resolutions and expected features:
-
25+ Angstroms - You can see the overall shape of your molecule, but cannot see subdomains yet.
- The surface should not appear textured since this is a very low resolution model and the textured surface represents high resolution information
-
15 - 25 Angstroms - You will begin to see shapes of domains, and perhaps even subdomains.
- The surface should not appear textured since this is a very low resolution model and the textured surface represents high resolution information
- 10 - 15 Angstroms - Subdomains are clearly visible as this resolution, as well as major & minor grooves of DNA/RNA
- 5 - 9 Angstroms - Helices are clearly visible at this resolution (no beta sheets)
- < 5 Angstroms - Beta sheets begin to separate, and bulky side chains may become visible
-
25+ Angstroms - You can see the overall shape of your molecule, but cannot see subdomains yet.
-
If you are able to dock a PDB model into your map, you can calculate a model vs. map FSC curve. This calculation assesses the correlation between the PDB coordinates and your 3D map.
- If your reported resolution is 'real', then you should see correlation of the atomic coordinates to the reported resolution.
- If the correlation goes to a lower resolution than that reported for the 3D map, then your map has been overfitted and this lower value is the 'real' resolution.