• E-value identity bitscore


    E-value:

    The E-value provides information about the likelihood that a given sequence match is purely by chance. The lower the E-value, the less likely the database match is a result of random chance and therefore the more significant the match is.

    Empirical interpretation of the E-value is as follows:

    If E-value < 1e-50 (or 1 X 10-50), there should be an extremely high confidence that the database match is a result of homologous relationships.

    If E-value is between 0.01 and 1e-50, the match can be considered a result of homology.

    If E-value is between 10 and 0.01, the match is considered not significant, but may hint at a tentative remote homology relationship. Additional evidence is needed to confirm the tentative relationship.

    If E-value > 10, the sequences under consideration are either unralated or related by extremely distant realtionships that fall below the limit of detection with the current method.

    Because the E-value is proportionally affected by the database size, an obvious problem is that as the database grows, the E-value for a given sequence match also increases.

    Because the genuine evolutionary relationship beween the two sequence remains constant, the decrease in credibility of the sequence match as the database grows means that one may "lose" previously detected homologs as the database enlarges. Thus, an alternative to E-value calculations is needed.

    The E-value is very important, the lower the better

    bitscore:

    A bitscore is another prominant statistical indicator used in addition to the E-value in a BLAST output. The bitscore measures sequence similarity independent of query sequence length and database size and is normalized based on the raw pairwise alignment score. The bitscore (S) is determined by the following formula: S = (λ * S - lnK) / ln2  where λ is the Gumble distribution constant, S is the raw alignment score, and K is a constant associated with the scoring matrix used. Clearly, the bitscore (S) is linearly related to the raw alignment score (S). Thus, the higher the bit score, the more highly significant the match is. The bit score provides a constant statistical indicator for  searching different databases of different size or for searching the same database at different times as the database enlarges.

    identity:

    Identity 35% means that 35% of AA in your sequence match to other sequences in database, There isn't something like "acceptable percentage". It always depends on what you are looking for:

    If you have unkown protein sequence and you would like to know the homology sequences, information about identity (even 35%) is valuable.

    If you have known protein and you need to confirm the sequence, the identity 35% is small and may suggest that something went wrong during your analysis.

  • 相关阅读:
    SpringCloud学习第四篇:Feign学习(Hoxton.SR4)
    SpringCloud学习第三篇:Ribbon负载均衡(Hoxton.SR4)
    SpringCloud学习第二篇:使用Consul注册中心(Greenwich-SR1版本)
    SpringCloud学习第一篇:服务的注册与发现Eureka(Hoxton.SR4)
    SpringBoot+Mycat+APO实现读写分离
    SpringBoot+activeMq
    自动化测试适用场景和难点
    软件测试理论
    软件测试理论中的注意事项
    python自动化:monkey测试的云测
  • 原文地址:https://www.cnblogs.com/0820LL/p/11352294.html
Copyright © 2020-2023  润新知