Disease Name: Papillon-Lefevre syndrome
>sp|P53634|CATC_HUMAN Dipeptidyl peptidase 1 OS=Homo sapiens OX=9606 GN=CTSC PE=1 SV=2
MGAGPSLLLAALLLLLSGDGAVRCDTPANCTYLDLLGTWVFQVGSSGSQRDVNCSVMGPQ
EKKVVVYLQKLDTAYDDLGNSGHFTIIYNQGFEIVLNDYKWFAFFKYKEEGSKVTTYCNE
TMTGWVHDVLGRNWACFTGKKVGTASENVYVNIAHLKNSQEKYSNRLYKYDHNFVKAINA
IQKSWTATTYMEYETLTLGDMIRRSGGHSRKIPRPKPAPLTAEIQQKILHLPTSWDWRNV
HGINFVSPVRNQASCGSCYSFASMGMLEARIRILTNNSQTPILSPQEVVSCSQYAQGCEG
GFPYLIAGKYAQDFGLVEEACFPYTGTDSPCKMKEDCFRYYSSEYHYVGGFYGGCNEALM
KLELVHHGPMAVAFEVYDDFLHYKKGIYHHTGLRDPFNPFELTNHAVLLVGYGTDSASGM
DYWIVKNSWGTGWGENGYFRIRRGTDECAIESIAVAATPIPKL
>NM_001114173.1 Homo sapiens cathepsin C (CTSC), transcript variant 3, mRNA
CGTAGCTATTTCAAGGCGCGCGCCTCGTGGTGGACTCACCGCTAGCCCGCAGCGCTCGGCTTCCTGGTAA
TTCTTCACCTCTTTTCTCAGCTCCCTGCAGCATGGGTGCTGGGCCCTCCTTGCTGCTCGCCGCCCTCCTG
CTGCTTCTCTCCGGCGACGGCGCCGTGCGCTGCGACACACCTGCCAACTGCACCTATCTTGACCTGCTGG
GCACCTGGGTCTTCCAGGTGGGCTCCAGCGGTTCCCAGCGCGATGTCAACTGCTCGGTTATGGGACCACA
AGAAAAAAAAGTAGTGGTGTACCTTCAGAAGCTGGATACAGCATATGATGACCTTGGCAATTCTGGCCAT
TTCACCATCATTTACAACCAAGGCTTTGAGATTGTGTTGAATGACTACAAGTGGTTTGCCTTTTTTAAGG
ATGTCACTGATTTTATCAGTCATTTGTTCATGCAGCTGGGAACTGTGGGGATATATGATTTGCCACATCT
GAGGAACAAACTGGCCATGAACAGACGTTGGGGCTAAGAGACAGAGCAGCCTGCGACAGTGTGGACCTAC
CTGTAGCAGCTAGCAAAGGCCTCTAGCAGCTACAGTCCCTTCTGGAGTCTTTATTTGCATGCAAAATGCA
AAGGAGTCCTGGTGACCTACCTCCAAGGCAGCTGCCCTCCTGAACACTCCCTTGGAAAACAGTAAACATC
ATTTTGGAATGTGAACAACCAGAGACTACACAGGAGAAAGGAAAAAAAAATTCTGAAGATGCAAAATCTT
GGGTGGCTTCACCGTTCAGTTTTTTAATAAAAGGAACAATATACAACACGTTGTTCTTTTTCTCTTTTGA
AATCCCTTCTATTACAGTGATTTTTTTCTAAGATTGTCAGGATTTGAAGTGTATGTTTTGTTTTATTCAC
AGCGTAAATTTTATTCACAGTTTAACTGTCTGCCTGAGTGTCTTTCCTTTCTCTAATTACCTTGAGGAAC
CCAAGAGCCTGTTGTAAGGAGAAAATAAGGCCCTTGGATCTCTTGAGATTCACAGATATAAGTTATTGAA
GGGAAGATGGTCCTATGGAGGACATATTTAAAGAAGGGAAAAAAGAGGCTTTCTCAGATATGTCAGACTG
CTATAGTATACTTCTACAGATTATAGACCTCCAGTACCTCTGGCCAGAAAGATGGTATCGTAAACACCCT
ATTTTTTTTCTTTTCTTTTTTCATTAGGTACAAGCTTTGTGCTAAGAAGTTGACATACTATAAGCTACAA
AAGTTCTGTAAAGTAGATATAACTAGTTTCATTTTATAGATAGAGAAAATTAATCTCTTACAGTGCTAAG
CTCACAGAGTTTCTAACTGTAAAATGCTAGAACTTGTCTTTCAAGCCTAAAGACTTCCTTGGGGCTAAAT
AGTGAAAAAAGCCATTTCACAAATAAGTAAATGGTATTTAGAGGCATATTTGGATTTCCTGGTAAATTCC
AGTCTGTGAGCATCATGAATATTAGTTTAATGTTGCATGGGCTCATGTTGAAGTTTTAAGAGAAGAACTG
CCTTGAAGCTTAGGTTTCCTTAGCTATTAGGCTACTGACTTTCTTGCCTAAACCAGGGTTTTTTCATTGA
AGACCAAAACTTACCTTCTCCTTCAGTTTGTAGTTTGGAAATTGGTAGAAGAGCTTTGTAAACTTCAAAT
TAAGTACAAACTAAGTGTCATAGTCAAATTTACTAATCTTAATTACAGTATTGTTCAACTGATTGCTATC
TTCTAGCTCTTTCCTGCCGAATAATGGTCTTGTTTCCTGCTCTGTTGGTTTAGAGCTGACTTCTTTCAGC
TTTGGTAAGCCTGAAATTATGGGGTTATGTTTAATTCATATTGTCTGGGTGGACTTTCCTCTCTTGCATT
TCTGCTTGAATAGAAGAATTTTTCTCTAGAGAGTAGTTTGTCATCCTTACTCTGTTGATTCAGATGACTC
TTTGTATGATCTGAGAGGTATACTGTTCTGCTATTCTGAGAAGAAGTATTTCAGAAAGATGAATTAAGAG
TACAGTGGACTGCTCCCACCTGGAAACTTTTATCTATCTCACCTCTGGACCTGATAAATTCTTTATCACT
CAGGACCTTGATGACGCTGCTCTCTGAAACCCTCCCCAGCTCTCTCTATTACCGTGAGAAACATCAGAAC
TTTGGTTCCCATTGCATATCGCAGGTACCTCTGCTTTCATGCCATGCTGTAATGGAGTGATTGGGTAGCA
TGTTTTCATCTCTTTCCAGATTGAAAATCTGTATTTCTCCCTGTATATCTTCAACACCTAATGCACATAG
AACTTTGTAGGTACCTGGAAAATGCACCACAGTTTTCTTTTCTTTTTGCAGACTTTTCACAAGTATTACC
AACTTACAAAGAATTAATTTTGTAGGATTCTAGAAAGACAAATCAGGAATGGTGCCATATACATCTTTTT
TGATTCCCTGCTCTAAAGAATATTATCAGGTTACCTTCCTGCAGAGTTTTAAAAGAATTGCATATTTCAA
GCTGACTTTCAGGATGTAAATATAACCAAAGCAACTGATATGTAAAAAATATATTCAATGGCATTCCTAG
ATTTTCTTCTAGGGTGTTTTATTGTTTTGGGTTTTACATTTAAGTCTTTAATCCATCTTGAGTTAATTTT
TGTATAGGTATAAGAAAGGGGTCCAGTTTTAATTTTCTGCGTATGGCTAGCCAGTTCTCCCAGCACCATT
TATTAAATAGGGAATCCTTTCCCTATTGTTTGTTTTTGTACGGTTTGTCAAAGATTAGATGGTTGTAGAT
GTGTGGTCTTATTTCTGAGATCTTCATTCTCTTCCACTGGTCTATGTGTCTGTTTTTGTACCATGCTTTT
TTGGTTACTGTAGCCTTGTAGTATAGTATGAAAGATAGCATGATGCCTCCAGGTTTGTTCTTTTTGCTTA
GGATTGTCTTGGCTATACGAGCTTTTTTTTGGTTCTATATGAATTTTAAAATAGTTTCTTCTAATTGTGT
GAAGAATGTTAATGGTAGTTTAATGGGAATAGCATTGAATCTGTGAATTGCTTTGGGCAGTATGGCCATT
TTCATGATATTGATTCTTCCTATCCATGAGCATGTAACGTTTTTCCCTTCGTTTGTGTCCTCTCTCATTT
CCTTGAGTAGTGGTTTGTAGTTCTCCTTGAAGAGATCCTTCACTTCTTCTGTATTCCTAGATATTTTATT
CTCTCTGTAGCTATTGGGAATGGGAGTTCATTCATGATTTTGCTCTCTGCTTGCCTTTTGTTGGTGTATA
GGGATCCTGGTGACTTCTGCACATTGATTTTGTATCCTGAGACTTTACCGAAGTTGCTTATCAGCTTAAG
AAGCTTTTGGGCTGAGATGATGGGGTTTTCTAGATATAGGATCATGTTATCTTCAAACAAAGACAATTTG
ACTTCCTCTCTTCCTATTTGAGTACGCTTTATTTCTTTCTCTTGCCTGATTGCCCTGGCCAGAACTCCCA
ATACTATATTGAATAAGAATGGTGAGAGAGGGCATCCTTGTCTTGTGCCAGTTTTCACGGGGAATGCTTC
CAGCTTTTGCCCATTCAGTATGATATTATCTGTGGGTTTCTCATAAAAAGCTCTTATTATTTGAGATACG
TTCCTTCAATACCTAGTTTATTGAGAGTTTTTAACATGAAGCGATGTTGAATTGTATCGAAGGCCTTTTC
TGTGTCTATTGAGATAATCATGTGGTTTTTGTCTTTAGTTCTGTTTATGTGATGAATGACGTTTATTGAT
TTGCATATGTTGAACCGGCCTTGCATCCTGGGGATGAAGCCAACTTGACTGTGGTAGATAAGCTTTTGGA
TGTGCTGCTGGATTTGGTTTATCAGTATTTCATTGAGATTTTTTGCGTCGAAGTTCATCAGGGATATTGG
ACTGAAGTTTTCTTTTTGTTGTCGTATCTCTGCCAGGTTTTGGTATCAGGATGATGCTGGCCTCATAAAA
TGAGTTAGGGAGGAGTCCCTCCTTTTCAATTGTTTGGAATAGTTTCAGAAGAAAGGGTATCAGCTCCTCT
TTGTACCTCTGGTAGAATTCAACTGTAAATCCATCTGGTCCTGGACTTTTTTTCATTAGTAGGCTATTTA
TTACTGCCTCACTTTCATAACTTGTTATTGATCTATTCAGGGATCCAACTTCTTCCTGATTCAGTCTTGG
GAGTGTGTATGCATCCAGGAATTTATCCATTTCTTCTAGATTTTCTAGTTTCTTTGCATAGAGGTGTTTG
TAGTATTTGCTGTTGGTTGTTTGTACTTCTGTGAGATCAGTGGTGGTATCCTGTTTATCATTTTTTATTG
TGTCTGTTTGATTCTTCTCTTATTTTTGACAAAGCTGACAAAAAGAAGCAATAGGGAAAGGACTCTCTAT
TCAATTAATCCTACTGTATATCTGGCTAGCCATATGCAGAAAATTGAAACTGTTCCTGTTTCTTAATCCA
TATACGAAAATCAACTTACGATGGATTAAAGACTTAAATGTAAAACCCAAAATTATAAAACCCTGGAATA
GAATATAGGCAATATCATTCTGGACATAGGAATGGGCAAAGATTTTATGAGAAAGACACCAAAAGCAATT
ACAACAAAAGCAAAAATTGGCAAATGAGATCTAATTAAACTAAAGAGCTCTGCACAGCAAAAGAAACTAC
TGTCAGAGTGAACAGGCAACCAACAGAATGGGAGAAAATTTTTTCAATCTATCCATATGACAAAGGTCTA
ACATCCAGAATCTACAAGGAACTTAACAAATTTACAAGAAAAAAGGAGCCCCATTAAAAAGTTGGCAAAG
AACATGAACAGACACTTCCCAGAAGATATTCATGTGGCCAATAAACATGAAGAAAAGCTCAACATCACTG
ACCATTAGAGACGTGCATATCAAAATCACAATGAGATACCATCTCATGTCACAATGGTGATTATTAAAAA
GTCAAACAACATGCTAGTGAGGTTGTAGAGAAATAAGAACGCTTTTACACTGTTGGTGGGAATGTCAACT
AATTCAACCACTGTGGAAGACAGTGTGGTGATTCCTCAAGGATTTAGAACCAGAAATATCATTACTGCAT
ATAGACCCAAAGGAATAGAAATCATTCTATTACAAAGATACATGCACATGTATGTTTATTACAGCACTAT
TCACAATAGCAAAGACATGGAATCAACCCAAATGCTCATCAGTGATAGACTGGAAAAAGAGAATGTGGAA
CATAAACACCATGGAATACTATGCAGCAATAAAAAGGAATGAGATCCTGTCCTTTTCAGGGACATGGATG
GAGTTGGAAGCTGTTATCCTCAGCAAACTAATGCAGGAACAGAAAACCAACCACCACATGTTCTCACTTA
TAAGTGGGAGCTGAACAATAGAACACATGGGCACAGGGAGGGGAATAACACACACTGGGGCCAGTCAGGG
GGTGGGGGGTCAAGCTGAGGGAGAGCATTAGAAAAAATAGCTAATGCATTCTGGGCTTAACCCATTTATG
CCTAGTGTTCCATTTCTGGAATGCTAAGCATGTGGAAGTTCTTTATATCCTGCTCAAGGTCATTGCCAAG
GTCTGATTTTTCACATTCAACAAATTGCAACCTCTGGCATAAATGGGTTAATACCTAGGTGATGAGTTGA
TAGGTGCAGGAAACCACCATGGCACATGTTTATCTATGTAAGAAACCTGCACATCCTACACATGTACCCT
GGAACTTAAAAAATTTAAAATATATATGTATATATATTTAATATGGAATTTTAAAAATTACTAATGAGTT
CTTTTATCTGAGTAATTTTGCATCAACATGCTTTTATTATGGAAGAGAAGATTCAGTGAGTACAAAATTG
CAGATACATGTGTCAGAAGATCCCTGAATATAATAAGGCTTAGTATTCTGTGTCATAATTGCCTGTTTGT
ATTCCTCTCTGGTCTTTAAACTTCATTAGGGCAAGGATCAACTCCATCTTACTAACCATTTGATTCCCTA
TGTATTACACGATATATGACCAATAATAAGCCTTCAATAAATACTTGTAAAATAAAGAATGTTATGTAAT
AAAAAAAAA
Pfam (3 motifs)
Pfam |
Position(Independent E-value) |
Description |
Peptidase_C1 |
231..457(3.5e-62) |
PF00112, Papain family cysteine protease |
CathepsinC_exc |
25..141(8.4e-46) |
PF08773, Cathepsin C exclusion domain |
Peptidase_C1_2 |
402..440(0.00041) |
PF03051, Peptidase C1-like family |
Evolutionary analysis by Maximum Likelihood method
The evolutionary history was inferred by using the Maximum Likelihood method and JTT matrix-based model [1]. The tree with the highest log likelihood (-4662.53) is shown. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. The proportion of sites where at least 1 unambiguous base is present in at least 1 sequence for each descendent clade is shown next to each internal node in the tree. This analysis involved 100 amino acid sequences. There were a total of 271 positions in the final dataset.
SNP |
COORDINATE |
AMINO ACID CHANGE |
GENE ID |
TRANSCRIPT ID |
PROTEIN ID |
SIFT SCORE |
SIFT MEDIAN |
SIFT PREDICTION |
rs28937571 |
88027331 |
Y412C |
ENSG00000109861 |
ENST00000227266 |
ENSP00000227266 |
0.001 |
2.48 |
DELETERIOUS |
rs104894208 |
88029333 |
Q286R |
ENSG00000109861 |
ENST00000227266 |
ENSP00000227266 |
0.002 |
2.53 |
DELETERIOUS |
rs104894211 |
88027526 |
Y347C |
ENSG00000109861 |
ENST00000227266 |
ENSP00000227266 |
0.002 |
2.55 |
DELETERIOUS |
Evolutionary analysis of coding SNPs
SNP |
substitution |
probability of deleterious effect |
Pdel |
rs28937571 |
Y412C |
probably damaging |
0.89 |
rs104894208 |
Q286R |
probably damaging |
0.89 |
rs104894211 |
Y347C |
probably damaging |
0.85 |
Provean
SNP Id’s |
Amino Acid Change |
probability |
Prediction |
Score |
rs28937571 |
Y412C |
Damaging |
Deleterious |
-8.15 |
rs104894208 |
Q286R |
Damaging |
Deleterious |
-3.83 |
rs104894211 |
Y347C |
Damaging |
Deleterious |
-8.78 |
SNAP2
Wildtype amino acid |
Position |
Variant Amino Acid |
Predicted Effect |
Score |
Expected Accuracy |
Y |
412 |
C |
Effect |
87 |
95% |
Q |
286 |
R |
Effect |
67 |
95% |
Y |
347 |
C |
Effect |
65 |
95% |
Polyphen 2
Rs_Id |
Mutation Probability |
Score |
rs28937571 |
Probably damaging |
0.999 |
rs104894208 |
Possibly damaging |
1.00 |
rs104894211 |
Probably damaging |
0.98 |
Hope Prediction
Rs_Ids |
Mutation |
Mapping Issues |
AA Variant |
Function Impact |
Score |
rs28937571 |
CTSC_Y412C |
|
Y412C |
High |
8.90 |
rs104894208 |
CTSC_Q286R |
Uniprot Residue:L |
Q286R |
High |
6.78 |
rs104894211 |
CTSC_Y347C |
|
Y347C |
Medium |
4.87 |
Rank |
C-Score |
Cluster Size |
PDB Hit |
Lig Name |
Consensus Building Residues |
1 |
0.83 |
395 |
252,256,258,259,298,299,300, 301,302,303,373,403,404,405, 429,453 |
||
2 |
0.02 |
10 |
252,256,257,258,405,429 |
||
3 |
0.02 |
21 |
258,259,300,301,303,373,404, 405,453 |
||
4 |
0.02 |
14 |
252,256,258,302,373,376,377, 403,404,405,406,454 |
||
5 |
0.02 |
15 |
252,258,374,376,380,405,406, 407,427,429 |
||
6 |
0.01 |
5 |
25,252,256,258,299,300 |
||
7 |
0.01 |
7 |
252,405,429 |
||
8 |
0.01 |
8 |
301,302,304,305,347,400,403, 453,454 |
||
9 |
0.00 |
1 |
379,382,392 |
||
10 |
0.00 |
3 |
251,252,253,429,432,433 |
Number of amino acids
|
463
|
Molecular weight
|
51853.82
|
Theoretical pI
|
6.53
|
Amino acid composition
|
Ala (A) 31 6.7%
Arg (R) 17 3.7%
Asn (N) 23 5.0%
Asp (D) 21 4.5%
Cys (C) 14 3.0%
Gln (Q) 15 3.2%
Glu (E) 24 5.2%
Gly (G) 43 9.3%
His (H) 14 3.0%
Ile (I) 21 4.5%
Leu (L) 37 8.0%
Lys (K) 25 5.4%
Met (M) 11 2.4%
Phe (F) 20 4.3%
Pro (P) 19 4.1%
Ser (S) 31 6.7%
Thr (T) 28 6.0%
Trp (W) 10 2.2%
Tyr (Y) 29 6.3%
Val (V) 30 6.5%
Pyl (O) 0 0.0%
Sec (U) 0 0.0%
|
Total number of negatively charged residues (Asp + Glu)
|
45
|
Total number of positively charged residues (Arg + Lys)
|
42
|
Formula
|
C2332H3521N615O680S25
|
Total number of atoms
|
7173
|
Instability index(II)
|
36.05
|
Stability
|
stable
|
Aliphatic index(AI)
|
74.34
|
GRAVY
|
-0.257
|
Alpha helix (Hh) : 121 is 26.13% 310 helix (Gg) : 0 is 0.00% Pi helix (Ii) : 0 is 0.00% Beta bridge (Bb) : 0 is 0.00% Extended strand (Ee) : 69 is 14.90% Beta turn (Tt) : 0 is 0.00% Bend region (Ss) : 0 is 0.00% Random coil (Cc) : 273 is 58.96% Ambiguous states (?) : 0 is 0.00% Other states : 0 is 0.00% |
Secondary Structure Prediction
Amino Acid Type
Name of enzyme |
No. of cleavages |
Arg-C proteinase |
17 |
Asp-N endopeptidase |
21 |
Asp-N endopeptidase + N-terminal Glu |
45 |
BNPS-Skatole |
10 |
CNBr |
11 |
Caspase1 |
2 |
Chymotrypsin-high specificity (C-term to [FYW], not before P) |
57 |
Chymotrypsin-low specificity (C-term to [FYWML], not before P) |
117 |
Clostripain |
17 |
Formic acid |
21 |
Glutamylendopeptidase |
24 |
Iodosobenzoic acid |
10 |
Hydroxylamine |
1 |
LysC |
25 |
LysN |
25 |
NTCB (2-nitro-5-thiocyanobenzoic acid) |
14 |
Pepsin (pH1.3) |
83 |
Pepsin (pH>2) |
144 |
Proline-endopeptidase |
2 |
Proteinase K |
230 |
Staphylococcal peptidase I |
22 |
Thermolysin |
127 |
Trypsin |
40 |
Tobacco etch virus protease |
2 |
Rank |
Start Position |
Sequence |
Score |
Prediction |
1 |
|
HFTIIYNQGFEIVLND |
|
Epitope |
2
|
323 |
|
|
Epitope |
3 |
341 |
|
0.89 |
Epitope |
3 |
303 |
|
0.89 |
Epitope |
4 |
|
PTSWDWRNVHGINFVS |
|
Epitope |
5 |
|
YFRIRRGTDECAIESI |
|
Epitope |
5 |
|
|
|
Epitope |
No of Nodes |
11 |
No of Edges |
36 |
Avg node degree |
6.55 |
avg. local clustering coefficient |
0.893 |
expected number of e, dges |
11 |
p-value |
1.78e-09
|
Protein – Protein Interaction network |
|
RNA Base Pairing Probability Plot