Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biosemantics group Martijn Schuemie. Overview  The biosemantics group  Ontology assembly  Concept tagging  Homonym disambiguation  Concept profile.

Similar presentations


Presentation on theme: "Biosemantics group Martijn Schuemie. Overview  The biosemantics group  Ontology assembly  Concept tagging  Homonym disambiguation  Concept profile."— Presentation transcript:

1 Biosemantics group Martijn Schuemie

2 Overview  The biosemantics group  Ontology assembly  Concept tagging  Homonym disambiguation  Concept profile creation  Nucleolus

3 Biosemantics group  ErasmusMC University Medical Center Rotterdam  Department of Medical Informatics  Biosemantics group  Jan Kors  Barend Mons  Erik van Mulligen  Martijn Schuemie  Rob Jelier  Kristina Hettne  Antoinne van Veldhoven

4 Biosemantics group Biosemantics  Molecular Biology  High througput experiment data (genomics and proteomics)  Gene and protein databases, MEDLINE, Gene Ontology Biosemantics  Concept-based text-mining  Interpretation of experiment data  Knowledge discovery

5 Ontology assembly Entrez GeneSwiss-ProtHUGO Combination Add spelling variations ABC1 -> ABC-1 DEF3 -> DEF-III Remove highly ambiguous terms CO2, membrane-bound obesity, open reading frame P=37%, R=76% P=50%, R=75%

6 Concept tagging MEDLINE text Malaria fever is a disease. It is spread by mosquitos. Sentence splitting [Malaria fever is a disease.] [It is spread by mosquitos.] Tokenization [Malaria] [fever] [is] [a] [disease] Word normalisation [malaria] [fever] [be] [a] [disease] Concept mapping [malaria fever] C24530 [disease] C12634 Homonym disambiguation PSA -> Prostate Specific Antigen or Poultry Science Association? Concept profile of text

7 Homonym disambiguation Some simple rules: Is it likely that a term has multiple meanings? - 3-letter-acronym (e.g. PSA): highly likely - long forms (e.g. Prostate Specific Antigen): highly unlikely - terms that refer to several conceptsby definition Is a synonym found? (e.g. “KLK3 (PSA)”) Is a keyword found? (e.g. “PSA is secreted by the prostate”) These simple rules change performance from P=50%, R=75% to P=71%, R=71%.

8 Homonym disambiguation Concept profile of text containing PSA Concept profile of Prostate Specific Antigen Concept profile of Phosphoserine Aminotransferase Unknown meaning Similarity? Previous tests showed an overall accuracy of 93%

9 Concept profile creation Concept profile of text Concept profile of concept Text Concept - From databases - By concept mapping

10 Concept profile creation Binary Log likelihood XIDF Uncertainty cf.

11 Concept profile creation Profile of gene ESR1: estrogen receptor1 breast neoplasm0.5 BRCA10.34 PGR0.30 Estrogen0.28 BRCA20.25 TP530.15 gene suppressor tumor0.12 genetics polymorphism0.12 genetic predisposition to disease 0.10 female0.05

12 Concept profile comparison

13 Concept NameWeightRAB27BMYRIPMLPHRAB27A 52.170.610.740.731 MLPH11.16-0.4410.29 Myosin Type V7.220.040.680.40.22 Melanosomes6.70.120.30.470.27 RAB27B4.0610.14-0.11 MYRIP2.980.0710.090.06 Melanocytes2.730.130.140.280.17 Myosins2.330.040.380.220.12 Myosin Heavy Chains1.72-0.460.180.09 GTP Phosphohydrolases1.310.170.230.040.08 Actins1.170.050.320.120.06 Exocytosis0.870.080.120.080.12 Secretory Vesicles0.680.070.160.060.09 Carrier Proteins0.59-0.110.170.09 Organelles0.540.11-0.120.09 rab GTP-Binding Proteins0.520.16-0.040.12

14 Nucleolus main function: ribosome biogenesis over 700 proteins identified and classified into 8 main categories

15 MEDLINE article Nucleolus – Concept profiles Concept profile of text Concept profile of protein Protein - From databases MEDLINE article

16 Nucleolus – Concept profiles BLAST (Basic Local Alignment Search Tool) Query: nucleolar protein Results: homologs in human mouse fruitfly yeast

17 Nucleolus – Concept profiles

18 Nucleolus – fun with protein profiles 2D visualization of high-dimensional space Automatic functional annotation of proteins Finding similar proteins

19 Nucleolus - visualisation SRP PARN Exosome comp. 10 O43390 P98179 Q8N220 Multi-Dimensional Scaling

20 Nucleolus – Assigning GO terms MEDLINE article Concept profile of text Concept profile of GO term GO term - From GO MEDLINE article

21 Nucleolus – Assigning GO terms AuC : Area under Curve

22 Nucleolus – Assigning GO terms 1.Manual assignment to one category only e.g. SFRS protein kinase 1 plays a role in splicing, but is also in kinase 2.Assumptions do not always hold Sequence homology ≠ function homology Concept co-occurrence ≠ functional relationship 3.Homonyms ‘Mistakes’ in automatic annotation

23 Nucleolus – Finding new proteins Concept profile of nucleolar protein Concept profile of human protein Concept profile of human protein Concept profile of human protein

24 Nucleolus – Finding new proteins 60S ribosomal protein L3-like Probable ATP-dependent RNA helicase DDX4 ATP-dependent RNA helicase DDX3Y Guanine nucleotide binding protein-like 3 Importin-11 (importin beta family) Putative Brix domain containing protein 1P Probable ATP-dependent RNA helicase DDX20 (Gemin 3) 60S acidic ribosomal protein P0 Helicase SKI2W ATP-dependent RNA helicase DDX39 40S ribosomal protein S20 Probable ATP-dependent RNA helicase DDX6 Probable ATP-dependent RNA helicase DDX23 Double-stranded RNA-binding protein Staufen homolog 1 ATP-dependent RNA helicase DDX25 Probable nucleolar complex protein 14 Eukaryotic initiation factor 4A-II ATP-dependent RNA helicase DDX19B 40S ribosomal protein S3 Ribosomal protein DEAD-box Found in nucleolus Associated with nucleolar p. DEAD-box Found in nucleolus DEAD-box Ribosomal protein DEAD-box Indirect evidence DEAD-box Nucleolar DEAD-box Ribosomal protein


Download ppt "Biosemantics group Martijn Schuemie. Overview  The biosemantics group  Ontology assembly  Concept tagging  Homonym disambiguation  Concept profile."

Similar presentations


Ads by Google