K-CAP 2007: day 1
Papers and presentation that I found interesting from day 1 of the K-CAP 2007 conference:
Oren Etzioni talked about his TextRunner knowledge extracting search engine. TextRunner gathers large amounts of knowledge from the web. It does this by focusing on semantically tractable sentences, finding the "illuminating phrases" and learning these in a self-supervised manner. It leverages the redundancy of the web, so, if something is said multiple times, it is more likely to be true.
This is all loaded into an SQL server and can be queried by anyone. If you type a query into the search engine it will return all the structured knowledge it knows about that query. For example: "Oren Etzioni is a professor" and "Oren Etzioni has an arm".
Capturing Knowledge About Philosophy
Michele Pasin talks about his PhiloSURFical project to build an ontology of the history of philosophy for the purpose of improving the browsing and searching experience for philosophy students and teachers. His view is that ontology should not be about true or beauty, but instead should focus on enabling reuse and sharing. Requirements for this tool were that it should support: uncertainty (e.g. of dates), information objects, interpretation of events, contradictory information and different viewpoints. The ontology itself is based upon CIDOC CRM. It captures events such when one philosopher interprets another's work, teaches a student, and/or debate with another scholar. The knowledge base contains 440 classes, 15000 instances, 7000 individual people, 7000 events and 500 axioms related to the philosopher Wittgenstein.
Searching Ontologies based on Content: Experiements in the Biomedical Domain
Harith Alani talked about the need for a good system to find existing ontologies on the web. Users need to find ontologies that they can reuse and/or bootstrap their own efforts. Existing content-based searching tools don't work, because, for example the Foundation Model of Anatomy (FMA) doesn't have an actual class called "anatomy" anywhere in it. So, a search for "anatomy" would not result in this ontology being returned.
The research involved interviewing a number of experts to established a gold standard. The experts were asked to list the good ontologies on certain topics (anatomy, physiological processes, pathology, histology). However, even the experts only agreed on 24% of answers.
The researchers new ontology search tools uses the wikipedia to expand the queried concepts (future work involves also using UMLS and WordNet to expand the query). The result was that Swoogle achieved an accuracy f-measure of 27% and the expanded term search's f-measure was 58%. The conclusion is that more ontology meta-data is necessary.
Capuring and Answering Questions Posed to a Knowledge-based Systems
Peter Clark from Vulcan, Inc. talks about their Halo project. The project aims to build a knowledge system (using the AURA knowledge authoring toolset) that can pass high-school level exams in physics, biology and chemistry. The system should be able to answer a free-text question such as: "a boulder is dropped off a cliff on a planet with 7 G gravity and takes 23 seconds until it hits the bottom. Disregarding air resistance, how high was the cliff?"
The system enforces a restricted simplified version of English that humans express the questions in (based upon Boeings Simplified English for aircraft manuals, modified for the specific domains). The language is both human usable and machine understandable.
Common sense assumptions need to be made explicit for the system. So, for example, in the above example it must be specific that the drop is straight downwards and not arced. So, the humans who were asking question to the system had to go through the following cycle: read original question text, re-write in controlled English, check with the system and take note of any re-writing tips, allow the system to turn the text into logic, check the paraphrase of the system's understanding, press the answer button and evaluate the system's attempted answer to the question, retry as necessary.
38% of biology questions were answered correctly with 1.5 re-tries per question (1-5 range).
37.5% of chemistry questions were answered correctly with 6.6 re-tries per question (1-19 range).
19% of physics question were answered correctly with 6.3 re-tries per question (1-18 range).
The researchers considered this to be a huge achievement! The system uses the sweet spot between logic and language to do something no other system before it could come close to. There was no single weak point that caused the system to give the wrong answer. Bad interpretation, bad query formation and missing knowledge all equally caused incorrect answers.