- From: Sergio Rodriguez <srodriguez142857@y...>
- To: Kurt Cagle <kurt.cagle@g...>
- Date: Tue, 15 Feb 2022 06:45:08 +0000 (UTC)
Hi, Kurt.
Good to see that you have interest in joining our group. Below there is more information about it:
KNOWLEDGE GRAPH CONSTRUCTION COMMUNITY GROUP
https://www.w3.org/community/kg-construct/ https://github.com/kg-construct Slack invite (expires in 23.5 hours): https://join.slack.com/t/kg-construct/shared_invite/zt-1373bhh5y-t7ArrKraAPngwiksJrwzYw
:)
On Tuesday, February 15, 2022, 04:23:27 PM GMT+11, Kurt Cagle <kurt.cagle@g...> wrote:
Sergio,
Thanks for the links (and I will definitely be joining the construction group - I'm part of several of the others at this point, with my interest primarily in evangelizing a lot of what's going on in this space to others).
The SPARQL-Anything project looks especially promising and seems to be heading in a direction similar to both my own studies and those I've seen from other developers in the knowledge graph arena. I wrote a (since abandoned) project called Kaleidoscope that was intended to be a knowledge graph editor - one thing that I found in the process is that at some point you have to recognize that you have to get everything down to a primitive layer and not get too committed to a specific ontology.
Another thing I've found that has been especially useful has been the realization that in developing an ontology you need to make a differentiation between an archival graph, one in which predicates are bound to reified bindings in order to map an evolving world, and a secondary now() graph that can then map what are often complex third normal form constructs into simpler, derived relationships. The canonical example is determining who is the CEO of a given company. In most cases, you are likely to start with a record of the form:
Job:_Job123 a Class:_Job; job:hasCompany Organization:_BigCo; job:hasEmployee Person:_JamesTBigg; job:hasId "JTB12523"^^Indentifier:_BigCoBadgeIdentifier; job:hasStartDate "2021-03-15"^^xsd:date; .
However, in a typical now() knowledge graph, this information is reduced down to a single assertion
<<Organization:_BigCo Organization:hasCEO>> :hasBinding Job:_Job123.
within a separate generated graph that links back to the descriptor.
This gets generated on a daily basis any time you have changes in the third normal form construct (usually the expiration of a given object after something obsoletes it). The now() graph reflects the state of the graph at regular intervals for simplicity, while maintaining the archival nature of the data coming in from external data in a temporal graph.
Anyway, I have more to digest on the SPARQL Anywhere work.Thanks for the links, and I'll see you on the lists.
Kurt Cagle Community/Managing Editor Data Science Central, A TechTarget Property Another interesting and, fairly recent, development towards an [from-any-source]-to-RDF fast mappings is "SPARQL Anything". See some resources below including a talk given yesterday.
GitHub repo:
| ![]() | GitHub - SPARQL-Anything/sparql.anything: SPARQL Anything is a system fo...SPARQL Anything is a system for Semantic Web re-engineering that allows users to ... query anything with SPARQL.... |
|
|
Slides:
Talk:
| ![]() | W3C Community Group on Knowledge Graph Construction - SPARQL Anything Pr... |
|
|
On Tuesday, February 15, 2022, 02:07:49 PM GMT+11, Kurt Cagle < kurt.cagle@g...> wrote:
Webb,
I THOUGHT there was some RDF thinking going on there. I recently completed a NIEM conversion project for the FBI, and while I've worked with NIEM before, was surprised at how readily it translated back into RDF. Kurt Cagle Community/Managing Editor Data Science Central, A TechTarget Property On Mon, Feb 14, 2022 at 2:13 PM Webb Roberts < webb@w...> wrote: During my time working on NIEM (the National Information Exchange Model), we kept integration of XML and RDF as a core tenet. The goal was to ensure that XML data and schemas in the NIEM ecosystem represented RDF data. There were several major pieces to this:
- We defined a mapping from data that uses NIEM to RDF. Instance documents are RDF datasets. Element and attribute occurrences are RDF properties. Most elements are subject-predicate-object triples. Some elements are RDF quads. Attribute and element values are RDF literals. (see https://niem.github.io/NIEM-NDR/v5.0/niem-ndr.html#section_5.6.3) - We maintained rules about how XML and XML Schema were used that let us maintain the relationship between XML and RDF.
Webb Roberts
On 2022-02-13, at 22:48, Dan Brickley < danbri@d...> wrote:
As has already been pointed out - this might well seem dreamy on XML-DEV but in the RDF world it's pretty much what drew most of us to the technology.
Most RDF toolkits try to make it easy to consolidate information from various sources and formats into its common graph model. They will usually do some subset of the more explicitly RDF-flavoured formats, e.g. RDF/XML, Turtle, RDFa, JSON-LD, N-Triples, Trig, ... etc. But there will also be an API that can be called programmatically, to create triples from anything you have programmatic access to. You'll find XML adaptors of various kinds (XSLT being the most obvious). Back in the day there were angsty debates about schema annotation for mapping to triples. For example see https://www.w3.org/2003/02/schema-annotation
... although those things never turned out to be as important and central as folks thought.
RDF folk spend much of their time moving all kinds of data into RDF graphs/triples. But so much of this grungy data cleaning work is necessarily custom, per-dataset, per-application, ... limiting the value of generic conversion tools.
If you look around at Wikidata you'll see that some of these factual claims are sourced.
So we can look into 15.999 being different to the 16 value in Hans-Jürgen's sketch. The sourcing given is:
Atomic weights of the elements 2013 (IUPAC Technical Report) (English)
Here is an example query that uses Wikidata SPARQL query service to pull out answers - i.e. Oxygen - with a chemical of 8.
SELECT ?chem ?chemLabel ?atomicNumber ?mass ?electronegativity ?anyprop ?anyval WHERE { ?chem wdt:P246 ?chemSymbol; wdt:P1086 ?atomicNumber; wdt:P2067 ?mass; wdt:P1108 ?electronegativity; ?anyprop ?anyval . FILTER(?atomicNumber=8) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } # Helps get the label in your language, if not, then en language }
And back on the original theme about mapping, it is also worth knowing about the CONSTRUCT mechanism in SPARQL.
We can take the above query and write CONSTRUCT queries that emit triples in a different shape or vocabulary.
This is a SPARQL query that takes what's in Wikidata for all Chemicals and emits triples along the lines sketched initially:
PREFIX foo: < https://foo.example.org/> CONSTRUCT { ?chem foo:symbol ?chemLabel . ?chem foo:numberOfElectrons ?atomicNumber . ?chem foo:atomicMass ?mass . ?chem foo:electronegativity ?electronegativity . ?chem foo:discoTime ?discoTime . # other properties from https://www.wikidata.org/wiki/Q629 here } WHERE { ?chem wdt:P246 ?chemSymbol; wdt:P1086 ?atomicNumber; wdt:P2067 ?mass; wdt:P1108 ?electronegativity . OPTIONAL { ?chem wdt:P575 ?discoTime . } . # commented out to get everything FILTER(?atomicNumber=8) SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } } ORDER BY ?discoTime # this might be pointless for CONSTRUCT queries
Hope this helps...
Dan
I am posting this on behalf of Mr. Hans-Jürgen Rennau while we debug a problem with his emails being posted to the list.
---
Assume
four datasets: an XML document, a JSON document, a CSV file and an HTML
document (authored near the north pole, in the rain forest, in Athens
and in the Antarctic, respectively).
Imagine a standard which enables you to define the mapping of a document node to a set of RDF triples.
Remember that all documents (XML, JSON, CSV, HTML) can be parsed into document nodes (for example see [1]).
Assume that the RDF graphs obtained from our documents contain the following triples: foo:oxygen foo:symbol "O"
foo:oxygen foo:numberOfElectrons "8"
foo:oxygen foo:atomic mass "16"
foo:oxygen foo:electronegativity ."3.5"
each one found in a different one of the four RDF graphs.
Then
we have integrated information, as we now know four things about
oxygen, contributed by different data sources using a different data
format. Of course it would be easy to serialize the integrated
information into XML, or JSON, or CSV, or HTML or any other format
(employing Inuit or any other natural language).
+ + + - - -
But
I suppose you think this is an idle dream. Perhaps you think that the
imagined standard would not be feasible to create or to use, or you
question the practicality to leverage RDF IRIs for identifying resources
and properties in more than a few specific cases.
Unfortunately
I agree that it is an idle dream. Only the reason I see is a different
one, as I am convinced that the imagined standard is not too difficult
to create and to use and I do not question the practicality of using RDF
IRIs in many fields, including natural science, pharmacology, health
care, finance, many verticals and economical interaction. The reason I
see is that it seems impossible to find minds with a deep interest in
both, XML technology and semantic technology. if - then - but.
With kind regards, Hans-Jürgen Rennau
[1]
-- ![]() | Chet EnsignChief Technical Community Steward OASIS Open | | | | |
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
|