It is a good idea to start with a fresh R session and empty environment in the memory. You will work with a session in R and an authenticated session on a Wikibase server. It may cause confusion if you have some variables, like passwords, tokens, from a previous session.
library(wbdataset)
# We asume that you store your bot password to a Wikibase instance
# on keyring.
library(keyring)
#> Warning: package 'keyring' was built under R version 4.4.2
my_csrf <- get_csrf(username="Reprexba jekyll@AntaldanielBot",
password = key_get(service = "https://reprexbase.eu/jekyll/api.php",
username = "Reprexba jekyll@AntaldanielBot"),
wikibase_api_url = "https://reprexbase.eu/jekyll/api.php")
#> Received a `handle`: https://reprexbase.eu/jekyll/api.php
#> response: Establish session with https://reprexbase.eu/jekyll/api.php: 200
#> Session: OK(200)
#> response: login to https://reprexbase.eu/jekyll/api.php: 200
#> Login: OK(200)
#> Login token: 26600cc3b9***********
#> Post login data to https://reprexbase.eu/jekyll/api.php
Define a simple graph
A knowledge graph is a network of knowledge statements, where nodes represent things (like people, places, or objects) and edges (the connections between nodes) represent the relationships between those things. In this picture, properties are like the labels on the roads (edges) that explain how the nodes are connected.
In Wikibase, we work with 3 knowledge graph concepts:
In the extended model, we can add further concepts:
The wbdataset package currently works with two functions that add nodes and edges to the graph.
-
create_item()
andcreate_property()
create them from the scratch; -
copy_wikidata_item()
andcopy_wikidata_property()
create them by equivalence from Wikidata. - The planned
copy_wikibase_item()
andcopy_wikibase_property()
will create by equivalence between two password protected Wikibase instances.
In the case of frequently used classes and with all properties,
creation by equivalence is recommended, particularly if you are not
experienced with graph data modelling. Both create_item()
and create_property()
offer the optional choice to create
an item (node) or a property (edge) by equivalence with a well-defined
vocabulary, like CIDOC-CRM, or FOAF.
The statements or claims are created with the
add_statement()
function family.
Create items
There are two ways to create items:
# This code will not run as expected if the instance at
# https://reprexbase.eu/jekyll/api.php already has an item labelled "Paris".
# copy_wikidata_item() does not overwrite existing items, see explanations below
# the code snippet.
paris_item <- copy_wikidata_item(
qid_on_wikidata = "Q90", # check https://www.wikidata.org/wiki/Q90
qid_equivalence_property = NULL, # we will get back to this later
language = c("en", "fr", "nl"), # select language with 2-letter codes
wikibase_api_url = "https://reprexbase.eu/jekyll/api.php", # Wikibase address
data_curator = person("Jane", "Doe"), # Your name should go in here !
log_path = tempdir(), # You can add a location where you save log file
csrf = my_csrf # your session credentials on the Wikibase instance
)
You get a message with the default label (currently set for English):
#> Default abel for Q90: Paris
And if your credentials were validated on the Wikibase instance, and it does not have already an item with the label Paris, you get this message:
#> Successfully created item Q9 (Paris)
You can print out your results:
print(paris_item)
If, however, there is already such a node, then you will get an error message similar to the one below; in such a labelling conflict, the original item is not overwritten:
wikibase-validator-label-with-description-conflictParisen[[Item:Q9|Q9]]Item <a href="/jekyll/index.php?title=Item:Q9" title="Item:Q9">Q9</a> already has label "Paris" associated with language code en, using the same description text.
The equivalence_property
, as you will see later, can
help you connect your new node to a node on Wikidata, which later allows
federation, and of course, data validation, or keeping your data
refreshable.
If your Wikibase instance has defined a property where you can store
this equivalence, say, P35
=“Equivalent item on Wikidata”
then we can add to your newly created item this crucial information:
-
yourwikibase:Q9
[the new item for Paris]yourwikibase:P35
(equals)wikidata:Q90
Next, we show you how to create an item (a node in your graph), if you do not find, or do not want to use an item from Wikidata.
# This code will not run as expected if the instance at
# https://reprexbase.eu/jekyll/api.php already has an item labelled
# "biological object", because create_item() does not overwrite existing items.
e20 <- create_item(
label = "biological object",
description = "This class comprises individual items of a material nature, which live, have lived or are natural products of or from living organisms.",
language = "en",
equivalence_property = "P37", # The creation of this is in the next section
equivalence_id = "E20",
wikibase_api_url = "https://reprexbase.eu/jekyll/api.php",
data_curator = person("Joe", "Doe"), # replace with your name
log_path = tempdir(),
csrf = my_csrf
)
# You can print this out with:
print(e_20)
You will get a message on the screen, and a logfile that records the
newly created item. The contents of the logfile are also returned, so
you can save them. In this case, it is saved in e20
. If the
action was successul, which you can check with e20$success
,
e20$id_on_target
contains the new QID on the target
Wikibase for this node. Of course, you can give a more meaningful name
to the e20
object depending on what item (node) you
create.
capital_cities <- copy_wikidata_property(
qid_on_wikidata = c("Q2807", "Q3114", "Q1490"),
qid_equivalence_property = "P35", # use your Wikibase instance's number
language = c("en", "nl", "hu", "es" ),
wikibase_api_url = "https://reprexbase.eu/jekyll/api.php",
data_curator = person("Joe", "Doe"),
log_path = tempdir(),
csrf =m y_csrf
)
capital_cities
Create properties
Properties connect new information to your items (nodes.) Such properties need to be copied or defined:
# This code will not run as expected if the instance at
# https://reprexbase.eu/jekyll/api.php already has an property that is
# equivalent to https://www.wikidata.org/wiki/Property:P21, because
# copy_wikidata_property() does not overwrite existing properties.
p21 <- copy_wikidata_property(
pid_on_wikidata = "P21",
pid_equivalence_property = "P2", # You need to change this to your numbering
language = c("en", "hu"),
wikibase_api_url = "https://reprexbase.eu/jekyll/api.php",
data_curator = person("Jane", "Doe"), # Replace it with your name
csrf = my_csrf
)
You have copied the property, identified by PID=21 on Wikidata. In
your system, it received the value stored in
p21$id_on_target
. This information is messaged on the
screen, returned to your environment (you saved it in this case to the
data.frame p21
), and it can be found in your log_file
directory, too, as a .csv file.
Alternatively, just like with items, you can create a property
manually. Still remember the equivalence_property
that
keeps your items aligned with Wikidata? Such equivalence can be defined
with Wikidata, but also with any well defined conceptual model. This is
how you define such an equivalence property with the Records in
Context conceptual model and ontology:
# This code will not run as expected if the instance at
# https://reprexbase.eu/jekyll/api.php already has an property labelled
# "RiC equivalent property", because create_property() does not overwrite
# existing properties.
ric_equivalence_pid <- create_property(
label = "RiC equivalent property",
description = "The equivalent property in Records in Context, an ontology used in archives.",
language = "en",
datatype = "external-id",
wikibase_api_url = "https://reprexbase.eu/jekyll/api.php",
log_path = tempdir(),
data_curator = person(
given="Joe", family="Doe", middle = "M",
email = "joe@example.com", role = "ctb"), # add your data here with person()
csrf = my_csrf
)
The ric_equivalence_pid$id_on_target
saves the new
property identifier in your system. The complete log can be found in
your log_file directory, too, as a .csv file.
#> Successfully created item P44 (RiC equivalent property)
You will get a message on the screen, and a logfile that records the
newly created item. The contents of the logfile are also returned, so
you can save them. In this case, it is saved in
ric_equivalence_pid
.
# This code will not run as expected if the instance at
# https://reprexbase.eu/jekyll/api.php already has a property labelled
# "has current or former member (is current or former member of)".
p107 <- create_property(
label = "has current or former member (is current or former member of)",
description = "This property relates an E39 Actor to the E74 Group of which that E39 Actor is a member.",
language = "en",
equivalence_property = "P37", # a property connecting your PIDs to CIDOC
equivalence_id = "P107", # the number of the property definition in CIDOC
datatype = "external-id",
wikibase_api_url = "https://reprexbase.eu/jekyll/api.php",
log_path = tempdir(),
csrf = my_csrf
)
#> Successfully created item P47 (has current or former member (is current or former member of))
For repeated queries, you can place the repeating variables into a
list, and provide this parameters as a list with
wikibase_session
=my_wikibase_session
.
my_wikibase_session <- list(
language = c("en", "nl"),
wikibase_api_url = "https://reprexbase.eu/jekyll/api.php",
log_path = tempdir(),
csrf = my_csrf
)
p107 <- create_property(
label = "has current or former member (is current or former member of)",
description = "This property relates an E39 Actor to the E74 Group of which that E39 Actor is a member.",
equivalence_property = "P37", # a property connecting your PIDs to CIDOC
equivalence_id = "P107", # the number of the property definition in CIDOC
datatype = "external-id",
wikibase_session = my_wikibase_session
)
You can copy many properties at once, too, but only with the same language, curator, wikibase_api_url parameters:
spotify_id_properties <- copy_wikidata_property(
pid_on_wikidata = c("P2207", "P2205", "P8704"),
pid_equivalence_property = "P2", # use the P number on your Wikibase instance
language = c("en", "nl", "hu" ),
wikibase_api_url = "https://reprexbase.eu/jekyll/api.php",
data_curator = person("Jane", "Doe"),
log_path = tempdir(),
csrf=my_csrf
)
spotify_id_properties