Skip to contents

You get reading API access to Wikidata (GET) without a password. But if you want API access to a privately managed Wikibase instance, you need to get a “bot” username and password and authenticate your scripted session.

Security first

CSRF is a way for a malicious website to exploit your logged-in session on another website to perform actions as you without your consent. The MediaWiki API (the API of Wikibase instances) employs CSRF tokens (also often called “edit tokens” in the MediaWiki context) as a crucial defense mechanism against Cross-Site Request Forgery attacks.

To receive such a CSRF security token (connected to your current editing session on a Wikibase instance) you have to run first get_csrf() to establish your session and your credentials. As a result, you will receive a csrf object which contains among other data, your CSRF token. Then, now running the get_csrf_token() function will unwrap from returned CSRF object the token itself, which is often, but not necessarily, a character string of length 42.

The MediaWiki API’s CSRF tokens are not designed to be long-lived. Their lifespan is intentionally kept short, typically tied to the user’s current editing session or a reasonable timeframe for a single action. This is a crucial security measure, and therefore you may need to log in several times while you are working with wbdataset.

A convenient and elegant solution to keep your login credentials R is the use of the keyring package.

require(keyring)
#> Loading required package: keyring
#> Warning: package 'keyring' was built under R version 4.4.2

If keyring is not installed in your computer, you can do it like this:

install.packages("keyring")

To get started, read the keyring package website. At first time use, you set up your credentials with key_set_with_value.

Use wbdataset on a Wikibase instance

You can of course use different secure ways to authenticate yourself, what is important that you will have to regularly give your username and password to get_csrf(). You should this in a way that you absolutely not at risk of revealing your bot password on the internet. With such a password a malicious actor can destroy an entire open knowledge graph.

library(keyring)
# https://reprexbase.eu/example/api.php does not exist in reality

key_set_with_value(
  service = "https://reprexbase.eu/example/api.php",
  username = "Demo@adminbot",
  password = "8******************************b"
)

You must never store those username-password pairs in your code to prevent it from accidentally getting to GitHub and it will be permanently revealed to everybody. (There is no delete button on GitHub!)

key_get(
  service = "https://reprexbase.eu/example/api.php",
  username = "Demo2@adminbot"
)

To gain access to a MediaWiki API, you need to get a CSRF token, which is a secure random token (e.g., synchronizer token or challenge token) that is used to prevent CSRF attacks.

my_csrf <- get_csrf(
  username = "Demo@adminbot",
  password = key_get(
    service = "https://reprexbase.eu/example/api.php",
    username = "Demo@adminbot"
  ),
  wikibase_api_url = "https://reprexbase.eu/example/api.php"
)

If all goes well, you receive similar messages on your terminal (the API instance as well as the login token are fictional in this example):

Received a `handle`: https://reprexbase.eu/example/api.php
response: Establish session with https://reprexbase.eu/example/api.php: 200
Session: OK(200)
response: login to https://reprexbase.eu/example/api.php: 200
Login: OK(200)
Login token: 7c7af4d52x***********
Post login data to https://reprexbase.eu/example/api.php

To interact with the Wikibase Instance, you will have to provide the API address of the instance and this CSRF object. In fact, under the hood, from the CSRF object get_csrf_token(csrf) function will unwrap your token itself from the CSRF security object holds the data of your session (not only the token itself.)

Now you are ready to send API requests to the MediaWiki API.

get_claims(
  qid = "Q528626",
  property = "P625",
  wikibase_api_url = "wikibase_api_url",
  csrf = my_csrf
)