top of page

Unlocking the power and adaptability of Data.world through their API

In partnership with data.world, a leader in the data catalog management market, BRF Consulting wrote this article with the aim of better understanding and disseminating the advantages of the Data.World built in API.


To get started, we will first understand better what a data catalog is. A Data Catalog

serves as a centralized repository, where various types of data can be stored, and, based on

this data, Data.world assists in the processing of this data in order to form information with business value.


The data.world API allows a wide range of operations that can be done via code

using just an authentication token along with the username. For this interaction via the API to be possible, you must have your access token on hand, which, in turn, is available in your account settings, in the service accounts tab, as follows in the attachment below:


Data.World API

After this initial configuration, just follow the step-by-step guide to create a new token, and then we can start working with the API itself.


Creation of dataSets


Like any API, we can easily start flows and control them per line of code, however the

Data.world API differs in its base construction and documentation. We can find ready-made search methods by data type, by ID and by several other parameters, in addition to having a library developed for Python to facilitate connection and use of this API.


Their documentation, in turn, stands out in its vast existing options, which can meet most of the needs of a developer, who hasn't already encountered an API where the documentation was completely confusing and this cost him a lot of time to interpret. In the documentation on the link above: https://developer.data.world/docs/dwapi-spec-stoplight/cd069e3c714ee-quickstart-guide.


We can see a quick-start tutorial, we will demonstrate how powerful this API can be using it correctly. To better understand how we should use it, we can break down this request section below to better understand how the request to data.world works.

curl --request POST \
  --url "https://api.data.world/v0/datasets/${DW_USERNAME}" \
  --header "Authorization: Bearer ${DW_API_TOKEN}" \
  --header "Content-type: application/json" \
  --data '{"title":"API Sandbox","visibility":"PRIVATE"}'

Using CURL, which is a remote connection tool native to Unix systems, we first define the method we want, as well as the URL passing the username of the user account as a parameter. Finally, we authenticate this user with the famous Bearer token, informing the type of header we want and the data.


Note that the data we chose was of the JavaScript Object Notation (JSON) type, therefore, the data we will pass must be of the same data type, if we remember correctly, this is the data structure JSON {“key”: ”value ”}.


This command allows us to create a new dataset in the account authenticated in the request, and from there we can insert, delete, and change files, just using requests from the servers.


Just like every request, this one must also return something, perhaps an error message, a warning message or most likely, a success message as shown in the attachment below, containing what was done and where the created data is located.

{
  "message": "Dataset has been successfully created.",
  "uri": "https://data.world/[USERNAME]/api-sandbox"
}

In addition to the success message, we can see that the request also returns a URI containing the location of what was just created.


File Upload


In addition to creation, deletion and various methods, we can also upload files to our

database created later. Using the token as authentication and the username just to indicate which dataSet you want to insert the files into.

curl \
  --header "Authorization: Bearer ${DW_API_TOKEN}" \
  -F "file=@file1.csv" \
  -F "file=@file2.csv" \
  --url https://api.data.world/v0/uploads/${DW_USERNAME}/api-sandbox/files

It is important to remember that the -F flag only searches for files that are in the same

directory where the system is running, so care must be taken to ensure that the files are available.


This request uploads files to the informed dataset, which in turn can be accessed later to use your data.


SQL queries


Finally, to give an understanding of the scope that the data.world API can achieve, we will give an example of an SQL query in the database created through a request. Yes, we can perform all types of SQL queries in just one request to an external API and with complete security.

curl --request POST \
  --url "https://api.data.world/v0/sql/${DW_USERNAME}/api-sandbox" \
  --header "Authorization: Bearer ${DW_API_TOKEN}" \
  --header "Accept: text/csv" \
  --data-urlencode 'query=SELECT name, line FROM nyc_subways WHERE line LIKE "%7 Express%"'

The command in question is already known, the authentication and header are the same as those used at the beginning of this article. However, it is clear that the file type changes and the date changes as well. Using the –data-uuencode flag, we were able to build a SQL query to our liking so that we could retrieve data of different types.


This adaptability of data.world makes its API multifunctional in several aspects. Developers normally depend on methods that process API data to be able to search for this, being able to perform an external SQL query, the developers themselves can create their own methods according to their needs.


Conclusion


Here we just gave a glimpse of the power of the data.world API, in addition to the ready-made methods, created libraries and robust and simple-to-understand documentation, the API provides agility in your processes as well as security even in SQL queries, thus doing so. Making it one of the best APIs ever used by our developers.


About the Author


Gabriel Matias is a Data Engineer from BRF Consulting team, an oficial partner of Data.World.


9 views0 comments
bottom of page