Skip to content

Ingesting data from a authenticated REST API using Job Secrets

dakodakov edited this page Aug 2, 2023 · 12 revisions

Overview

In this tutorial you are going to learn how to use Secrets in a data job.

Scenario

You like to read news daily and are a huge Taylor Swift fan. Let's combine these passions into a single data job, which searches for Taylor Swift news and stores them in a database.

For source of our data, we are going to use the free, key protected API of newsapi.org.

Who is this article for?

Users who wants to learn how to use Secrets in a data job. Before starting with this tutorial you should be familiar with basic concepts, explained in Hello World Data Job and Ingesting data from REST API into Database.

Estimated Time Commitment

If you have all the prerequisites in place, the completion of this tutorial should take 10 to 15 minutes.

Prerequisites

Since Job Secrets are stored securely, you'll need a pre-configured installation of the VDK Control Service and Hashicorp Vault:

  1. A VDK Control Service installation Install VDK Control Service with custom SDK and a local VDK SDK installation configured to use it
  2. A Configured VDK Control Service/Hashicorp vault integration Configuring Hashicorp Vault Instance for storing Secrets

Storing Secrets

In the first part of tutorial, you are going to create a data job and store a Secret for it.

Create Data Job

Create a data job, by executing the following command:

vdk create -n taylor-swift-news -t my-team

This will create a taylor-swift-news directory with some sample data jobs file inside. Delete the files so that only the empty directory remains.

Obtain an API Key

Go to newsapi.org and click the "Get API Key" button. Fill in the form and copy the API Key.

Store the API key in a Job Secret

You can use the "vdk secrets" command to store and retrieve secrets via the command line. If you are using the vdk cli on a private/secure console, you can directly set a secret via the following command

vdk create -n taylor-swift-news -t my-team --set "apiKey" "<your API Key goes here>"

Alternatively you can pass just the key for you secret to the command and then you'll get prompted to enter it and it won't be kept in your console's history.

vdk create -n taylor-swift-news -t my-team --set "apiKey"

Using secrets in a data job

Now, let's create a data job step which uses the API key to retrieve the news we are interested in.

Edit The Data Job

Create a new python file, named 10_get_data.py in the data job directory. You should have the following file structure.

taylor-swift-news/
├── 10_get_data.py

Now that you've created the python file you need, let's fill in the code. This python data job does the following:

  1. Get the API key from the job secrets
  2. Prepare and execute the request for the newsapi.com
  3. Send the received data to the data base
10_get_data.py
import logging
import requests
from datetime import date, timedelta
from vdk.api.job_input import IJobInput

def run(job_input: IJobInput): # Get the API Key from the Job Secrets api_key = job_input.get_secret('api_key') # Get yesterday's date yesterday_date = date.today() - timedelta(days=1)

# Get the data
url = "https://newsapi.org/v2/everything"
params = {
    "q": "Taylor Swift",
    "from": yesterday_date.strftime("%Y-%m-%d"),
    "sortBy": "popularity",
    "language": "en",
    "apiKey": api_key,
}
response = requests.get(url, params=params)
response.raise_for_status()
data = response.json()

# Send the data to the DB
payload = {'articles': data['articles']}
job_input.send_object_for_ingestion(
    payload=payload,
    destination_table="taylor_swift_news"
)










Wrap-up

[ Edit: Congratulations on completing this tutorial! (Edit this part as you feel, with your free-form congrats sentence) ]

Conclusion

[ Edit: Summarize important lessons you'd like users to take away from the tutorial. ]

What's Next?

[ Edit: Help users along their journey by providing the next steps - these might be recommended reading, supplementary resources, or even another tutorial. ]

Checklist before publishing

  • I have read the Versatile Data Kit Tutorial guidelines.
  • Is the goal clear and can be found within the first 30 seconds of opening your tutorial?
  • Are the steps clear and actionable for accomplishing the task your reader needed help with?
  • Is it possible to complete your tutorial within 15-30 minutes?
  • Do you have at least 1-2 images/visuals/media in your tutorial?
  • Did you spell-check and proofread? Grammarly
  • Is it easy to read? Hemingway. Quillbot is good for fluency and paraphrasing.
  • Did you check and fix the accessibility issues? WAVE
  • Is there a Call to Action / Next Steps?
  • Test with a user if the tutorial is understandable - if applicable and if possible
Clone this wiki locally