-
Notifications
You must be signed in to change notification settings - Fork 56
Ingesting data from a authenticated REST API using Job Secrets
In this tutorial you are going to learn how to use Secrets in a data job.
You like to read news daily and are a huge Taylor Swift fan. Let's combine these passions into a single data job, which searches for Taylor Swift news and stores them in a database.
For source of our data, we are going to use the free, key protected API of newsapi.org.
Users who wants to learn how to use Secrets in a data job. Before starting with this tutorial you should be familiar with basic concepts, explained in Hello World Data Job and Ingesting data from REST API into Database.
If you have all the prerequisites in place, the completion of this tutorial should take 10 to 15 minutes.
Since Job Secrets are stored securely, you'll need a pre-configured installation of the VDK Control Service and Hashicorp Vault:
- A VDK Control Service installation Install VDK Control Service with custom SDK and a local VDK SDK installation configured to use it
- A Configured VDK Control Service/Hashicorp vault integration Configuring Hashicorp Vault Instance for storing Secrets
In the first part of tutorial, you are going to create a data job and store a Secret for it.
Create a data job, by executing the following command:
vdk create -n taylor-swift-news -t my-team
This will create a taylor-swift-news directory with some sample data jobs file inside. Delete the files so that only the empty directory remains.
Go to newsapi.org and click the "Get API Key" button. Fill in the form and copy the API Key.
You can use the "vdk secrets" command to store and retrieve secrets via the command line. If you are using the vdk cli on a private/secure console, you can directly set a secret via the following command
vdk create -n taylor-swift-news -t my-team --set "apiKey" "<your API Key goes here>"
Alternatively you can pass just the key for you secret to the command and then you'll get prompted to enter it and it won't be kept in your console's history.
vdk create -n taylor-swift-news -t my-team --set "apiKey"
Now, let's create a data job step which uses the API key to retrieve the news we are interested in.
Create a new python file, named 10_get_data.py in the data job directory. You should have the following file structure.
taylor-swift-news/
├── 10_get_data.py
Now that you've created the python file you need, let's fill in the code. This python data job does the following:
- Get the API key from the job secrets
- Prepare and execute the request for the newsapi.com
- Send the received data to the data base
10_get_data.py
import logging import requests from datetime import date, timedelta from vdk.api.job_input import IJobInputdef run(job_input: IJobInput): # Get the API Key from the Job Secrets api_key = job_input.get_secret('api_key') # Get yesterday's date yesterday_date = date.today() - timedelta(days=1)
# Get the data url = "https://newsapi.org/v2/everything" params = { "q": "Taylor Swift", "from": yesterday_date.strftime("%Y-%m-%d"), "sortBy": "popularity", "language": "en", "apiKey": api_key, } response = requests.get(url, params=params) response.raise_for_status() data = response.json() # Send the data to the DB payload = {'articles': data['articles']} job_input.send_object_for_ingestion( payload=payload, destination_table="taylor_swift_news" )
[ Edit: Congratulations on completing this tutorial! (Edit this part as you feel, with your free-form congrats sentence) ]
[ Edit: Summarize important lessons you'd like users to take away from the tutorial. ]
[ Edit: Help users along their journey by providing the next steps - these might be recommended reading, supplementary resources, or even another tutorial. ]
- I have read the Versatile Data Kit Tutorial guidelines.
- Is the goal clear and can be found within the first 30 seconds of opening your tutorial?
- Are the steps clear and actionable for accomplishing the task your reader needed help with?
- Is it possible to complete your tutorial within 15-30 minutes?
- Do you have at least 1-2 images/visuals/media in your tutorial?
- Did you spell-check and proofread? Grammarly
- Is it easy to read? Hemingway. Quillbot is good for fluency and paraphrasing.
- Did you check and fix the accessibility issues? WAVE
- Is there a Call to Action / Next Steps?
- Test with a user if the tutorial is understandable - if applicable and if possible
SDK - Develop Data Jobs
SDK Key Concepts
Control Service - Deploy Data Jobs
Control Service Key Concepts
- Scheduling a Data Job for automatic execution
- Deployment
- Execution
- Production
- Properties and Secrets
Operations UI
Community
Contacts