Purview supports a glossary of business definitions that allow you to have connections between terms, term hierarchies, and custom attributes (called term templates) stored on the term. PyApacheAtlas supports these features!
Assuming you have installed and authenticated PyApacheAtlas you can use its PurviewClient
and PurviewGlossaryTerm
to upload and create glossary terms.
Starting with a simple term, the example below demonstrates the process for defining a term.
from pyapacheatlas.core.glossary import PurviewGlossaryTerm
default_glossary = client.glossary.get_glossary()
term = PurviewGlossaryTerm(
name="This is my term",
qualifiedName = "This is my term@Glossary",
glossaryGuid = default_glossary["guid"],
longDescription = "This is a long description",
status = "Draft" # Should be Draft, Approved, Alert, or Expired
)
client.glossary.upload_term(term)
The sample defines a Purview Glossary Term with:
client.glossary.get_glossary()
)We then upload the single term by calling the client glossary upload_term method.
A given term can take in an expert or steward but you must pass in the AAD object id as Purview only support object ids. PyApacheAtlas provides a utility function to look up AAD object ids based on the user principal name or email address.
expert_id = client.msgraph.email_to_id("will@example.com")
steward_id = client.msgraph.email_to_id("bill@example.com")
term.add_expert(expert_id, "Additional info about the expert")
term.add_steward(steward_id, "Additional info about the steward")
client.glossary.upload_term(term)
You may also want to add connections to other existing terms. The following parameters can be set as arrays of related term dictionaries which are simply a dictionary containing a key of termGuid
and a valid glossary term guid as the value.
term.seeAlso = [{"termGuid":"abc-123-456"}]
term.synonyms = [{"termGuid":"xyz-789-1011"}]
Azure Purview supports a term hierarchy which allows you to connect parent and child terms. To create this programmatically, you should use the PurviewTerm.add_hierarchy
method. However, you can't just provide the parent term alone, it requires the parent term guid too!
As a result, you need to look up the parent term and extract the parent's name and guid.
from pyapacheatlas.core.glossary import PurviewGlossaryTerm
default_glossary = client.glossary.get_glossary()
parent_term = client.glossary.get_term(
name="parent term", glossary_guid=default_glossary["guid"]
)
term = PurviewGlossaryTerm(
name="This is my child term",
qualifiedName = "This is my child term@Glossary",
glossaryGuid = default_glossary["guid"],
longDescription = "This is a long description",
status = "Draft" # Should be Draft, Approved, Alert, or Expired
)
term.add_hierarchy(
parentFormalName = parent_term["name"],
parentGuid = parent_term["guid"]
)
client.glossary.upload_term(term)
Azure Purview also support the concept of term templates which let you store additional attributes on the term. PyApacheAtlas exposes this with the attributes
property.
Assuming you have a term template named "myTermTemplate" that supports "attr1" and "attr2", your term definition might look like:
from pyapacheatlas.core.glossary import PurviewGlossaryTerm
default_glossary = client.glossary.get_glossary()
term = PurviewGlossaryTerm(
name="This is my term in a template",
qualifiedName = "This is my term in a template@Glossary",
glossaryGuid = default_glossary["guid"],
longDescription = "This is a long description",
status = "Draft", # Should be Draft, Approved, Alert, or Expired
attributes = {
"myTermTemplate":{
"attr1":"abc",
"attr2":123
}
}
)
client.glossary.upload_term(term)
In Azure Purview, you can also bulk import terms from a csv file. This operation takes a standard template (accessible from the Azure Purview UI) and then performs an asynchronous import. When you execute the import method, you get back an operation id that you can periodically poll to check its status.
import time
results = client.glossary.import_terms('./path/to/some.csv')
operation_id = results["id"]
operation_status = results["status"]
while operation_status in ["RUNNING", "PENDING"]:
time.sleep(5)
operation_results = client.glossary.import_terms_status(operation_id)
operation_status = operation_results["status"]
print(operation_results)
You can also export glossary terms based on the guid. When doing an export, it only support exporting terms in the same term template. The PurviewClient.glossary.export_terms
method takes in a list of guids and a file path to store the exported terms as a csv.
guids = ["abc-123", "def-456"]
client.glossary.export_terms(
guids,
'./path/to/some/destination.csv'
)