As you begin your Atlas or Purview journey, you'll often want to create your own custom types. We can get this done via the API, the PyApacheAtlas SDK, or through the Excel functionality built into PyApacheAtlas.
Entity TypeName |
---|
MyCustomType |
Save the above content to an excel spreadsheet on a tab called EntityDefs
. This is the absolute minimum necessary to do an upload if the type of entity type has no additional required attributes and should be considered a DataSet type. The only attributes you'll be able to record are the "name", "qualifiedName", "description", and the relevant contacts / owner / experts.
Next, we can upload the contents using this snippet. Be sure to have installed and configured PyApacheAtlas and include the relevant client (PurviewClient for Purview and AtlasClient for Apache Atlas).
Here's an example using Azure Purview and a Service Principal.
import json
from pyapacheatlas.auth import ServicePrincipalAuthentication
from pyapacheatlas.core.client import PurviewClient
from pyapacheatlas.readers import ExcelConfiguration, ExcelReader
auth = ServicePrincipalAuthentication(
tenant_id = "replace_with_tenant_id",
client_id = "replace_with_client_id",
client_secret = "replace_with_client_secret"
)
client = PurviewClient(
account_name= "PurviewAccountName",
authentication = auth
)
ec = ExcelConfiguration()
reader = ExcelReader(ec)
entityTypeDefs = reader.parse_entity_defs('path/to/spreadsheet.xlsx")
results = client.upload_typedefs(entityTypeDefs)
print(json.dumps(results, indent=2))
Assuming you've authenticated properly, you will have uploaded one entity type definition to Purview!
What happened in this script?
azure-identity
and the AzureCliCredential
for even less code having to be written.parse_entity_defs
and does the actual upload to your Atlas or Purview service.json.dumps
turns a Python dictionary
and turns it into a string. The indent=2
tells Python to add two spaces for each level in the resulting json.Now that you've done the upload, you could create an instance of this custom type through the REST API, with PyApacheAtlas classes, or with the Atlas / Purview Bulk Upload in Excel feature of PyApacheAtlas.
Entity TypeName | name | typeName | description | isOptional |
---|---|---|---|---|
My2ndCustomType | attrib1 | This is a string attribute by default | ||
My2ndCustomType | attrib2 | string | This is a string attribute | |
My2ndCustomType | reqAttrib | int | This is an int attribute | FALSE |
Take the content of this table and update your spreadsheet and re-run the above steps to create a second custom type. Notice how each row contains "My2ndCustomType" in the Entity TypeName column. The value must be repeated so the package knows when you start referring to a new entity type.
In this table, we've defined a custom type called My2ndCustomType and it has three attributes. * attrib1: Takes advantage of the default values for typeName (string) and isOptional (TRUE). * attrib2: Explicitly states the type for this attribute but leaves the isOptional to the default (TRUE) * reqAttrib: Is an example of a required attribute with isOptional set to FALSE and the typeName is set to int.
To learn more about the possible types an attribute can have, see the Atlas / Purview custom types overview.
Entity TypeName | Entity superTypes | name | typeName | description |
---|---|---|---|---|
CustomProcessWithColMap | Process | columnMapping | string | Support Purview column mapping in lineage UI |
This table demonstrates how you would create a process entity with the column mapping feature enabled. In Azure Purview, this enables you to display the column mapping user experience in the Lineage tab.
To use column mapping in Purview, you need to have a Process entity type with a string attribute called columnMapping
. The columnMapping attribute contains a stringified json array that stores json objects that describe which data sets map to each other and which columns within those datasets map to the other side's columns.
If you don't want to mess with JSON, consider using the Purview ColumnMapping in Excel feature available in PyApacheAtlas.
If you've made it this far, you've uploaded three types and experimented with the optional values of the template. If you generate the excel template from PyApacheAtlas you'll notice there are quite a few more columns you can work with. However, leaving them blank gives the default values which are suited for recording a single value in the attribute.
If you're looking for more complex attribute definitions, it can still be done in Excel but you'll need to understand Atlas / Purvivew Custom Types concepts much better.
ExcelReader.parse_entity_defs
method to read the "EntityDefs" tab and create entity type definitions.AtlasClient/PurviewClient.upload_typedefs
to submit the results from parse_entity_defs
to your Atlas or Purview service.