Home OSS About Privacy

Create Custom Types with Excel for Purview and Apache Atlas

As you begin your Atlas or Purview journey, you'll often want to create your own custom types. We can get this done via the API, the PyApacheAtlas SDK, or through the Excel functionality built into PyApacheAtlas.

Starting with a simple spreadsheet

Entity TypeName
MyCustomType

Save the above content to an excel spreadsheet on a tab called EntityDefs. This is the absolute minimum necessary to do an upload if the type of entity type has no additional required attributes and should be considered a DataSet type. The only attributes you'll be able to record are the "name", "qualifiedName", "description", and the relevant contacts / owner / experts.

Next, we can upload the contents using this snippet. Be sure to have installed and configured PyApacheAtlas and include the relevant client (PurviewClient for Purview and AtlasClient for Apache Atlas).

Here's an example using Azure Purview and a Service Principal.

import json
from pyapacheatlas.auth import ServicePrincipalAuthentication
from pyapacheatlas.core.client import PurviewClient
from pyapacheatlas.readers import ExcelConfiguration, ExcelReader

auth = ServicePrincipalAuthentication(
    tenant_id = "replace_with_tenant_id",
    client_id = "replace_with_client_id",
    client_secret = "replace_with_client_secret"
)
client = PurviewClient(
    account_name= "PurviewAccountName",
    authentication = auth
)

ec = ExcelConfiguration()
reader = ExcelReader(ec)

entityTypeDefs = reader.parse_entity_defs('path/to/spreadsheet.xlsx")

results = client.upload_typedefs(entityTypeDefs)

print(json.dumps(results, indent=2))

Assuming you've authenticated properly, you will have uploaded one entity type definition to Purview!

What happened in this script?

Now that you've done the upload, you could create an instance of this custom type through the REST API, with PyApacheAtlas classes, or with the Atlas / Purview Bulk Upload in Excel feature of PyApacheAtlas.

Create a DataSet with Custom Attributes

Entity TypeName name typeName description isOptional
My2ndCustomType attrib1 This is a string attribute by default
My2ndCustomType attrib2 string This is a string attribute
My2ndCustomType reqAttrib int This is an int attribute FALSE

Take the content of this table and update your spreadsheet and re-run the above steps to create a second custom type. Notice how each row contains "My2ndCustomType" in the Entity TypeName column. The value must be repeated so the package knows when you start referring to a new entity type.

In this table, we've defined a custom type called My2ndCustomType and it has three attributes. * attrib1: Takes advantage of the default values for typeName (string) and isOptional (TRUE). * attrib2: Explicitly states the type for this attribute but leaves the isOptional to the default (TRUE) * reqAttrib: Is an example of a required attribute with isOptional set to FALSE and the typeName is set to int.

To learn more about the possible types an attribute can have, see the Atlas / Purview custom types overview.

Creating Custom Processes for Purview's Column Mapping

Entity TypeName Entity superTypes name typeName description
CustomProcessWithColMap Process columnMapping string Support Purview column mapping in lineage UI

This table demonstrates how you would create a process entity with the column mapping feature enabled. In Azure Purview, this enables you to display the column mapping user experience in the Lineage tab.

To use column mapping in Purview, you need to have a Process entity type with a string attribute called columnMapping. The columnMapping attribute contains a stringified json array that stores json objects that describe which data sets map to each other and which columns within those datasets map to the other side's columns.

If you don't want to mess with JSON, consider using the Purview ColumnMapping in Excel feature available in PyApacheAtlas.

Recap

If you've made it this far, you've uploaded three types and experimented with the optional values of the template. If you generate the excel template from PyApacheAtlas you'll notice there are quite a few more columns you can work with. However, leaving them blank gives the default values which are suited for recording a single value in the attribute.

If you're looking for more complex attribute definitions, it can still be done in Excel but you'll need to understand Atlas / Purvivew Custom Types concepts much better.