Cloudera Octopai Extraction APIs

Learn about the API endpoints that are designed to facilitate the extraction of lineage data from the Cloudera Octopai system.

For optimal results, start with a call to /assets/query to obtain a comprehensive list of assets. Then call /lineage to retrieve detailed lineage information for each asset identified in the initial query. This two-step approach delivers a structured and efficient data retrieval flow and provides clear insight into asset lineage within the Cloudera Octopai ecosystem.

Assets API

This API retrieves data about assets such as columns and tables so you can review their properties.

Use this API to locate the asset _key value that is required when you call /lineage.

Key function
Search assets.
Endpoint
GET /api/v2.0/assets/query
Request body
{
  "ConnectionIds": [],
  "AssetNames": [],
  "ToolNames": [],
  "ToolTypes": [],
  "DatabaseName": "",
  "SchemaName": "",
  "LayerName": "",
  "TableName": "",
  "assetType": 0,
  "batchSize": 0,
  "nextID": ""
}
Request body parameters
Parameter Required Type Description
ConnectionIds No list

Search by connection identifiers.

Example: ["101","102","103"]

AssetNames No list Match assets that contain one of the supplied names.
ToolNames No list Filter by tool names.
ToolTypes No list

Filter by tool type.

Set assetType to 1 for cross-system lineage (DATABASE / ETL / REPORT).

Set assetType to 2 for column-level lineage (DB / ETL / REPORT).

DatabaseName No string Filter by database name.
SchemaName No string Filter by schema name.
LayerName No string Filter by layer name.
TableName No string Filter by table name.
assetType Yes number

1 – Cross-system lineage.

2 – Column-level lineage.

batchSize Yes number Maximum results returned per page up to 10,000.
nextID Yes (from the second call) string Use the supplied value to retrieve the next result page.
Request example
{
  "assetNames": ["storeid"],
  "schemaName": "SALES",
  "tableName": "Customer",
  "databaseName": "AdventureWorks2014",
  "toolTypes": ["DB"],
  "assetType": 2,
  "limit": 1000
}

Response structure

The response body contains the main columns array along with the following fields to control pagination:
  • hasMore (boolean) – Indicates whether more result pages are available.

  • id (string) – Cursor identifier.

  • columns (array) – List of asset objects with metadata such as _key, connection details, object type, and timestamps.

Pagination

To retrieve subsequent batches of results, call GET /api/v2.0/assets/query/scroll/{nextID} with the nextID value from the previous response. Continue this process until the hasMore field in a response is false.

Response example
{
  "items": [
    {
      "_key": "8D1A993B6739DAB65392B54D45F7052E",
      "connectionId": "251",
      "connLogicName": "sqlserverdbwa03testETLTRUE",
      "toolName": "SQLS",
      "toolType": "DB",
      "displayConnectionId": "251",
      "containerObjectName": "sp_built_MrrPerson_MrrCustomer",
      "containerObjectPath": "AdventureWorks2014.dbo",
      "containerObjectType": "SQL_STORED_PROCEDURE",
      "controlFlowPath": "sp_built_MrrPerson_MrrCustomer",
      "controlFlowName": "sp_built_MrrPerson_MrrCustomer",
      "objectID": "ADVENTUREWORKS2014SALESCUSTOMERTABLESTOREID",
      "objectGUID": "8D1A993B6739DAB65392B54D45F7052E",
      "objectType": "TABLE",
      "assetName": "StoreID",
      "dataType": "",
      "precision": "",
      "scale": "",
      "layerName": "Customer",
      "schemaName": "SALES",
      "databaseName": "AdventureWorks2014",
      "tableName": "Customer",
      "isObjectData": true,
      "isMap": false,
      "isVisible": true,
      "updatedDate": "2024-05-26T10:21:51.224Z",
      "createDate": "0001-01-01T00:00:00Z",
      "serverName": "",
      "isSrcColumnOrphan": null
    }
  ],
  "hasMore": false,
  "cursorId": null
}

Lineage API

Use the Lineage API to retrieve lineage data that describes the relationships and dependencies between assets.

Key function
Return lineage details for a specific asset.
Endpoint
GET /api/v2.0/lineage
Request body
{
  "assetKey": "",
  "depth": 0,
  "direction": 3,
  "limit": 0,
  "assetType": 0
}
Request body parameters
Parameter Required Type Description
assetKey Yes string Asset key that defines the lineage starting point
depth Yes number Total number of hops to traverse from the start asset
direction No number The default value is 3.

1 – Input relations only

2 – Output relations only

3 – Both directions

limit Yes number Total number of assets returned in the lineage graph
assetType Yes number 1 – Cross-system lineage

2 – Column-level lineage

Request example
{
  "assetKey": "8D1A993B6739DAB65392B54D45F7052E",
  "depth": 6,
  "limit": 1000,
  "assetType": 2,
  "direction": null
}
Response structure
  • nodes (array): Asset objects that participate in the lineage graph.

  • edges (array): Links that connect the nodes.

  • startNode (object): The originating asset for the requested lineage.

  • depth and direction: Metadata describing the traversal depth and direction.

Response example
{
  "nodes": [
    {
      "_key": "8D1A993B6739DAB65392B54D45F7052E",
      "toolName": "SQLS",
      "toolType": "DB",
      "objectType": "TABLE",
      "assetName": "StoreID",
      "databaseName": "AdventureWorks2014",
      "tableName": "Customer"
    }
  ],
  "edges": [
    {
      "_from": "05E69AF36C41C5FA59208CD75C937AA7",
      "_to": "F6DC4A4D0BA1C88011F843A31BB6BFD4",
      "isCompressed": true
    }
  ],
  "startNode": {
    "_key": "8D1A993B6739DAB65392B54D45F7052E",
    "toolName": "SQLS",
    "assetName": "StoreID",
    "databaseName": "AdventureWorks2014",
    "tableName": "Customer"
  },
  "depth": 12,
  "direction": "Both"
}