This is the fifth post in this series, discussing the usage of Power BI to visualize Azure Data Explorer data. You can read the rest of the posts here:
- Part 1: Querying Azure Data Explorer using Power BI
- Part 2: Using dimensions when querying Azure Data Explorer using Power BI
- Part 3: Using Azure Data Explorer with Power BI Dynamic Parameters
- Part 4: Advanced usages of Azure Data Explorer with with Power BI Dynamic Parameters
- Part 5 (this post): Advanced options of the Azure Data Explorer Connector for Power BI
In the previous post, we used the Dynamic M Parameters feature of Power BI, and used it to invoke Azure Data Explorer Functions, and use query operators that exist in ADX but are not available in Power BI, such as Percentiles and Regular Expressions. In this post, I’d like to discuss some of the advanced settings of the ADX Connector.
Azure Data Explorer Connector options
As we saw in the previous posts, the Azure Data Explorer connector exposes the Contents method, which accepts 4 parameters – the cluster URL, the database name, a table name or query, and an M Record objects for options. These options allow us to change the behavior of the connector, and affect the generated queries.
Query Limits – MaxRows, MaxSize and NoTruncate
By default, Azure Data Explorer has some limits on the results of queries. Those limits are in place to prevent you from accidentally pulling a huge amount of data from your cluster, which might impact the cluster’s performance and your costs. Azure Data Explorer’s limits are 500K rows and 64MB of data. Anything more than that, and you’ll get a nice and friendly error message. It also makes common sense – you usually don’t have anything useful to do with 3M rows.
That being said, there are a few scenarios where retrieving more than 500K rows makes sense – one such scenario is when you’re retrieving a large dimension table, as discussed in part 2 of this series.
We can use the MaxRows settings to change the 500K limit, and the MaxSize to change the data size limit. For example
let
Source = AzureDataExplorer.Contents("cluster", "database", "table", [MaxRows=1000000, MaxSize=100*1024*1024])
in
Source
This will generate the following query:
set truncationmaxrecords=1000000;
set truncationmaxsize=104857600;
table
We can also use the NoTruncate option to remove all limits:
let
Source = AzureDataExplorer.Contents("cluster", "database", "table", [NoTruncate=true])
in
Source
Additional Set Statements
The AdditionalSetStatements option allows you to add any set statement you’d like to your query. While you can, in theory, add your set statements as part of the query itself, this results in failures to convert your M queries to KQL. The reason for that, is that whenever your M query ends up performing a join between 2 (or more) data sources (which is common when working with multiple Direct Query sources), it results in an invalid KQL query – your set statements appear in the middle of the query, which is invalid. For this purpose, the AdditionalSetStatements was created:
let
Source = AzureDataExplorer.Contents("cluster", "database", "table", [AdditionalSetStatements="set query_datascope=hotcache"])
in
Source
Strings handling – CaseInsensitive and ForceUseContains
The next 2 options are CaseInsensitive and ForceUseContains. These two options exist to close the gap between Power BI and Azure Data Explorer behaviors. Let’s start with CaseInsensitive.
By default, Power BI is case insensitive when working with filters and slicers. This means that if you have a slicer that displays a list of states, and you’d like to filter them, you can type “ALABAMA” or “alabama”, and both will give you the same result. However, when working with the Azure Data Explorer Connector, the generated queries use the == operator, which is case sensitive. Enabling the CaseInsensitive option will configure the connector to use the case insensitive =~ operator. For example
let
Source = AzureDataExplorer.Contents("cluster", "database", "table"),
#"Filtered Rows" = Table.SelectRows(Source, each [State] == "aLaBama")
in
#"Filtered Rows"
Will result in the following query (which probably won’t return any rows):
table
| where State == 'aLabama'
While the following:
let
Source = AzureDataExplorer.Contents("cluster", "database", "table", [CaseInsensitive=true]),
#"Filtered Rows" = Table.SelectRows(Source, each [State] == "aLaBama")
in
#"Filtered Rows"
Will result in the following query (which is probably what you’d want):
table
| where State =~ 'aLabama'
Moving on to the ForceUseContains option, which serves a similar purpose – by default, when filtering values in Power BI slicers, PBI supports substrings ADX on the other hand, defaults to full tokens using the “has” operator (which is much more efficient, since it can query the ADX indices directly). So if we want to support filtering for states that contains “cali”, we need this option, which configures the connector to use the (less efficient, but better user experience) “contains” operator:
let
Source = AzureDataExplorer.Contents("cluster", "database", "table", [ForceUseContains=true]),
Filtered = Table.SelectRows(Source, (r) => Text.Contains(r[State], "cali"))
in
Filtered
Which gets translated to the following KQL query:
table
| where State contains 'cali'
Timeout
Last but not least, we have the Timeout option. As the name suggests, it allows you to control the query timeout for both the Power BI client-side, and the Azure Data Explorer server side. Usually, the default 4 minutes is more than enough, however, certain cases might warrant more time – loading a large dimension table, or performing a very compute-intensive query.
let
Source = AzureDataExplorer.Contents("cluster", "database", "table", [Timeout=#duration(0,15,0,0)])
in
Source
Conclusion
As you can see, there are multiple options that allow you to control the behavior of the connector, and impact the generated KQL queries to match your exact needs, and the expected user behavior.