Monday 8 June 2015

Optimizing Tableau Server Performance

Troubleshooting Tableau Server performance issues always begins with comparing the performance of the same workbooks in Tableau Desktop. A view that loads slowly in Desktop will also load slowly once published to Tableau Server. Most performance issues on Tableau Server can be easily addressed by simply using a few best practices while authoring your workbooks. There is an in-depth training video available in our 'On-Demand Training' section under 'Training & Tutorials':
Once you have established that workbooks perform sufficiently in Tableau Desktop, you can turn your attention to Tableau Server.
Consider these key points for achieving optimal performance of Tableau Server and workbooks published to it:
  • How views are loaded by Tableau Server
  • How caching works on Tableau Server
  • Numbers of VizQL and Application Server processes
  • Number of workers in a Tableau Server distributed environment

How views are loaded by Tableau Server

Tableau Server publishes workbook views for access via a browser. Requests that come in from the client browser first hit the Apache web server and are routed to the first available Application server process (wgserver.exe) which handles browsing and permissions for the Tableau Server web interface. Once a view is opened a request is sent to the VizQL process (vizqlserver.exe) from which queries are sent directly to the data source.
Views generated by Tableau Server are not static HTML. They are dynamic, interactive, tools that can be used to ask questions of your data. This means that every interaction with a view necessarily involves processing that data. In particular the VizQL server processes tend to consume the most cpu and memory (though every environment varies). It is good to think of Tableau Server, not as a simple web server, but as a data querying and analytics tool. Because Tableau Server is interactive, response times are dependent on query response times from data sources (unless the query result is already cached or extracts are used). The initial load of a live view will never be faster than the amount of time it takes to query the pertinent data from the data source.
For more information on optimizing your data queries please see our knowledge base article:

Understanding caching in Tableau Server

There are two primary levels of caching present in Tableau Server:

Model Cache

When a request from the browser comes in for a particular view, Tableau Server will check to see if the computations to display the view have previously been done. If so Tableau Server will use the pre-computed results to return back to the browser.
If the view requested has not been run before, Tableau Server will first determine a set of queries to run and then check to see if the queries have been run before. If we have, we can use those results to compute the view and send that back to the browser, avoiding talking to the database at all.
Of course, if neither of these cases applies, the underlying database will be queried, building the view as normal.

Query Result Cache

If the view requested has not been run before, Tableau Server will first determine a set of queries to run and then check to see if the queries have been before. If they have, we can use those results to compute the view and send that back to the browser. This is more expensive than a model cache hit, but faster than going back to the database for data.
If neither of the above mechanisms can serve up cached data, the data source will be queried for the data and the view will be constructed and saved into the cache for subsequent use.

Determining numbers of VizQL and Application Server processes

It is important to understand how caching works when determining the optimal number of processes for your Tableau Server environment. With too many Application server or VizQL Server processes running, requests coming into Tableau Server will be less likely to hit a process that has already handled the request and saved the data in cache. For this reason, reducing the number of Application Server or VizQL Server processes in Tableau Server can sometimes improve the performance and user experience of Tableau Server.

Where to start

Determining the correct quantity of each of these processes may require some experimentation to find optimal settings, but a good place to start is generally:
  • Use equal amounts of VizQL and Application server processes (this is most important for versions of Tableau Server prior to 6.1)
  • Set both at 1x to 2x times the number of processing cores present on the server

When to add additional processes

If all the following conditions are met, add VizQL and Application server processes incrementally and test the results:
  • Load begins to slow the server down but the machine is not yet constrained by physical resources (disk i/o, network throughput, memory, CPU, etc.).
  • Single requests are substantially faster during times of low load/concurrency and slower during times of high load/concurrency.
  • Your data sources are capable of handling more concurrent queries from Tableau Server than are currently being generated (even at times of high load) without suffering from longer query times.

Determining number of workers in a Tableau Server distributed environment

Tableau Server is shipped with the capacity to create distributed environments by adding clustered nodes called workers. These distributed environments are designed to provide scalability (not to improve performance). Adding workers to Tableau will allow requests coming into Tableau Server to be load-balanced to multiple machines. Bear in mind that Tableau Server relies on HTTP communication to balance load and handle internal requests between processes. Moving from a single server environment to a distributed environment will require processes in Tableau processes to communicate with each other over the network. Since this is added overhead, adding a worker (or multiple workers) should only be considered when a single dedicated server is unable to handle the required load.

Where to start

Tableau has published the test results for Tableau Server. The whitepaper is coming soon.
Note: These tests were performed by Tableau and show results from a specific test configuration and should not be taken as a guarantee of client response times. These benchmark results were returned in a controlled lab environment, without other applications running during execution. Actual results will vary based on a number of variables including, but not limited to, load type, hardware, network speed, browser settings, and database performance.
In these tests (see figure below), 100 concurrent users experienced optimal performance with a single server. Only when concurrent users increase to over 100 users did help to add Tableau Workers. This is largely due to the performance benefits of caching (caching benefits generally increase with concurrent usage).
The above findings are by no means universal. Response times for concurrent users can be affected by the number and complexity of workbooks and the frequency with which the users access and interact with the views. Every Tableau Server environment is different and while there is not a one-size-fits-all answer to the question of scaling Tableau Server, the information provided in the above knowledge base article will often give a good baseline.

When to add additional workers

If all the following conditions are met you may want to consider adding workers:
  • Adding additional VizQL and/or Application server processes is not possible with current resources, and/or the machine in general is beginning to be constrained by physical resources (disk i/o, network throughput, memory, CPU, etc.).
  • It is not possible to improve the existing hardware sufficient to meet load needs.
  • The machine(s) currently running Tableau are dedicated to Tableau Server and do not have other applications running on them which may contend for resources (SQL Server, IIS, etc.).
  • Your have additional available cores on your core license (sufficient to license another worker), or you are using a named user license.
  • Your data sources are capable of handling more concurrent queries from Tableau Server than are currently being generated (even at times of high load) without suffering from longer query times.

No comments:

Post a Comment