Real-time data can provide many benefits to businesses as well as their customers. In the past, data warehouses could only provide information on past events, with the age of the data dependent upon the frequency with which the warehouse was updated. Real-time processing means that the data warehouse is updated continuously. Although this provides a more current picture of the data, real-time processing presents a variety of issues for data warehouse management. Following certain best practices -- in addition to following the routine steps for ensuring successful deployment of any new technology -- can help meet the challenges associated with updating data in real-time.
- Decide whether true real-time is necessary or if "near real-time" would suffice. Data loading frequently requires limiting access to the warehouse while the data is loading. This is not practical for many enterprises. Solving the problem can be costly, so the easiest solution is to avoid real-time loading altogether. Some enterprises definitely require real-time data; for example, a bank authorizing charges against a credit or debit card stands to lose a significant amount of money if information is obsolete. However, many other businesses will see no substantial impact if information is more than a few minutes old. If near real-time is adequate, scheduling batch loading on an hourly or twice daily basis may give you the information you need without the struggle to enact true real-time updates.
- Consider an external real-time data cache, or RTDC. An RTDC can replicate a complex warehouse system or be as simple as a dedicated server for receiving, storing and loading real-time data. A number of different configurations are possible, depending on factors such as the size of the database and how quickly users need updated data. One approach is to direct all queries to the RTDC. A second approach is to have the real-time data temporarily merged with the warehouse for the query execution. For some applications, choosing an in-memory database for use with an external cache can expedite query processing.
- Simplify real-time reporting. As the complexity of real-time reports increases, so does the possibility of out-of-sync data. Keep in mind that even a simple query run against a moderate-sized database needs a second or two to run. What happens if the report requires multiple passes of the data to acquire all of the values? With real-time updates, the chances are excellent that some of the data needed for the report has changed between passes -- and the more passes required, the greater the discrepancies can be. To illustrate, suppose a report asks for sales units by item number for the week-to-date, current month-to-date and year-to-date. The first pass captures the weekly data at the point at which the pass was initiated. Before the second pass begins, a customer enters a large order. This sale will be included in the month-to-date numbers, but not the week-to-date. The same issue can arise for quarter-to-date or year-to-date numbers. Your reports will never "balance" and reconciliation can be time-consuming and difficult, if not impossible. Limiting real-time reporting to queries that can be executed in a single pass is the best method to eliminate the issue of conflicting reports.
- Consider a "just-in-time" merge solution. A just-in-time data merge from an external data cache can be the best solution for some. JIT mergers are frequently recommended if the enterprise cannot make do with near real-time data but must have true real-time information that is changing quite rapidly, sometimes at the rate of 1,000 transactions each second. The data may be accessed simultaneously by thousands of users, and queries may be complex, multiple-pass operations. A just-in-time merger is a hybrid approach in which real-time data resides in an external data cache, with a link to the warehouse occurring on an as-needed basis.
Technology has advanced to the point that there is virtually no challenge related to software and data processing that cannot be met. However, the best practice is to determine what is essential before proceeding with a potentially costly solution. In the near future, better or less-expensive solutions may be available. If you feel that real-time data warehousing is critical to the success of your operations, you might want to consider engaging the services of a true technology partner.