The prevalence of commodity disk-based hardware heralded the first wave of Big Data adoption, but a more recent focus on speed (as much as size) of data – and shortening timeframes in which to act – has highlighted the value of in-memory variants of database infrastructure components. Not all in-memory databases are created equal, though (some simply cache the hottest tables). SAP HANA has been designed to operate memory-first as a database and application environment for processing high volumes of high-value operational and transactional data – and with cloud-based options now available, SAP HANA’s capabilities can be procured for a lower up-front cost.
There’s ‘in-memory’ and there’s ‘all-in-memory’, but not all applications can fully exploit the advantage
If you have a real need for speed in the processing of data, it may not be enough simply to cache some of your hottest data in memory some of the time for some of your applications. SAP HANA’s ‘memory-first’ database and applications platform provides an environment in which you can hold onto all the data your business applications need in order for them to provide you with insight in near-real time. Even a straight configuration swap from disk to memory can speed things up considerably. But for a real performance hike, pushing application logic down into the database layer enables you to truly exploit in-memory processing performance. SAP has invested heavily in re-writing its own Business Suite to take advantage of the capability, but enabling other applications to take advantage of the potential of SAP HANA will require significant effort – either from you, or your application providers.
SAP HANA isn’t just for SAP applications, but their needs will be driving HANA’s development for years to come
Although SAP HANA is sold as an environment for any real-time business application development, the recent launch of SAP Business Suite 4 SAP HANA (S/4HANA) cements the platform’s position as SAP’s big bet on Big Data, especially of the high-velocity kind. When it comes to the platform roadmap, it’s therefore logical that the needs of S/4HANA will take priority over other ‘non-core’ requirements. That said, the principles upon which SAP HANA is based – a multi-layer technology stack with good integration credentials; the integration of Sybase IQ’s Dynamic Tiering to take in a broad range of storage / latency options – means that it represents a highly capable and relevant platform option for both widespread SAP adopters and non-SAP shops alike.
SAP’s Big Data strategy is focused on real-time business decision-making, at the nexus of (unsurprisingly, given its enterprise operations heritage) transaction and analytic processing. Its play here is SAP HANA’s convergence of database and business application capabilities – enabling transactional processing, text analysis, predictive analysis, data virtualisation, spatial processing, streaming data processing and more to be performed within the same platform architecture, and on the same instance of in-memory data. SAP counts 5,800 directly-licensed HANA customers, as of Q4 2014.
SAP has positioned HANA at the core of an ecosystem designed to extend BI, analytics and ERP capabililties into the real-time domain. Around its core native in-memory (columnar, relational) database storage and processing capabilities the platform is open to integrations with SAP business software, various Hadoop distributions (Hortonworks, Cloudera, MapR and others) and ‘traditional’ data management systems (Teradata, Oracle, IBM and others); and enables data ingestion from a variety of structured and unstructured sources (such as industrial sensors, social media platforms, text document sources, geospatial sources, and others).
SAP HANA is the product of in-house development from the ground up (SAP worked with the Hasso Plattner Institute and Stamford University to develop the core in-memory database architecture in 2008), together with a set of strategic acquisitions – such as in-memory OLTP database P*TIME, MaxDB (which included the in-memory liveCache engine), and TREX (a columnar data search engine).
The SAP HANA integrated platform brings together the following capabilities:
- Data acquisition – SAP Replication Server for data replication, SAP ESP (HANA Smart Data Streaming) for complex event processing, and SAP Data Services for data exchange.
- Storage and processing – both HANA and SAP IQ – HANA Dynamic Tiering.
- Data exchange and federation – SAP Data Services for data exchange, SAP Smart Data Access for data federation, and connectors for Hadoop and archive management.
- Analysis – a variety of tools for business intelligence, predictive analytics, and analytic modelling.
- Management – data categorisation and enrichment, and data governance.
- An application development environment.
All of these capabilities are optimised for in-memory storage, though with tiering options for less ‘hot’ data to be stored on disk. The platform as a whole is available either on-premises (on certified appliances), as a managed hosted service, or across a hybrid arrangement.
In addition to its in-built engines for native data processing, SAP HANA has out-of-the-box integration with other members of the SAP business software family (native connectors are available for SAP Business Intelligence suite, Lumira, Design Studio, Analysis for Office, Data Services), third-party analytics applications (including Tableau, Cognos, MicroStrategy, and Qlik), and ETL tools (such as Informatica). HANA supports open standards like REST, JSON, ODBC, MDX, and JDBC to encourage third-party application integration; client applications bypass HANA’s SQL processor and access the database directly using JDBC or HTTP.
Two development environments are available: HANA Studio (desktop software based on Apache Eclipse that provides application lifecycle management and development capabilities); and a web-based editor and administration panel based on Apache Orion.
Rather than improving the performance of a database which has been primarily written to disk by caching it (or some of its tables) in-memory, SAP HANA is built around a ‘memory first’ database designed for on-the-fly calculations.
HANA only writes data to disk only for persistence; memory snapshots are saved at frequent intervals to provide savepoints, and in-between a changelog is saved to a fast flash disk to achieve a low Recovery Point Objective. The HANA database is fully ACID-compliant and can also replicate to standby systems to provide high availability in a cluster (as well as sporting interfaces for third-party backup and monitoring with the likes of NetBackup, IBM’s Tivoli Storage Manager, SAP Landscape Virtualization Management, etc.).
The SAP HANA database is principally a columnar data store. It can store in row format, but only does so only when column stores are specifically not well suited – such as for queues, or when storing configuration information – as it’s the columnar data store which boasts more capabilities. Data is compressed and stored only once (with the persistence caveats above, but there is no duplication or replication between columns and rows in any hybrid stores), and in its most granular form (i.e. with aggregations only performed on-demand – and in practice it’s rare to require aggregations in advance anyway because of HANA’s in-memory processing speed: 3bn scans/sec/core and 20m aggregations/sec/core). Because every column is stored as an index, there is no need for separate primary indexes (although secondary indexes for OLTP scenarios with multiple columns are possible).
All of these storage strategies together lead to a considerable reduction in overall data footprint (SAP estimates up to a 95% saving, depending on the specific datatypes in use) when compared with traditional row-based relational databases.
When SAP HANA was only available as a pre-built appliance on certified hardware, customers deploying on-premises were licensed either as a proportion of their software application value from SAP, or by the (64Gb RAM) unit. Currently certified hardware can scale up to 8Tb storage on a single system (a maximum of 2Tb for analytics and 6Tb for SAP Business Suite applications running on HANA), and scale out to 112Tb in a multi-node cluster – depending on which hardware is employed. However it’s also worth noting that HANA’s data compression capabilities mean that systems 112Tb in size can potentially house databases which would otherwise require much larger equivalent diskspace. What’s more, the new Dynamic Tiering options allow larger databases comprising some ‘warm’ data sets to occupy a combination of memory and conventional disk storage; and also allow integration with other databases, such as SAP IQ or third-party offerings, for ‘cold’ storage of genuinely slow-changing data.
Versions, editions, and pricing
New versions of SAP HANA come in 6-monthly ‘service packs’, the last of which – Service Pack 9 – became generally available at the end of 2014. Although some of its new features (such as native multi-tenancy) are built into the core product, other new capabilities (like Dynamic Tiering, data integration and data quality management, streaming data processing and graph analysis) are packaged as optional extras. The rationale for this split is that many SAP customers already own some of the software in its original form (if they’re SAP shops), or they may not need all of the new capabilities.
The SAP HANA ‘Platform’ edition bundles SAP HANA’s core database with SAP HANA Studio (for application development and testing), SAP HANA Extended Application Service (for HTTP access to the HANA APIs), and capabilities for high availability and disaster recovery. The ‘Enterprise’ edition bundles additional data provisioning capabilities and analytical processing engines.
HANA in the Cloud
Since October 2012 SAP HANA has also been available in the public cloud as an in-memory Platform-as-a-Service from SAP itself (SAP HANA Cloud Platform), via Amazon Web Services (HANA One), and via other partners (it is also available as a hybrid cloud offering).
In May 2013 SAP added the ability to deliver HANA on a private managed cloud (SAP HANA Enterprise Cloud). SAP HANA Cloud Platform’s ‘Standard’ subscription for cloud-based production deployments (with which customers can build and run unlimited custom applications) starts at $3,932/month; an ‘Extensions’ edition subscription (for building and running unlimited HANA-based extensions to customers’ cloud-based or on-premise applications) starts at $1,337/month. Developers can take advantage of either a free shared instance, or a dedicated ‘Starter’ edition that2 starts at $539/month.
Future plans and suitability
SAP promotes SAP HANA as a data platform for time-critical Big Data analytics, and it encourages third party vendors to develop on HANA and offer their applications through the SAP HANA Marketplace. SAP’s partner program for HANA is home to over 1,800 start-ups, of which over 100 have delivered HANA-based applications. However since launching in November 2010 – and especially with February 2015’s launch of SAP S/4HANA – HANA’s place as primarily the underpinning infrastructure for SAP’s re-engineered suite of business software applications is clear. This is what’s driving its roadmap for the foreseeable future.
SAP HANA’s use cases are broad – anywhere that analytics is deployed to provide real-time and inline business insight based on transactional information, potentially in combination with data from myriad other (structured and unstructured) sources. Examples include:
- Predictive maintenance. Beyond Internet-of-Things scenarios in transport and logistics, SAP is building on its ERP presence in heavy industries to ingest sensor data and extend applications into mobile and real-time domains – supporting what it refers to as “connected manufacturing”.
- Line-of-business data marts that feed real-time business decision-making using high value data.
- Acting as a ‘simplifying front-end’ through which customers can build applications on a single, in-memory, copy of all relevant business data.
SAP is capitalising on both the platform’s processing speed and footprint compression to argue that HANA is where a company’s Big Data centre-of-gravity should reside, with datasets loaded in (or accessed via virtual tables) from Hadoop and enterprise data warehouses, and ingested from a range of typical Big Data sources.
It’s this ‘all-in-one-place’ ethos (and that place being fast memory, rather than a cluster of spinning disks), the native processing capabilities, out-of-the-box connectors to common databases and applications, the integration layer and development environment ecosystem (quite apart from the SAP application integrations) – plus the pay-as-you-go subscription option – that make SAP HANA an attractive platform for business applications in a real-time environment. With all ‘hot’ data stored in main memory, it’s possible for OLTP and analytics applications to run on the same instance at the same time, providing answers to ad hoc queries and analysis in real time.
The extent to which SAP has renewed its entire portfolio around HANA, although a ringing self-endorsement of the technology (and a demonstration of the size of the stake the company has taken in its in-memory platform), is also an indication of what’s driving HANA development for the foreseeable future: it’s not as the in-memory űber-platform for everyone – despite its semi-democratising in-cloud subscription option; it’s as the heart of SAP itself in the real-time Big Data age.