How can you add real-time to your Big Data application stack? How can you leverage in-memory computing to ensure you maintain consistency, scalability, transactionality, reliability, and high availability? Answers here.

XAP 9.5 GA Is Here

I’m proud to announce today the release of XAP 9.5, which is another important milestone in the XAP 9 release train. 9.5 is a direct continuation of our efforts and focus on real-time processing of big data streams, improved data grid performance, and better manageability and end-user visibility.  In this post I’d like to give

READ MORE…


via WordPress http://bit.ly/XXhF8r

Big-Data-Real-Time-Performance

Big-Data-Real-Time-Performance - Enjoying both worlds with one Architecture! The efficiency of business processes is everything. Companies must be in a position to quickly react to new opportunities that arise. A successful organization today must be able to extract critical business information out of the incoming raw data and have it available at fingertips of the decision

READ MORE…


via WordPress http://bit.ly/WHGuo1

Big-Data-Real-Time-Performance

Big-Data-Real-Time-Performance - Enjoying both worlds with one Architecture! The efficiency of business processes is everything. Companies must be in a position to quickly react to new opportunities that arise. A successful organization today must be able to extract critical business information out of the incoming raw data and have it available at fingertips of the decision

READ MORE…


via WordPress http://bit.ly/WHGuo1

Misuse of great technologies

Or Cohen from Fewbytes gave an interesting presentation yesterday at DevOpsCon. He talked about common misuses of NoSQL and big-data products, here are three of his points that I recall off the top of my head:

* Or claims that flushing data to Amazon EBS volumes happens too quickly, faster than physically possible to write to a hard drive. Therefore, when running NoSQL on the cloud we must have two running copies on two different availability zones, and not rely solely on EBS persistency.

* Cassandra is installed on two machines “for high availability”. Each write is written to two replicas, and each read looks for a QUORUM of replicas. That means that any Cassandra machine that fails will make the cluster unavailable. Why ? Because a QUORUM of two machines requires both of them to be online. That gives us half the availability at double the price

* Using Hadoop on high-end hardware that is traditionally used for RDBMSs (scale-up to stronger hardware). Hadoop best scales out on multiple commodity hardware that cost the same as one big hardware.

Big Data Document Persistency

In the Big Data world, we see more and more document based solutions for storing data (Cassandra, MongoDB, CouchDB, more…).

GigaSpaces’ Space Persistency module was specially written to support this scenario, wherever it’s needed to persist documents to an external data storage.

Such persistency is done by extending the SpaceSynchronizationEndpoint class and specifying it in the space configuration.

The following is a basic example for a MongoDB persistency implementation & configuration:

(Configuration is pretty straightforward & done via spring in pu.xml)

This basic example only demonstrates handling write operations on batch synchronization event.

The Space Persistency module offers more features such as Transactions Persistency, Type Introduction, Index Creation and more…

Project your Big Data and Get Just What You Need

In many common use cases, very large objects are stored in the data store, like a big table such as Cassandra.

However, some of the operations need only a small portion of that object, so there is no need to retrieve the entire object for that.

For example, in Cassandra you can get specific columns instead of the entire row. For that purpose I am currently developing the “projection” API in GigaSpaces. 

You can use this API together with the very rich query API GigaSpaces exposes to get partial results instead of the full object. The result will be the object/s that matches the desired query, but it will contain only the projection of the required fields instead of the full object. A very cool capability is that you can also use the projection API when you subscribe for notifications, which will make sure you get only the relevant data when a notification upon data change occurs.

Here’s a snippet of how you can use the projection API using SQL query to calculate an average of price quotes in a specific region:

From the Memory Grid to Cassandra

The new Cassandra Space Persistency implementation allows persisting space entries to Cassandra.

It’s pretty cool.

After setting everything up, you can write entries to the space and with no additional configuration, a matching column family will be created for you in the configured Cassandra Keyspace.

Moreover, this implementation provides a “flattening” mechanism. Basically this means a zero-configuration object <—> column family mapping.

For example, consider the following data classes:

You can simply write a ‘Person’ to the XAP in-memory data grid:

If this is the first time an entry of this type is written to the data grid, its type description will also be passed to the Cassandra Space Persistency component, which takes care of the mapping and the actual column family creation. This is the column family you will end up with in Cassandra:     

Moreover, Cassandra is by nature schema-optional. If you write documents to the in-memory grid with dynamic properties (meaning they are not part of the original type description), The Cassandra Space Persistency component will take care of writing (serializing) these entries in such way that their type is maintained. Later, when you need to read from Cassandra (during cache miss on the data grid for example), these properties will be read (deserialized) correctly with their type maintained.