Khanderao Kand

Subscribe to Khanderao Kand feed
Khanderao, CTO at GloMantra Inc Providing Online Personal Recommendations - Assistance via twitter @khanderao My other blogs , In this blog, I will share my information and my thoughts on emerging technologies like social media, mobile computing, cloud computing as well as integration technologies(Java/SOA/ESB) and BPM.Khanderao
Updated: 5 hours 50 min ago

Cheat Sheet For DB Based SOLR Indexing

Mon, 2012-10-08 18:11
  1. Define data-config.xml (whatever the name of your data configuration file) : 
    1. This file defines from how to read data from RDBMS to your document to be indexed. So, define your SQL for full import as well as subsequent partial imports (called as delta imports) in this file.
    2. how does the data read get mapped to fields: Map here columns to SOLR fields.
    3.  Make sure that you test your sql in using your favorite RDBMS client.
  2. solrconfig.xml : Register request handler and data-config.xml in solrconfig.xml
    1. For example, if your db import is defined as dbimport in data-config.xml, you can define a request handler and specify request's url and map to data-config.xml
  3. schema.xml should contain all the fields that are defined in document in data-config.xml The solr config specifies how those fields should be dealt with when adding documents to the index.
  4. You can define your datasource either in data-cofig.xml or in solrconfig.xml
  5. You can index the data by http invocation of http://:port/solr/dbimport?command=full-import  (please note that the use whatever path you mentioned for 'dbimport' in your request handler.
  6. Please make sure that appropriate jdbc driver is in the lib path of solr.
  7. You can monitor the progress / status as     :     http://host:port/solr/admin/stats.jsp
  8. To look inside the index, use web version of Luke added as solr plugin :     http://host:port/solr/admin/luke  BTW the perfect way to look into indexes would be to install Luke and point to the data dir.
  9. Cleanup / Re-index: You can either cleanup solr indexes through issuing cleanup command on your dbimport or you can simply wipe of the content of data directory. However, make sure that you really want to do it.
  10. You can debug (very minimal) indexing by specifying debug=true in your dbimport command. However, make sure that you add commit=true 
For details:

Reclaiming Space from Deleted Big Tables from MySQL

Thu, 2012-10-04 18:00
So, in my earlier post, I mentioned about a need of dynamically resizing (increasing) EBS volume on EC2. Here is how I landed in the situation. In the prototype, my database grew very high and I could not reclaim the innodb space of mysql even after dropping large tables or even database. The ibdata1 seems to be greedy and never gives up. And there must be a good technical reason why mysql does not support an utility to release unused space.

Any how, here are the steps for reclaiming the space. Disclaimer: As you know I am not a DBA but I have to do what I have to do:

1. Take a sqldump of entire db 2

2. Shutdown mysql

3. delete (filesystem) ibdata1, ib_logfile0 and 1

4. Edit my.cnf (/etc/my.cnf) : add: innodb_file_per_table
    With this param, table data would be in separate files and only metadata will reside in ibdata1

5. Start mysqld

6. Reload the data dump.

Need a support to dynamically increase size of EBS volume of EC2 running instance

Mon, 2012-10-01 12:26
Recently I started a prototype involving Big Data processing on EC2.  I started with a "guesstimated" size for EBS volume. As in any POC / guesstimates, I was wrong and very soon the size grew and I needed to increase the space. But I realized that we cannot dynamically increase size of EBS volume of a running instance. That seems to be a problem to me. In my opinion, in today's world of visualization and pay as you use model, the vertical and linear scalability should be without any down time.

Anyway, since I was doing this as a POC, I was able to afford a small down time. BTW the process for increase the volume size is not that difficult. Here are the instructions from another blogger:

NFC and HTML5 emerging as Winners on Mobile

Wed, 2011-11-09 13:03
Near Field Communication(NFC) ( for more details read my earlier blog is very useful in very short distance communication ideal for mobile devices. It has got another backer that is from RIM. RIM is bating high on NFC based Apps as its one of the main four focus area (according to its Dev relations VP Saunders).

This week there is another from Adobe. It seems that it is stopping further development on MObile Flash and bating on HTML5 which they should have done it long back after listening to Steve Jobs. Anyway, recently Adobe taking good steps. Adobe has recently acquired Phone Gap which is an opensource for developing HTML5 based hybrid apps of variety of mobile devices like iPhone and Android. We are using it at myBantu. In this case 90% code base is independent of the mobile device. (HTML5 related my blog entry: )

Oracle in Three in One Cloud : PaaS, SaaS, DaaS

Thu, 2011-10-06 18:06
After adverse comments last year, Larry Ellison announced that Oracle got into Cloud in a big way. It is significantly different than CRM on Demand. It also adopts different approach than Multi-Tenant Sales Force. Oracle's public Cloud provides: PAAS (Platform As A Service), SaaS (Software As A Service) and Daas (Database As a Service). It has five components:

* Oracle’s Fusion Applications (HCM and CRM) SaaS
* Oracle Fusion Middleware : PaaS
* Oracle Database : DaaS
* Sun Systems, OS, VM : PaaS
* Social Network : SaaS

This announcement puts Oraccle in competition with Amazon, and Salesforce, which are the clear leaders in the public cloud computing space.

Good to see that CRM and HCM are in the pack. Last year I helped CRM and earlier Talent management from HCM in this initiative. This puts Oracle in direct competition with SalesForce, WorkDay and SuccessFactor (on Talent management side).

Oracle also announced Social Network for enterprises. It would allow social networking featers like sharing and following private to enterprises. Does it sound like Salesforce's Chatter? While on the topic of SalesForce, the approach Oracle for most of the component is virtualization based where every customer gets its own pack and not sharing with others as in multi-tenant softwares like SalesForce. This addresses someof the privacy concerns. Enterprises can customize the cloud based Fusion apps by SOA , BPEL, BPM and ADF standards based Fusion Middleware. Oracle's Database as a Service is in competition with similar provision on Amazon.

Oracle also provides Java stack as PaaS in competition with VMWare's CloudFoundry, Redhat's OpenShift and Salesforce's Heroku.

According to Ellison, Oracle’s new public cloud will be available for a monthly subscription and will include resource management and isolation, security, data exchange and integration, self-service sign up, elastic capacity on-demand, virus scanning, and more.

However, pricing and availability is yet to be announced.


Opensource WebRTC for Browser 2 Browser communication Coming up

Thu, 2011-06-02 13:56
WebRTC, A new open source for browser to browser communication is making some progress. has been launched and would go through W3C standardizatio. It may be part of HTML5. As per my knowledge, Google may adapt it and many browswer would support it. This browser to browser communication without any server involved would replaced traditional P2P communication including chats and talks.

Here is WebRTC architecture diagram:

For developers :
However, currently it is not yet ready. The current demo still needs a demo server. but it will be there soon.

If you want to join the effort:

NFC Getting Momentum

Thu, 2011-05-26 18:45
The Near Field Communication (NFC) is gaining a momentum. Its usage will be propelled via mobile devices and would give a birth to many fantastic devices and applications.

Last week Apple introduced Retail 2.0 store that effectively used NFC.

To the heels of the Apple, Google took the techbology to the more practical and more wider usage. That is to enable credit card payments via mobile. This morning Google introduced an android App called Wallet. It uses RFID and NFC technology.

The usage of NFC in the Android mobile is going to propel a new set of innovation. Hence let us look at few facts about NFC:

1. Distance: 4 cm or less (may be increased to 20cm)
2. 13.56 Mhz
3. Speed 106Kb/s to 848KB/s
4. Typicall usage: “sharing, pairing, and transaction"
5. Passive or active (2way) communication models
6. Unlike Bluetooth, NFC doesn’t require pairing
7. NFC requires far less power than Bluetooth but its slower.
8. Drawback: not-secure (may be ok because of very short range)

Coming back to Wallet. It should be noted that all the android mobiles would not have NFC technology built in. Also the service provider (shops / retailers) should be have complimentary technology.

Is ChromeBook nothing but Larry's old idea of Network Internet Computer?

Thu, 2011-05-12 01:44
Google announced ChromeBooks at GoogleIO 2011 conference today with a great fan-far. Definitely it is an idea appropriate to current Web centric world. It seems to be giving right vibes. It's slick, fast to start, connected to web, secured, may be free from viruses, consumes low battery. It is consistent with today's cloud computing. In other word, it is perfect client device for a Cloud Computing world or new web. However, is it a real innovation? Larry had started Network computer concept and had launched a separate company for the same. May be it was ahead of time. Isn't ChromeBook recycled the same idea? Anyway, though it has an innovative subscription model for education and businesses, the cost is higher, 499 for wifi and 599 for wireless. Especially on the background of various efforts going on to introduce slick Netbook at $100. Moreover, at this price, the ChromeBook would get sandwiched between tablets and PCs. Read my detailed blog at:

Hadoop is building a good momentum...

Tue, 2011-05-10 23:58
In EMC World this week, many new products based on Hadoop called launched.

EMC announced enterprise and community distribution as well as appliance of Apache Hadoop. This would be in competition with Cloudera which has a very good traction in Hadoop market. Moreover, Yahoo which has been pioneer in original contribution of Hadoop and a heavy user, is rumoured to be launching Hadoop spin-off. It has contributed Pig as a layer above Hadoop.

During the conference other products like Brisk,which makes Hadoop with Cassendra as a node, and SnapReduce from SnapLogic were also announced. Overall all of these are good indication of Hadoop traction. A more detailed note is in my other blog which is dedicated to emerging technologies and apps.

Personalized News Recommendations

Sun, 2011-03-13 19:53
A couple of days back, on March 10, Barron’s reported that “ Adds Recommendation Feature‘. Back in Nov 2010 MyBantu powered ‘Personalized News Recommendations‘ was launched for Samachar, largest news portal about India. This personalized news recommendation, one of the firsts, not only increased visitors(readers) traffic to the Samachar site but it also resulted in the readers spending more time reading these personalized articles.  more ....

Collaborative Filtering Vs Personal Preferences Based Recommendations

Thu, 2011-03-10 04:19
From my other blogs:

Buzz Around Non-Relational DBs

Tue, 2011-03-08 18:34
Reposting from my other blog

Last Saturday we (GITPRO – Global Indian Tech Professionals Association) arranged Tech Talk on NoSQL (nonRelational actually) DBs and Scaling Hadoop. It was very well attended. In the general introduction session when many introduced themselves they told their interests in Hadoop and NoSQL DB. It was nice to see a good size crowd sacrificing their Saturday evening to attend this informative session. It was more surprising to see many of them were actually users of these technology.

We at MyBantu are using MongoDb which is a document orient database. We store XML document (actually when store it is BSON in MongoDB) and queries use Scripting language for conditions. Other alternative in this class is CouchDB which is more Web-like and gives REST based access. Other famous Non-Relational (popularly called as NoSQL) we of course Hadoop and Cassandra. Both are apache projects with few very good show case implementations. However, recently when Diggs had problem and was using Cassandra, it got a bad name which is not that accurate. Anyway, Hadoop and its database called HBase are making more buzz. It was interesting news when Facebook also moved their messaging system from Cassandra to HBase. Its interesting especially because Cassandra originally came from engineers at Facebook. They used in their InBox search. There is some interesting work on Hadoop is happening in Facebook. They are the original contributors of Hive which is a data manipulation add of targeted towards implementing warehousing on top of Hadoop. While MapReduce databases created a lot of buzz around NoSQL, it is interesting that Hive and Hbase are SQL. so, when folks say NoSQL, it is actually non-Relational databases. Another warehousing related add-on to Hadoop is Pig (Apache Pig) which has originally coming out of Yahoo.

Anyway, its interestingly rapid development happening in this space and the major drive is due to the huge user generated data being handled in the social networking giants like Facebook, Zynga, LinkedIn,.. but the original credit to this concept of Big Table goes to Google from where the Map Reduce database 

Games (Asian) Indians Play

Mon, 2011-02-28 21:21
I recently read this book,"Games Indian Play". This book is not about Indian games like Kho-Kho, Kabbadi etc. But it is about "why do Indian behave how they behave". To be precise, the sub-title of the book is "">Why we are the way we are". The author, Raghunathan, as used his studies on "Game Theory and Behavioral Economics" to make sense of Indians behavior. Its a though provoking book which uses "Prisoners Dilemma", the famous problem from game theory, to eloborate how Indians are 'rational' but thier self-centered rationalism undermines their long term as well as community interests. His examples cover day to day scenarios covers almost everyone, individuals, politicians, or community by and large. At the end of the book, the author tried to propose crux of "Bhagwat Geeta" as a solution to behavior and explained it in the context of the game theory problem.


Wed, 2010-10-13 15:41
Since many folks asked about an ability to view posted events, here is the url to do so

China May Lead Patenting, How About Innovation?

Wed, 2010-10-06 19:08
Title is self explanatory. My following blog refers the stories..

Larry, Fusion Apps, and SOA Middleware Technology

Wed, 2010-10-06 17:36
At Oracle Open World 2010, Larry Ellison glowingly referred Oracle Fusion Apps (OFA) in his eagerly watched keynote as the largest engineering project of Oracle in the recent times. He covered some important aspects of the Oracle Fusion Apps. He said that it is the first ERP Application completely based on standards. While terming it as "Never done before", he specifically mentioned BPEL! He mentioned that the Fusion Apps is all about Intelligence and not just process automation. Per Larry, the Fusion Apps has wonderful web 2.0 based easy to use UI and search capabilities. Most importantly, he specifically mentioned that the Fusion Applications is based on SOA technology. In fact, this was on his slide deck. All of us who worked on SOA - BPM stack must have been proud to hear Larry saying that.

At the end of his keynpte, Larry invited Steve Miranda, In-charge of Fusion Apps, to demonstrated the Fusion Apps. you must have seen the Work List apps and might have immediately recognized the SOA suite serving the business process. After an intense work for years, hearing such great comments is a rejuvenating memorable moment of all of us who worked on the product as well as helped to happen the Fusion Apps!

Resolving timeouts while using ejb beans for SOA invocation - deployment

Tue, 2010-10-05 16:41
If you are using AS11 BPEL's apis for deployment or invocations and facing time-outs, and the task really needs more time, then you may need to increase the ejb timeout.

1. Find ejb_ob_engine_wls.jar in your deployment.
2. Modify META-INF/weblogic-ejb-jar.xml,
3. Increase time-out
4. rejar and replace the original jar

Selectively turning off the service engines in Oracle SOA AS11

Tue, 2010-10-05 16:05
Thanks to the Spring based implementation of Oracle SOA AS11, if you are not using B2B or mediator or decision or any other service engine, you can turn it off by removing their reference beans in fabric's Spring config (fabric-config.xml). Just be careful though! You may accidently remove something that is needed :-) This may bring the memory consumption down by 100s of MBs.

Integration technologies for Cloud

Tue, 2010-10-05 15:20
2010 is seeing Cloud computing and mobile computing at their 'tipping points'. There are many startups being launched in these areas. With many applications being SaaS based hosted on Cloud, the next requirement would be how to connect them securely and reliably. What could be the right technologies for the same. This need would definitely turn into Cloud based Integration technologies - SOA technologies.

Micropayments opening gate to the fortune at the bottom of the pyramid

Mon, 2010-07-19 16:29
Late C. K. Prahalad wrote a path breaking book showing to the world that there are great opportunities in the emerging as well as poor nations. Those potential markets would need innovation to realize.

When he wrote The Fortune at the Bottom of the Pyramid, he may not have thought about VoiceSMS and voice blogging.

But he does mentioned about the innovation. And such innovation is in action in the form of bubbleMotion which has more than 1.2 million paid users who pay 0.65 USD per month for blogging / accessing voice.