Tuesday 24 December 2013

Peak Number of Concurrent Sessions Count Algorithm - Sheer Geniousness

So its been a while since someone asked me to generate reports for a web application and in particular, be able to profile the load on the server via different metrics. Metrics like longest running session, response times per service/request all seemed pretty easy...until I got stumped by peak number of concurrent sessions.

Why you might ask? Because the problem is more complex than what meets the eye. You need a clever way to determine overlaps between sessions and then count the maximum overlap per day.

Oh and did I mention I wanted to solve this all at the database level? If you'd want to solve this programmatically, the solution isn't all that difficult. A simple O(n^2) algorithm would be:

for (session1 in sessions) {
  for (session2 in sessions) {
    //determine if there is an overlap
    if (session1 != session2 && session1.startTime <= session2.endTime && session1.endTime >= session2.startTime) {
    session1.concurrentSessionCount++
    }
  }
}

//find the session with the max overlap
sessionWithMaxConcurrents = sessions.find {session -> max(session.concurrentSessionCount)}

//Add one to the overlap to include the current session
return sessionWithMaxConcurrents.concurrentSessionCount + 1

Very easy. But as I mentioned before this is O(n^2) - not very sexy. And loading all sessions into memory - well lets just say I'm not a fan.

Turns out there is a very neat approach. It does however require Windows or Analytics functions in the database. Hey you can't get everything right?

Okay so lets set up data. Our table looks like so:


Our application records the time when the user logs in and out. BTW - I should point out that not all users may log out of your application.... so how do you record a session expiry? I'll post that in a seperate post. For now, lets move on.

The first thing that we want to do, is assign a row_number to every row that corresponds to a log_in, pretty straight forward:


And now we just union all our log_out:


But this isn't good. We want to put our logins and logouts in the order they were done in our application; this means we need to sort it by the date, no problem:


Much better...and now for the magic!


Monday 2 December 2013

Failover on Postgresql 9.1 and above

The previous post allowed you to create a server in HOT standby mode. As mentioned earlier, you can only use this to do readonly queries. Now let us say that the primary server goes down, following are the steps you'll need to promote the standby server and make it your primary.
Open a shell on the standby server with the postgres user and type:
pg_ctl promote -D /var/lib/pgsql/9.2/data/
And that is it! Your standby server has now become a primary instance, ie, you can do inserts, deletes and updates now.
Please note that if the standby server fails, we quickly need to bring that back as well. To detect failure, on a daily basis, you should do a count on the audit log table on both primary and standby and make sure they are equal. If the standby fails, it is best to kill that database and redo the steps from the previous post.

Tuesday 26 November 2013

Configure Hot Standby with Streaming Replication on Postgresql 9.1 and above

NOTE: Please note that this does NOT work on Windows! Backups only work on Unix machines.

NOTE: We need at least postgres 9.1 to make this back up work.

NOTE: You must be admin to restart the server. And you must be user postgres to make edits to the configuration files. All edits assume your base directory is /var/lib/pgsql/

Hot Standby is the name for the capability to run queries on a database that is currently performing archive recovery. Log Shipping replication allows you to create one or more standby nodes that are replicas of the primary node (or master node). Standby nodes can then be used for read-only query access.

To enable hot standby, make the following changes to the postgres.conf on the master:
wal_level = hot_standby
checkpoint_segments = 8
max_wal_senders = 3
wal_keep_segments = 8
Please note that the segment size of the write ahead log (wal) is a fixed 16MB. Should the server undergo huge stress, it could be that the streaming process cannot keep up with WAL, in which case, some data might be lost. To prevent this, increase the size of checkpoint_segments and wal_keep_segments.

Add the following changes to pg_hba.conf on the master:
host     replication     postgres        192.168.2.37/32         trust
Where 192.168.2.37 is the slave. The above line is saying that we would like the postgres user on host 192.168.2.37 to gain replication access to this master server.

Save these changes and restart the server. It is now time to configure the slave server.

First stop the postgres server on the slave and create a backup of the existing server. I do this by moving the entire data directory to old.
mv 9.2 9.2.old
Now that we have an empty data directory, copy all the contents from the master database with the pg_basebackup command. Note: this only works on versions of postgres greater than or equal to 9.1
pg_basebackup -D 9.2/data --host=10.1.1.7 -v -P -xlog -U postgres 
Where 10.1.1.7 is your master server. This in essence copies the entire database over to the slave. Now we need to make some edits. If your access for the master and the slave are the same, you do not need to edit pg_hba.conf. Edit postgresql.conf:
hot_standby = on
This tells postgres that the server should act as a standby server.

Now we need to create a new file, called recovery.conf. In essence this file contains the information of the master server:
standby_mode = 'on'
primary_conninfo = 'host=10.1.1.7 port=5432 user=postgres password=postgres'
Now start the slave server. You must verify that the server is in recovery; this can be done by opening an sql pane and executing the query:
select pg_is_in_recovery();

Tuesday 19 November 2013

Hibernate Associations and Discards/Evicts

Right, so your sitting at work thinking how life would be without the Hibernate ORM...and you would go.... very frustrating indeed. Most of the time, you'd be busy mapping your SQL to your Java instances and vice versa.

But then again, Hibernate does come with its own set of headaches. Here is something I learnt recently.. that hibernate will not tell you.

You have your domain instance and your hibernate session and for whatever reason, you decide to evict it (also known as discard in the Grails world). For single entities, you see that this works just fine...but most system work with a graph of entities....not just one.

Suppose you have domain class A that has one or more class B objects. Should you decide to evict on class A, and you find class B is dirty (meaning some attribute has changed there), you will see that your discard just gets ignored. After all, if class B is updated and class A relies on it, the relationship between A and B needs to get refreshed and as a result, Hibernate just cannot ignore not persisting A.

So what is the solution you ask....you need to evict EVERY B as well as A. Essentially, this tells hibernate that nothing is dirty and now finally, A will not get persisted.

Friday 15 November 2013

Grails Data Binding Gotcha

Grails Data Binding is one of the most powerful tools I've used in the Web MVC world. Not only does it automatically bind correctly to data types, but you can also bind to lists and maps.

So yesterday, I was building an app; and following was the code:
 1 <g:each in="${agents}" var="agent" status="i">
 2  <!-- start row -->
 3  <div class="row">
 4   <div class="span12 control-group-holder">
 5    <div class="row">
 6     <div class="span4 control-group">
 7      <g:include params="[ext_id: agent.ext_id, readonly: true]" controller="agent"
 8           action="show"/>
 9      <g:hiddenField name="agents[$i].ext_id" value="${agent.ext_id}"/>
10      <g:hiddenField name="agents[$i].version" value="${agent.version}"/>
11     </div>
12 
13     <div class="span8 control-group">
14      <g:radioGroup name="agents[$i].photoStatus" class=""
15           labels="${photoStatusValues*.name()*.toLowerCase()*.capitalize()}"
16           values="${photoStatusValues}" value="${agent.photoStatus}">
17       <label class="radio inline" style="margin-left: 0px;">${it.radio} ${it.label}</label>
18      </g:radioGroup>
19      <div style="visibility: hidden" class="photoDeniedReason">
20      <g:message code="text.admin.reviewPhotos.reason.for.rejection" /><br/>
21      <textarea name="agents[${i}].photoDeniedReason" maxlength="4000" class="rejection_reason_required"></textarea>
22      </div>
23     </div>
24    </div>
25   </div>
26  </div>
27  <hr/>
28  <!-- end row -->
29 </g:each>
This is being bound to the command object:
 1 @Validateable
 2 class BatchUpdatePhotosCommand {
 3  List<Agent> agents = new ArrayList<Agent>()
 4 }
 5 
 6 @Validateable
 7 class AgentPhotoCommand {
 8  String ext_id
 9  String version
10  String photoStatus
11 }

Except that it just would not bind. I would get:
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at grails.plugin.cache.web.filter.PageFragmentCachingFilter.doFilter(PageFragmentCachingFilter.java:195) at grails.plugin.cache.web.filter.AbstractFilter.doFilter(AbstractFilter.java:63) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662)
The solution was updating BatchUpdatePhotosCommand with the 'withLazyDefault' and everything worked:
 1 @Validateable
 2 class BatchUpdatePhotosCommand {
 3  List<AgentPhotoCommand> agents = new ArrayList<AgentPhotoCommand>().withLazyDefault {new AgentPhotoCommand()}
 4 }
 5 
 6 @Validateable
 7 class AgentPhotoCommand {
 8  String ext_id
 9  String version
10  String photoStatus
11 }

Wednesday 18 September 2013

Search and Security

Search....the final frontier....

Whoops...lets start over....search...hard to imagine life without it now. And for very good reason, the amount of information readily accessible has gone up a million times over the past 10 years! One has to be able to quickly sift through all that data and find the relevant information.

Search comes in all shapes and sizes..whether your using email, accessing the web, or even using local search on windows...your always in need of it. Of course, when it comes to implementation, there is a plethora of techniques out there one can use. And I'm not just talking about databases, but we now have natural search engines like Apache Lucene...and I don't even know what Google uses under the hood...

And then we have security. Security is very real and has its importance in the food chain. There are many fractions to security, but I'm specifically going to talk about data encryption.

Alas...data encryption itself is pretty huge...and the algorithms that exist are pretty sophisticated. One can even encrypt the same data into several different hashes, and be able to decrypt it again!

So thats that, now onto the topic and hand. Search and security are like water and oil....well maybe I'm being a bit dramatic...but at the very least, you'll need to rethink how to get it going. Don't believe me...lets look at an example.

Lets say your building a database model for some customers. Usually, the customer will have some attributes like a first name, last name, date of birth and perhaps an address.

Search becomes pretty simple:

select * from customer where firstname = 'abdul';
select * from customer where dob < '01/01/2000';
select * from customer where address ilike '%wellington%';

Super. Memory footprint is low..if your using indexes then the response is very quick (except perhaps the last query).

Now lets encrypt the data. Oh wait....you can't execute ANY of the above verbatim! Fortunately, we have some solutions.

If security is high high on your radar, then you'll probably want to use a variable hash algorithm (meaning that for the same text input, you'll get a different hash output every time), then the best solution really is to go row by row in the database, bring that row to the application, decrypt it, and do your comparison there. Very poor efficiency, but it'll get the job done.

Another solution would be to store an in memory data store of sorts. Perhaps an in memory database, or a lucene cache, or some custom data structure. Security folks come back with concerns that if a memory dump is taken, then the data can be exposed...sure..the risk is there. Like I said, if security is high up in the project, it'll probably be a difficult sell.

As far as variable hash algorithms, thats all I've got. All the above examples can be done in code once the data has been decrypted.

If you can relax the constraint of variable hashes...and use same hash algorithms, well the story changes quite a bit. Now you can encrypt the search text in the application and push the encrypted search string to the database. For equal and not equal operators, this will work brilliantly. You can index the columns too to get very fast results.

But thats pretty much all you can do. Haha! In particular, comparator operations like less than or greater than cannot work. You are after all, working with garbage strings.

For the comparator operations defined above, typically, they stem from some kind of business rule. Perhaps in our example, we need to wait for the customer to be 18 before they are eligible for a certain product, say. Then you can have a scheduled job that sets the particular indicators based on those rules. Essentially, you'll need to remove the less than/greater than with some form of equal to.

And then finally, we have our good friend like/ilike. Here, even if you use single hash algorithms; your still stuck...because your search phrase is usually a subset of the entire phrase - which is why your using the like clause to begin with.

Don't despair good people. The notion remains the same...we need to replace the like clause with some form of equal to operator. But how?

Enter string tokenization. All you really do is break the string into a set of literals. Lets say someone lives at 10 Wellington Drive. Then the tokens can be 10, wellington, drive. You can go crazy with slicing and dicing, start with say 3 characters, till the end of the string. So wellington becomes wel, ell, lli,.....will, elli,....wellington. Now encrypt all these tokens and associate the tokens to the id of the customer. When a user types well in the search phrase, you do an equal to operation on the tokens, get the associated id and voila!

I am sure there are other ways in which these problems can be solved. This is one solution that I've applied to a project and works quite well. Needless to say, as technology evolves, I am sure more work will be done in this area. Until then....

Tuesday 17 September 2013

Another Developer and his Blog?

Yeah...thats right...I've finally decided to join the fold.

Like any other developer, when approaching a 'problem', I start with a Google search...chances are, if I've come across a specific problem...someone's already solved it right? Normally, I'd get hits on sites like stackoverflow.com or javaranch.com... and for the most part that was good enough.

However, over time I've come to realize that a lot of issues are now discussed on random blogs. These contain gotchas, or other subtleties in frameworks (including bugs and their workarounds!), or even simple code templates like a jndi lookup procedure. And you know what, they've proven to be very valuable. I feel its my turn to give back :-)

And so it begins...