Saturday, April 2, 2016

C/C++ Function returns value through register A

Did you know? C/C++ functions return values through register A:

(Try on visual studio)

int Temp()
mov eax, 10

int x = Temp();
cout<<x; //10

With the understanding that a micro-controller register is used for returning values, think what happens when you return primitive datatypes, pointers, references etc., from functions and how the function behaves!!

Wednesday, February 18, 2015

Video Wall using Raspberry Pi

Over the weekend, I was working on a prototype of a Video Wall implementation using Raspberry Pi and succeeded in it. I share my architecture here:

1. Raspberry Pi is supported by OpenELEC, which will turn your Pi into a media center. Install it in your Pi and interface it to your TV and attach it to your wireless network.

2. UPnP protocol is a popular media sharing protocol. You can turn your Pi into a UPnP client by following this:

3. Your Pi is ready to display media on TV by any network device which can speak in UPnP. For android devices, there are plenty of apps that can interact with UPnP devices in your network like Toaster Cast. (This step is just to test your single video wall monitor and have some fun with Pi)

4. To extend this as an "n" monitor video wall, you need "n" Raspberry Pi and one computer that will split each video frame to "n" pieces each for a monitor display, then transmit this data as UPnP to the Pi network, which will in-turn render the frame to the monitor display attached to it.

Without resources, I can't complete step "4", but would be happy to work together with someone who wants to realise a fully functional video wall.

Thursday, November 27, 2014

Linking C and C++ functions using extern specifier

The way C and C++ compilers generate the assembly code is a little different, this could give lots of linker errors while trying to link a function written in a C file in a C++ code or vice-versa when using the extern specifier.

Always remember that C/C++ codes are directly converted to object and assembly codes that are just linear in nature like:

;Actual code to print
call _printf_    ;now the actual code under label _printf_ is called

Something like goto style code. Thus, printf() function might be atlast converted to _printf_ label in the object/assembly code. This label is used for linking stage.

Lets take a project which contains one C file and one CPP file.

void function_c()
printf("From C");

extern function_c();
void function_cpp()

This will not work because C compiler generates function name for function_c something like "_function_c_". But the C++ compiler (as it supports function overloading concepts) looks for the label "?function_c@@YAXXZ", hence it wont compile. To resolve this use the extern "C" option:

extern "C" function_c();
void function_cpp()

For more insights, read about Name mangling in C++:

Wednesday, January 1, 2014

Learn to customize an OS and create your own flavour

Its always a dream for geeky IT students to customize OS code to add some stuffs like, adding an additional screen during OS boots, adding your name in right click menu everything. But since they don't get proper guidance and direction to explore about this, most of them get bored and their enthusiasm goes down.

Students! What if you get a very good e-course which will teach you FROM SCRATCH everything required (technologies, tools, mind sets, innovative ideas) to download a live OS code, open the code and do something geeky with it, play with the Kernel, may be modify the OS TCP/IP code to be more effective and lot more. May be you got a very good research concept about OS but since you can't implement it and measure the result your new modified algorithm produces, you left presenting a great tech research paper.

No more worries! If you think that your college project is not/less innovate, its mundane, but you feel you are dedicated and have the zeal to learn and code complex things, here is what you need to do.

I have experimented with some OS codes and felt that my work will be a great asset for college students/researchers/hobbyists especially for research projects and I am documenting my work at:

Hobby Coders is non-profit team of coders who work for passion. Please register here to get email notifications when new chapters/topics/course are added.

Monday, June 10, 2013

How Computer Architecture and C++ can help a Java developer to improve in performance

Recently I came across an interesting example of how the knowledge of C/C++ and Computer Architecture can help a Java developer to code better performing code and want to share with you all.

Lets take an example of Matrix multiplication of two 1000x1000 matrices in two ways: (Download the source code from:
1. The traditional logic   ( usualMatrixMul() function has this logic in the code shared )
This is the traditional way, which was taught to us in schools. We traverse matrix 1 horizontally and matrix 2 vertically, multiplying the corresponding elements and calculating partial sums and adding all partial sums at last to find the resulting matrix's element.

2. Slightly modified logic ( modifiedMatrixMul() function has this logic in the code shared )
 In this slightly modified algorithm, we first transpose martix 2, and do the same traditional logic, but catered to the transposed matrix 2.

A demo run of these two algorithms took the following time to run: (x86 processor, Win32)
Usual matrix multiplication algorithm takes 24797 ms
Modified matrix multiplication algorithm takes 8500 ms

How this small change in logic can reduce the time of run by about three times?
                                            Well, a little bit of knowledge about C/C++ and Computer Architecture can help you understand this.

1. From C/C++: Multidimensional arrays are stored linear in the memory
JVM is itself implemented in C/C++. Under the hood, Java arrays are mapped to C/C++ arrays as the dynamic memory allocation logic for Java arrays are . In C/C++ arrays, be in single/multi dimension/s, are stored linear in memory.
For instance,
Array X in memory is stored as: X11 X12 X21 X22
Array Y in memory is stored as Y11 Y12 Y21 Y22

2. From Computer Architecture: Processor Cache caches chunks of linear data from memory to Cache
When your CPU needs some data from the memory, it first consults the cache, if it has the data. If the cache has the data from expected memory address (called Cache Hit), it immediately gives the data to the CPU and CPU proceeds with its work. But if the cache haven't cached the memory address, there is a Cache Miss and the CPU then goes to main memory to fetch the block to cache and proceeds with its execution.

 Cache accessing time is in nanoseconds, where was accessing main memory (RAM) will take long time in comparison.

How these help in the slightly modified algorithm?
                                               When the array is very large, parts of the array are cached and when there is a miss on some other part of array, then the subsequent part of the array is cached linearly.

In case of matrix multiplication by the traditional logic, matrix X is cached linearly and access linearly, so Cache misses are less. But, for matrix Y, the cache is like Y11 Y12 Y21 Y22, but is accessed like Y11 Y21 Y12 Y22.

For large data, like 1000x1000 matrix, the cache miss for matrix Y will be huge and everytime, the CPU has to bring data from main memory which is comparatively a time expensive operation and as said earlier, cache works like caching a chunk of data in increasing linear way of addressable memory.

Hence, tweaking the logic little bit and access matrix Y linearly reduces cache misses and shows improvement in performance of time.

Download source code at:

This performance improvement is based on caching and is specific to the traditional algorithm and is true for any programming language, Java/C/C++. Just java developers need to know additionally the way java arrays are handled by JVM.

Thursday, April 4, 2013

Parallel make utility to leverage from multi core processors

By default make utility, which is used for building native code is single threaded, which means that though there can be different targets compiled in parallel from the makefiles, make utility will go in a serial fashion and build targets mentioned in the makefiles.

But if you have multi core processors, you can utilize a handy switch in the make utility to build different targets in parallel.

To get the number of CPUs you have in your machine, use: grep 'processor.*:' /proc/cpuinfo | wc –l

Suppose you have two CPUs, you can command make utility to utilize both the CPUs at the same time and perform two different compilations in parallel. Make utility takes a parameter, -j, which specifies the number of parallel make threads you need. You can say make -j 2 makefiles, which will performs compilations in two parallel make threads.

Now, to tie the make threads to the number of CPUs, we can use:

make -f makefile -j `grep 'processor.*:' /proc/cpuinfo | wc -l` -k buildall

This can utilize the full true computation power of your machine and complete compilation in less time.


All separate make threads write output messages to the same output stream, so in your terminal, you will see jumbled messages from all parallel make threads. This could be a problem for logs, but for developers, who just want only the compilation to be done fast and not worried about the logs, this is cool.

Thursday, February 21, 2013

Controlling C# application from web application

 In this small tutorial, we are going to create a simple C# windows application with a web browser component, load a web page in it, and control the complete native application from web based application which is loaded into the web browser control.

By this, you can make your app fully as a web based app, in any of your favorite web based programming/scripting language like php, html, javascript and control the native application with web based coding.

A demo app - the notifications app is made with this concept. The app is open source at

I will be writing the steps in this blog, but till that time, I made a small video tutorial explaining the steps. Please check that.