August | 2012 | APPLIED PHYSICS 186

Activity 9: Playing Notes by Image Processing

August 22, 2012

If previously we had to deal with the handwritten notes, this time we would be dealing with another type of notes.

I have always been fond of music and have considered it to be essential in life. Just like any other languages, music has the ability to make you understand a message by not only its lyrics but also based on the tone of the music. Music can be produced through vibrations of materials and the notes vary by the frequency of these vibrations.

With these concept, Scilab is able to play music for us! Shocking right? It simply requires the frequency of the notes you want to play as well as the duration of the note. Making use of the note() function which asks for these two parameters and placing all notes inside a list []… the sound() function does the singing for you!

The Sound of Music:

Before anything else, a musical sheet is needed. It was instructed that a simple musical sheet is best used first so as to keep things simple.. However, since most of my classmates are starting of with the most basic “Twinkle twinkle little star”,I instead opted to use of my favorite songs by the Beatles 🙂

All My Loving – Beatles

I’ve divided this musical sheet into three rows respectively. Each row will be image processed respectively so that I’d end up determining the corresponding notes from this musical sheet. Calling these rows in Scilab, the lines can be easily removed by making use of the im2bw() function where the threshold value is set to the point where we’d only end up with something like this:

Notice that I have cropped the image where the left most portion of the musical sheet is removed as this is not essential for our case. A cropping function is available in Scilab: imcrop(x,y,W,H) where the x and y values are the coordinates of the point where you’d like to start cropping and the W,H the width and height of the portion which you wish you to leave.

Now since more or less we are left with the marks that we want… we only have to deal with a few more. I have tried using morphological functions however I would always end up erasing some essential portions in from my notes. The only solution I could come up with was to mask these unwanted portions. I eventually resorted to this method as my chosen musical piece is a bit complicated. If I have only chosen a simpler model, I’m pretty sure that I’d end up instantly getting a clean image after just a few steps :).

Anyway, the masking procedure is done by creating masks first. These masks are black on white patterns where the black portions stand for the locations of the pixels you wish to no longer appear on our image. Using the image above as our reference, the corresponding masks I have made are as follows:

Calling this in Scilab and multiplying this to our image so far.. the unwanted points will be multiplied to zero eventually leaving us with only the points we desire.

Now at this point, we only have the most essential portions from our musical note. We can clearly distinguish which notes are which. From the first row, the nine quarter notes are seen as solid white blobs whereas the two half notes seem like incomplete white blobs. The same goes with the second row only this time there is only one half note and twelve quarter notes. Lastly, the last row shows seven white blobs which stand for our quarter notes while the single whole note looked like two smaller blobs a significant distance apart. Still, this is different from our half notes where the half notes’ distance is smaller. This then is enough to distinguish one note from the other and hence gives way for us to move on to the next portion of our activity.

The code that I used in identifying the respective type of note (duration) of each blob is available below:

Code I have used for rows 1 and 2. Row 3 will simply replace t3 as we are now dealing with whole notes instead.

This makes use of the functions from the SIP toolbox (as I still have yet to figure out how the Blob functions work in the IPD toolbox). From our final images processed above, we run this through the bwlabel() function. This function labels the ‘connected’ blobs separating each and one of them. The L output is actually the matrix of our image wherein the blobs are relabeled (one blob will all have values equal to one, the next will have a value equal to 2 etc.). The n output of the bwlabel function on the other hand gives out the total number of blobs the bwlabel distinguished. Thanks to a site, I figured out that the bwlabel function in SIP actually scans the image for the blobs from top to bottom. For our case, we hope to scan it from left to right instead. Hence, I had to use the transpose technique seen in my code. The rest of the code basically instructs that if the area of the blobs have a specific value, it will have a corresponding note type (or duration). I have done this as the difference in the area of the blobs are obviously significant for our different note types.

The determination as to which frequency these notes have is the next objective in finally making Scilab sing my song. From the results that we have, I basically had to do morphological operations to end up with only single pixels as a result for each note. With this, the code I have used in localizing is available below:

What the code basically does is that it finds the location of all the single points we have. I have also found out that the matrix in Scilab is indexed in this manner:

hence.. I made use of the modulo function that outputs the remainder of a number if divided by another number. The other number I used is the height of the image we are currently dealing with while the number to be divided is the index from the find function. Through this method, I get the location of the specific note with respect to the vertical axis. This enables me to set ranges for which certain distances from the top is indicated.

With all these.. I have compiled the results from the three rows giving rise to the music below:

Take note that the singing in scilab makes use of the following code as provided by Mam Jing Soriano.

function n = note(f, t)
n = sin (2*%pi*f*t);
endfunction;

S = [note(typeofnote, duration)….];

sound(S);

know it didn’t exactly sound like the original but blame it to the fact that this musical sheet required the use of complex symbols. Also, rests in between certain sets of points are required as well for our case to ultimately duplicate the music of the Beatles. Nevertheless, if this was done to a much simple musical piece, there is no doubt my code would be able to sing this properly 😀

For this activity… even though I was unable to really replicate the sound of the Beatles’ All My Loving.. I would like to give myself still a grade of 10. This was based from the fact that I was able to accomplish the objective of this activity which is to find a way for Scilab to read and sing musical sheets. I have made use of different techniques coming from different toolboxes from Scilab and through these I have managed to learn a few more things such as the bwlabel functions. Also I was pretty excited by the fact that I was able to localize the points of the notes through only the use of the find and the modulo functions! So I think with all that I did I deserve this grade 🙂

Posted by Izayish

Filed in Uncategorized

Activity 8: Preprocessing Text

August 22, 2012

From all that we’ve learned so far, we are now to try and test our CSI/Detective/Stalking skills. The next three activities requires the use of the different techniques learned, especially the morphological operations, to manipulate the images in Scilab.

In preparation for the said activities, I had to ensure that different toolboxes in my Scilab are functioning well:

SIVP in Scilab 5.3.3 – the installation process of this toolbox is very straightforward and has been discussed in the earlier chapters.
SIP in Scilab 4.1.2– with the SIP toolbox being a bit moody back in my Ubuntu operating system (as discussed earlier), I was forced to install a lower version of Scilab in my Windows OS and eventually put in SIP there. The installation of this toolbox is straightforward HOWEVER upon choosing the SIP toolbox…. you’d get loads of errors!! Thanks to Ms. Charm Gawaran’s hint… this can be easily resolved by typing in the code:
chdir(ImageMagickPath);
link(‘CORE_RL_magick_.dll’);

to the client and then clicking the SIP toolbox again will finally properly load SIP 😀
IPD in Scilab 5.3.3 – As suggested by Mam Jing Soriano, we can download the IPD toolbox through here. Unzipping the file and placing the folder inside the Scilab 5.3.3 –> contrib folders, the installation can only be completed upon typing in the following command in the client.

atomsInstall(‘IPD’)

This toolbox will prove its purpose as there are a lot of morphological operations (and other image processing functions) available here. In fact, this toolbox was the one I most used for this activity.

So now that we are fully equipped with all the functions that we need. Let us now proceed to our mission for this episode 🙂

Stealing Signatures:

I know it’s really a bad idea to try and steal someone’s signature but…. I find this title fitting for this activity. Why? Well.. we are given a scanned document as seen below this paragraph. Making use of only Scilab, we have to figure out a way to retrieve the handwritten words of a portion of this document and have it cleaned up ending with a clear, binary image of our chosen words.

However, it was indicated that the use of other image editing devices (such as Photoshop or GIMP) is allowed only for cropping your desired portion as well as re-aligning the image so that the lines coming from the table is not tilted. The alignment of these lines will play an important role later on…

Taking notice that this image is actually a Truecolor type of image, we expect to have RGB layers upon calling this in Scilab. Instead of immediately converting the image to grayscale or black and white, I looked at the different layers of the RGB image and compared them to one another first. This was done because it was noticeable that the handwritten words are not exactly colored black but rather blue. I was hoping thatupon looking at the “blue” layer, the handwritten would have the highest intensity and hence can be easily detached from the rest of the image. However, the images below seem to tell another story…

Since what I was hoping to find was not apparent (and in fact the blue layer had the smallest intensity among RGB layers!), I just had to make the most out of my effort. From here, I picked the layer which gave the highest contrast. This was done so that I can easily eliminate the noise coming from the paper (it’s texture or small particles) hence leaving us with only the handwritten words and the horizontal lines. I compared the red layer to the binarized version of the original image as well as its’ grayscale to see which image produced the highest contrast. It was still the red layer that gave out the image with the highest contrast so this was used for the next steps of my image manipulation.

The next thing I did was to try and remove the horizontal lines off our image. This can be done by investigating the Fourier transform of the image. From the previous activities, repetitive patterns will appear as distinct patterns in Fourier space. Through this, we can easily determine which points in the Fourier space are the ones responsible for the horizontal lines. This can be removed by making use of a mask that removes the said patterns off the Fourier transform of our original image. The results are presented below:

The code used is available below where I is the matrix of the red layer of our original RGB image and F the filter used to remove the unwanted patterns off the Fourier transform.

In addition to the FFT of the image, I had to properly scale the resulting inverse Fourier through the code seen in lines 25-26. The intensity of the resulting image was easily varied by simply multiplying this to a constant (which for my case was 3) so that I’d end up with the contrast that I wanted. With Mam Soriano’s reminder that the image should be white on black, we “inverse” the image by using the imcomplement() function. And finally, we binarize the resulting image with the threshold that best fits our taste (0.55 for my case).

What’s our end-product so far?

From the looks of it, the image no longer has the horizontal parallel lines coming from the table. However, due to its removal… some points that belonged to the words were removed as well. Also, there are still noise visible. The only solution to this is to make use of different morphological operations :).

Besides the typical erode and dilate operations we have learned from the previous activity… the scilab IPD toolbox presents other morphological functions such as close and image (which basically is a combination of dilate and erode). I have also learned from this portion of the activity the importance of the structural elements we make use of. Thankfully, the IPD toolbox has the CreateStructureElement() function which asks for two inputs. The first input is the type of structural element you wish to have. There are already built in codes for the typical shapes such as square, circle rectangle, etc. If on the other hand you wish to make use of your own structural element, the term ‘custom’ is for you. The second input in this function asks for the size of the structural element you wish to have. If for the case you make use ‘custom’, the second input directs to the matrix of your ideal structural element containing boolean terms %T and %F (instead of the usual 1’s and 0’s).

As much as I’d like to show the different structural elements that I have used, the imshow function does not work alongside the output of the CreateStructureElement() function. Hence I would just be listing down the code and leave it to your imagination 🙂

A. Eroded the current image we have so far with its’ structuring element a rectangle of dimension [1, 3].

B. Dilated the image (A) with structuring element a rectangle of dimension [2,1].

C. Dilated the image (B) with structuring element following the matrix [%F %T; %T %F]).

D. Dilated the image (C) with the structuring element a rectangle of dimension [3,1].

What exactly was I doing? Well.. I wanted to find a way for the gaps from the removal of the horizontal lines be brought back through morphological functions. I had to constantly dilate the image in a way that it closes the gaps BUT the letters from the note is still distinguishable. This was the best combinations I could get… and sadly it is not looking good so far. I could’ve stopped at point B or C however the gap still exists! In fact, A is good enough for me already but the gap was very very persistent.

Wanting to end up with only single pixel thick letters, I had to shift to my Scilab 4.1.2 to make use of the thin() and skel() morphological functions available in the SIP toolbox. The results are as follows:

The skel function gave the outline of the letters which I think has words that are readble (such as CABLE). On other hand, the thin function takes the central points of your shapes. Due to the intense dilation from my previous steps, I ended up different squiggly lines that looked as if certain words were connected to one another. The word “CABLE” on the lower right has only the letters C and a to be readable.

I really tried my best in finding the right combinations but I somehow ended up with unreadable images still! The best image that I got came from the images with the gaps… In fact I tried thinning point A as I find this image readable for me and ended up with:

were I think the word Cable as well as the B from the “USB” was readable. I tried close the gaps through another set of morphological function but I ended up ruining the image. This then gives me the best image I can post process…

Personal Note:

Hence, with my resulting image a bit tacky and not being able to end up with all the words being readable… I would give myself a grade of only 8. Though I think I should give props to myself as the portion I have chosen to post process was actually one of the hardest. It dealt with handwritten cursive font letters as well as the whole portion embedded with horizontal lines. But still, I was pretty much disappointed with my results. Every morphological operation I did has its advantages as well as its disadvantages. I tried to cleverly think of new structuring elements but I always end up messing the image even more. 😦

Posted by Izayish

Filed in Uncategorized

Activity 7: Morphological Operations

August 2, 2012

In preparation for an extreme ultra-CSI episode that’ll be up next after this activity, we have to familiarize ourselves with another technique in image processing that will be handy in future use.

The word Morphology already gives you a faint idea that it deals with shapes and structures. Morphological operations are commonly used on images to enhance, isolate, and improve its details. However, these operations are carried out only on binary images with its elements lying on a 2-dimensional space.

A quick review of the set-theory is needed as this serves as the main rules of our operations. The image below will serve as our reference point. The following elements can be seen:

Set A – pink box, Set B – purple box, C – region within the green blue outline, Set D – yellow box, and the elements butterfly and running man.

The butterfly is an element of Set A whereas the running man is not.
The complement of D are all the elements outside it (Set A, butterfly, doorman and the whole region of B that is not within D).
Set D is a subset of Set B.
C is the intersection of Sets A and B.
The union of Set A and B is the whole pink and purple regions together.
The difference of Set A and B is the whole pink region excluding those within the blue green outline.

In addition, translation is simply the shifting of the element to another point.

For the morphological operations to work, two images or sets are required. The main pattern will serve as the image we hope to increase/decrease its size. The structuring element on the other hand will serve as our basis on how much will the original pattern change and as to how it will change.

Dilation – Super Size Me!

With your structuring element having a set origin (where this is where we will be placing an element if certain cases are met), this is translated to every point of the main pattern. The points at which at least a single element of your structuring element intersects with your main pattern, the origin will turn into an element of the dilated set.

A larger structuring element will then result to a larger output. If in case a single pixel is considered as your structuring element, we will end up with the exact image as our main pattern.

Erode – The Biggest Loser!

As for the case of the erode operation, the resulting image will be smaller as compared to the original image. This is because this operation considers the points at which your structuring element fit inside your main pattern.

——-

To start of with the activity, we first have to determine the images we want to apply the operations. As indicated earlier, there should be two inputs involved: the main pattern by which we will be eroding / dilating and the structuring element which will be the basis as to how much will the original pattern be eroded/dilated.

For the main pattern, four different shapes were used as drawn below:

(L-R): A 5×5 square, a triangle with 4 units as its base and 3 units as its height, a cross that has a length of 5 units for each strip and lastly a hollow 10×10 square that has a width of 2 units.

The structuring elements on the other hand are as shown below:

(L-R): A 2×2 square, 1×2 square, 2×1 sqaure, 1’s at the diagonal of 2×2

Going old-school:

Before taking advantage of the gifts of technology, we are to test if we have really understood the concepts behind the dilate and erode morphological operations by going old-school. Through the use of a graphing paper and a marker, the resulting shape as our main patterns are eroded for a chosen structuring element is predicted and drawn.

Starting with erode, we are to expect that the shapes would be smaller as compared to the original pattern. The matrix of images below shows that I have managed to get shapes that are indeed smaller. In fact, there are instances wherein the original pattern is totally removed! This is expected for certain shapes as the erode operation deals with points at which your structuring element perfectly fits the main pattern. Take for example the case of cross main pattern where the 2×2 box structuring element is impossible to embed as the width of the cross is only 1 unit! Hence, the resulting eroded shape is nothing.

Erode Matrix of Images (Theoretical)

In contrast, dilating the shape results to a relatively larger pattern which I have managed to observe from my predictions. With the dilate function considering the points at which the structuring element still overlaps on the main pattern even if the overlap is only a single pixel.

Dilate Matrix of Images (Theoretical)

Consistency please?

Knowing what we are to expect upon dilating and eroding all our patterns and structuring elements together, it’s time to find out if my predictions are correct. The operations can be carried out in Scilab as readily available functions for morphological operations are present in the SIP and IPD toolbox.

With the sudden death of my Ubuntu system, I have finally accepted defeat and opted to download the Scilab 4.1.2.. This version of Scilab matches the last version of the SIP toolbox amidst having lots of flaws such as being unable to run .sce’s so I usually end up typing my codes in notepad and pasting them on the prompt.

Through the use of Photoshop, I recreated the main patterns as well as the structuring elements as observed below:

Main Patterns

Structural elements

The important thing to take note in creating these patterns is the consistency of the scale of your units. Initially, I downloaded a graphing paper page from the internet and used this as the basis in creating my patterns. However, upon eroding and dilating them, the results produced images that seemed to not fit the units that I have! I had to redo my patterns again using 3×3 pixels as a single unit.

Now after calling all our patterns into scilab through the imread() function, it is best to ensure that the matrix of your patterns is strictly binary and not in grayscale (afterall, we are talking about binary operations!). This can be easily done by the im2bw() function with my threshold point set to be at 0.7.

The erode() function calls for two inputs: the matrix of the image you wish to erode and the structuring element. Using this through all our patterns gives rise to the matrix of images below:

Eroded Images through Scilab

Comparing this matrix to my matrix of images of old-school eroded patterns, I am glad to say that they are exactly the same! The only difference between the two are their scaling.

Similarly, the matrix of image for the dilated images show that the digitally dilated images as well as my predictions are the same.

Dilated images through Scilab

Personal Note:

Despite getting the right results from my predictions, it took quite awhile before I managed to fully understand and master the art of eroding and dilating. I had the hardest time taking to heart the process of dilation and it was all thanks to Mr. Xavier Tarcelo that I finally figured it out through the use of an acetate in which I had to move box per box on my graphing paper. For this activity, I would like to give myself a grade of 10/10 as I have managed to accomplish all the requirements in the activity as well as understand the techniques and concepts.

References:

1. Soriano, Jing. Applied Physics 186 Activity 7 – Morphological Operations. NIP, UP Diliman. 2012.

Posted by Izayish

Filed in Uncategorized ·Tags: graphing paper, Photoshop, scilab, SIP

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

APPLIED PHYSICS 186

Activity 9: Playing Notes by Image Processing

August 22, 2012

function n = note(f, t) n = sin (2*%pi*f*t); endfunction;

S = [note(typeofnote, duration)….];

sound(S);

Activity 8: Preprocessing Text

August 22, 2012

Activity 7: Morphological Operations

August 2, 2012

function n = note(f, t)
n = sin (2%pif*t);
endfunction;