How to Create Home Automation App for Clap Detection With Node.js and Raspberry PI

Not so long ago, I have described how you can manage your home and home appliances with a smartphone and Raspberry PI. Check out Building Home Automation Open-Source App With React Native, Node, Express and Raspberry Pi.

After this, I started looking for other ways to interact with your home. I have surfed the Internet and stumbled upon a few ideas. Here are just some of them:

  • Smartphone and similar gadgets
  • Camera
  • Voice sensor
  • Voice activation
  • Motion sensors

I was curious about what program for home management I can develop with Node.js, let's say, during the weekend. I have decided in favor of the voice sensor, to switch the light on and off with simple clapping. I have used ordinary wireless headphones with the microphone. I connected them to Raspberry PI and developed a short algorithm that responded to clapping. This is what I have got:

clap recording with nodejs and raspberry PI

I have used a diode for this because didn't have a relay on hand.I have described how you can connect your Raspberry to 220 V in the previous article.

The sound itself is a vibration of the air. During clapping the air is sharply disturbed. This, in turn, applies pressure to the microphone's membrane. This pressure is fixed in the form of the analog signal and applied to the sound card. It turns the analog signal into a digital one. This is how the recording of the clapping looks:

clap recording

Above you can see the air fluctuation (so-called sound amplitude) caused by clapping. The louder is the sound, the higher is the amplitude.

A +1-1 (t)

This is caused by the change of amplitude (A) from -1 to +1 depending on the time (t). In general, your device perceives a sound as a large array of numbers from -1 to +1, where each number is the value of a sound amplitude at a certain moment of time. The number of such moments is bigger than 10 000 per second (depending on the quality of recording).

How to detect a clap

To be honest, it is simple enough. All you have to do is to find a certain amplitude threshold limited by a particular period of time. If the amplitude of the sound exceeds this threshold and doesn’t exceed a certain period of time, it means that the clap took place. These values may be determined experimentally.

clap recording values On this image, t - is the minimum clap duration, T - a threshold of the maximum amplitude.

First of all,you have to make your program listen to the microphone. SOX protocol of a sound exchange is a perfect choice for this purpose. You can use mic package to simplify the process of setting things up. This is a simple package with detailed and easy documentation. The simplest code of the recording will look like this:

As you may notice, the documentation of the library provides the instructions on how to save recording to the file. We don’t use any files in this case, as we only have to listen to the microphone without saving this information. Settings in this library mean that the a two-channel recording will be made on the device plughw:0. You can see the list of the available devices with the arecord -l. command.

One more important point to remember is that the sensitivity of your microphone is limited by your operating system. You can regulate the microphone's sensitivity in the sound settings of your operating system. However, if you use raspberry, you often have to work with the help of console. You can change the microphone’s sensitivity with the help of alsamixer command. Check it out in more detail here.

Then you have to turn these data into the array of numbers (value of the amplitude at a certain moment of time), which will help you compare the amplitude with a certain threshold. You can do this with the help of wav-decoder package.

But here is a tricky bit. This package was not developed for working with the streams, but only with wav files. So you can’t directly decode buffer with it, because mic records data in the raw format. To turn raw into wav you need a header. Header is the information about the type and additional settings of the recording, which we have actually transferred to mic library. The package waveheader is exactly what we need. Here is how ouron data method will look:

Now you can work with the sound wave in the digital format. You can find the maximum wave value and compare it to the value of threshold. You can set the threshold value by the way of experiment. I have found that its optimal value equals >0.7 (70% from the maximum value of microphone's amplitude)

Now we need to calculate the time interval. The method on data appears approximately each 90 ms. However, this time interval is not long enough for the clap. I took the value 500 ms for it. That’s why I started to process the data and check if the sound signal was exceeded only after this time has passed.

You have to save data buffer for more precise data decoding into the numerical expression. It will come handy for work with more complex algorithms of sound wave processing. However, you can decode it at once and just save the array of decoded numbers. In such a way you can program the logic of processing on the clap. All you have to do is to manage one of the Raspberry PI outputs connected to the power supply at your home. I have described how to do it in the previous article.

Possible problems and ways to solve them

The algorithm I have described is just an idea you can turn into the solution. You can improve and enrich it depending on your needs. To make things easier for you, I have described the main challenges you can encounter and ways to deal with them.

  • Noise

Noise can become a real obstacle for the proper algorithm operation. If you turn the music too loud, the algorithm may not work as desired, as the sound amplitude will exceed the limit. To avoid this, you can take into account the clap duration with the highest maximum amplitude. Consequently, there should be silence before and after the clap. Here is how it looks schematically:

clap recording silence

During the period t, maximum amplitude should be bigger than T, and during the periods t-1 and t+1 - lower.

  • Double clap
You may also write an algorithm for double-clap detection. Here is how I have implemented this. I have made several recordings and found out that the duration of one clap is approximately 100 ms. There is approximately the same period of silence between the claps. In addition, theoretically there should be also silence before and after the series of claps. Here is how it looks schematically:

double clap

So, сonsider the period of claps, during which the maximum amplitude is higher than the limit, and time between the claps, where the maximum amplitude is lower than the limit.

If you work with more complex sound algorithms, you can use the spectrum analysis and Fourier transform algorithms. This algorithm shows at what frequency sound signal has the highest value. I have found some of its implementations in npm ecosystem. Personally I have used this frequencyjs library.

Wrapping Up

The algorithm described in this article is simple and “rough”. My aim was to show you how many possibilities you have with the technologies available out there. I have implemented the algorithm for this article with the help of Node.js platform, which proves its сapabilities and potential. If you have any questions or ideas, let me know in comments below:)