ELEC 301 Projects Fall 2005 by Danny Blanco, et al - HTML preview

/ Home / Academic Articles / ELEC 301 Projects Fall 2005

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

background noise level ended up being somewhat higher than the ideal. However, said fan proved

quite useful as yet another test source.

The delayed and summed waveform in this case shows a great deal more "constant" level, if you

will, of noise than the individual waveform; thus lending weight to our contention that, since only

a small part of the signal is used for matching and since noise is, on the whole, relatively random,

the noise is both additive and destructive in such a way as to make the overall level of noise less

than one might expect.

Although the more regular signals experience an increase in overall magnitude of at least 2x as a

rule, it is interesting to note that, much as the previous paragraph suggests, the noise level here

remains more or less the same.

Also of note is, of course, the angle of the noise itself -- as we might expect from a fan placed

nearly directly overhead, our computations do indeed yield a small theta (given that it was not

exactly overhead, but in the approximate zero direction) and a small phi -- indicative that, despite

the random nature of the noise, our algorithm was still able to find enough regularity to deduce the

position of the "signal".

All the previous examples were run at an upsampling rate of 10x and a sampling frequency of 8

kHz. However, both of the previously mentioned variables can be changed at will; we also went

through and did a couple of tests at other levels of upsampling, although we generally stayed at or

around the same frequency, as that was initially chosen to cut down on the amount of aliasing that

would be induced by the analog-to-digital conversion.

Figure 2.12.

An example of how our program ran with no upsampling.

As you can see, even with no upsampling, there is still a significant improvement in the signal

between the initially input signals and the final, delayed and summed output. The less upsampling

there is, the fewer numbers there are that the program must iterate through; thus it should come as

no surprise that when run with no upsampling, the program completed far faster than with 10x.

However, it is also worth noting that when the signal is not upsampled, there is a lot less data to

work with, and the amount shifted to compensate for delay is more likely to be significantly

different from the actual time the signal should be delayed in an analog setting.

Thus, calculations may be prone to slightly more error, and as a result the final product may also

not be quite as effective at noise suppression as its upsampled cousins -- although as you can see

from the figure above, "not quite" still shows a significant change, even if the noise is still noticably more spiky than the previous results.

2.5. Delay and Sum Beamforming with a 2D Array:

Conclusions*

Summary of Results of Data

As can be seen in the screenshots given in the results section, the setup and program work at least some of the time. As was noted in the introduction, there is a small degree of ambiguity resulting from the two-dimensional nature of the array in that the array and algorithm will not be able to

distinguish at all, theoretically, between a source above the array and an identical source at a

location that is the original source reflected across the plane of the array. Practically, this should

not matter at all if the array is mounted on a wall or on the ground, but should the need arise, the

problem can be eliminated completely by taking the array to three dimensions.

As the arccosine function was used in the computation of one of the angles, accuracy is much

worse as the angle approaches ±pi/ 2 due to the slope of the arccosine function going to infinity in

that region. The shortcuts we took were probably detrimental to the accuracy of the project, but

once again, they were necessary to have it run at all on what we had to work with. Scanning with

only three microphones instead of with all eight does not fully utilize the noise-reducing

capability of the delay and sum technique, but it does conserve a great deal of computing power.

The simplification made by assuming that the three microphones used to scan could be broken up

into pairs will sometimes yield ”Not a Number” as one of the angles because the resulting delays

are not physically realizable.

When it works, the simplification cuts computation for that portion of the program from

complexity O(n^2) to O(n). The reason for this is that the ”safe” way to scan using the three

microphones would be to check each valid pair of delays, where valid would mean that the delays

between the microphones was physically realizable, and then take the pair corresponding to the

maximum power whilst the ”quick” way to do so that we ended up using was to compute the delay

for one pair, compute the delay for the other pair, and then assume that the two delays as a pair

were physically realizable. This works if the maximum power is located at one of the valid pairs

of delays (as, in most cases where there is a well defined signal, it will be) and fails if it is not, but is in return far faster, much as the reduction in computational complexity would suggest.

Limitations of Hardware and Computing Power

Any increases in computing power and hardware would always be welcome in a computationally

intensive project such as this; however, the computers and hardware that were used were far from

top of the line, even for five years ago. The DAQ card that was available for use is far enough out

of date that it is no longer supported by National Instruments; in fact, the current version of

Labview does not even recognize it.

As a result, in order to be able to interface with the DAQ card that was available, the version of

Labview this project was implemented on was also greatly out of date -- to the point where the NI

website no longer provided support. A good indicator of the age of the hardware is the fact that the

computer used possessed a turbo button. In fact, it seems that the computers were probably very

good when they were first purchased as the RAM size is actually 32 megabytes. However, that was

enough years ago to where they no longer are adequate for the level of processing we wished to

do.

With more computing power and better hardware, it would be possible to increase the amount of

sampling (restricted by a 64kbyte buffer) and upsampling to increase accuracy and precision, take

fewer shortcuts, also to the same effect, or perhaps run in real time (a feat possible only if the

processing time is of equivalent length to or less than the time it takes to input a "chunk" of the signal).

Possible Extensions

One obvious extension to this project would be to go to a 3-dimensional array, eliminating the

ambiguity arising from reflection of signals across the plane of the array. Another possible

improvement would be to increase the processing speed to the point where processing in real time

could be a possibility. This project only dealt with calculations in far field, which seem to be less

computationally taxing than calculations in near field. Thus, another possible extension would be

to implement something that could accurately locate signals in near field. Another possible region

of interest might be to investigate beamforming when dealing with multiple signal sources.

Simple delay and sum is optimal (in ability to distinguish, if not in computational complexity) for

a single source and white noise, but we are unsure if it is optimal if some structure to the noise is

known. In the case of multiple signals, the other signals would be considered noise when focusing

in on one signal. It may be possible to focus on each of the signals and separate them. If there is

not a closed form for doing so, the separation could probably be done iteratively by some sort of

subtraction. A further extension of this would be to apply blind signal separation techniques to

distinguish signal sources that seem to overlap.

2.6. Expressing Appreciation for the Assistance of

Others*

Dr. Don Johnson (aka DOJO)

For giving of his time and expertise to enlighten a bunch of lost undergrads in the area of array

processing and beamforming. Without his advice and book, Array Signal Processing: Concepts

and Techniques we might still be asking the question, "Beamforming? What's that?"

Dr. William Wilson (aka Dr. Bill)

For his continual interest in our project and always offering to assist us with what we needed, his time in helping us find the proper level of microphone amplication, recommendations of the types

of microphones to use, and his donation of OPAMPs, Coax Cable, and capacitors. We wish you

the best in the future whether at Rice or in Vermont!

Dr. Rich Baraniuk (aka RichB)

For his energetic and often wacky ways of conducting class and working with us and National

Instruments to obtain for us a DAQ card.

Dr. J.D. Wise

For laying out our options for obtaining samples from our microphones and using Labview.

Mr. Michael Dye

For granting us access to the undergraduate lab and always asking, "Is it working yet?"

Mr. Andy Deck and National Instruments

For his time and effort in locating for us a DAQ card and donating one for the use in ours and

future ELEC 301 projects. We are grateful for your time and generosity.

Solutions

Chapter 3. Seeing Using Sounds

3.1. Introduction and Background for Seeing with

Sound*

Introduction

Seeing with sound is our attempt to meaningfully transform an image to sound. The motivation

behind it is simple, to convey visual information to blind people using their sense of hearing. We

believe in time, the human brain can adapt to the sounds, making it a useful and worthwhile

system.

Background and Problems

In researching for this project, we found one marketed product online, the, vOICe, that did just what we set out to do. However, we believe that the vOICe is not optimum, and we have a few improvements in mind. One idea is to make the center of the image the focus of the final sound.

We feel like the center of an image contains the most important information, and it gets lost in the

left to right sweeping of vOICe. Also, some of the images are far too "busy" to use their technique. We the images need to be simplified so that only the most important information is

conveyed in the sounds.

3.2. Seeing using Sound - Design Overview*

Input Filtering

The first step in our process is to filter the input image. This process helps solve the "busy" sound problem from the vOICe. We decided to first smooth the image with a low pass filter, leaving only the most prominent features of the image behind. We then wanted to filter the result with an

edge detector, essentially a high pass filter of some sort. We chose to use a Canny filter for the

edge detection. The advantage of using an edge detector lies in simplifying the image while at the

same time highlighting the most structurally significant components of an image. This is

especially applicable to using the system for the blind, as the structural features of the image are

the most important to find your way around a room.

The Mapping Process

Simply put, the mapping process is the actual transformation between visual information and

sound. This block takes the data from the filtered input, and produces a sequence of notes

representing the image. The process of mapping images to sound is a matter of interpretation,

there is no known "optimal" solution to the mapping for the human brain. Thus, we simply chose

an interpretation that made sense to us.

First of all, it seemed clear to us that the most intuitive use of frequency would be to correlate it to

the relative vertical position of an edge in the picture. That is, higher frequencies should

correspond to edges that are higher in the image than lower frequencies. The only other idea that

we wanted to stick to was making the center the focus of the attention. For a complete description

of this component, see the mapping process.

3.3. Canny Edge Detection*

Introduction to Edge Detection

Edge detection is the process of finding sharp contrasts in intensities in an image. This process

significantly reduces the amount of data in the image, while preserving the most important

structural features of that image. Canny Edge Detection is considered to be the ideal edge

detection algorithm for images that are corrupted with white noise. For a more in depth

introduction, see the Canny Edge Detection Tutorial.

Canny Edge Detection and Seeing Using Sound

The Canny Edge Detector worked like a charm for Seeing Using Sound. We used a Matlab

implementation of the Canny Edge Detector, which can be found at

http://ai.stanford.edu/~mitul/cs223b/canny.m. Here is an example of the results of filtering an image with a Canny Edge Detector:

Figure 3.1. Figure Title (optional)

Before Edge Detection

Figure 3.2. Figure Title (optional)

After Edge Detection

3.4. Seeing using Sound's Mapping Algorithm*

The mapping algorithm is the piece of the system that takes in an edge-detected image, and

produces a sound clip representing the image. The mapping as we implemented it takes three

steps:

Vertical Mapping

Horizontal Mapping

Color Mapping

Figure 3.3. Mapping Diagram

Illustration of our mapping algorithm

Vertical Mapping

The first step of the algorithm is to map the vertical axis of the image to the frequency content of

the output sound at a given time. We implemented this by having the relative pitch of the output at

that time correspond to rows in each column that have an edge. Basically, the higher the note you

hear, the higher it is in your field of vision, and the lower the note, the lower in your field of

vision.

Horizontal Mapping

Next, we need some way of mapping the horizontal axis to the output sound. We chose to

implement this by having our system "sweep" the image from the outside-in in time (see figure 1).

The reasoning behind this is that the focus of the final sound should be the center of the field of

vision, so we have everything meeting in the middle. This means that each image will have some

period that it will take to be "displayed" as sound. The period begins at some time t0, and, with stereo sound, the left and right channels start sounding notes corresponding to edges on each side

of the image, finally meeting in the middle at some time tf.

Color Mapping

Using scales instead of continuous frequencies for the notes gives us some extra information to

work with. We decided to also try to incorporate the color from the original image of the point at

an edge. We were able to do this by letting the brightness of the scale that we use. For example,

major scales sound much brighter than minor scales, so bright colors correspond to major scales,

and darker ones correspond to minor. This effect is difficult to perceive for those that aren't

trained, but we believe that the brain can adapt to this pattern regardless of whether or not the user

truly understands the mapping.

3.5. Demonstrations of Seeing using Sound*

For each example, right click on the link to the corresponding sound and go to "Save Link Target

As..." to download and play it.

Examples

Figure 3.4. Identity Matrix

Our Simplest Example - Listen

Figure 3.5. X Matrix

Listen

Figure 3.6. Edge Detected Heart

Listen

Figure 3.7. Front Door Repeated

Our Hardest Example - Not for beginners! - Listen

3.6. Final Remarks on Seeing using Sound*

Future Considerations and Conclusions

There are many ways to improve upon our approach. One way to significantly improve left/right

positioning is to have the left and right scales play different instruments. Another way to improve

resolution would be to have different neighboring blocks compare data so that when an edge spans

many different blocks it does not sound like a cacophony. Other filters could be applied, besides

edge detectors, to determine other features of the image, such as color gradients or the elements in

the foreground. This information could be encoded into different elements of the basis scale, or

even change the scale to a different, perhaps acyclic, pattern. One way to go about this might be to

look at existing photo processing filters (e.g. in Photoshop) and use those for inspiration.

Contact Information of Group Members

Flatow, Jared: jmflizz @ rice dot edu

Hall, Richard: rlhall @ rice dot edu

Shepard, Clay: cwshep @ rice dot edu

Solutions

Chapter 4. Intelligent Motion Detection Using

Compressed Sensing

4.1. Intelligent Motion Detection and Compressed

Sensing*

New Camera Technology with New Challenges

Our project investigates intelligent motion detection using compressed sensing (CS), an emerging

data acquisition technology with promising applications for still and video cameras. CS

incorporates image compression into the initial data collection on an image rather than generating

a compressed file after initially collecting a larger amount of data. By taking only as many data

points as will be stored or transmitted, compressed sensing seeks to eliminate the waste from

collecting many, many pixel-intensity values on an image and then using compression algorithms

(such as JPEG or GIF) to encode a much smaller number of data points to closely approximate the

information in the original image. [1]

Lower resource usage makes compressed sensing cameras attractive choices for low-power

applications including security cameras. Ilan Goodman, Ph.D candidate at Rice University, has

demonstrated that motion detection using a simulated CS camera is possible by computing

entropy changes between successive CS measurements [2]. Starting from his work, we explore

what can be determined about the motion of an object using compressed sensing.

[1] "Compressed Sensing Resources." Digital Signal Processing Group, Department of Electrical

and Computer Engineering, Rice University. 2005. http://www.dsp.ece.rice.edu/cs.

[2] I.N. Goodman & D.H. Johnson. "Look at This: Goal-Directed Imaging with Selective

Attention." (poster) 2005 Rice Affiliates Day, Rice University, 2005.

4.2. Compressed Sensing*

Compressed sensing is based on exploiting sparsity. Sparse signals are those that can be

represented as a combination of a small number of projections on a particular basis. (This new

basis must be incoherent with the original basis.) Because of sparsity, the same signal can be

represented with a smaller amount of data while still allowing for accurate reconstruction.

In non-compressed sensing methods, one would first aquire a large amount of data, compute an

appropriate basis and projections on it, and then trasmit these projections and the basis used. This

is wasteful of resources since many more data points are initially collected than are transmitted.

In compressed sensing, a basis is chosen that will approximately represent any input sparse signal,

as long as there is some allowable margin of error for reconstruction.

Figure 4.1. Comparison of Data Aquisition Algorithms that Use Sparsity

Comparison of different algorithms. Our project focuses on the third algorithm using random basis projections.

The pre-defined basis for the optimal case (as represented in the block diagram) can only be

determined with prior knowledge of the signal to be aquired [1]. However, in practical

applications such information is not usually known. To generalize to a basis that gives sparse

projections for all images, a random basis can be used. A matrix of basis elements is generated

from random numbers such that the basis elements are normal and orthogonal on average. Since

using projections on a random basis is not the optimally sparse case, a larger number of

projections must be taken to allow for reconstruction [2],[3]. However, this number is still far

fewer than the number of datapoints taken using the traditional approach which exploits sparsity

after data acquisition.

One application of compressed sensing is an N-pixel camera being designed by Takhar et al.

which acquires much fewer than N data points to record an image [4].

For a more detailed explanation of compressed sensing, please refer to the literature on

http://www.dsp.ece.rice.edu/cs

[1] D. Baron, M. B. Wakin, S. Sarvotham, M.F. Duarte and R. G. Baraniuk, “Distributed

Compressed Sensing,” 2005, Preprint.

[2] E. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction

from highly incomplete frequency information,” IEEE Trans. Inform. Theory, 2004, Submitted.

[3] D. Donoho, “Compressed sensing,” 2004, Preprint.

[4] D. Takhar, V. Bansal, M. Wakin, M. Duarte, D. Baron, K. F. Kelly, and R. G. Baraniuk, “A

compressed sensing camera: New theory and an implementation using digital micromirrors,” in

Proc. Computational Imaging IV at SPIE Electronic Imaging, San Jose, January 2006, SPIE, To

appear.

4.3. Feature Extraction from CS Data*

Can Random Noise Yield Specific Information?

The novel challenge with using random basis projections for intelligent motion detection is that

there is no spatial information about the image or movie in the compressed sensing data.

Traditional pixel-based cameras provide a graph of light intensity over position and, logically,

most pixel-based detection approaches use the information about where the motion occurs to help

classify it. [1] The CS data provides us not with intensity at a point, but with the similarity (inner

product) between the original pixel image and a selection of basis elements composed of random

noise spread throughout the image plane. Intelligent detection for CS, therefore, must use

approaches radically different from detection used on conventional video.

Simplicity for Low Power

A key feature of CS systems is the potential for extremely low power consumption. To keep the

overall power of the system low, any computations we perform must be low power as well. For

practical application, the algorithms chosen must be simple to compute.

Investigation

PREV NEXT