background noise level ended up being somewhat higher than the ideal. However, said fan proved
quite useful as yet another test source.
The delayed and summed waveform in this case shows a great deal more "constant" level, if you
will, of noise than the individual waveform; thus lending weight to our contention that, since only
a small part of the signal is used for matching and since noise is, on the whole, relatively random,
the noise is both additive and destructive in such a way as to make the overall level of noise less
than one might expect.
Although the more regular signals experience an increase in overall magnitude of at least 2x as a
rule, it is interesting to note that, much as the previous paragraph suggests, the noise level here
remains more or less the same.
Also of note is, of course, the angle of the noise itself -- as we might expect from a fan placed
nearly directly overhead, our computations do indeed yield a small theta (given that it was not
exactly overhead, but in the approximate zero direction) and a small phi -- indicative that, despite
the random nature of the noise, our algorithm was still able to find enough regularity to deduce the
position of the "signal".
All the previous examples were run at an upsampling rate of 10x and a sampling frequency of 8
kHz. However, both of the previously mentioned variables can be changed at will; we also went
through and did a couple of tests at other levels of upsampling, although we generally stayed at or
around the same frequency, as that was initially chosen to cut down on the amount of aliasing that
would be induced by the analog-to-digital conversion.
Figure 2.12.
An example of how our program ran with no upsampling.
As you can see, even with no upsampling, there is still a significant improvement in the signal
between the initially input signals and the final, delayed and summed output. The less upsampling
there is, the fewer numbers there are that the program must iterate through; thus it should come as
no surprise that when run with no upsampling, the program completed far faster than with 10x.
However, it is also worth noting that when the signal is not upsampled, there is a lot less data to
work with, and the amount shifted to compensate for delay is more likely to be significantly
different from the actual time the signal should be delayed in an analog setting.
Thus, calculations may be prone to slightly more error, and as a result the final product may also
not be quite as effective at noise suppression as its upsampled cousins -- although as you can see
from the figure above, "not quite" still shows a significant change, even if the noise is still noticably more spiky than the previous results.
2.5. Delay and Sum Beamforming with a 2D Array:
2.5. Delay and Sum Beamforming with a 2D Array:
Conclusions*
Summary of Results of Data
As can be seen in the screenshots given in the results section, the setup and program work at least some of the time. As was noted in the introduction, there is a small degree of ambiguity resulting from the two-dimensional nature of the array in that the array and algorithm will not be able to
distinguish at all, theoretically, between a source above the array and an identical source at a
location that is the original source reflected across the plane of the array. Practically, this should
not matter at all if the array is mounted on a wall or on the ground, but should the need arise, the
problem can be eliminated completely by taking the array to three dimensions.
As the arccosine function was used in the computation of one of the angles, accuracy is much
worse as the angle approaches ±pi/ 2 due to the slope of the arccosine function going to infinity in
that region. The shortcuts we took were probably detrimental to the accuracy of the project, but
once again, they were necessary to have it run at all on what we had to work with. Scanning with
only three microphones instead of with all eight does not fully utilize the noise-reducing
capability of the delay and sum technique, but it does conserve a great deal of computing power.
The simplification made by assuming that the three microphones used to scan could be broken up
into pairs will sometimes yield ”Not a Number” as one of the angles because the resulting delays
are not physically realizable.
When it works, the simplification cuts computation for that portion of the program from
complexity O(n^2) to O(n). The reason for this is that the ”safe” way to scan using the three
microphones would be to check each valid pair of delays, where valid would mean that the delays
between the microphones was physically realizable, and then take the pair corresponding to the
maximum power whilst the ”quick” way to do so that we ended up using was to compute the delay
for one pair, compute the delay for the other pair, and then assume that the two delays as a pair
were physically realizable. This works if the maximum power is located at one of the valid pairs
of delays (as, in most cases where there is a well defined signal, it will be) and fails if it is not, but is in return far faster, much as the reduction in computational complexity would suggest.
Limitations of Hardware and Computing Power
Any increases in computing power and hardware would always be welcome in a computationally
intensive project such as this; however, the computers and hardware that were used were far from
top of the line, even for five years ago. The DAQ card that was available for use is far enough out
of date that it is no longer supported by National Instruments; in fact, the current version of
Labview does not even recognize it.
As a result, in order to be able to interface with the DAQ card that was available, the version of
Labview this project was implemented on was also greatly out of date -- to the point where the NI
website no longer provided support. A good indicator of the age of the hardware is the fact that the
computer used possessed a turbo button. In fact, it seems that the computers were probably very
good when they were first purchased as the RAM size is actually 32 megabytes. However, that was
enough years ago to where they no longer are adequate for the level of processing we wished to
do.
With more computing power and better hardware, it would be possible to increase the amount of
sampling (restricted by a 64kbyte buffer) and upsampling to increase accuracy and precision, take
fewer shortcuts, also to the same effect, or perhaps run in real time (a feat possible only if the
processing time is of equivalent length to or less than the time it takes to input a "chunk" of the signal).
Possible Extensions
One obvious extension to this project would be to go to a 3-dimensional array, eliminating the
ambiguity arising from reflection of signals across the plane of the array. Another possible
improvement would be to increase the processing speed to the point where processing in real time
could be a possibility. This project only dealt with calculations in far field, which seem to be less
computationally taxing than calculations in near field. Thus, another possible extension would be
to implement something that could accurately locate signals in near field. Another possible region
of interest might be to investigate beamforming when dealing with multiple signal sources.
Simple delay and sum is optimal (in ability to distinguish, if not in computational complexity) for
a single source and white noise, but we are unsure if it is optimal if some structure to the noise is
known. In the case of multiple signals, the other signals would be considered noise when focusing
in on one signal. It may be possible to focus on each of the signals and separate them. If there is
not a closed form for doing so, the separation could probably be done iteratively by some sort of
subtraction. A further extension of this would be to apply blind signal separation techniques to
distinguish signal sources that seem to overlap.
2.6. Expressing Appreciation for the Assistance of
Others*
Dr. Don Johnson (aka DOJO)
For giving of his time and expertise to enlighten a bunch of lost undergrads in the area of array
processing and beamforming. Without his advice and book, Array Signal Processing: Concepts
and Techniques we might still be asking the question, "Beamforming? What's that?"
Dr. William Wilson (aka Dr. Bill)
For his continual interest in our project and always offering to assist us with what we needed, his time in helping us find the proper level of microphone amplication, recommendations of the types
of microphones to use, and his donation of OPAMPs, Coax Cable, and capacitors. We wish you
the best in the future whether at Rice or in Vermont!
Dr. Rich Baraniuk (aka RichB)
For his energetic and often wacky ways of conducting class and working with us and National
Instruments to obtain for us a DAQ card.
Dr. J.D. Wise
For laying out our options for obtaining samples from our microphones and using Labview.
Mr. Michael Dye
For granting us access to the undergraduate lab and always asking, "Is it working yet?"
Mr. Andy Deck and National Instruments
For his time and effort in locating for us a DAQ card and donating one for the use in ours and
future ELEC 301 projects. We are grateful for your time and generosity.
Solutions
Chapter 3. Seeing Using Sounds
3.1. Introduction and Background for Seeing with
Sound*
Introduction
Seeing with sound is our attempt to meaningfully transform an image to sound. The motivation
behind it is simple, to convey visual information to blind people using their sense of hearing. We
believe in time, the human brain can adapt to the sounds, making it a useful and worthwhile
system.
Background and Problems
In researching for this project, we found one marketed product online, the, vOICe, that did just what we set out to do. However, we believe that the vOICe is not optimum, and we have a few improvements in mind. One idea is to make the center of the image the focus of the final sound.
We feel like the center of an image contains the most important information, and it gets lost in the
left to right sweeping of vOICe. Also, some of the images are far too "busy" to use their technique. We the images need to be simplified so that only the most important information is
conveyed in the sounds.
3.2. Seeing using Sound - Design Overview*
Input Filtering
The first step in our process is to filter the input image. This process helps solve the "busy" sound problem from the vOICe. We decided to first smooth the image with a low pass filter, leaving only the most prominent features of the image behind. We then wanted to filter the result with an
edge detector, essentially a high pass filter of some sort. We chose to use a Canny filter for the
edge detection. The advantage of using an edge detector lies in simplifying the image while at the
same time highlighting the most structurally significant components of an image. This is
especially applicable to using the system for the blind, as the structural features of the image are
the most important to find your way around a room.
The Mapping Process
Simply put, the mapping process is the actual transformation between visual information and
sound. This block takes the data from the filtered input, and produces a sequence of notes
representing the image. The process of mapping images to sound is a matter of interpretation,
there is no known "optimal" solution to the mapping for the human brain. Thus, we simply chose
an interpretation that made sense to us.
First of all, it seemed clear to us that the most intuitive use of frequency would be to correlate it to
the relative vertical position of an edge in the picture. That is, higher frequencies should
correspond to edges that are higher in the image than lower frequencies. The only other idea that
we wanted to stick to was making the center the focus of the attention. For a complete description
of this component, see the mapping process.
3.3. Canny Edge Detection*
Introduction to Edge Detection
Edge detection is the process of finding sharp contrasts in intensities in an image. This process
significantly reduces the amount of data in the image, while preserving the most important
structural features of that image. Canny Edge Detection is considered to be the ideal edge
detection algorithm for images that are corrupted with white noise. For a more in depth
introduction, see the Canny Edge Detection Tutorial.
Canny Edge Detection and Seeing Using Sound
The Canny Edge Detector worked like a charm for Seeing Using Sound. We used a Matlab
implementation of the Canny Edge Detector, which can be found at
http://ai.stanford.edu/~mitul/cs223b/canny.m. Here is an example of the results of filtering an image with a Canny Edge Detector:
Figure 3.1. Figure Title (optional)
Before Edge Detection
Figure 3.2. Figure Title (optional)
After Edge Detection
3.4. Seeing using Sound's Mapping Algorithm*
The mapping algorithm is the piece of the system that takes in an edge-detected image, and
produces a sound clip representing the image. The mapping as we implemented it takes three
steps:
Vertical Mapping
Horizontal Mapping
Color Mapping
Figure 3.3. Mapping Diagram
Illustration of our mapping algorithm
Vertical Mapping
The first step of the algorithm is to map the vertical axis of the image to the frequency content of
the output sound at a given time. We implemented this by having the relative pitch of the output at
that time correspond to rows in each column that have an edge. Basically, the higher the note you
hear, the higher it is in your field of vision, and the lower the note, the lower in your field of
vision.
Horizontal Mapping
Next, we need some way of mapping the horizontal axis to the output sound. We chose to
implement this by having our system "sweep" the image from the outside-in in time (see figure 1).
The reasoning behind this is that the focus of the final sound should be the center of the field of
vision, so we have everything meeting in the middle. This means that each image will have some
period that it will take to be "displayed" as sound. The period begins at some time t0, and, with stereo sound, the left and right channels start sounding notes corresponding to edges on each side
of the image, finally meeting in the middle at some time tf.
Color Mapping
Using scales instead of continuous frequencies for the notes gives us some extra information to
work with. We decided to also try to incorporate the color from the original image of the point at
an edge. We were able to do this by letting the brightness of the scale that we use. For example,
major scales sound much brighter than minor scales, so bright colors correspond to major scales,
and darker ones correspond to minor. This effect is difficult to perceive for those that aren't
trained, but we believe that the brain can adapt to this pattern regardless of whether or not the user
truly understands the mapping.
3.5. Demonstrations of Seeing using Sound*
For each example, right click on the link to the corresponding sound and go to "Save Link Target
As..." to download and play it.
Examples
Figure 3.4. Identity Matrix
Our Simplest Example - Listen
Figure 3.5. X Matrix
Figure 3.6. Edge Detected Heart
Figure 3.7. Front Door Repeated
Our Hardest Example - Not for beginners! - Listen
3.6. Final Remarks on Seeing using Sound*
Future Considerations and Conclusions
There are many ways to improve upon our approach. One way to significantly improve left/right
positioning is to have the left and right scales play different instruments. Another way to improve
resolution would be to have different neighboring blocks compare data so that when an edge spans
many different blocks it does not sound like a cacophony. Other filters could be applied, besides
edge detectors, to determine other features of the image, such as color gradients or the elements in
the foreground. This information could be encoded into different elements of the basis scale, or
even change the scale to a different, perhaps acyclic, pattern. One way to go about this might be to
look at existing photo processing filters (e.g. in Photoshop) and use those for inspiration.
Contact Information of Group Members
Flatow, Jared: jmflizz @ rice dot edu
Hall, Richard: rlhall @ rice dot edu
Hall, Richard: rlhall @ rice dot edu
Shepard, Clay: cwshep @ rice dot edu
Solutions
Chapter 4. Intelligent Motion Detection Using
Compressed Sensing
4.1. Intelligent Motion Detection and Compressed
Sensing*
New Camera Technology with New Challenges
Our project investigates intelligent motion detection using compressed sensing (CS), an emerging
data acquisition technology with promising applications for still and video cameras. CS
incorporates image compression into the initial data collection on an image rather than generating
a compressed file after initially collecting a larger amount of data. By taking only as many data
points as will be stored or transmitted, compressed sensing seeks to eliminate the waste from
collecting many, many pixel-intensity values on an image and then using compression algorithms
(such as JPEG or GIF) to encode a much smaller number of data points to closely approximate the
information in the original image. [1]
Lower resource usage makes compressed sensing cameras attractive choices for low-power
applications including security cameras. Ilan Goodman, Ph.D candidate at Rice University, has
demonstrated that motion detection using a simulated CS camera is possible by computing
entropy changes between successive CS measurements [2]. Starting from his work, we explore
what can be determined about the motion of an object using compressed sensing.
[1] "Compressed Sensing Resources." Digital Signal Processing Group, Department of Electrical
and Computer Engineering, Rice University. 2005. http://www.dsp.ece.rice.edu/cs.
[2] I.N. Goodman & D.H. Johnson. "Look at This: Goal-Directed Imaging with Selective
Attention." (poster) 2005 Rice Affiliates Day, Rice University, 2005.
4.2. Compressed Sensing*
Compressed sensing is based on exploiting sparsity. Sparse signals are those that can be
represented as a combination of a small number of projections on a particular basis. (This new
basis must be incoherent with the original basis.) Because of sparsity, the same signal can be
represented with a smaller amount of data while still allowing for accurate reconstruction.
In non-compressed sensing methods, one would first aquire a large amount of data, compute an
appropriate basis and projections on it, and then trasmit these projections and the basis used. This
is wasteful of resources since many more data points are initially collected than are transmitted.
In compressed sensing, a basis is chosen that will approximately represent any input sparse signal,
as long as there is some allowable margin of error for reconstruction.
Figure 4.1. Comparison of Data Aquisition Algorithms that Use Sparsity
Comparison of different algorithms. Our project focuses on the third algorithm using random basis projections.
The pre-defined basis for the optimal case (as represented in the block diagram) can only be
determined with prior knowledge of the signal to be aquired [1]. However, in practical
applications such information is not usually known. To generalize to a basis that gives sparse
projections for all images, a random basis can be used. A matrix of basis elements is generated
from random numbers such that the basis elements are normal and orthogonal on average. Since
using projections on a random basis is not the optimally sparse case, a larger number of
projections must be taken to allow for reconstruction [2],[3]. However, this number is still far
fewer than the number of datapoints taken using the traditional approach which exploits sparsity
after data acquisition.
One application of compressed sensing is an N-pixel camera being designed by Takhar et al.
which acquires much fewer than N data points to record an image [4].
For a more detailed explanation of compressed sensing, please refer to the literature on
http://www.dsp.ece.rice.edu/cs
[1] D. Baron, M. B. Wakin, S. Sarvotham, M.F. Duarte and R. G. Baraniuk, “Distributed
Compressed Sensing,” 2005, Preprint.
[2] E. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction
from highly incomplete frequency information,” IEEE Trans. Inform. Theory, 2004, Submitted.
[3] D. Donoho, “Compressed sensing,” 2004, Preprint.
[4] D. Takhar, V. Bansal, M. Wakin, M. Duarte, D. Baron, K. F. Kelly, and R. G. Baraniuk, “A
compressed sensing camera: New theory and an implementation using digital micromirrors,” in
Proc. Computational Imaging IV at SPIE Electronic Imaging, San Jose, January 2006, SPIE, To
appear.
4.3. Feature Extraction from CS Data*
Can Random Noise Yield Specific Information?
The novel challenge with using random basis projections for intelligent motion detection is that
there is no spatial information about the image or movie in the compressed sensing data.
Traditional pixel-based cameras provide a graph of light intensity over position and, logically,
most pixel-based detection approaches use the information about where the motion occurs to help
classify it. [1] The CS data provides us not with intensity at a point, but with the similarity (inner
product) between the original pixel image and a selection of basis elements composed of random
noise spread throughout the image plane. Intelligent detection for CS, therefore, must use
approaches radically different from detection used on conventional video.
Simplicity for Low Power
A key feature of CS systems is the potential for extremely low power consumption. To keep the
overall power of the system low, any computations we perform must be low power as well. For
practical application, the algorithms chosen must be simple to compute.
Investigation