Publications and Documents from 1998 to Present |
---|
Detection
of unknown targets from aerial camera and extraction of simple object
fingerprints for the purpose of target reacquisition - 2012 |
T. Nathan Mundhenk, | Kang-Yu Ni, | Yang Chen, | Kyungnam Kim | and | Yuri Owechko |
|
Intelligent Robots and Computer Vision XXIX: Algorithms and Techniques | vol 8301 | | | San Francisco | January |
|
Abstract: |
An aerial multiple camera tracking paradigm needs to not only spot unknown targets and track them, but also needs to
know how to handle target reacquisition as well as target handoff to other cameras in the operating theater. Here we
discuss such a system which is designed to spot unknown targets, track them, segment the useful features and then create
a signature fingerprint for the object so that it can be reacquired or handed off to another camera. The tracking system
spots unknown objects by subtracting background motion from observed motion allowing it to find targets in motion,
even if the camera platform itself is moving. The area of motion is then matched to segmented regions returned by the
EDISON mean shift segmentation tool. Whole segments which have common motion and which are contiguous to each
other are grouped into a master object. Once master objects are formed, we have a tight bound on which to extract
features for the purpose of forming a fingerprint. This is done using color and simple entropy features. These can be
placed into a myriad of different fingerprints. To keep data transmission and storage size low for camera handoff of
targets, we try several different simple techniques. These include Histogram, Spatiogram and Single Gaussian Model.
These are tested by simulating a very large number of target losses in six videos over an interval of 1000 frames each
from the DARPA VIVID video set. Since the fingerprints are very simple, they are not expected to be valid for long
periods of time. As such, we test the shelf life of fingerprints. This is how long a fingerprint is good for when stored
away between target appearances. Shelf life gives us a second metric of goodness and tells us if a fingerprint method
has better accuracy over longer periods. In videos which contain multiple vehicle occlusions and vehicles of highly
similar appearance we obtain a reacquisition rate for automobiles of over 80% using the simple single Gaussian model
compared with the null hypothesis of less than 20%. Additionally, the performance for fingerprints stays well above the null
hypothesis for as much as 800 frames. Thus, a simple and highly compact single Gaussian model is useful for target
reacquisition. Since the model is agnostic to view point and object size, it is expected to perform as well on a test of
target handoff. Since some of the performance degradation is due to problems with the initial target acquisition and
tracking, the simple Gaussian model may perform even better with an improved initial acquisition technique. Also, since
the model makes no assumption about the object to be tracked, it should be possible to use it to fingerprint a multitude of
objects, not just cars. Further accuracy may be obtained by creating manifolds of objects from multiple samples.
|
|
Key Words: | Arial, Video, Fingerprint, Unsupervised, Target, Reacquisition, Tracking, AFTER |
|
|
|
|
|
Manifold-based Fingerprinting for Target Identification - 2012 |
Kang-Yu Ni, | T. Nathan Mundhenk, | Kyungnam Kim | and | Yuri Owechko |
|
|
Abstract: |
In this paper, we propose a fingerprint analysis algorithm based on using product manifolds to create
robust signatures for individual targets in motion imagery. The purpose of target fingerprinting is to re-identify a
target after it disappears and then reappears due to occlusions or out of camera view and to track targets
persistently under camera handoff situations. The proposed method is statistics-based and has the benefit of
being compact and invariant to viewpoint, rotation, and scaling. Moreover, it is a general framework and does not
assume a particular type of objects to be identified. For improved robustness, we also propose a method to detect
outliers of a statistical manifold formed from the training data of individual targets. Our experiments show that the
proposed framework is more accurate in target reidentification than single-instance signatures and patchbased
methods.
|
|
|
|
|
|
|
Methods for efficient correction of complex noise in outdoor video rate passive millimeter wavelength imagery - 2012 |
T. Nathan Mundhenk, | Joshua Baron | and | Roy M Matic |
|
Optical Engineering | vol 51 | issue 9 | pg 1-9 | | September |
|
Abstract: |
Passive millimeter wavelength (PMMW) video holds great promise, given
its ability to see targets and obstacles through fog,
smoke, and rain. However, current imagers produce undesirable complex
noise. This can come as a mixture of fast shot (snowlike) noise and a
slower-forming circular fixed pattern. Shot noise can be removed by a
simple gain style filter. However, this can produce blurring of objects
in the
scene. To alleviate this, we measure the amount of Bayesian surprise in
videos. Bayesian surprise measures feature change in time that is
abrupt but cannot be accounted for as shot noise. Surprise is used to
attenuate the shot noise filter in locations of high surprise. Since
high
Bayesian surprise in videos is very salient to observers, this reduces
blurring, particularly in places where people visually attend. Fixed
pattern
noise is removed after the shot noise using a combination of
nonuniformity correction and mean image wavelet transformation. The
combination
allows for online removal of time-varying fixed pattern noise, even when
background motion may be absent. It also allows for online adaptation
to differing intensities of fixed pattern noise. We also discuss a
method for sharpening frames using deconvolution. The fixed pattern and
shot
noise filters are all efficient, which allows real time video processing
of PMMW video. We show several examples of PMMW video with complex
noise that is much cleaner as a result of the noise removal. Processed
video clearly shows cars, houses, trees, and utility poles at 20 frames
per second.
|
|
|
PDF Version of this Document for Dowdload. Mirror List:- mundhenk.com
|
|
|
|
|
Efficient Reduction of Complex Noise in Passive Millimeter Wavelength Video Utilizing Bayesian Surprise - 2011 |
T. Nathan Mundhenk, | Josh Baron | and | Roy Matic |
|
Display Technologies and Applications for Defense, Security, and Avionics V; and Enhanced and Synthetic Vision 2011 | vol 8042 | | | Orlando, FL | |
|
Abstract: |
Passive millimeter wavelength (PMMW) video holds great promise given its ability to see targets and obstacles through
fog, smoke and rain. However, current imagers produce undesirable complex noise. This can come as a mixture of fast
shot (snow like) noise and a slower forming circular fixed pattern. Shot noise can be removed by a simple gain style
filter. However, this can produce blurring of objects in the scene. To alleviate this, we measure the amount of Bayesian
surprise in videos. Bayesian surprise is feature change in time which is abrupt, but cannot be accounted for as shot noise.
Surprise is used to attenuate the shot noise filter in locations of high surprise. Since high Bayesian surprise in videos is
very salient to observers, this reduces blurring particularly in places where people visually attend. Fixed pattern noise is
removed after the shot noise using a combination of Non-uniformity correction (NUC) and Eigen Image Wavelet
Transformation. The combination allows for online removal of time varying fixed pattern noise even when background
motion may be absent. It also allows for online adaptation to differing intensities of fixed pattern noise. The fixed pattern
and shot noise filters are all efficient allowing for real time video processing of PMMW video. We show several
examples of PMMW video with complex noise that is much cleaner as a result of the noise removal. Processed video
clearly shows cars, houses, trees and utility poles at 20 frames per second.
|
|
Key Words: | Passive, Millimeter, PMMW, Noise, Reduction, Surprise, NUC |
|
|
|
|
|
High Precision Object Segmentation and Tracking for use in Super Resolution Video Reconstruction - 2011 |
T. Nathan Mundhenk, | Rashmi Sundareswara, | David R Gerwe | and | Yang Chen |
|
Intelligent Robots and Computer Vision XXVIII: Algorithms and Techniques | vol 7878 | | | San Francisco | January |
|
Abstract: |
Super resolution image reconstruction allows for the enhancement of images in a video sequence that is superior to the
original pixel resolution of the imager. Difficulty arises when there are foreground objects that move differently than the
background. A common example of this is a car in motion in a video. Given the common occurrence of such situations,
super resolution reconstruction becomes non-trivial. One method for dealing with this is to segment out foreground
objects and quantify their pixel motion differently. First we estimate local pixel motion using a standard block motion
algorithm common to MPEG encoding. This is then combined with the image itself into a five dimensional mean-shift
kernel density estimation based image segmentation with mixed motion and color image feature information. This
results in a tight segmentation of objects in terms of both motion and visible image features. The next step is to combine
segments into a single master object. Statistically common motion and proximity are used to merge segments into master
objects. To account for inconsistencies that can arise when tracking objects, we compute statistics over the object and fit
it with a generalized linear model. Using the Kullback-Leibler divergence, we have a metric for the goodness of the track
for an object between frames.
|
|
Key Words: | Super, Resolution, Video, Segmentation, Tracking |
|
|
|
|
|
What the Searchlight saw: revealing the extent of natural image
information that passes through bottom-up visual attention mechanisms to
higher visual processing - 2009 |
T. Nathan Mundhenk, | Wolfgang Einhaeuser | and | Laurent Itti |
|
Vision Science Society Annual Meeting | | | | Naples, FL | May |
|
Abstract: |
In order to optimize information utilization and prevent bottlenecking
during visual processing, bottom-up information is triaged by
selectively
gating image features as they are observed. Here we demonstrate for the
first time a biologically-plausible, information-theoretic model of the
visual gating mechanism which works efficiently with natural images.
From this, we give a neurophysiological preview of what image
information is
passing to higher levels of processing. We do this by processing
information given in a natural image Rapid Serial Visual Presentation
(RSVP) task
by its spatio-temporal statistical surprise (Einhaeuser, Mundhenk,
Baldi, Koch and Itti, 2007). From this, we obtain an attention-gate mask
over
each of the RSVP image frames derived from the map of attentional
capture provided by the surprise system. The mask itself accounts for
the degree
to which distracter images that proceed or follow a target image are
able to take attention away from it and vice versa. Attention is also
accounted for within an image so that targets need to be salient both
across frames and within the target image in order to be detected.
Additionally, stronger target capture leads to better masking of rival
information decreasing later visual competition. The surprise-based
attention-gate is validated against the performance of eight observers.
We find that 29 unique RSVP targets from 29 different sequences which
are
easy to detect precisely overlap to a far greater extent with open
regions in the attention gate compared with 29 unique targets which are
difficult to detect (P less than .001). This means that when a target
is easy to detect, more target regions are passing through the
attention-gate
increasing the availability of relevant features to visual recognition
facilities. Additionally, this allows us to surmise what parts of any
given
image in an RSVP task can plausibly be detected since regions which are
gated at this stage cannot be processed any further.
|
|
|
|
|
|
|
Automatic
Computation of an Image's Statistical Surprise Predicts Performance of
Human Observers on a Natural Image Detection Task - 2009 |
T. Nathan Mundhenk, | Wolfgang Einhaeuser | and | Laurent Itti |
|
Vision Research | vol 49 | issue 13 | pg 1620-1637 | | June |
|
Abstract: |
To understand the neural mechanisms underlying humans' exquisite ability
at processing briefly flashed visual scenes, we present a computer
model that predicts human performance in a Rapid Serial Visual
Presentation (RSVP) task. The model processes streams of natural scene
images presented at a rate of 20Hz to human observers, and attempts to
predict when subjects will correctly detect if one of the presented
images contains an animal (target). We find that metrics of Bayesian
surprise, which models both spatial and temporal aspects of human
attention, differ significantly between RSVP sequences on which subjects
will detect the target (easy) and those on which subjects miss the
target (hard). Extending beyond previous studies, we here assess the
contribution of individual image features including color opponencies
and Gabor edges. We also investigate the effects of the spatial location
of surprise in the visual field, rather than only using a single
aggregate measure. A physiologically plausible feed-forward system,
which optimally combines spatial and temporal surprise metrics for all
features, predicts performance in 79.5% of human trials correctly. This
is significantly better than a baseline maximum likelihood Bayesian
model (71.7%). We can see that attention as measured by surprise,
accounts for a large proportion of observer performance in RSVP. The
time course of surprise in different feature types (channels) provides
additional quantitative insight in rapid bottom-up processes of human
visual attention and recognition, and illuminates the phenomenon of
attentional blink and lag-1 sparing. Surprise also reveals classical
Type-B like masking effects intrinsic in natural image RSVP sequences.
We summarize these with the discussion of a multistage model of visual
attention.
|
|
|
PDF Version of this Document for Dowdload. Mirror List:- ilab.usc.edu
|
|
|
|
|
Computational modeling and utilization of attention, surprise and attention gating - 2009 |
|
PhD Thesis | | | | University of Southern California | Spring |
|
Abstract: |
What draws in human attention and can we create computational models of
it which work the same way? Here we explore this question with several
attentional models and applications of them. They are each designed to
address a missing fundamental function of attention from the original
saliency model designed by Itti and Koch. These include temporal based
attention and attention from non-classical feature interactions.
Additionally, attention is utilized in an applied setting for the
purposes of video tracking. Attention for non-classical feature
interactions is
handled by a model called CINNIC. It faithfully implements a model of
contour integration in visual cortex. It is able to integrate illusory
contours of unconnected elements such that the contours "pop-out" as
they are supposed to and matches in behavior the performance of human
observers. Temporal attention is discussed in the context of an
implementation and extensions to a model of surprise. We show that
surprise
predicts well subject performance on natural image Rapid Serial Vision
Presentation (RSVP) and gives us a good idea of how an attention gate
works
in the human visual cortex. The attention gate derived from surprise
also gives us a good idea of how visual information is passed to further
processing in later stages of the human brain. It is also discussed how
to extend the model of surprise using a Metric of Attention Gating
(MAG)
as a baseline for model performance. This allows us to find different
model components and parameters which better explain the attentional
blink
in RSVP.
|
|
|
PDF Version of this Document for Dowdload. Mirror List:- proquest.com
|
|
|
|
|
Natural
Image RSVP task performance is predicted by measurements of bottom-up
Bayesian Surprise exhibited by image sequences - 2008 |
T. Nathan Mundhenk, | Wolfgang Einhaeuser | and | Laurent Itti |
|
Vision Science Society Annual Meeting | | | | Naples, FL | May |
|
Abstract: |
The performance of observers on a Rapid Serial Vision Protocol (RSVP)
task is causally linked with the amount of bottom-up Bayesian Surprise
(buBS)
exhibited by both target and distracter images in RSVP sequences. In
this paradigm, observers watched a sequence of 20 images at 20Hz. One of
the
images in the sequence might contain a picture of an animal target at
chance. Subjects had to respond as to whether or not they spotted the
target. Observers' performance was compared with the amount of buBS
images in the sequence exhibited. The buBS information metric defined by
(Itti and Baldi 2005; Itti and Baldi 2006) gives a measure of the
amount of information gain both within an image (between image
locations) and
between images. Using the coarse statistics of buBS we were able to
alter the performance of observers on an RSVP task by changing the order
of
images within a sequence. Placing images of high surprise both before
and after the target image impairs the ability of observers to recall
the
target(Einhaeuser, Mundhenk et al. 2007). Here we show coarse
statistics for buBS in both color and Gabor orientations is
significantly different
between RSVP sequences observers find easy (subjects tend to spot the
target correctly) compared with ones that observers find difficult. In
particular, course statistics for mean buBS are elevated in the
flanking images before and after the target in difficult RSVP sequences.
Further,
buBS is significantly different in some features such as vertical lines
as much as 250ms before the target image with a relaxed period 100ms
before the target. This lends support to the two stage model of visual
processing (Chun and Potter 1995). Additionally, we can use the buBS
statistics to inform us of the amount of bottom-up attention capture
intrinsic in images in RSVP sequences.
|
|
|
|
|
|
|
A bottom-up model of spatial attention predicts human error patterns in rapid scene recognition - 2007 |
Wolfgang Einhaeuser, | T. Nathan Mundhenk, | Pierre Baldi, | Chirstof Koch | and | Laurent Itti |
|
Journal of Vision | vol 7 | issue 10 | pg 1-13 | | July |
|
Abstract: |
Humans demonstrate a peculiar ability to detect complex targets in rapidly presented natural scenes. Recent studies
suggest that (nearly) no focal attention is required for overall performance in such tasks. Little is known, however, of how
detection performance varies from trial to trial and which stages in the processing hierarchy limit performance: bottom-up
visual processing (attentional selection and/or recognition) or top-down factors (e.g., decision-making, memory, or alertness
fluctuations)? To investigate the relative contribution of these factors, eight human observers performed an animal detection
task in natural scenes presented at 20 Hz. Trial-by-trial performance was highly consistent across observers, far exceeding
the prediction of independent errors. This consistency demonstrates that performance is not primarily limited by
idiosyncratic factors but by visual processing. Two statistical stimulus properties, contrast variation in the target image
and the information-theoretical measure of surprise in adjacent images, predict performance on a trial-by-trial basis.
These measures are tightly related to spatial attention, demonstrating that spatial attention and rapid target detection share
common mechanisms. To isolate the causal contribution of the surprise measure, eight additional observers performed the
animal detection task in sequences that were reordered versions of those all subjects had correctly recognized in the first
experiment. Reordering increased surprise before and/or after the target while keeping the target and distractors
themselves unchanged. Surprise enhancement impaired target detection in all observers. Consequently, and contrary to
several previously published findings, our results demonstrate that attentional limitations, rather than target recognition
alone, affect the detection of targets in rapidly presented visual sequences.
|
|
|
PDF Version of this Document for Dowdload. Mirror List:- ilab.usc.edu
|
|
|
|
|
Surprise bottom-up reduction and control in images and videos - 2006 |
T. Nathan Mundhenk | and | Laurent Itti |
|
13th Joint Symposium on Neural Computation | | | | La Jolla, CA | May |
|
Abstract: |
We have developed a method to reduce the amount of bottom-up driven
surprise in images and videos. The primary function of this method is to
remove more strictly bottom-up information, which may be distracting to
an individual engaged in a top-down driven search task. Additionally,
we can use such a tool to help image analysts find items in an image by
giving them an image which may be easier to focus top-down attention
onto.
Our method works by first computing surprise in an image using the iLab
Neuromorphic Vision Toolkit’s surprise computation software. Using this,
we can determine in a biologically driven manner where a persons
attention is most likely to focus based on the interaction of image
features and the computation of the amount of surprising information one
should expect. This gives us conspicuity maps that not only tell us
where people are more likely to focus their attention, but what features
should be most responsible for them to be attracted to. Using this
information, we can focus such instruments as band pass filters directly
to regions and features in an image, which give the strongest surprise
readings. Thus, we leave parts of an image, which are unsurprising,
generally untouched. Additionally, we focus filters to channels related
to surprise output. For instance, if surprise is generated by color
contrast, then surprise reduction will be focused on color channels.
Testing on video clips shows that a general reduction of about 40%
surprise is achieved while maintaining reasonable semantic content of
the clip. Additionally surprise reduction across feature types and
scales is statistically significant. Thus, we can maintain a reasonable
amount of quality in an image or video while removing large amounts of
bottom-up surprise.
|
|
|
|
|
|
|
Computational modeling and exploration of contour integration for visual saliency - 2005 |
T. Nathan Mundhenk | and | Laurent Itti |
|
Biological Cybernetics | vol 93 | issue 3 | | | September |
|
Abstract: |
We propose a computational model of contour integration for visual saliency. The model uses biologically plausible devices
to simulate how the representations of elements aligned collinearly along a contour in an image are enhanced. Our model
adds such devices as a dopamine-like fast plasticity, local GABAergic inhibition and multi-scale processing of images.
The fast plasticity addresses the problem of how neurons in visual cortex seem to be able to influence neurons they are
not directly connected to, for instance as observed in contour closure effect. Local GABAergic inhibition is used to control
gain in the system without using global mechanisms, which may be non-plausible given the limited reach of axonal arbors in
visual cortex. The model is then used to explore not only its validity in real and artificial images, but to discover some
of the mechanisms involved in processing of complex visual features such as Junections and end-stops as well as contours.
We present evidence for the validity of our model in several phases, starting with local enhancement of only a few collinear
elements. We then test our model on more complex contour integration images with a large number of Gabor elements. Sections
of the model are also extracted and used to discover how the model might relate contour integration neurons to neurons that
process end-stops and Junections. Finally, we present results from real world images. Results from the model suggest that
it is a good current approximation of contour integration in human vision. As well, it suggests that contour integration
mechanisms may be strongly related to mechanisms for detecting end-stops and Junection points. Additionally, a contour
integration mechanism may be involved in finding features for objects such as faces. This suggests that visual cortex
may be more information efficient and that neural regions may have multiple roles.
|
|
|
|
|
|
|
Distributed biologically based real time tracking in the absence of prior target information - 2005 |
T. Nathan Mundhenk, | Jacob Everist, | Chris Landauer, | Laurent Itti | and | Kirstie Bellman |
|
Proc. SPIE Conference on Intelligent Robots and Computer Vision XXIII: Algorithms, Techniques, and Active Vision | vol 6006 | | pg 330-341 | Boston, Ma | October |
|
Abstract: |
We are developing a distributed system for the tracking of people and
objects in complex scenes and environments using biologically based
algorithms.
An important component of such a system is its ability to track targets
from multiple cameras at multiple viewpoints. As such, our system must
be able
to extract and analyze the features of targets in a manner that is
sufficiently invariant of viewpoints, so that they can share information
about
targets, for purposes such as tracking. Since biological organisms are
able to describe targets to one another from very different visual
perspectives,
by discovering the mechanisms by which they understand objects, it is
hoped such abilities can be imparted on a system of distributed agents
with many
camera viewpoints. Our current methodology draws from work on saliency
and center surround competition among visual components that allows for
real time
location of targets without the need for prior information about the
targets visual features. For instance, gestalt principles of color
opponencies,
continuity and motion form a basis to locate targets in a logical
manner. From this, targets can be located and tracked relatively
reliably for short
periods. Features can then be extracted from salient targets allowing
for a signature to be stored which describes the basic visual features
of a target.
This signature can then be used to share target information with other
cameras, at other viewpoints, or may be used to create the prior
information
needed for other types of trackers. Here we discuss such a system,
which, without the need for prior target feature information, extracts
salient
features from a scene, binds them and uses the bound features as a set
for understanding trackable objects.
|
|
|
|
|
|
|
Biologically inspired feature based categorization of objects - 2004 |
T. Nathan Mundhenk, | Vidhya Navalpakkam, | Hendrik Makaliwe, | Shrihari Vasudevan | and | Laurent Itti |
|
Proc. SPIE Human Vision and Electronic Imaging IX | vol 5292 | | pg 330-341 | San Jose, California | January |
|
Abstract: |
We have developed a method for clustering features into objects by taking those features which include intensity,
orientations and colors from the most salient points in an image as determined by our biologically motivated
saliency program. We can train a program to cluster these features by only supplying as training input the number of
objects that should appear in an image. We do this by clustering from a technique that involves linking nodes in a
minimum spanning tree by not only distance, but by a density metric as well. We can then form classes over objects
or object segmentation in a Novemberel validation set by training over a set of seven soft and hard parameters. We discus
as well the uses of such a flexible method in landmark based navigation since a robot using such a method may have
a better ability to generalize over the features and objects.
|
|
|
|
|
|
|
Teaching the computer subjective notions of feature connectedness in a visual scene for real time vision - 2004 |
T. Nathan Mundhenk, | Chris Landauer, | Kirstie Bellman, | Michael A. Arbib | and | Laurent Itti |
|
Proc. SPIE Conference on Intelligent Robots and Computer Vision XXII: Algorithms, Techniques, and Active Vision | vol 5608 | | pg 136-147 | Philadelphia, PA | October |
|
Abstract: |
We discus a tool kit for usage in scene understanding where prior information about targets is not necessarily
understood. As such, we give it a notion of connectivity such that it can classify features in an image for the purpose
of tracking and identification. The tool VFAT (Visual Feature Analysis Tool) is designed to work in real time in an
intelligent multi agent room. It is built around a modular design and includes several fast vision processes. The first
components discussed are for feature selection using visual saliency and Monte Carlo selection. Then features that
have been selected from an image are mixed into useful and more complex features. All the features are then
reduced in dimension and contrasted using a combination of Independent Component Analysis and Principle
Component Analysis (ICA/PCA). Once this has been done, we classify features using a custom non-parametric
classifier (NPclassify) that does not require hard parameters such as class size or number of classes so that VFAT
can create classes without stringent priors about class structure. These classes are then generalized using Gaussian
regions which allows easier storage of class properties and computation of probability for class matching. To speed
up to creation of Gaussian regions we use a system of rotations instead of the traditional Psuedo-inverse method. In
addtion to discussing the structure of VFAT we discuss training of the current system which is relatively easy to
perform. ICA/PCA is trained by giving VFAT a large number of random images. The ICA/PCA matrix is computed
by features extracted by VFAT. The non-parametric classifier NPclasify it trained by presenting it with images of
objects having it decide how many objects it thinks it sees. The difference between what it sees and what it is
supposed to see in terms of the number of objects is used as the error term and allows VFAT to learn to classify
based upon the experimenters subjective idea of good classification.
|
|
Key Words: | iRoom, Biological, Vision, Tool, Multi Agent, Saliency, Real Time |
|
|
|
|
|
Camera Localization methods for Intelligent Room Systems using RF Techniques - 2004 |
Pradeep NataraJanuary, | T. Nathan Mundhenk, | Kirstie Bellman, | Michael A. Arbib | and | Laurent Itti |
|
Proc. SPIE Conference on Intelligent Robots and Computer Vision XXII: Algorithms, Techniques, and Active Vision | vol 5608 | | pg 177-187 | Philadelphia, PA | October |
|
Abstract: |
One of the important components of a
multi sensor “intelligent” room, which can
observe, track and react to its occupants, is a multi
camera system. This system involves the
development of algorithms that enable a set of
cameras to communicate and cooperate with each
other effectively so that they can monitor the
events happening in the room. To achieve this, the
cameras typically must first build a map of their
relative locations. In this paper, we discuss RF and
vision based techniques for estimating distances
between cameras. The algorithm proposed for RF
can estimate distances with relatively good
accuracy even in the presence of random noise.
We have also described a vision-based algorithm
for localization using stereovision techniques. This
algorithm can compute the location of the camera
given the location of a calibration object and vice
versa.
|
|
|
|
|
|
|
Schizophrenia and the Mirror Neuron System - 2004 |
Michael A. Arbib | and | T. Nathan Mundhenk |
|
Neuropsychologia | vol 43 | | pg 268-280 | | October |
|
Abstract: |
We analyze how data on the mirror system for grasping in macaque and
human ground the mirror system hypothesis for the evolution of the
language-ready human brain, and then focus on this putative relation
between hand movements and speech to contribute to the understanding
of how it may be that a schizophrenic patient generates an action
(whether manual or verbal) but does not attribute the generation of that
action to himself.We make a crucial discussion between self-monitoring
and attribution of agency.We suggest that vebal hallucinations occur
when an utterance progresses through verbal creation pathways and
returns as a vocalization observed, only to be dismissed as external
since
no record of its being created has been kept. Schizophrenic patients on
this theory then confabulate the agent.
|
|
Key Words: | FARS model, Grasping, Mirror system, Schizophrenia, Agency |
|
|
|
|
|
Contour-facilitation in a model of bottom-up attention - 2003 |
Rob J. Peters, | T. Nathan Mundhenk, | Laurent Itti | and | Christof Koch |
|
Proc. Society for Neuroscience Annual Meeting (SFN 03) | | | | | November |
|
Abstract: |
Previously we showed that interactions among overlapping orientation-tuned units could improve a bottom-up attention
model in predicting human eye movement targets. We have now extended this work to address the question of how elongated
contours affect saliency in natural scenes. We used a model of contour-facilitation based on putative long-range excitatory
and inhibitory interactions among orientation-tuned units in early visual cortex. Each unit tends to excite other units that
are nearly collinear, and inhibit those that are nearly parallel. We tested the model on artificial images such as arrays
of Gabor patches with embedded implicit contours ('snakes'), as well as natural images such as outdoor photos and overhead
satellite photos. Our results agree with previous psychophysical measurements of human observers' sensitivity to implicit
contours such as Gabor snakes; we found that a basic bottom-up saliency model was completely blind to such contours,
while an enhanced saliency model with contour-facilitiation module could consistently identify the embedded contour
(left figure) as the most salient element in the image (right figure). Preliminary eyetracking results suggest that observers
are less sensitive to high spatial-frequency contours in natural scenes.
|
|
Key Words: | Computational Modeling, Human Psychophysics, Model of Bottom-Up Saliency-Based Visual Attention, Human Eye-Tracking Research |
|
|
|
|
|
Low-cost
high-performance mobile robot design utilizing off-the-shelf parts and
the Beowulf concept: the Beobot project - 2003 |
T. Nathan Mundhenk, | Chris Ackerman, | Daesu Chung, | Nitin Dhavale, | Brian Hudson, | Reid Hirata, | Eric Pichon, | Zhan Shi, | April Tsui | and | Laurent Itti |
|
Proc. SPIE Conference on Intelligent Robots and Computer Vision XXI | vol 5267 | | pg 293-303 | Providence, RI | October |
|
Abstract: |
Utilizing off the shelf low cost parts, we have constructed a robot that is small, light, powerful and relatively
inexpensive (less than $3900). The system is constructed around the Beowulf concept of linking multiple discrete
computing units into a single cooperative system. The goal of this project is to demonstrate a new robotics platform
with sufficient computing resources to run biologically-inspired vision algorithms in real-time. This is accomplished
by connecting two dual-CPU embedded PC motherboards using fast gigabit Ethernet. The motherboards contain
integrated Firewire, USB and serial connections to handle camera, servomotor, GPS and other miscellaneous
inputs/outputs. Computing systems are mounted on a servomechanism-controlled off-the-shelf “Off Road” RC car.
Using the high performance characteristics of the car, the robot can attain relatively high speeds outdoors. The robot
is used as a test platform for biologically-inspired as well as traditional robotic algorithms, in outdoor navigation and
exploration activities. Leader following using multi blob tracking and segmentation, and navigation using statistical
information and decision inference from image spectral information are discussed. The design of the robot is opensource
and is constructed in a manner that enhances ease of replication. This is done to facilitate construction and
development of mobile robots at research institutions where large financial resources may not be readily available as
well as to put robots into the hands of hobbyists and help lead to the next stage in the evolution of robotics, a home
hobby robot with potential real world applications.
|
|
Key Words: | Beowulf, Robot, Vision, Biology, Low Cost, Modular, Off-the-shelf |
|
|
|
|
|
Utilization and viability of biologically-inspired algorithms in a dynamic multi-agent camera surveillance system - 2003 |
T. Nathan Mundhenk, | Nitin Dhavale, | Salvador Marmol, | Elizabeth Callega, | Vidhya Navlpakkam, | Kirstie Bellman, | Chris Landauer, | Michael A. Arbib | and | Laurent Itti |
|
Proc. SPIE Conference on Intelligent Robots and Computer Vision XXI | vol 5267 | | pg 281-292 | Providence, RI | October |
|
Abstract: |
In view of the growing complexity of computational tasks and their design, we propose that certain interactive
systems may be better designed by utilizing computational strategies based on the study of the human brain.
Compared with current engineering paradigms, brain theory offers the promise of improved self-organization and
adaptation to the current environment, freeing the programmer from having to address those issues in a procedural
manner when designing and implementing large-scale complex systems. To advance this hypothesis, we discus a
multi-agent surveillance system where 12 agent CPUs each with its own camera, compete and cooperate to monitor
a large room. To cope with the overload of image data streaming from 12 cameras, we take inspiration from the
primate’s visual system, which allows the animal to operate a real-time selection of the few most conspicuous
locations in visual input. This is accomplished by having each camera agent utilize the bottom-up, saliency-based
visual attention algorithm of Itti and Koch (Vision Research 2000;40(10-12):1489-1506) to scan the scene for
objects of interest. Real time operation is achieved using a distributed version that runs on a 16-CPU Beowulf
cluster composed of the agent computers. The algorithm guides cameras to track and monitor salient objects based
on maps of color, orientation, intensity, and motion. To spread camera view points or create cooperation in
monitoring highly salient targets, camera agents bias each other by increasing or decreasing the weight of different
feature vectors in other cameras, using mechanisms similar to excitation and suppression that have been documented
in electrophysiology, psychophysics and imaging studies of low-level visual processing. In addition, if cameras need
to compete for computing resources, allocation of computational time is weighed based upon the history of each
camera. A camera agent that has a history of seeing more salient targets is more likely to obtain computational
resources. The system demonstrates the viability of biologically inspired systems in a real time tracking. In future
work we plan on implementing additional biological mechanisms for cooperative management of both the sensor
and processing resources in this system that include top down biasing for target specificity as well as Novemberelty and the
activity of the tracked object in relation to sensitive features of the environment.
|
|
|
|
|
|
|
A new computational algorithm for the modeling of early visual contour integration in humans - 2003 |
T. Nathan Mundhenk | and | Laurent Itti |
|
Neurocomputing | vol 52-54 | | pg 599-604 | | June |
|
Abstract: |
In order to gain a better understanding of visual saliency, we have developed
and algorithm which simulates the phenomenon of contour integration for the
purpose of visual saliency. The model developed consists of the classical
butterfly pattern of connection between orientation selective neurons in the
primary visual cortex. In addition, we also add a local group suppression gain
control to eliminate extraneous noise and a fast plasticity term which helps to
account for closure effect often observed in humans exposed to closed contour
maps. Results from real world images suggest that our algorithm is effective at
picking out reasonable contours from a scene. The results improved with the
introduction of both the fast plasticity and group suppression. An addition of
multi scale analysis has also increased the effectiveness as well.
|
|
Key Words: | contour, integration, visual, saliency, model |
|
|
|
|
|
Towards a simpler model of contour integration in early visual processing using a composite of methods - 2002 |
T. Nathan Mundhenk | and | Laurent Itti |
|
Proc. 9th Joint Symposium on Neural Computation (JSNC'02) | | | | Pasadena, California | May |
|
Abstract: |
iLab has been attempting to simulate contour integration in early visual
preprocessing. Our model starts with a standard butterfly pattern of neural connections
that excite or suppress neighboring neurons depending on their preferred visual
orientation used for instance by Li (1998). This creates systems where neurons tend to
excite other neurons with a collinear orientation, but tend to suppress neurons with a
parallel orientation.
Our current model attempts to distance itself from many current models that use
either neuro synchronization or cascade effect to obtain good contour detection. Instead,
we have concentrated on a simpler composite model that uses group suppression gain
control, multi scale image analysis and fast plasticity. In this, group suppression works
by summing the excitation for small groups of neurons. If the group exceeds threshold,
proportionately suppression among the group’s neurons is increased. Fast plasticity
works by increasing the excitatory ability of a neuron if it has been excited by
neighboring neurons to a large enough extent. Finally, multi scale processing works by
taking the result of processing the same image in multiple scales on the same neural
kernel model at each scale.
Experiments on real world images shows that contours are most noticeably
improved by the use of group suppression gain control, while tests on computer generated
contours provided by Jachen Braun that are of varying size, phase and alignment shows
improvement most from the use of fast plasticity and multi scale processing. Our results
so far suggest that all three additions a both viable and helpful. Further, our model
suggests that simpler mechanisms can be used by the brain in the act of early visual
contour integration.
|
|
|
|
|
|
|
Towards Visually-Guided Neuromorphic Robots: Beobots - 2002 |
Jen Ng, | Reid Hirata, | T. Nathan Mundhenk, | Eric Pichon, | April Tsui, | Tong Ventrice, | Philip Williams | and | Laurent Itti |
|
Proc. 9th Joint Symposium on Neural Computation (JSNC'02) | | | | Pasadena, California | May |
|
Abstract: |
Despite the advancements made in the field of AI and Robotics, robots today
remain vastly inferior to animals in terms of mental agility. The main reason for this is
that robots do not possess the neural capabilities of an animal brain. Neural algorithms
adapt well to diverse environments, whereas robot AI is usually limited to a test lab
setting. To resolve this disparity, an intuitive solution would be to try to emulate the
neural functions present in animal brains. However, neural algorithms require vast
amounts of computational power to process, in particular those algorithms that require
real-time vision. Many robots, which run on power-saving embedded processors, do not
have a lot of CPU cycles to spare.
We are developing a high-performance visually-guided robotics platform with
enough processing speed to run neural algorithms. This ?Beobot? platform consists of a
high-performance radio-controlled truck chassis (the ?robot?) carrying an x86-based
supercomputer (the ?Beowulf? cluster). The computing cluster consists of two compact
dual-CPU motherboards linked together by a gigabit Ethernet connection. Powering the
computer are four Pentium-III (Coppermine) 1Ghz processors along with 768MB of
memory per motherboard. Two Firewire cameras provide the Beobot?s vision. A compact
flash card is used as a makeshift hard drive, and it has enough space to store a thin
UNIX-variant kernel and iLab?s vision software.
The vision software itself consists of several general-purpose neural algorithms.
Most prominent of these is iLab?s Saliency-based visual attention system, which enables
the Beobot to drive its attention towards the most salient locations and objects in a visual
scene. In addition, we have developed prototype algorithms that allow the Beobot to
parse scene layouts and perform object recognition. A primitive action/memory AI
system allows it to implement simple visually-guided behavior. Finally, the componentoriented
nature of the vision software enables future additions of neural modules.
The potential advantage of the Beobot comes from its use of x86-based hardware
and UNIX-based C++ development environment. Nearly all the parts of the Beobot are
inexpensive, off-the-shelf components. This enables easy replacement of broken parts.
Furthermore, the expandability of PC hardware enables devices to be plugged into the
Beobot for additional functionalities. All these traits combined make the Beobot
potentially easy to replicate, and this allows for wider adoption upon the successful
completion of the prototype.
|
|
|
|
|
|
|
A New Robotics Platform for Neuromorphic Vision: Beobots - 2002 |
Daesu Chung, | Reid Hirata, | T. Nathan Mundhenk, | Jen Ng, | Rob J. Peters, | Eric Pichon, | April Tsui, | Tong Ventrice, | Dirk Walther, | Philip Williams | and | Laurent Itti |
|
Proc: 2nd International Workshop on Biologically Motivated Computer Vision (BMCV 02) | | | pg 558-567 | Tubingen, Germany | November |
|
|
|
|
|
|
|
A Model of Contour Integration in Early Visual Cortex - 2002 |
T. Nathan Mundhenk | and | Laurent Itti |
|
Proc: 2nd International Workshop on Biologically Motivated Computer Vision (BMCV 02) | | | pg 80-90 | Tubingen, Germany | November |
|
Abstract: |
We have created an algorithm to integrate contour elements and find
the salience value of them. The algorithm consists of basic long-range
orientation specific neural connections as well as a Novemberel group suppression
gain control and a fast plasticity term to explain interaction beyond a neurons
normal size range. Integration is executed as a series of convolutions on 12
orientation filtered images Augustmented by the nonlinear fast plasticity and group
suppression terms. Testing done on a large number of artificially generated
Gabor element contour images shows that the algorithm is effective at finding
contour elements within parameters similar to that of human subjects. Testing
of real world images yields reasonable results and shows that the algorithm has
strong potential for use as an addition to our already existent vision saliency
algorithm.
|
|
|
|
|
|
|
CINNIC, a new computational algorithm for the modeling of early visual contour integration in humans - 2002 |
T. Nathan Mundhenk | and | Laurent Itti |
|
Proc. 11th Annual Computational Neuroscience Meeting (CNS 02) | | | | Chicago, Il | July |
|
Abstract: |
In order to gain a better understanding of visual saliency, we have developed
and algorithm which simulates the phenomenon of contour integration for the
purpose of visual saliency. The model developed consists of the classical
butterfly pattern of connection between orientation selective neurons in the
primary visual cortex. In addition, we also add a local group suppression gain
control to eliminate extraneous noise and a fast plasticity term which helps to
account for closure effect often observed in humans exposed to closed contour
maps. Results from real world images suggest that our algorithm is effective at
picking out reasonable contours from a scene. The results improved with the
introduction of both the fast plasticity and group suppression. An addition of
multi scale analysis has also increased the effectiveness as well.
|
|
Key Words: | contour, integration, visual, saliency, model |
|
|
|
|
|
Techniques for Fisheye Lens Calibration Using a Minimal Number of Measurements - 2000 |
T. Nathan Mundhenk, | Michael J. Rivett, | Xiaoqun Liao | and | Ernest L. Hall |
|
Proc. SPIE Conference on Intelligent Robots and Computer Vision XIX | vol 4197 | | pg 81-88 | Boston, Ma | November |
|
Abstract: |
A method is discussed describing how different types of Omni-Directional “fisheye” lenses can be
calibrated for use in robotic vision. The technique discussed will allow for full calibration and correction of
x,y pixel coordinates while only taking two uncalibrated and one calibrated measurement. These are done
by finding the observed x,y coordinates of a calibration target. Any Fisheye lens that has a roughly
spherical shape can have its distortion corrected with this technique. Two measurements are taken to
discover the edges and centroid of the lens. These can be done automatically by the computer and does not
require any knowledge about the lens or the location of the calibration target. A third measurement is then
taken to discover the degree of spherical distortion, This is done by comparing the expected measurement
to the measurement obtained and then plotting a curve that describes the degree of distortion. Once the
degree of distortion is known and a simple curve has been fitted to the distortion shape, the equation of that
distortion and the simple dimensions of the lens are plugged into an equation that remains the same for all
types of lenses. The technique has the advantage of needing only one calibrated measurement to discover
the type of lens being used.
|
|
Key Words: | omni, vision, fisheye, circular, regression, correction, distortion, nikon |
|
|
|
|
|
Simple
Obstacle Detection to Prevent Miscalculation of Line Location and
Orientation in Line Following Using Statistically Calculated Values -
2000 |
T. Nathan Mundhenk, | Michael J. Rivett | and | Ernest L. Hall |
|
Proc. SPIE Conference on Intelligent Robots and Computer Vision XIX | vol 4197 | | pg 181-190 | Boston, Ma | November |
|
Abstract: |
Visual line following in mobile robotics can be made more complex when objects are placed on or around
the line being followed. An algorithm is presented that suggests a manner in which a good line track can be
dis criminated from a bad line track using the expected size of the line. The mobile robot in this case can
determine the size of the width of the line. It calculates a mean size for the line as it moves and maintains a
set size of samples, which enable it to adapt to changing conditions. If a measurement is taken that falls
outside of what is to be expected by the robot, then it treats the measurement as undependable and as such
can take measures to deal with what it believes to be erroneous data. Techniques for dealing with erroneous
data include attempting to look around the obstacle or making an educated guess as to where the line
should be. The system discussed has the advantage of not needing to add any extra equipment to discover if
an obstacle is corrupting its measurements. Instead, the robot is able to determine if data is good or bad
based upon what it expects to find.
|
|
Key Words: | line, following, obstacle, detection, robot, mobile |
|
|
|
|
|
Intelligent Robot Trends and Predictions for the New Millennium - 1999 |
Ernest L. Hall | and | T. Nathan Mundhenk |
|
Proc. SPIE Conference on Intelligent Robots and Computer Vision XVIII | | | pg 14-25 | Boston, Ma | September |
|
Abstract: |
An intelligent robot is a remarkably useful combination of a
manipulator, sensors and controls. The current use of these
machines in outer space, medicine, hazardous materials, defense
applications and industry is being pursued with vigor but
little funding. In factory automation such robotics machines can improve
productivity, increase product quality and improve
competitiveness. The computer and the robot have both been developed
during recent times. The intelligent robot combines
both technologies and requires a thorough understanding and knowledge of
mechatronics. In honor of the new millennium,
this paper will present a discussion of futuristic trends and
predictions. However, in keeping with technical tradition, a new
technique for “Follow the Leader” will also be presented in the hope of
it becoming a new, useful and non-obvious technique.
Today’s robotic machines are faster, cheaper, more repeatable, more
reliable and safer. The knowledge base of inverse
kinematic and dynamic solutions and intelligent controls is increasing.
More attention is being given by industry to robots,
vision and motion controls. New areas of usage are emerging for service
robots, remote manipulators and automated guided
vehicles. Economically, the robotics industry now has more than a
billion-dollar market in the U.S. and is growing.
Feasibility studies show decreasing costs for robots and unaudited
healthy rates of return for a variety of robotic applications.
However, the road from inspiration to successful application can be long
and difficult, often taking decades to achieve a new
product. A greater emphasis on mechatronics is needed in our
universities. Certainly, more cooperation between
government, industry and universities is needed to speed the development
of intelligent robots that will benefit industry and
society.
|
|
|
|
|
|
|
Range Detection for AGV Using a Rotating Sonar Sensor - 1998 |
Dhyana Chandra Ramamurthy, | Wen-chuan Chiang, | T. Nathan Mundhenk | and | Ernest L. Hall |
|
Proc. SPIE Conference on Intelligent Robots and Computer Vision XVII | vol 3522 | | pg 435-443 | Boston, Ma | November |
|
Abstract: |
A single rotating sonar element is used with a restricted angle of sweep to obtain readings to develop a
range map for the unobstructed path of an autonomous guided vehicle (AGV). A Polaroid ultrasound
transducer element is mounted on a micromotor with an encoder feedback. The motion of this motor is
controlled using a Galil DMC 1000 motion control board. The encoder is interfaced with the DMC 1000
board using an intermediate IMC 1100 break-out board. By adjusting the parameters of the Polaroid
element, it is possible to obtain range readings at known angles with respect to the center of the robot. The
readings are mapped to obtain a range map of the unobstructed path in front of the robot. The idea can be
extended to a 360 degree mapping by changing the assembly level programming on the Galil Motion
control board. Such a system would be compact and reliable over a range of environments and AGV
applications.
|
|
Key Words: | sonar sensing, motion control, obstacle avoidance, mobile robots |
|
|
|
|
|
Path Planning for Mobile Robot Navigation Using Sonar Map and Neural Network - 1998 |
Wen-chuan Chiang, | Dhyana Chandra Ramamurthy, | T. Nathan Mundhenk | and | Ernest L. Hall |
|
Proc. SPIE Conference on Intelligent Robots and Computer Vision XVII | vol 3522 | | pg 256-264 | Boston, Ma | November |
|
|
|
|
|
|
|