Writeup for our Autonomous Band project. We created an artificial intelligence system that detects and parses large sheet music with an overhead camera and plays the music on xylophones with a series of synchronized robotics arms. See the website link on the writeup for video demonstrations and more information.
Scanning the Internet for External Cloud Exposures via SSL Certs
Autonomous Band Project Writeup
1. Autonomous Band
EECS 498: Autonomous Robotics Laboratory
University of Michigan
Robert Bergen, Mark Isaacson, Sami Luber, and Zach Oligschalaeger
http://eecs498teammusic.webs.com/
Overview
For our EECS 498 Autonomous Robotics Lab Final Project, we created an autonomous
band, consisting of four robotic arms and two xylophones. An overhead camera is used to
detect music notes, which are parsed into music and sent to the robotic arm controller.
The arm controller uses motion planning to direct the arms to play notes while avoiding
arm collisions.
Autonomously reading and playing music is challenging because it requires building a
robust music-reading system, careful synchronization between the music reader and the
music player components, and motion planning between the arms to ensure notes are
played smoothly and without arm collisions. Described in detail in the Music Reading
section, our system continuously detects musical notes on the music board to allow users
to see their desired music change in real-time on the GUI. However, to allow the user to
set their desired song before playing it, the user controls when the new music is sent to
the music player. At this point, the music player discards previously sent music and clears
queued notes to accommodate the new music. Described in detail in the Music Playing
section, based on the number of available robotic arms, the music player component
schedules each robotic arm to play notes using motion planning to avoid arm collisions.
Music Reading
Representing Music
Our system strives to resemble real sheet music as much as possible. Notes are
represented by red circles, uniform in color and size. Users can set these notes as desired
on the music board. Furthermore, our system supports reading and playing all G-clef
notes (middle C - high G).
Camera Calibration
2. Our camera calibration calibrates by first detecting the four corners (marked by blue
squares) on the music board (the system pauses if more or less than four corners are
detected). Using the locations of the four corners, a homography is built to project notes
from camera pixel space to physical music board space with respect to the center of the
music board. Using knowledge of pre-defined line and stanza spacing to determine the
note value and time of each music note based on its physical (x,y) position on the music
board.
Detecting Notes
When detecting music, each stanza is scanned separately for music notes. We detect
music notes by using a Blob Detection algorithm. In the blob detection algorithm, there is
a color threshold for notes in which colors are represented using HSV (more robust in
environments with varying lighting). The blob detection algorithm is also blob size
sensitive. More specifically, there is a threshold range on how many pixels a blob must
be within to be considered a note. If a note blob is larger than expected, we assume there
are overlapping notes on the music board that are being detected as one large note blob.
Using the covariance matrix of the overlapping note blob and the expected size of a note,
we determine the configuration of the overlapping notes and how many notes the blob
should be broken into. The round shape of the notes is also important as it improves our
variance calculation accuracy when building the covariance matrix.
3. Once a note is detected, the note blob is projected from camera pixel space to physical
music board space. The note's value and time (within an eighth count from the beginning
of the stanza) are determined based on a position threshold on the (x,y) physical location
of the note on the music board. The time of the note is "snapped" (rounded) to the nearest
eighth count, so that the music flows more smoothly.
Finally, because our music reader actively reads music in real-time, our system is
sensitive to any changes made to the music note configuration on the music board. The
blob detection algorithm runs continuously and updates its parsed music based on any
changes to the note configuration.
Parsing Music
Once all notes in each stanza are detected, we preform auto-spacing on the notes for
"better sounding" music. More specifically, if there are three or four notes in a stanza,
regardless of the "snapped" times for each note, note times will be adjusted such that the
times are spread evenly across the stanza.
The music speed is also automatically adjusted for the robot's physical limit. In other
words, the music speed is slowed down enough to allow the robot arms ample time to
play each note. Using the system's GUI, the user can also further slow down the speed of
the music.
GUI and Music Transmission
As the music reader dynamically detects music, a GUI shows the user the currently
detected music notes, a camera feed of the actual music board, and the music currently
being played on the robot arms. This allows the user to adjust the notes on the music
board without interrupting the currently playing music, and then, once the user is satisfied
with the new music, send the new music to the robot arms to be played. Upon sending the
new music to the music player component, a clear message is sent that invalidates
previously sent music notes and clears the queue of current notes to be played (allowing
these notes to be replaced with the updated music). An adjustable bar on the GUI also
allows the user to change the speed of the music.
4. Music Playing
Overview
The music actuation component of the Rage with the Machine project featured four
robotic arms autonomously playing xylophones in conjunction and communication with
the vision component. Our system managed the planning, timing, and mechanical aspects
of playing a desired piece of music and was written to be expandable to meet our
resource budget.
Planning
Our planner served as a scheduler for our robotic arms. Each arm had its 'availability' to
play a note mapped out in what we called a Planner Line, which was a sequence of
actions to perform, each of which kept track of its start, 'hitting the key', and completion
times, which we calibrated with trial information from the mechanical component of the
project (see below). The planner would receive the sequence of notes it should play in an
LCM (Lightweight Communications and Marshaling) message and proceed to distribute
them to the Planner Lines via our arm allocation algorithm either by appending to the
existing song or erasing and starting anew. As we continued to work, our algorithm grew
more sophisticated to achieve our definition of optimality, which was maximizing the
number of notes we could play in a given period of time. We implemented this in the
following growth stages:
1. Query for the first free arm and use it to play the next note.
5. 2. Query for all free arms and use the closest one of them to the next note to play as
our choice.
3. To improve demoability by increasing the number of 'occupied' arms in a given
period of time, we modified our algorithm to query for all free arms, then all of
those which were tied for being closest to the next note, and finally chose the arm
to use at random from that list.
4. Use method 3 as a quick and almost always successful greedy choice, but upon
failure to add a note, achieve optimality by redistributing future notes via a branch
and bound family algorithm. The planner would then maintain a thread that
queried the Planner Lines to determine if they should play the note sitting at their
individual playheads and signal the proper arm's state machine to do so.
Arm Interface
Our arms were managed via an Arm class, which was responsible for knowing the
configuration space locations of positions corresponding to hitting every key on our
xylophones as well as positions directly above them and conducting smooth transitions
between them. It also abstracted the mirroring of arms and their various positions from
one side of a xylophone to the other.
The process of making these movements smooth and reliably accurate took a number of
design features to achieve. We had to establish a procedure that involved moving our
arms one joint at a time in specific and varying orders to prevent collisions and
unexpected behavior that arose by simply ordering the arm from place to place, a practice
which unfortunately increased the delay between playing notes consecutively. For the
purpose of achieving accuracy, we implemented extremely tight thresholds on what it
meant for an arm joint to have 'arrived' at a location.
The above system performed extremely well after tuning what it meant to be 'above' a
key to decrease playing delays; however, our system encountered issues regarding servo
overloads while attempting to sustain positions above keys at the edge of our arm's safe
operating range which required a combination of tactics to solve. We determined that by
lowering the maximum operating torque for a servo, we could allow it to last longer
under stress, and wrote it into our code to achieve this when an arm was idle for a
sufficiently long period of time. To achieve the best demoability and reliability for our
system, we further determined that it would be safest to not only lower torque after being
idle for long periods of time, but also to move the arm to a known safe position, above
6. the middle xylophone key. All of these tactics for sustainability were conducted blind to
the planner, abstracted away and allowed for new instructions at a moment's notice.
Mechanics
The portion of the system we found we had the greatest difficulty refining was our
mechanical interface with the xylophones. In the course of testing, we determined that in
order to affect a sound off of a xylophone, it was necessary to not merely hit a note and
release quickly with some artifact, but to pivot while doing so. After several design
iterations and trials, we developed a rig which consisted of a 3" x ¾" hex bolt with one-
degree-of-freedom for striking with a pivot to produce the appropriate ring from our
xylophones.
Once the rig was constructed, we also had to ensure that we could actuate the servos in
such a fashion that we could strike the key as intended. This required moving several
joints in quick succession, and was a process that we changed several times over the
course of development, and had the undesirable quality of often needing to be manually
fine tuned for specific keys, rather than be based off of some mathematical model and
adjustments. When we had settled on the motion of the arm, we ran a calibration program
that ran through every possible motion of the arm between our pre-programmed positions
and recorded the times taken over several trials for use in our planner.
Success
We view our system as a success. We were able to completely abstract the playing of
music in a reliable planner that was able to achieve optimal playing capacity. We were
able to play well known songs at a reasonable tempo with a relatively small number of
robotic arms. The limitations of our system were either due to our resource budget or a
consequence of being pressed for time. While our program could accommodate any
number of robotic arms (and therefore play more complex and interesting music), we
7. were limited by the number available, the space required for each arm (4 sq. ft.), and the
number of USB ports and number of AC power sockets (1 per arm). We could have
further improved our system by reducing delays between notes, but were constrained by
the amount of time it would've taken to find positions and movement patterns starting
from closer above each note that still produced quality sound, and in the same way were
limited from reducing servo overload issues by decreasing the operating radius of our
arms. These known issues stated, we did manage to find solutions to both within the
scope of programming, by making our planning algorithm optimal and our arms employ
effective safety protocols, and as such, these issues were voided by demonstration day,
and our system a success.