Sunday, May 9, 2010

HCI Remixed

My Vision isn't my vision: Making a Career out of Getting Back Where I Started

The story was written William Buxton and follows his story as a music undergraduate testing the NRC's digital music machine which was used to study computer human interaction.

The music machine sported an animation monitor and bi-manual input.

On the left hand Buxton could enter note duration on a keyboard with 5 keys.
On the rigt hand he could enter the pitch of the note using either a primitive version of the mouse or the 2 wheel knobs. Buxton opted for the wheel knobs.

Buxton felt that the machine was very advanced for its time and highlights that we should consider the user first when developing systems.

_______________

Drawing on Sketchpad: Reflections on Computer Science and HCI

The author of this short story, Joseph A. Konstant, talked about Sutherland's work on the sketchpad that featured many features that were ahead of their time such as:
Pointing with a pen light
Rendering lines, circles, and text
Constraints of the system and their displays
Data structures, algorithms, and object oriented programming structures

Sutherland essentially laid all the foundational work of graphical displays and drawing that rivals CAD systems.

Konstan said that we should innovate, communicate and not compute, and that we need to give more focus on systems for experts and not focus too much on knowledge workers.
____________________
The mouse, the demo, and the big idea

Stanford's Wendy Ju wrote an article that talked about Engelbart's online system (NLS) and the big demo that introduced and shock the world with the mouse.

Engelbart's system wasn't automatically accepted because research on the time was focused on office automation and artificial intelligence.

The goal of the demo was to change the way people thought, which it did, but not in the way Engelbart intended. Too many people were focused on the mouse.

Stanford had a "demo or die" culture.
Demonstrations create converts and makes a sale rather than just informs.

Wendy Ju argues that the computer is a tool to enhance and empower humanity rather than replace human input.
________________
My spill:

The articles presented here provided an interesting view into the past of computing and CHI that demonstrated how much the field has changed (alot) and how much has stayed the same (nearly everything).

It's interesting to note that nothing we research is entirely new, but usually just research that has been brought to the spotlight again, but this time in a different light, approach, and researcher.

Wednesday, April 14, 2010

CHI '08: From meiwaku to tokushita!: lessons for digital money design from japan

Authors:
Scott Mainwaring Intel Research, Portland, OR, USA
Wendy March Intel Research, Portland, OR, USA
Bill Maurer UC Irvine, Irvine, CA, USA

Paper Link:
http://portal.acm.org/citation.cfm?id=1357054.1357058&coll=ACM&dl=ACM&type=series&idx=SERIES260∂=series&WantType=Proceedings&title=CHI&CFID=://tamuchi2010a.blogspot.com/p/assignments.html&CFTOKEN=tamuchi2010a.blogspot.com/p/assignments.html

Mainwaring et. al. discuss the finding of an ethnographic study on the effects of e-money in Japan, particularly Tokyo and Okinawa (the Hawaii of Japan).
The main focus of the ethnographic study of e-money was on Near-Field Communication (NFC) enabled into cards, passes, and mobile devices. The reason the team chose Japan as the place for the ethnography is that it already has a high adoption rate of various forms of e-money. Mainwaring et. al. studied 3 different brands of emoney:



The main results of the study found that the reason for the high rate of adoption of e-money was the deeply engrained Japanese wish to reduce "meiwaku" 迷惑 which means "nuisance" or "bother." This plays heavily into Japanese society where the needs of the community often trump individual concerns and not wishing to stand out or bother others. This sense of meiwaku also highlights why only 1/10th of transactions in Japan are done via credit card whereas in the U.S. it's 1/4th of all transactions.

With suica, people could simply move past a turnstile and have their card automatically charged without holding up the flow of traffic.

While Edy also offers auto-charging with NFC technology, it could also increase meiwaku by holding up lines when the e-money ran out. Furthermore, putting more money into the account required finding charging stations of the same brand and on top of that, the card can only be used in stores supporting the brand. Finally, by law, money converted into e-money cannot be converted back into regular cash.

The other main theme found in the use of E-money is "Tokushita" 得した or well done/advantage gained. This refers to the rewards gained by using the e-money through rewards programs and gaining "something for nothing" out of using the card. Suica for example, allowed customers to earn travel miles for a certain amount spent. The study found that people would go out of their way to use their cards out of a sense of tokushita and to gain rewards.

The design considerations that the ethnography suggests that e-cash systems should:
1) result in a net decrease in commotion, before, during, and after point of sale.
2) Be designed for public use and take into account the environment of the transaction.
3) support management of their money without either introducing new burdens nor decreasing friction to a point of invisible spending
4) Subtly engage multiple senses, both for practical and aesthetic issues.
5) Leave room for dreams, irrationality, and for tokushita! Money is not just about exactness and frugality; it's also about fun. If e-money brightens your day then it might also fit into your life.

__________
My spill:

I was interested in this study primarily because I'm studying Japanese currently and thought I'd like to hear some of the cultural implications in spending. I can't say that I learned just a whole lot but it was interesting. think we can all appreciate not wanting to burden or be a nuisance on people and keeping that in mind for designing any technology is important.

I would like to see future work address their design considerations listed in the end. Particularly in how NFC can be employed such that people aren't charged accidentally and being able to reverse a transaction should that happen. Also, I'd like to know whether it's possible to convert e-cash into say a credit card in the U.S. or if that's just a Japanese law.

Incorporating a rewards system for these kinds of transaction is a smart business move, I think. It keeps people motivated for using your card.

The authors also mentioned that the Japanese really focus on delivering aesthetic satisfaction in using their products. I think we should do that more in the states.

Tuesday, April 13, 2010

IUI '08: Designing and assessing an intelligent e-tool for deaf children

Authors:
Rosella Gennari Free University of Bozen-Bolzano, Bolzano, Italy
Ornella Mich Free University of Bozen-Bolzano, Bolzano, Italy

Paper Link:
http://portal.acm.org/citation.cfm?id=1378773.1378821&coll=ACM&dl=ACM&type=series&idx=SERIES823&part=series&WantType=Proceedings&title=IUI&CFID=81639924&CFTOKEN=12013848

In this paper, Gennari and Mich designed an intelligent e-web based program called LODE (LOgic based e-tool for DEaf children) that aimed at cultivating the reading and reasoning skills of deaf children.

The aim for their system is best understood in the light of the difficulties that the deaf experience in language. Deaf children have difficulty developing their reading and reasoning skills as they are largely deprived of the constant exposure to language. Deaf people encode information differently from those that can hear and organize and access knowledge in different ways. The deaf focus on details and images as opposed to relations amongst concepts.

Specifically, their system focused on "stimulating global deductive reasoning" on entire narratives. Their LODE system does this by extracting temporally sensitive words using a logic system and automated temporal reasoning. The system can logically arrange the given input language (Italian in this case) and generate global deductive reasoning questions based on the story.

The architecture of the system is based on a web client-server model composed of several modules:
1) e-stories database
2) Automated reasoner made up of:
a) ECLiPSe - constrainst based programming system
b) a knowledge base for ECLiPSe
c) domain knowldge of constraint problems formalizing the temporal information of the e-stories



The GUI consists of a simple page framed in yellow (for concentration) with a picture and a sentence from the story on a blue background (for calmness) along with buttons to go to the next and previous pages and a dictionary to look-up difficult words. Temporal words are highlighted in orange to draw attention to temporal concepts that the user should remember.



They tested their system with bringing together LIS interpreters, a logopaedist, a linguist expert of deaf studies, a cognitive psychologist expert of deaf studies, and two deaf children.

One kid who was 13 years of age completed the stories easily while another 8 year old kid had trouble navigating the interface. Feedback from the experts was positive.

______________
My Spill:

It's great that they're constructing advanced educational tools for deaf children. It seems like a great system for exceptionally young children, but I would think the questions for older children using the system would need to have stories and questions crafted by a human.

Their testing of the system was terribly insufficient. They needed to test more children using their system. Of course the experts on deaf studies will approve the system. After all, they are interesting in promoting work in their own fields.

IUI '08 (assignment): Temporal semantic compression for video browsing

Authors:
Brett Adams Curtin University of Technology, Perth, W. Australia
Stewart Greenhill Curtin University of Technology, Perth, W. Australia
Svetha Venkatesh Curtin University of Technology, Perth, W. Australia

Paper Link:
http://portal.acm.org/citation.cfm?id=1378773.1378813&coll=ACM&dl=ACM&type=series&idx=SERIES823∂=series&WantType=Proceedings&title=IUI&CFID=81639924&CFTOKEN=12013848

Adams et. al. set out a video browsing approach known as Temporal Semantic Compression (TSC) that allows for unique ways of browsing and playing video data based on tempo and interest algorithms.

With interest algorithms, which can be installed to the browser using customizable plug-ins, a video can be filtered in terms of what the user is looking for in the video. An interesting application highlighted in the paper is that of applying different interest algorithms based on the genre.

For example, we could use:
excitement algorithms for sports
anxiety for home surveillance and news story change
attention for home home videos
etc...



The controls for the temporal compression based video browser employ a 2d spatial control on the display screen where the horizontal axis controls the point in the video whereas the vertical axis controls the compression. (compression is the amount of the video remaining from the original. i.e. 20% compression leaves 20% of the shots from the original video. 100% compression only leaves the "most intersting" frame of the video.)



The main measure of interest to derive which frames are selected in compression is calculated by determining the tempo. Tempo is determined by the director of the video by using action, music, dialog to affect the audiences sense of the time in the film. This video compression browser measures tempo by the pan, tilt, volume.
The calculation is as follows:



3 timescales:
1) Frame level features are in the timescale in the original movie. Adjusts playback point.
2) Shot level features are in the timescale that weights the timescale durations as being equal.
3) Compression level is where the compression functions can be changed.

Example compression functions:

Default (linear) - playback is in a linear pace much like the regular playback and fast forward functions.

Midshot - takes a constant amount from each shot (section) chosen by the pacing algorithm

Pace Proportional - uses the pacing tempo to continuously vary the playback speed. When the tempo is low the playback increases leading to more playback from higher tempo sections. (i.e. the more important sections are favored for playback)

Interesting shots - Applies speed up and compression and entire shots that consist of lower tempos are left out.

Adams et. al. tested their system on several movies, news shows, commercial, cartoons and talk shows and found that their compression algorithm could successfully pull out meaningful and interesting chunks of shots from the clips.

Video: (should make it easier to understand)


__________
My Spill:

The Temporal Semantic Compression scheme is a great idea from my perspective. Most media players only support regular playback and fastforward and scene selection but I've never seen a browsing tool for choosing interesting parts of the video.
That's really cool.

The plugable functions could make the user able to search for different points of interest. (maybe I just want to find the action scenes in a movie.)

The real improvement in their interface would be to reduce the amount of metrics are shown so that screen space can be maximized.

IUI '08: Multimodal Chinese text entry with speech and keypad on mobile devices

Authors:
Yingying Jiang Chinese Academy of Sciences, Beijing, China
Xugang Wang Chinese Academy of Sciences, Beijing, China and Ministry of information Industry Software and Integrated Circuit Promotion Center
Feng Tian Chinese Academy of Sciences, Beijing, China
Xiang Ao Ministry of information Industry Software and Integrated Circuit Promotion Center
Guozhong Dai Chinese Academy of Sciences, Beijing, China
Hongan Wang Chinese Academy of Sciences, Beijing, China

Paper Link:
http://portal.acm.org/citation.cfm?id=1378773.1378825&coll=ACM&dl=ACM&type=series&idx=SERIES823&part=series&WantType=Proceedings&title=IUI&CFID=81639924&CFTOKEN=12013848

In this paper Jiang et. al. created a multimodal text entry system that uses both keypad and speech entry to reduce the amount of key-presses, time to enter the characters, and number of resulting possible characters to choose from when using a mobile device.

Jiang et. al. identify the problem of chinese text entry on mobile keypads as slow and arduous and set out to improve the input method for these characters. The current method is called T9 in which roman phonetic characters (pinyin) corresponding to the sound of the chinese characters are input and then the desired characters are selected from a list of homophones. However this is slow and arduous so the Jiang et. al. proposed a method called "Jianpin" where the initial sound of the each chinese character the user wants is input via keyboard while the user simultaneously says the word they wish to enter.

For example, if the user wants to enter "wang luo" 网络 (network) into a mobile phone using Jianpin, the user presses "95" which corresponds to "w.l" while saying "wang luo" then the user selects 网络 from several other homophones.



Here is an overview of the input method:


A user study was run with 4 college students where 50 words were inputted in both the T9 method and the "Jianpin" method. They measured the number of key presses it took to complete the 50 words with each method. The results are as follows:



_________________
My spill:
The Jianpin input system sounds like a great way to reduce ambiguity in the selection set as well as speed up input.
My only bone to pick is that the input scheme requires voice input. I can imagine being on a crowded street in china with hundreds of Chinese entering voice input into their cell phones just so they can text.
It's just more noise pollution that way.
If they can make a faster system without voice input, I'll be impressed.

Knowing Japanese, I was really interested in how the Chinese entered text since they don't have a phonetic system like the Japanese. In the end, it really isn't all that different.

Monday, April 12, 2010

CHI '08 (assignment): Reality-based interaction: a framework for post-WIMP interfaces

(Comment left on Brandon Jarratt's blog

Authors:
Robert J.K. Jacob Tufts University, Medford, MA, USA
Audrey Girouard Tufts University, Medford, MA, USA
Leanne M. Hirshfield Tufts University, Medford, MA, USA
Michael S. Horn Tufts University, Medford, MA, USA
Orit Shaer Tufts University, Medford, MA, USA
Erin Treacy Solovey Tufts University, Medford, MA, USA
Jamie Zigelbaum MIT Media Lab, Cambridge, MA, USA

Paper Link:
http://portal.acm.org/citation.cfm?id=1357054.1357089&coll=ACM&dl=ACM&type=series&idx=SERIES260&part=series&WantType=Proceedings&title=CHI&CFID=://tamuchi2010a.blogspot.com/p/assignments.html&CFTOKEN=tamuchi2010a.blogspot.com/p/assignments.html

In this paper, Jacob et. al. discuss the emerging methods of human computer broadly referred to as reality based interfaces (RBI) and identify the unifying themes and concepts of these methods.

The research team first notes that human computer interaction was initially done via command line instructions that were typed in through a keyboard. This method of interaction was cumbersome and relied on knowledge of the command the computer would accept. It was difficult to use in part because users could not use preconceived notions of interaction.

Next they identified that the current generation of HCI is direct manipulation of 2 widgets commonly known as window, icon, menu, pointing device (WIMP) interfaces.



Finally the emerging methods of interaction are reality based interaction (RBI) that they define as drawing from four overarching themes:
1) Naive Physics
2) Body Awareness & Skills
3) Environment Awareness & Skills
4) Social Awareness & Skills



The team notes that using RBI themes may enhance or inhibit:
Expressive Power
Efficiency
Versatility
Ergonomics
Accessibility
Practicality

The team uses Superman as an analogy saying that a strictly reality based representation of Superman would only allow Superman to walk and see like a regular man, but instead reality is traded off for the extra functionality of flight and X-ray vision.



The team demonstrates the four themes of RBI and the resulting tradeoffs in several case studies:
1) URP (a tangible user interface for urban planning)
2) Apple iPhone
3) Electronic Tourist Guide
4) Visual-Cliff Virtual Environment

The research team hopes this paper provides a scheme that unites the divergent user interfaces into a common framework that will be adopted by interface designers to create better systems in the future and that their research also provides a method to analyze future interfaces.

_____________
My Spill:

While their work is an interesting summary of reality based interfaces, I feel like this research didn't generate anything we didn't already know.
Reality is an ever emerging theme in CHI
and using reality based interfaces introduces several considerations and tradeoffs.

That's essentially all this paper was.
I'd like to see them present a set of ideal interfaces for a system or something.

The Superman analogy was nice.

Sunday, April 11, 2010

Rich interfaces for reading news on the web

Authors:
Earl J. Wagner Northwestern University, Evanston, IL, USA
Jiahui Liu Northwestern University, Evanston, IL, USA
Larry Birnbaum Northwestern University, Evanston, IL, USA
Kenneth D. Forbus Northwestern University, Evanston, IL, USA

paper link:
http://portal.acm.org/citation.cfm?id=1502650.1502658&coll=ACM&dl=ACM&type=series&idx=SERIES823&part=series&WantType=Proceedings&title=IUI&CFID=81639924&CFTOKEN=12013848

In this paper, Wagner et al. created the "Brussell" system which is an interface that compiles summary information on a news article.
Brussell also gathers background information from on the article from related articles and links and can construct a kind of summary of information leading to the main article. It does this by searching for related links and cross referencing the information against other articles to remove extraneous and possible erroneous information.

By giving a summary of an article, at a quick glance users can quickly assimilate news, background information, and current information on certain events.
Even more important is the fact that the system can work off a knowledge base construct a net of references to older material when looking at a current article.

To test the system the team created templates for several kinds of articles and defined a set of information that the system looks to fill in for the template.
The team also used a database of older articles that gave the system a knowledge base for the Brussell system. Then the system was run over 100 different news stories to measure the number of references found by the system. The system found an average of 4.1 references per article.

_________
My Spill:

The Brussell system creates an interesting addition to the data mining community by allowing casual users to weave a web of references and background information for news articles. I think the idea for the system is great. Allowing people to have a summarized view of current events could make the general populace more informed on current issues if the system is strong enough.

But that makes me think that the average user might not be motivated enough to use the system to become more educated on current issues, even though the given implementation may be easy enough to use. If the system could provide a means of rewarding the user for taking advantage of the system and reviewing material, then I think this kind of thing could be revolutionary.

I really wonder how "smart" the system really is...

Saturday, April 10, 2010

Obedience to Authority

Book Title:
Obedience to Authority: The Experiment that Challenged Human Nature

Author:
Stanley Milgram

This book presents the methodology, results, and formulated knowledge from the famous Stanley Milgram shock experiments and presents the thoughts of the man who created it.

Milgram begins by discussing the nature of obedience as it relates to everyday social life as well as the not-soon-forgotten implications of obedience as was displayed in the historical debacle known as WWII and the holocaust.

Milgram then sets up a method of inquiry into the nature of obedience with this Shock experiments in which a subject "teacher" at the command of an experimenter is to administer shocks to a "learner" who must answer a word pairing set up. The experiment is set up such that the learner who gets shocked is a paid actor who is faking pain and eventual death. The experimenter only replies to the complaints of the subject by telling him that the "experiment must continue" and that "there is no permanent tissue damage."

The shocks start small and increase in intensity until the shocks reach fatal levels.
The results of the baseline experiment showed that 65% of the subjects tested were obedient (remained in the experiment until fatal shocks were delivered)
Most subjects displayed extreme tension and anxiety during the experiment.

Several variations of the experiment were conducted including variations where the subject had to hold the learners hands to electrified plates to variations where the only signal from the learner was a light to indicate an answer. The variations of the experiment showed that increased proximity to the victim increased disobedience while increased proximity to the experimenter increased obedience.

Variations also included changes of personnel and personality types in the different roles and even multiple teachers. If the experimenter was not authoritative or a professional, disobedience would increase. If both learner and experimenter were professional authorities, the experiment would halt immediately.

In examining the results, Milgram found that people transition between two states of operations:
1) Autonomous State
2) Agentic State

In the autonomous state, the person is an individual where motivation and responsibility for one's actions are derived from one's own self. Here the overriding determinant of morality is the self which generally means that harm of others is avoided at all costs.

However, in the agentic state, the individual relinquishes responsibility for his or her actions onto the authority who issues commands presumably for a justifiable cause that benefits society in some way, whether the society is immediate or at some nebulous level. Since responsibility is with the authority, the judging of the morality of one's actions is bypassed and entrusted to the authority. Milgram argues that we are predisposed to obey the authority to preserve the structure of society. Morality is now viewed in terms of obedience, loyalty, duty, discipline, and self sacrifice.

Immediate Antecedent Conditions: (to entering authority)
Perception of Authority
Entry into Authority System
Coordination of Command with the Function of Authority
Overarching Ideology

Binding Factors:
Sequential Nature of the Action
Situational Obligations

Resolution of Strain:
Avoidance
Subterfuges
Physical Conversion
Dissent
Disobedience

Milgram notes that the steps to disobeying an authority are psychically painful and is done only as a last resort.
The steps toward dissobedience are:
Inner doubt
Externalization of doubt
Dissent
Threat
Disobedience

Milgram also mentions several stain-resolving mechanisms that help an individual remain obedient.
_______________________
My Spill:
Obedience to Authority, despite being several decades old, the book and the associated research has not lost any of the potency to shock (no pun intended) the reader at the apparent hardheartedness of humanity.

Milgram deconstructs and carefully examines each component of his experiment and comes up with a thorough theory of obedience that I think does much to explain the nature of authority.

With this viewpoint of authority, we begin to see man in a different light. A light where sources of authority should be held with great distrust as they hold with it the actions of every man under that authority.

The idea that really strikes me is how when switching to the Agentic State the nature of how one interprets morality is changed from one's own morals to the 'virtues' (perhaps principles is a better word) of obedience, loyalty, duty, discipline, and self sacrifice.

However, the Agentic virtues are only virtues when the aim of the authority is
toward benevolent ends accompanied with benevolent means.
Malicious ends and malicious means should be rebelled against!

Those in places of authority and power MUST act with morality.
In the end, we come to the quote (of debated origins),
"With great power comes great responsibility."
or as Jesus Christ says in the Gospel of Luke, chapter 12, verse 48: "For unto whomsoever much is given, of him shall be much required: and to whom men have committed much, of him they will ask the more."

Here's some more food for thought:
"...those of us who heedlessly accept the commands of authority cannot yet claim to be civilized men." -Harold J. Laski (Not that I'm a proponent of the Labour Party)

“Unthinking respect for authority is the greatest enemy of truth.” -Albert Einstein

Monday, April 5, 2010

Using salience to segment desktop activity into projects

Authors:
Daniel Lowd - University of Washington, Seattle, WA, USA
Nicholas Kushmerick - Decho Corporation, Seattle, WA, USA

paper link:
http://delivery.acm.org/10.1145/1510000/1502719/p463-lowd.pdf?key1=1502719&key2=9659840721&coll=ACM&dl=ACM&CFID=81639924&CFTOKEN=12013848

This paper outlines research that is a part of Smart Desktop which is an application for information management. The research itself is concerned with providing functions and algorithms for "predicting the project associated with each action a user performs on a desktop." The main goal of these methods is to incorporate salience which claims that more recent information is more informative.

Actions done within the Smart Desktop application are captured by the algorithm and marks the resources and information involved in that operation including timestamps, what actions were done, and which project the actions and resources are involved in.

By capturing and mining these resources for information management related knowledge, users can have access to useful data more quickly, making the users more efficient.

Resource Features: (R)
Resources mined from the SmartDesktop application including web browsers, email clients, and office applications.

Past Project Features: (P)
Resources mined from the previous project that the user was working. These features help to predict the kinds of actions that the user plans to perform.

Salience Features: (S)
Information mined from current actions and how they related to resource features. Salient features define a current relationship between actions, programs, and resources.

Shared Salience Features:
The above features are used to construct a full feature vector with associated weights to projects. However that creates a large overhead and "overfitting" that prevents the ability to generalize new projects or different users.
So the algorithms develop looked at the shared salience features between projects.

The algorithms testing the salience metrics were:
Naive Bayes (NB)
Passive Aggressive (PA)
Logistic regression (LR)
Support Vector Machines (SVM)
Expert System (Expert)

The methodology for testing the system tested several users within several companies. The data mined can be very personal, so it was obfuscated. Each algorithm was evaluated on the user data with different feature combinations.

Results for the Errors of each algorithm are shown in the table below:


The results of their study showed that the logistic regression and support vector machine algorithms were the best where SVMs were slightly advantageous. Since these algorithms supported salience, their good performance indicates that salience is an important metric to implement for smart system.

The passive-aggressive algorithm was more accurate than the Naive Bayesian algorithm for the salience based input metrics even though it seemed to distract PA from providing good information.

_____________
My spill:

It was difficult to tell what exactly the paper was aiming to produce within the Smart Desktop application. However, it was clear that providing efficient prediction methods to enable information workers is important and that providing salience metrics improves most algorithms' performances.

The future work for developing these better algorithms for these metrics would clearly be to train the SVM algorithm or logistic regression algorithm using an expert like system for each user.

It seems like from the data that adding combinations of feature data into the algorithms doesn't help their accuracy.

I very much like the idea of having (smart) predictive office applications that lessens the tedium of computer based office work and enhances decision making.

MediaGLOW: organizing photos in a graph-based workspace

Authors:
Andreas Girgensohn FX Palo Alto Laboratory, Palo Alto, CA, USA
Frank Shipman Texas A&M University, College Station, CA, USA
Lynn Wilcox FX Palo Alto Laboratory, Palo Alto, CA, USA
Thea Turner FX Palo Alto Laboratory, Palo Alto, CA, USA
Matthew Cooper FX Palo Alto Laboratory, Palo Alto, CA, USA

paper link:
http://portal.acm.org/citation.cfm?id=1502650.1502711&coll=ACM&dl=ACM&type=series&idx=SERIES823&part=series&WantType=Proceedings&title=IUI&CFID=81639924&CFTOKEN=12013848



MediaGLOW is a graph-based interactive workspace for organizing photos.
GLOW stands for Graph Layout Organization Workspace. Photos can be organized into stacks based on relatedness and distance to produce glowing areas of relatedness called "neighborhoods." Neighborhoods are indicated by a colored halo where the photos are located. Photos can only belong to a single neighborhoods however, their relatedness to other neighborhoods can be shown by overlapping neighborhood halos.

Relatedness of the nodes/photos in the graph are indicated by manually entered tag data associated which each photo. Related photos and neighborhoods are moved when a related area is moved to maintain the visual relatedness. Relatedness of photos can also be determined by the time the photos were taken or by geographical location of where the photos were taken. (Temporal and Geographical) Photos anchored to a node based on relatedness are indicated by a blue lines called "spring."

The user can zoom in and out of the workspace and conduct standard selection gestures as well as use a get all button to select all photos based on tags, places, or dates.



A user study was conducted using both a traditional photo organization program and media glow where users had to organize 450 photos into categories of:
nature
cuisine
nightlife
culture
architecture
Then they organized 3 photos from each category into a travel brochure.

The study showed that while the mediaGLOW interface was not as efficient as the traditional program, users stated that mediaGlow was more fun.
______________________

My spill:

MediaGLOW is a visually interesting program that makes use of some good metrics for organization, but the fact that they're placing photos on a blank interface makes the program LESS organized than other photo organizers that use a grid.

I like the idea of making interfaces more fun, but I think that makes the interface only useful for novice photo organizers where the more advanced metrics won't be as appreciated.

I like the idea of overlapping halos and geographic metrics for photos and having a clear interface probably keeps the workspace from being too obfuscating or overwhelming.

For future mediaGLOW work, I'd like to see the photos based on the relatedness based on inherent photo content somewhat like what the google image search does.

Tuesday, March 30, 2010

Simplified facial animation control utilizing novel input devices: a comparative study

Authors:
Nikolaus Bee University of Augsburg, Augsburg, Germany
Bernhard Falk University of Augsburg, Augsburg, Germany
Elisabeth André University of Augsburg, Augsburg, Germany

Paper Link:
http://portal.acm.org/citation.cfm?id=1502650.1502680&coll=ACM&dl=ACM&type=series&idx=SERIES823&part=series&WantType=Proceedings&title=IUI&CFID=81639924&CFTOKEN=12013848

Animating facial expressions can be difficult due to the fact that most facial expressions involve the simultaneous movement of different muscle groups.
Graphic designers can usually only move one certain muscle group through a bar slider and a mouse however this makes good facial animation difficult.
So the team used a gamepad and data gloves that would allow parallel editing using several different mapping schemes.

The model for facial manipulation that the team used was the "Alfred" (Facial Action Coding System) FACS based model with 23 action units.
FACS was used for Gollum in Lord of the Rings. King Kong, and Half-Life 2.



For the gamepad the team chose the XBOX 360 controller because it was ergonomic, cheap, familiar, and can be easily connected to a Windows PC.
The 360 controller offered 2 analog sticks and 2 analog triggers.
The sticks offer x and y axis control with both negative and positive values allowing for 4 different parameters to be controlled by one stick.
The other stick control used a circular polar coordinate based control.

The digital buttons and directional pad was used for other control functions like switching the current setting and action unit mappings.
Mapping of the controls for the gamepad included three settings:
1) Upper face with 7 action units
2) Lower face 1
2) Lower face 2 - Inner lips



For the data glove, the team chose the "P5 Glove" which was originally designed for gaming, making it cheap and widespread. The data glove can provide for 5 simultaneous movements. The data gloves can register one dimensional finger bends and the orientation and positions of the hand making it a near perfect candidate to replace the traditional slider. The P5 glove has the following features:
• absolute position (x,y,z), relative position (x,y,z), and rotation
(yaw, pitch, roll)
• finger bend
• three additional digital buttons
Mapping of the data glove used 6 settings:
1) Brows - 3 AUs
2) Lids - 3 AUs
3) Cheek and Nose - 3 AUs
4) Corners of the Mouth - 4 AUs
5) Chin and Inner Lips - 4 AUs
6) Lips - 3 AUs
Selection of the 6 settings was done by moving the glove horizontally

After running a correlation analysis of Action Units to expressions
of joy, anger, fear, sadness, disgust, and surprise and finding the frequency of use of certain actions to each emotion, the team devised a context-based control mapping where higher used AUs to an emotion mode.

Professional Study:
They introduced the devices by coaching
and listened to the game developers think aloud.

Directly mapped gamepad interface: Liked
Context mapped gamepad interface: difficult to orient and get familiar, less control

Data glove: less familiar, not accurate, physically tiring, noisy, selecting a setting was difficult.

Formal User Study:
1) How users get along with novel input versus sliders
2) Enjoyment?
3) Assessment of technical features:

5 point scale, 17 users age 20 to 40, 76% students
training phase - mess around with the device
modeling phase - recreate a photo expression
questionnaire

Accuracy Speed Satisfaction of expression
Gamepad 4.26 148.06 s 3.63
sliders 4.56 168.29 s 3.84
dataglove 4.94 263.31 s 3.30

Gamepad - best mean scores, reduced production time, no loss of quality, 49% preferred

Data glove - can focus on work, slower, low quality, low comfort, insufficient accuracy, 24% preferred

Sliders - 27% preferred, have to shift focus, accurate, reasonable satsifaction
Interaction experience
____________________________________________

My Spill:
I was surprised to learn that gamepads aren't already the standard method of input in facial design. It seems like the ergonomic design and multiple functions would lend itself to that role and the user study seems to reflect that.

That said, I wish the data glove would have produced some better results, it seems like you could make facial animations quickly with 5 different levels of control. Maybe they needed a better mapping scheme for it.

I think using sliders sounds boring and too difficult given that you can only control one thing at a time.

I'd like to see more work on the power glove to make it as efficient and enjoyable as the other two methods.

Saturday, March 27, 2010

Opening Skinner's Box

(Comment left on Jill Grezcek's blog)

Author: Lauren Slater

In Opening Skinner's Box, Lauren Slater examines 10 of the most influential psychological experiments of the 20th century and applies her own views and interpretations in nearly lyrical style to both entertain and illuminate readers on topics ranging from philosophy, existentialism, and views on the sacredness of life and the human mind.

The 10 psychological Experiments were:
1) Skinner's experiments on rats showing that autonomous responses are cued by rewards and reinforcements meaning that simple animals could learn complex tasks and skills and are more influence by reward than punishment.

2) Milgram's shock/obedience-to-authority experiments that had people put in the situation where they were instructed to shock another human being as punishment. The results showed that about 65% would deliver "fatal" shocks. The experiments were deemed unethical and dehumanizing and those involved were clearly changed by their involvement, although a few claim that they would not trade their experience.

3) Rosenhan's infiltration of psych wards by 8 normal people that challenged the foundation of psychoanalysis. 7 of the 8 infiltrators were held in the psych wards when they complained that they heard a voice say "thump" in their heads. The experiments made it clear that there was no clear way to psychoanalytically define mental conditions and that mental conditions may be more of a problem of perception and labels as opposed to actual illness.

4) Darley and Latane's discovery of the "diffusion of responsibility." Their experiments were inspired by the Kitty-Genovese case in New York City where a woman was killed and raped over a prolonged incident where there were as many as 38 eye-witnesses who did nothing to stop the assault. The Darley and Latane experiment itself had a subject sit in a room where they listened to a recording of a man having a seizure where others were supposedly listening. It took over 6 minutes for most to take action. Darley and Latane developed the five stages of helping behavior in response:
1 - You, the potential helper, must notice an event is occurring.
2 - You must interpret the event as one in which help is needed.
3 - You must assume personal responsibility
4 - You must decide what action to take
5 - You must then take action

5) Festinger and his "Theory of Cognitive Dissonance" where "The psychological opposition of irreconcilable ideas (cognitions) held simultaneously by one individual, created a motivatin force that would lead, under proper conditions, to the adjustment of one's belief to fit one's behavior" (Rather than vice-versa)
People would alter their beliefs to justify their behavior or their current circumstances. Slater investigated cognitive dissonance in Linda and her daughter who supposedly took in the pain of others to heal them. Festinger investigated a cult who believed in a cataclysmic event by aliens but never happened and observed the rationalizations and reactions of the believers. (They continued to believe despite evidence because they were explaining away their reactions)

6) Harlow's experiments on macaque monkeys and the nature of love and affection. Harlow deprived monkey's of their mothers and constructed a metallic surrogate with milk and a soft surrogate. The monkeys clinged to the soft surrogate. Harlow found that for proper development, proximity, touch, play, and affection is needed for primates. This led to changes in what kind of care is needed for infants to develop properly. Ironic that cruelty done to monkeys results reveals the nature of love and affection.

7) Alexander's experiments showing the nature of addiction being situational and cultural in his "Rat Park" experiment where caged rats and rats in an idealistic rat park were provided with clean and heroine laced water. Rats in the rat park stayed clean while caged rats got high. Furthermore rats forced to get high in rat park would overcome their addiction while going through withdrawal. A highly political situation.

8) Loftus's experiments on the nature of memory that showed that false memories can be easily created by mere suggestion. The chapter also showed her defense of people suddenly "remembering" traumatic childhood events that never existed. Her experiments had family members suggesting an episode of being lost in a mall to a subject. After 24-48 hours the subject would completely "remember" the fictional incident and minute details and feelings about the subject. Challenged the ideas of repression and countered popular thought and trends of the time.

9) Kandel conducted experiments on sea slugs to demonstrate the biological nature of learning and memory on a neuron level. He discovered CREB which switches on the genes needed to produce proteins that create permanent connection between cells which is how learning and memory is created. Drugs are being created to use this compound to enhance learning and recall in humans which raises ethical questions.

10) Moniz and his lobotomies that relieved anxiety and severe psychological symptoms. While lobotomies became popular for about 2 decades thanks to Moniz, there was a backlash when less precise (but also less invasive and controversial) pharmacological alternatives were provided. Although the mysteries of the brain are still many, today, lobotomies have become much safer and more precise and may actually be a preferable method to pharmacological alternatives. Lobotomies remain taboo and are hard to find mostly because the perception of lobotomies themselves and the fact that the surguries may "eliminate the spark" that makes us human. In essence, the brain is sacred.

__________

My spill:
The book itself is an interesting read that brings up many intriguing and controversial questions about the nature of the mind and the sanctity of life.
The field of psychology has historically been a hard field to classify and I think this book addresses that point and most of its facets, including its human element, quite well.

The book is written in an artsy style that didn't quite meet up to my sensibilities as to how these kinds of subjects should be addressed. Don't get me wrong, I like the arts and can appreciate cultured expositions, but Slater's presentation of the material felt strained, dishonest, and too skewed to her own perceptions. I think if I met this woman, that her and I would disagree on a large number of issues.

It is interesting to note that East Asians are more comfortable with paradoxes at a biological level.

Tuesday, March 23, 2010

The Inmates are Running the Asylum (part 2)

(Comment left on Aaron Loveall's blog)

The second half of TIARTA was more a prescriptive fix on how
companies could implement effective design into their software development
process.

Cooper pointed out that businessmen of the company are too focused on viability
while programmers are too focused on capability.
Cooper claimed that designers bridge the gap by providing desirability and earning customer loyalty.

Cooper recommended that companies should spend more time early in the devo
process and clearly identify goals of the system and specific fictional target users that reflect reality(called personas) and design for those crucial people.

Cooper also said that well designed software should:
be interesting in me
forthcoming
have common sense
anticipate needs
be responsive
taciturn about its personal problems
well informed
perceptive
self-confident
stay focused
fudgable
give instant gratification
be trustworthy

After defining personas and tehir goals scenarios can be constructed
and should be defined in breadth rather that in depth.
-Daily uses: well designed
-necessary uses: available
-edge cases: addressed

Cooper then discusses design-friendly business practices and conceptual integrity in product vision. (Like not letting users or programmers run the process, but designers.)

_________

My spill:

The second half of the book seemed more practical and useful since it actually prescribed some action for good business and software development.

All of the recommendations made by Cooper seemed reasonable.
Design is important and should come first and goal-oriented persona based design
seems quite reasonable.

Cooper still seems like he's bashing programmers a little bit though.

Thursday, March 11, 2010

Extending 2D Object Arrangement with Pressure-Sensitive Layering Cues

(Comment left on Jillian Greczek's blog)

Authors:
Philip L. Davidson Perceptive Pixel, Inc, New York, NY, USA
Jefferson Y. Han Perceptive Pixel, Inc, New York, NY, USA

Paper Link:
http://delivery.acm.org/10.1145/1450000/1449730/p87-davidson.pdf?key1=1449730&key2=6198338621&coll=ACM&dl=ACM&CFID=81067528&CFTOKEN=37358406

Davidson and Han provide a pressure-sensitive depth sorting technique that utilizes two dimensional multi-touch manipulation techniques.

This depth sorting is most commonly known as layering.
Layering of objects in current models is usually done via a mouse input where commands are done with a relative control model where operations are done discretely.
For example you can click an element and tell it to "go to back" or "bring forward"

However with tabletop systems coming that utilize multi-touch technology, these kinds of commands are awkward.

Davidson's and Han's system allows the user to tilt and uplift objects on a pressure sensitive multi-touch surface and accurately manipulate and sort objects.



The system has several features:
-Windows/objects can be peeled back to uncover objects below
-Pressure Sensing that allows the exact ordering relative to other windows
-Multi-touch commands for resizing, moving, tilting, and rotating objects
-Audio and haptic feedback for overlap and tilting events.

The system also offers the benefit of minimizing the amount of control artifacts relating to depth sorting on the UI.

________________

My spill:

This is a very solid piece of research pertaining to layering of objects and I can definitely see how this will be useful in tabletop system and projection smart spaces.

I agree with their observations in future work where permanently curled and folded corners would be a benefit. I'd also like to see how this kind of interaction could be applied to more three-dimensional figures on an interactive surface.

I really don't see too many drawback to this work.

Towards More Paper-like Input: Flexible Input Devices for Foldable Interaction Styles

(Comment left on Aaron Loveall's blog)

Authors
David T. Gallant Queen's University, Kingston, ON, Canada
Andrew G. Seniuk Queen's University, Kingston, ON, Canada
Roel Vertegaal Queen's University, Kingston, ON, Canada

Gallant et. al. present Foldable User Interface (FUI), a paper like input device for more organic/paper-like manipulation of on screen objects like windows, pages, or three dimensional models.

The main benefit of their interface is that it is cheap, unlike most similar input devices. It also is fairly robust and accurate.

To implement FUI, they used an IR webcam, an LCD screen, and a foldable input device (FID) made of black cardstock with 25-35 infrared reflectors made out of 3M retro-reflective tape.



FUI has several interaction techniques:
Thumb Slide - Select, click, pop-up menus
Scoop Shape
Top Corner Bend - Bookmarking
Hover - Magnify/Zoom
Fold - Helps create 3D models
Leafing - Turning Pages
Shake - triggers discrete events (like sorting)
Squeeze

Navigation is done by moving the FID itself.
________________
My Spill:

While the FUI is an interesting idea with good observations about the properties of paper, I don't see this being a widespread method of everyday human-computer interaction.




However, I could see where this kind of technique might be useful in creating three dimensional models for certain kinds of specialists.

If they presented a literal desktop interface where documents are represented by paper like objects (I know there are a few out there) this system would be much more appealing. But for current GUIs, this isn't a very useful input system.

Wednesday, March 10, 2010

Annotating Gigapixel Images

(Comment left on ________'s Blog)

Authors:
Qing Luan University of Science and Technology of China, Hefei, China
Steven M. Drucker Microsoft Live Labs Research, Redmond, WA, USA
Johannes Kopf University of Konstanz, Konstanz, Germany
Ying-Qing Xu Microsoft Research Asia, Beijing, China
Michael F. Cohen Microsoft Research, Redmond, WA, USA

Paper Link:
http://portal.acm.org/citation.cfm?id=1449715.1449722&coll=ACM&dl=ACM&type=series&idx=SERIES301∂=series&WantType=Proceedings&title=UIST&CFID=81067528&CFTOKEN=37358406

Luan et al.'s system provides a way of annotating gigapixel images with three kinds of annotations:
1) Looping Sounds
2) Triggering Narrations
3) Visual Labels

The system also exhibits hysteresis where sounds persist after moving away and strength the of the sound increases as we get closer.

Smaller annotations gradually appear as the user stays on a particular part of the image.

The user can add the annotation and provide audio files. The size of the annotation marker is referenced against the size of the original file and the strength of the associated audio files and size of the annotation labels is determined by that reference value.

_____________

My spill:

This kind of annotation system has obvious applications for things like google earth, space maps, and biological systems.

Providing these annotations make exploring these kinds of systems more fun, engaging, and informative and I could see this system coming to use for education purposes very easily.

The only thing I would want for this is a way to change your initial point of view to a place within the image when you're either zoomed in or panned out.

Lightweight Material Detection for Placement-Aware Mobile Computing

(Comment left on Randy Ransom's blog)

Authors:
Chris Harrison Carnegie Mellon University, Pittsburgh, PA, USA
Scott E. Hudson Carnegie Mellon University, Pittsburgh, PA, USA

Paper Link:
http://delivery.acm.org/10.1145/1450000/1449761/p279-harrison.pdf?key1=1449761&key2=3175428621&coll=ACM&dl=ACM&CFID=81067528&CFTOKEN=37358406

Harrison and Hudson develop a lightweight cheap sensor for detecting the placement of mobile devices such as cell phones, ipods, and laptops. This sensor allows the device to detect the context/location that it is placed in and react accordingly.

For example a cell phone placed in a pocket doesn't have to light up it's screen to let the user know that a call is incoming. It just has to ring or vibrate. By preventing the screen from lighting up, the phone can save power and extend it's battery life.

The sensor that they implemented
1) provides info on space surrounding the device
2) Requires no external infrastructure to operate
3) The resulting data is available to use by the device.



The sensor itself is made of photoresistor which measures light intensity and a TSL230 light to frequency converter.
The sensor also has light emitting diodes
1) Infrared
2) Red
3) Green
4) Blue
5) Ultraviolet
that illuminates the surrounding area so that the sensors can pick up the reflected light back toward the device and proceed to deduce what kind of environment it finds itself in.

The sensing routine takes only 25ms and results in very low power consumption of 20mA when active.

They tested 27 sample materials over 6 trials where the first 5 trials trained the naive Bayes classifer and the 6th determined teh accuracy of the sensor. They found that the overall accuracy of the device was 86.9%

Next they conducted a 16 person survey of the environments that several mobile devices found themselves and the materials in those environments.

With those materials they ran the previous tests and found that the accuracy was now 94.4%

___________

My spill:

I'm all for making devices smarter by identifying the context of the device. This makes machines more useful and require less intentional commands to get what you want out of it.

I think that they've succeeded for the most part in devising their sensor. Hopefully businesses will pick up on this sensor and implement it in their devices.
With a little bit of advertising. We could see a new generation of mobile devices that are smarter and more energy efficient.

Their work seemed pretty flawless on the sensor itself. If they had to make any improvements, I would have expanded the number of people they got to take the survey or maybe take the time to implement their sensor in a number of devices and implement a few uses to showcase their work.

Wednesday, March 3, 2010

Understanding the Intent Behind Mobile Information Needs

(Comment left on Jarratt Brandon's blog)

Authors:
Karen Church Telefonica Research, Barcelona, Spain
Barry Smyth University College Dublin, Dublin, Ireland

Paper Link:
http://delivery.acm.org/10.1145/1510000/1502686/p247-church.pdf?key1=1502686&key2=3607667621&coll=ACM&dl=ACM&CFID=76752576&CFTOKEN=55465958

Church and Smyth conduct a diary based study on mobile internet usage and compare it to the non-mobile internet usage.
They defined mobile usage on a location basis where away from the home or office was "mobile." This allowed them to take into account all mobile devices.



The research team divided the entries into 6 location contexts:


Their study had 20 participants with an average age of 31 enter diary entries/descriptions for all their information needs via internet search over a month.
They reminded their participants of the study once week to enter entries.
Their methodology principle was to minimize the amount of interaction so that study participants would generate very natural data that could be analyzed.

The study generated 405 entries, of which 67% of the entries were made when mobile.
34% of the entries were in the on-the-go location context.

Their study also revealed that, in addition to Broder's classification of search
-intents,
-navigational
-informational
-transactional

there needs to be 2 new categories

-geographical
-personal information management (PIM)(personal items, tasks, scheduling)

In addition their study generated the info that mobile search engines should be able to pick up on location, time, context, user's activity.
Also a large portion of mobile searches are non-informational and principally geographical.



_________

My spill:
This sounds like an interesting study in trying to make mobile searches more effective and sensitive to user demands based on the mobile context. The study pretty much generated what you would expect in a study like this.

Mobile devices need to be able to search for directions and have an idea your location to generate useful search information.

I think the biggest improvement in mobile searches needs to be done in making interfaces that are easier to view and navigate as well as making the infrastructure for such searches MUCH quicker. In fact many users in their study cited that a main reason why they didn't even search using a mobile device.

So I guess I'm trying to say that they're putting the cart before the horse by doing this study. Then again, I'm being too critical.

I like the idea of trying to make a device that is more aware about context. That's a major flaw in computer system in more way than one: language translation, understanding user input... That would make computers much smarter.

Although I think they did an excellent job with their methodology, I would have liked to seen more users in this kind of study to make a more accurate view of user behaviors. Future work might be to try and include searches that can more accurately identify personal information searches and tie the searchers in with scheduling programs and the like on the mobile device, or maybe interface with your own computer at home.

Saturday, February 27, 2010

Collaborative Translation by Monolinguals with Machine Translators

Authors:
Daisuke Morita Kyoto University, Kyoto, Japan
Toru Ishida Kyoto University, Kyoto, Japan

Paper Link:
http://delivery.acm.org/10.1145/1510000/1502701/p361-morita.pdf?key1=1502701&key2=4634337621&coll=ACM&dl=ACM&CFID=76752576&CFTOKEN=55465958

Morita & Ishida created a collaborative translation process that allows monolingual people to communicate reliably with a machine translator as the intermediary.

In this system, one person acts as the source language provider that enters sentences in one language. The source language sentence is translated by a machine translator and viewed by the target user who only speaks in the target language.

The target language user modifies the sentence so that the sentence makes sense and sends it back to the source user via the machine translator. If the sentence sent back to the source user has the same meaning as the one originally sent. Then the source user accepts it and the meaning of the sent message is confirmed for both sides of the communication. However if the sentence sent back does not have the same meaning, then the process is repeated.

The collaborative system solves two important problems in machine translation:
1) Mistranslation by the machine translator.
2) Nonsensical translations provided by the machine translator. (usually a result of puns in the source language or extreme differences in sentence structure)

The software associated with the system also provides highlighting to give a sentences progress in translation.

_______________
My spill:

Since I always deal with the google translator and the retarded translations it gives for any Japanese I give it, I think this is a wonderful tool.

The main problem I see is that communicating this way would be agonizingly slow. I guess the real use for this system is if your really want to ensure there are no miscommunication like for a business deal.
But for casual situations, this would be too slow.

It's also worth noting that this paper had a ton of grammatical errors which makes it hard to take this work seriously. (But translations from English-Japanese and Japanese-English can be difficult so I'll forgive them. ^_^)

In future work, I'd like to see some way to not only ensure that the meaning is preserved but that the grammar would be intact and the quality of the sentence is maintained. That would be impressive.

Multi-touch Interaction for Robot Control

(comment left on: Jacob Faire's blog)

Authors:
Mark Micire University of Massachusetts Lowell, Lowell, MA, USA
Jill L. Drury The MITRE Corporation, Bedford, MA, USA
Brenden Keyes The MITRE Corporation, Bedford, MA, USA
Holly A. Yanco University of Massachusetts Lowell, Lowell, MA, USA

Paper Link:
http://delivery.acm.org/10.1145/1510000/1502712/p425-micire.pdf?key1=1502712&key2=3044037621&coll=ACM&dl=ACM&CFID=76752576&CFTOKEN=55465958

The researchers in this paper developed a multi-touch interface to control an Urban Search-and-Rescue Robot (USAR).
Their primary objective was to observe how users would interact with the affordances provided in the control interface and what information could be generated based on those observations.

In their controller they provided a digital screen that showed:
1) A map generated by the robot as the user explored a space.
2) A front view display
3) A rear view display
4) A generated display of the area immediately surrounding the robot
5) A control panel with 4 directional arrows, a speed control slider, and a brake button.



With this controller, they had 6 users who were trained to operate the robot with a joystick, operate the robot using their multi-touch controller.

The results of the study showed that the controller they had designed generated a wide array of emergent behavior and emphasized that they needed to provide clearer affordances in the controls and provide separate camera and movement controls.

_____________________

My spill:
I think the idea to use multi-touch for robots makes sense, but I feel that their approach wasn't very ambitious or original. First of all most video game developers could have told them how to make an efficient controller for an entity separated from the user. They could have told them that a separate camera and movement controller was essential.

For the future work, I'd like to see them implement a control for a robot that has something more of a human form where the user could control arms and legs with the user's own arms and legs.
That would cool.

Providing clear affordances is important and they really needed to focus on that a bit more I think.

Thursday, February 25, 2010

CRAFTing an Environment for Collaborative Reasoning

(comment left on Patrick Webster's blog)

Authors:
Susanne C. Hupfer IBM T.J. Watson Research Center, Cambridge, MA, USA
Steven I. Ross IBM T.J. Watson Research Center, Cambridge, MA, USA
Jamie C. Rasmussen IBM T.J. Watson Research Center, Cambridge, MA, USA
James E. Christensen IBM T.J. Watson Research Center, Cambridge, MA, USA
Stephen E. Levy IBM T.J. Watson Research Center, Cambridge, MA, USA
Daniel M. Gruen IBM T.J. Watson Research Center, Cambridge, MA, USA
John F. Patterson IBM T.J. Watson Research Center, Cambridge, MA, USA

Paper Link:
http://delivery.acm.org/10.1145/1510000/1502704/p379-hupfer.pdf?key1=1502704&key2=4081617621&coll=ACM&dl=ACM&CFID=76752576&CFTOKEN=55465958

This paper covers research on collaborative reasoning and sense making on large scale "wicked" problems.
They described sensemaking as "a motivated continuous effort to understand connections (which can be among, people, places, and events) in order to anticipate their trajectories and act effectively."

The goal of their research on collaborative reasoning and problem solving led them to focus on the aspects of semantics, collaboration, and adaptability and designed a system to guide collaborative problem solving based on these goals.

To this end, the IBM team developed CRAFT (The Collaborative Reasoning and Analysis Framework and Toolkit) which allows for a generalized and visualized way to create an ontological model (basically an object oriented visualization system) that tracks relationships between entities. This visualization system provides a lengua franca (common tongue) to exchange information between members of an investigative team regarding a problem being solved.



Alongside entity tracking, the system can continually update and evolve existing data and meta data on the objects with continual inquiring and searching.

The system also allows awareness of multiple entities and users who are using the CRAFT system and see what inquiries and updates they have made to the system.

Making an inquiry into the system can uncover previously made inquiries by other entities and new entities that share a name can be flagged for either identifying the entities as the same entity or disambiguating the same-named entities.

CRAFT also provides investigation nodes that allow the user to question, hypothesize, inquiry, and gather evidence for a particular model, question, or investigation. These investigation nodes allow the user to model a particular scenario and gather evidence for or against that scenario.
(For example an investigation node on a corporation might include stock quotes relevant information and expert opinion on the action of the stocks.)

_______________________

My spill:

This sounds like an interesting problem. You always hear in the news about multiple organization that are unable to cooperate and collaborate critical information leading to some disaster or another. The CRAFT system seems like it could solve these kinds of problems.

They did mention that they needed to implement a system that had access to the internet. I think that kind of feature is critical for a system of this kind just so the system can have access to the gigantic amounts of data available on the web and the ability to collaborate with multiple entities over several networks.

I could see a big problem with information security on a collaborative system especially for criminal investigations and the like.

Intelligent Wheelchair (IW) Interface using Face and Mouth recognition

(comment left on Kerry Barone's blog)

Authors:
Jin Sun Ju Konkuk University, Seoul, South Korea
Yunhee Shin Konkuk University, Seoul, South Korea
Eun Yi Kim Konkuk University, Seoul, South Korea

Paper Link:
http://delivery.acm.org/10.1145/1510000/1502693/p307-ju.pdf?key1=1502693&key2=2377217621&coll=ACM&dl=ACM&CFID=76752576&CFTOKEN=55465958

Ju et al. developed an intelligent wheelchair system that does 4 objectives:

1) Make a non-intrusive system for controlling a wheelchair that can be used by those disabled from the neck down.
2) Make the system usable for all times of the day.
3) Make the system accurately discriminate between intentional and non-intentional commands to decrease user frustration and system correctness.
4) Make the system able to recognize and avoid obstacles.



Their first objective to provide a non-intrusive system meant that they had to avoid any kind of objects that touched the face or head to control the system. So they used a Logitech PC camera to monitor the face's orientation, eyes movements, and mouth position. The user can tilt their face and eyes left or right to indicate that they want to move in those respective directions while the mouth shape controls the forward movement where a "Go" position signaled forward and an "uhm" position signaled the IW to stop.

Objective 3 was accomplished by making the system recognize when the user was facing forward or looking in another direction. If the user was facing forward, then commands were accepted. Else, they were ignored as non-intended commands.

Their fourth objective was achieved by implementing 10 range-sensors (2 ultrasonic and 8 infra-red) that detect the area around the IW (intelligent wheelchair). Faults of the system included a few blind spots around the IW that caused it to bump into objects in those blind spots.





The first study, they measured the accuracy of the facial recognition interface by setting the users in varying environments of lighting and backgrounds and found that the average time to process a frame was 62ms resulting in 19 frames processed per second. They also measured the recall and precision of their four commands (left,right,stop,go) and found that the average recall by users was 96.5% and the precision of the commands were an impressive 100%. Half of the users were able-bodied and the other half had disabilities.

In the second study, 10 able bodied users (1/2 male, 1/2 female) used three kinds of wheelchairs (joytick controlled, headband controlled, and the IW system) to navigate a course and the time to complete the run was measured.

They found that the joystick was the quickest method both before and after training and that the headband method yielded about a 2 second improvement in speed over their system. Once the user was trained in the methods of wheelchair control, the IW system was slightly better (by a few milliseconds) than the headband method.

________________
My Spill:

I think this system provides a very reliable way to provide the extremely disabled with a way to navigate their wheelchair that is free from the annoyance of intrusive methods of control, which is good.
However I think they need to consider the people who can't even necessarily control their neck muscles for movement (of which I know a few). But then again you can't please everyone.

The fact that the system works as well as the headband method is encouraging and it is interesting they provide obstacle recognition.

They need to implement better sensing of the surrounding environment to have a truly intelligent chair, but I think that is a relatively minor problem. I am worried about how much energy consumption it takes to run all these sensors and to power the computer to translate all this data.

Also they need to be able to provide more complex controls to refine the movement of the IW.

Emotional Design

(comment left on Nic Lupfer's blog)

by: Donald A. Norman

In his book, Norman goes beyond the simple usability and accordance in the design of everyday things and analyzes the emotional aspects of design and how it is that we feel about the things we encounter on a day-to-day basis.

Norman identifies three levels of emotional design:

Visceral - Associated with the most primitive and inbuilt instincts and intuition given to us by nature. This is how judge things as "pretty" (which concerns symmetrization, bright colors, etc...) or maybe "good tasting" (usually sweet things).

Behavioral - How the product feels and operates. This is the rational level of design and includes how well the instrument fulfills its purpose.

Reflective - Appeals to our emotions and includes the cultural influences that affect how we feel about the product. It makes us happy or we learn to like things bitter or sour things through "acquired taste" by reflecting on it. (see reflective level)

Norman also emphasized the role of making things fun to improve the value of a product and discussed the role of emotions in communication devices and devices where the attachment to the device comes from interpersonal actions.

From there Norman addressed his findings as it applied to robotics and his idea that effective artificial intelligence would need to have emotions reflecting the three levels of emotional design.

__________________
My spill:
Norman's ideas in this book differ greatly from his masterpiece "The Design of Everyday Things," but I think it encompasses a much more realistic (if less concrete) representation of value and use of everyday things.

Norman's split of emotional design into the three levels of visceral, behavioral, and reflective follow the traditional theories on the idea of self and seems to be an enlightening way to look at the world.

I thought Norman's ideas were most interesting when applied to the world of robotics. I've come to believe that "emotions" truly are the missing link in creating strong AI.

I will criticize Norman's work in that it doesn't seem to provide a clear guideline for maximizing any of the three aspects of emotional design although it did provide many examples of good design in each area. This makes it hard for the engineer/designer of these things to consistently make use of his ideas.

Wednesday, February 17, 2010

Learning from IKEA hacking: i'm not one to decoupage a tabletop and call it a day.

(Comment left on: Nicholas Lupfer's Blog)

Researchers:
Daniela Rosner - School of Information - University of California - Berkeley CA
Jonathan Bean - Department of Architecture - University of California - Berkeley CA

Paper Link:
http://delivery.acm.org/10.1145/1520000/1518768/p419-rosner.pdf?key1=1518768&key2=7380346621&coll=&dl=&CFID=76581338&CFTOKEN=95309429

Rosner and Bean conducted a qualitative study on a particular online community that identifies themselves as IKEA hackers to examine the increasing trend of interest in personalization and Do It Yourself (DIY) culture. IKEA hackers provide "an intersection between online culture and the material world of creative practitioners."

IKEA hackers take IKEA products and modify them to create unique products. Examples of IKEA hacking include the GYNEA chair, two IKEA chairs made into a single gynocology chairs with comfortable leg rests.



The study was a simple set of 9 1=2 hour interviews where the researchers questioned the hackers about their motivations, inspirations, and various creations.
The researchers discovered 3 themes:
1) Identity and Creativity
2) Technology
3) Hacking

Most hackers felt a kind of creative expression in IKEA hacking that made them feel like valued individuals while simultaneously being identified as part of a community that enjoyed similar interests of creating useful and unique products.
One participant labeled this idea as "non-concurrent collaboration."

Most hackers felt a kind of satisfaction from the haptic sense of physically manipulating objects and noted that they couldn't get this sensation from traditional computer based hacking.
They also noted that they felt RL (real life) hacking had a constructive feel which opposed the kind of destructive feel that computer based hacking has.

____________
My Spill:
One interesting idea to come out of this idea was that parties interested in collaborative design should provide tools to encourage the performance of collaborative values as well as a common medium for collaboration itself.
There has to be some kind of business idea that could take advantage of that.

It is interesting to see how much web based culture can intersect with RL.

The drawback of this study is probably a lack of quantitative data to work with.
But from the perspective of the paper, it is probably on a minor drawback.
I would have liked them to discuss a few more IKEA hacking creations to get a better feel of the process of it.