This report concerns the issues of human-robot interaction, in particular, the issues related with using the Internet as a communication medium. Various interface architectures and models are reviewed. The report covers also languages used for robot programming, knowledge representation, and inter-agent communication. Related Internet technologies and standards are outlined. Finally, the proposal is made for utilization of a human-robot interface via Internet for the monitoring and control of an autonomous guided vehicle.
Whether working in a human environment or performing a task at a remote site, most of modern robots eventually interact with humans. In the latter case it has a lot in common with a human-computer interaction. The former case is essentially different, from the first view, due to the presence of physical agent. However, because human, at least in a significant part, is also a physical agent, modern techniques for human-computer interaction account for this factor by introducing variety of multimodal interfaces including (a) virtual reality, where an artificial physical agents are created to interact with human, (b) sensors to facilitate the physical interaction with a human, and (c) actuators to give physical feedback. These techniques overlap significantly with the vision of how robots should interact with humans: indeed, a robot may be seen as a computer with actuators and sensors, just that not all of them are dedicated exclusively to interfacing with a human, as opposed to human-computer interfaces.
In this report we restrict our scope to the software-related issues of the human-robot interfaces (HRIs), emphasizing on interface architectures and languages.
Starting from early robots, the HRI has been primarily concentrated in two areas: offline programming and teleoperation. The interface and language issues of robot programming will be discussed in 0.1.2. As for the teleoperation, despite the fact that the issue is not new, the recent developments in communication and computer technologies brought new possibilities and new problems. Fast reliable networks are widely used for advanced teleoperation applications (e. g. remote interaction via ISDN [49]). Low price, wide availability and ever growing quality of the Internet services made it an attractive medium for teleoperation communication [4,50,46,9,38]. However, unlike broadband communication channels, the Internet does not provide high quality-of-service guarantees.
Lack of high quality-of-service guarantees over the Internet limits the applications of Internet-based telerobotics and suggests using more complex interface models, i. e. interface agents that control the robot at a higher level rather than through direct teleoperation, thus relaxing the time constraints imposed by the real-time robotic system. Despite the fact that there are agent applications on robots [27,39] and agent-based interfaces [34,41], few works combine agents embedded in robots and user interface agents in a single agent space [30,47].
Another problem related to teleoperation is data representation. Works [28,29] prove that the operating in configuration space significantly improves human performance in teleoperation of a robot manipulator.
Advances in virtual reality technologies help to cope with inherent problems of teleoperation, i. e. delays, feedback et al. The solutions to cope with these problems include prediction and estimation techniques using multiple sensors [1,5,25,26,40], macro programming techniques [26,51], and task oriented concepts [33]. Work [14] recognizes necessity to ``give commands to the robot system on different levels of abstraction'' to overcome these problems.
Modern robotics target service and entertainment markets, and thus the robots shall be equipped with more human-friendly interfaces. The two natural way of communicating between humans are verbal and visual communications.
Early works on verbal communication include task specification in
descriptive manner, similar to the conversational patterns between
humans, i. e. ''turn left at the 3rd intersection and go to the 2nd
intersection, which is three-forked'' is given as:
<3, *, left> <2, 3, goal>
where ``*'' means that attribute is unknown [42].
Later works use this kind of a language for a data representation in a
robot knowledge-base, but the data is generated via the true verbal
communication in a human language [7,13,35].
The more multimodal communication interfaces are suggested via using verbal
and nonverbal communications, e. g. gestures in [48].
Even more complex models where utilized for MUSIIC (Multimodal User
Supervised Interface and Intelligent Control) [32] where
speech, gestures and robot's own autonomous planning techniques where
integrated in a single multimodal user interface system.
There have appeared several proposed common languages for agent communication and knowledge representation. The agent communication languages include KQML (Knowledge Query Manipulation Language) [10], FIPA (Foundation for Intelligent Physical Agents) [11], AOP (Agent-Oriented Programming) [43] and Telescript [52]. The first order predicate calculus-based language for expressing the content of a knowledge-base is the KIF (Knowledge Interchange Format) [21].
This chapter will discuss agent languages and agent system architectures. In particular, the emphasize is made at the communication and knowledge representation languages and agent communication scenarios.
The major agent communication languages are AOP (Agent-Oriented Programming) [43], Telescript [52], KQML (Knowledge Query Manipulation Language) [10] and FIPA (Foundation for Intelligent Physical Agents) [11].
Agent-Oriented Programming (AOP) emphasizes the intentional description of an agent and comes with a programming language in which one can program agents that communicate and evolve. In AOP agents are defined as entities whose state is viewed as consisting of mental components such as beliefs, capabilities, choices and commitments. AOP also introduces a formal language with syntax and semantics to describe the mental states and an interpreted programming language, AGENT-0, that has semantics consistent with semantics of mental state. AGENT-0 includes a communication language that introduces primitives for the interaction of agents. These primitives are speech acts; their semantics are provided in terms of their execution (the communication acts update the belief and the commitment space of an agent) and may be executed conditionally (depending whether certain mental states hold). AGENT-0 was intended as a prototype to illustrate principles of AOP and is limited in many ways. The facts expressed in AGENT-0 have to be atomic sentences (no logical operations are allowed) and commitments can be only for primitive actions. AGENT-K is a language in the AOP that extends AGENT-0 to use KQML for communication (more on KQML in section 1.1.3).
Telescript was developed by General Magic, for applications of electronic commerce. It defines an environment for transactions between software applications over network. Agents in Telescript travel over the network, carrying both procedures and data, and perform actions on data at the transport location, instead of exchanging the data. This paradigm is important in terms of bandwidth usage and security issues. Telescript is an interpreted, communication-centric language executed by the Telescript engine that has access to the application environment through an API. The language has attracted the interest of commercial vendors for electronic commerce applications. The problems related to usage of Telescript is how the Telescript-agents communicate with software of traditional client-server paradigm. This question still remains open.
Knowledge Query Manipulation Language (KQML) is the result of work of External Interfaces Group of the Knowledge Sharing Effort consortium that is founded to facilitate interoperability in today's computing environment. KQML is a both message format and message-handling protocol. to support run-time knowledge sharing among agents. The key features of KQML may be summarized as follows:
KQML is an attempt to dissociate knowledge representation from communication language, which should define a set of standard message types that are to be interpreted identically by all interacting parties.
The KQML language may be viewed as consisting of three layers: the content layer, the message layer, and the communication layer.
The content layer bears the actual content of the message, in the agents own representation language. The content message can be expressed in any representation language and written in either ASCII strings or one of many binary notations. All KQML implementations ignore the content portion of the message except to the extent that they need to recognize where it begins and ends.
The syntax of KQML is based on a balanced parenthesis list. The initial element of the list is the performative and the remaining elements are the performative's arguments as keyword/value pairs. Because the language is relatively simple, the actual syntax is not significant and can be changed if necessary in the future. The syntax reveals the roots of the initial implementations, which were done in Common Lisp, but has turned out to be quite flexible.
The KQML language supports these implementations by allowing the KQML messages to carry information which is useful to them, such as the names and addresses of the sending and receiving agents, a unique message identifier, and notations by any intervening agents. There are also optional features of the KQML language which contain descriptions of the content: its language, the ontology it assumes, and some type of more general description, such as a descriptor naming a topic within the ontology. These optional features make it possible for the supporting environments to analyze, route and deliver messages based on their content, even though the content itself is inaccessible.
The set of performatives forms the core of the language. It determines the kinds of interactions one can have with a KQML-speaking agent. The primary function of the performatives is to identify the protocol to be used to deliver the message and to supply a speech act which the sender attaches to the content. The performative signifies that the content is an assertion, a query, a command, or any other mutually agreed upon speech act. It also describes how the sender would like any reply to be delivered, that is, what protocol will be followed.
Conceptually, a KQML message consists of a performative, its associated arguments
which include the real content of the message, and a set of
optional arguments transport which describe the content and perhaps
the sender and receiver. For example, a message representing a query about
the price of a share of IBM stock might be encoded as (example is borrowed
from [10]:
(ask-one
:content (PRICE IBM ?price)
:receiver stock-server
:language LPROLOG
:ontology NYSE-TICKS)
In this message, the KQML performative is ask-one, the content is
(price ibm ?price), the ontology assumed by the query is
identified by the
token nyse-ticks, the receiver of the message is to be a server
identified as stock-server and the query is written in a language
called LPROLOG.
A similar query could be conveyed using standard Prolog as the
content language in a form that requests the set of all answers as:
(ask-all
:content "price(IBM, [?price, ?time])"
:receiver stock-server
:language standard-prolog
:ontology NYSE-TICKS)
The first message asks for a single reply; the second asks for a set as a reply.
The other performatives used in KQML include: stream-all to turn a set of answers into a set of replies, tell to send a generic information, subscribe to request all future changes to the answer to the query, advertise to announce what kind of KQML messages the agent is willing to handle, recruit to find agents that are suitable for particular types of information interchange et al.
First version of FIPA (Foundation for Intelligent Physical Agents) was based on earlier agent communication language Alcol. Unlike KQML, FIPA specification has a formal semantics. In general, FIPA is a new, developing standard that shall include the speech acts available in KQML.
KIF is the product of the Interlingua Working Group, chaired by R. Fikes and M. Geneseth (part of Knowledge Sharing Effort, KSE). The work on KIF has grown out of the recognition that an interlingua needs to be a language with the following general properties:
KIF is an extended version of first order predicate logic. The version 3.0 has the following features:
Agent system architectures provide for different types of specialized agents to operate and interact with the environments and each other.
One type of agent manages protocols on behalf of application and resources. In this capacity, the agents produce a layer of homogeneity among the heterogeneous components of an environment. The layer might be at a low communication level, heads, and front-end processors; at a semantic level, such as knowledge handlers, ontology agents, type brokers, and wrappers; or at an information management level, such as produced by mediators, routers, intelligent information agents, and facilitators.
Type brokers provide a means to manage the structure and semantics of information and query languages. They define standard types by which computations can communicate. Most of this work pertains to lower-level issues, which typically involve a set of such type brokers and a way to distribute type information. An application issues the broker to find a service and them communicates directly with the desired service.
Other types of agents access information from heterogeneous sources on behalf of users and other agents. For example, a mediator is a simplified agent that acts on behalf of a set of information resources or applications. Mediators come in a wide range of capabilities, from database and protocol converters to intelligent modules that capture the semantics of the domain and learn from the data. The basic idea is that a mediator is responsible for mapping the resources or applications to the rest of the world. Mediators thus shield the different components of a system from each other. To construct mediators effectively requires some common representation of the meanings of the resources and applications they connect, i. e. interlingua.
An agent environment comprises the set of all agent domains that fall within the range of agent-to-agent protocol and is thus potentially unbounded. The agent-to-agent protocol extends beyond a particular domain through the use of proxy agents. Proxy agents are useful in cases where two agent domains share agent-to-agent protocols but cannot communicate because they are implemented within different distributed object environments [8]. Separate agent domains, whether similar or not, may use proxy agents that communicate through sockets or through some other mechanisms. Any extension of the range of the agent-to-agent protocol beyond the bounds of an agent domain by definition uses proxy.
For an agent domain to become active, two agents must be started. One is the Matchmakers, by which agents access information about services within a domain. The other is the Domain Manager, which controls the entry and exit of agents within a domain, and maintains a set of properties on behalf of the domain administrator.
Robot systems, including autonomous mobile robots, need to achieve high level goals while remaining reactive to contingencies and new opportunities. They need to recover gracefully from exceptions and effectively manage their resources. These capabilities are often referred to as task-level control, and they form the basis of modern three-tiered robot control architectures, consisting of the planning layer, executive layer and behavior layer. In such architectures, the behavior layer interacts with the physical world, controlling actuators and collecting sensor data. The planning layer specifies, at an abstract level, how to achieve goals and how to deal with goal interactions. The executive layer mediates between the symbolic level of the planner and the continuous level of the behaviors.
Further in this chapter we compare features of some task-level robot programming languages.
Many of the robot programming languages are based or derived upon the first order predicate logic. Earlier works include Task Control Architecture (TCA) [45]. TCA combines task-level control and interprocess communication, using message passing between multiple processes to achieve concurrency. Aspects of TCA that are maintained in the derived languages include the underlying concept of a task tree, execution monitors, and the approach to hierarchical structuring of exceptions handlers.
PRS is based around the concept of a procedural reasoning expert [19]. PRS facilitates deciding what actions an agent should be doing at any given time. Both Lisp-based and C-based interpreters for PRS have been implemented. PRS is tightly integrated with a ``world model'' knowledge vase that is used to identify opportunities, exceptions and when to transition between tasks.
Logic-based action language GOLOG [23,36] has been developed to account for programming navigation, manipulation, perception, and interaction tasks for robots. GOLOG is based on the situation calculus, which itself is a dialect of the predicate calculus with three sorts: ordinary objects, actions, and situations. Situations are action histories constructed from an initial situation S0 and a special two-place function do where do(a, s) denotes the successor situation to s resulting from performing the action a. What is true at a situation is described in terms of fluents, which are predicates whose last argument is a situation.
As the main target of this work is to investigate the Internet-based HRI, we devote this chapter to the description of the standards for generic data representation over the Internet, i. e. Extensible Markup Language (XML) [53] and XML Schema [54,55].
XML is a method for putting structured data in a text file. XML is a set of rules for designing text formats for such data, in a way that produces files that are easy to generate and read (by a computer), that are unambiguous, and that avoid common pitfalls, such as lack of extensibility, lack of support for internationalization/localization, and platform-dependency. Like HTML, XML makes use of tags (words bracketed by '<' and '>') and attributes (of the form name="value"), but while HTML specifies what each tag and attribute means (and often how the text between them will look in a browser), XML uses the tags only to delimit pieces of data, and leaves the interpretation of the data completely to the application that reads it.
XML is the WWW Consortium standard since February 1998. It was derived from SGML standard, using experience of HTML. XML is simpler than SGML, more regular, and designed to be used with any kinds of structured data, as opposed to SGML that is intended to be used mostly for technical documentation.
XML is widely used for development of derived standards, covering particular domains. To mention a few, there are: MathML -- the language of mathematical markup, Scalable Vector Graphics (SVG) -- the language for description of vector graphics, XML Query -- the language designed to provide flexible query facilities to extract data from real and virtual documents on the Web, et al.
It may be the best to illustrate XML with an example from a specific domain. Here we use MathML to represent the equation x2 + 4x + 4 = 0 in two ways: first using presentational tags, then using semantic tags.
<mrow>
<mrow>
<msup> <mi>x</mi> <mn>2</mn> </msup> <mo>+</mo>
<mrow>
<mn>4</mn>
<mo>&invisibletimes;</mo>
<mi>x</mi>
</mrow>
<mo>+</mo>
<mn>4</mn>
</mrow>
<mo>=</mo>
<mn>0</mn>
</mrow>
Here mrow tag is used for representation of horizontal groups, msup to denote superscript, mi for identifier, mn for a number, mo for an operation, &invisibletimes; denotes a non-terminal symbol.
The semantic tags take into account such concepts as ``times'' ``power of'' and so on:
<apply>
<plus/>
<apply>
<power/>
<ci>x</ci>
<cn>2</cn>
</apply>
<apply>
<times/>
<cn>4</cn>
<ci>x</ci>
</apply>
<cn>4</cn>
</apply>
XML Schemas express shared vocabularies and allow machines to carry out rules made by people. They provide a means for defining the structure, content and semantics of XML documents.
The purpose of a schema is to define a class of XML documents, and so the term ``instance document'' is often used to describe an XML document that conforms to a particular schema. In fact, neither instances nor schemas need to exist as documents per se - they may exist as streams of bytes sent between applications, as fields in a database record, or as collections of XML Infoset "Information Items" - but to simplify the primer, we have chosen to always refer to instances and schemas as if they are files.
To demonstrate the use of XML Schema let us consider an example. Assume that an XML document represents the information related to the state of hardware of a wheeled mobile robot. It may look like (XML document type declaration and annotations are omitted):
<robot name="AGV1">
<wheel name="1">
<motor name="steering motor">
<position>2577</position>
</motor>
<motor name="driving motor">
<velocity>0.2</velocity>
</motor>
</wheel>
<wheel name="2">
<motor name="steering motor">
<position>-754</position>
</motor>
<motor name="driving motor">
<velocity>0.35</velocity>
</motor>
</wheel>
<wheel name="3">
<motor name="steering motor">
<position>1026</position>
</motor>
<motor name="driving motor">
<velocity>0.21</velocity>
</motor>
</wheel>
<wheel name="4">
<motor name="steering motor">
<position>175</position>
</motor>
<motor name="driving motor">
<velocity>0.16</velocity>
</motor>
</wheel>
</robot>
The XML Schema corresponding to this XML document may be presented as follows (XML document type declaration, the standard XML Schema wrapping and annotations are omitted):
<xsd:element name="robot" type="Robot"/>
<xsd:complexType name="Robot">
<xsd:element name="wheel" type="Wheel"
minOccurs="4" maxOccurs="4"/>
<xsd:attribute name="name" type="xsd:string"/>
</xsd:complexType>
<xsd:complexType name="Wheel">
<xsd:element name="motor" type="SteeringMotor"
minOccurs="1" maxOccurs="1"/>
<xsd:element name="motor" type="DrivingMotor"
minOccurs="1" maxOccurs="1"/>
<xsd:attribute name="name" type="xsd:string"/>
</xsd:complexType>
<xsd:complexType name="SteeringMotor">
<xsd:element name="position" type="xsd:decimal"
minOccurs="1" maxOccurs="1"/>
</xsd:complexType>
<xsd:complexType name="DrivingMotor">
<xsd:element name="velocity" type="xsd:decimal"
minOccurs="1" maxOccurs="1"/>
</xsd:complexType>
In XML Schema, there is a basic difference between complex types which allow elements in their content and may carry attributes, and simple types which cannot have element content and cannot carry attributes. There is also a major distinction between definitions which create new types (both simple and complex), and declarations which enable the appearance in document instances of elements or attributes with specific names and types (both simple and complex).
New complex types are defined using the complexType element and such definitions typically contain a set of element declarations, element references, and attribute declarations. The declarations are not themselves types, but rather an association between a name and constraints which govern the appearance of that name in documents governed by the associated schema. Elements are declared using the element element, and attributes are declared using the attribute element.
It has been recognized that a sophisticated agent architecture is necessary for a HRI over the Internet, to account for inherent problems of remote teleoperation of robots. A task-level description language should be utilized. While the known languages are powerful and convenient for expressing the information related to their respective applications, a simultaneous utilization of a robot programming language, agent communication language and knowledge representation language in a single application, as it would be necessary for an agent-based HRI, creates significant difficulty for a human operator.
There is a need for a language for the purposes of Internet-based HRI applications that would unify the languages for agent communication, robot programming and knowledge representation, while being transparent for a human with minimal training. Namely, the language should:
It has been chosen to use XML as a base for development of such common language for robotic applications.
The implementation of a proxy-mediated agent-based architecture for HRI via the Internet and of a common markup language for robotic applications were suggested and described in detail in a respective research paper: M. Makatchev, S. K. Tso, ``Human-Robot Interface Using Agents Communicating in an XML-Based Markup Language,'' IEEE Intl. Workshop on Robot and Human Communication, September 2000, Osaka, Japan (accepted).
This document was generated using the LaTeX2HTML translator Version 2K.1beta (1.55)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -no_navigation -no_math -html_version 3.2,math comprehensive5.tex
The translation was initiated by Maxim on 2001-06-23