The Engine Framework operates between knowledge and adapters, between an agent's internal representation of the world and its interactions with the external world itself. The word engine is commonly used to describe a broad range of decision making technologies, such as script interpreters, linguistic analyzers, rule interpreters, and many other information technologies. The Engine Framework provides a structure for all these engine types but remains technology-neutral. For instance, this framework does not specify the use of any decision technology as required. However, this release of the toolkit is currently targeted for supporting rule interpreters.
The Engine Framework is based on the event-condition-action paradigm, which operates across this boundary. The parts of the paradigm are as follows:
Events flow from the adapter into the engine as trigger events. All events generated by adapters have the same format. However, the response to an event is dependent on the engine type. For example, an inference engine will use the event to start an inferencing episode. A learning engine will use the event as a learning opportunity.
When the engine needs more information to handle an event, a sensor call will be made from the engine to the adapter. The adapter will get the requested information and return it to the engine.
As part of processing an event, the engine may decide that it is necessary to change the outside world. To do this, an effector call is made from the engine to the adapter. It is then the adapter's responsibility to carry out the request of the engine.
These interfaces are paralleled by the Adapter Interface, through which an engine controls applications. In fact, this is the point of the engine interfaces; engines can work with each other just as if they were objects under an Adapter Interface. Each engine can specify its semantic interface, which allows any other engine to process its events and interact with it according to the dynamic configuration allowed by run-time binding. In fact, most engines will be under the configuration and policy control of rules within an inference engine.
The engine framework in the Level 6 of the IBM Agent Building Environment Developer's Toolkit is represented by the most generally required engine type, a rule-based Inferencer, but the framework can be extended to include many other types such as for learning, which is also becoming generally required. This section explains the different engine types and explains how such other engines can be added, so that Engine developers can begin to working with the other IBM Agent Building Environment Developer's Toolkit components.
Every agent contains a chainer engine, of one sort or another. A chainer observes events and interacts with the environment through Adapters. A rule-based inference engine is the most common example. Although this type is labeled Chainer, there is no presumption that it uses inferential chaining. This engine type simply chains the event to some network of symbols, nodes, other engines, or whatever, eventually chaining back to sense or effect the world through an adapter.
Very often an application will also need an Executive engine, which observes events but does not provide the core processing function of a Chainer. Instead, an Executive is responsible to manage specific problems outside the role of the Chainer but manages one or more of them. For example, a UserIterator engine within a multiuser system receives a single event from the environment and steps a rule-based inference engine through the rules of each user.
An Executive engine has two sets of interfaces. As an engine that communicates with adapters, it observes events and requests sensors and effectors. As an event source to another engine, it generates events and acts as if it where an adapter to its down-stream engines. Specifically, an Executive engine mimics an adapter, taking its single eventId and exploding it into a series of events (such as for each user in the UserIterator). The Chainer can directly call any other adapter directly, but it cannot return the eventID back to the original adapter because the Executive has changed it. The Executive must intercept the request to the original adapter, resolve the user-specific eventId back to the original single eventId and pass the request through to the adapter.
An Executive can control one or more other engines; therefore, it is often useful to intercept the decisions of several different engine types in order to resolve conflicts or eliminate redundant actions on the same event context. An Executive can manage other engines in parallel, or may be designed to serialize them such as when following hypothesis generation (by a generalizing learning engine) by safety checking (rule-based filtering). This complexity is not required in all cases; therefore, the interfaces for such Adapter mimicry are optional.
The internal technology of Executives is not specified. They can be hard-coded, script-based, state-controlled, or whatever.
In Adapter terms, Analyzer engines provide sensor functions. Given a set of facts, they translate, filter, map, transform or otherwise generate new (derived) facts. Just as an Adapter can provide several different sensors, each Analyzer registers and provides one or more analytical procedures. For instance, a linguistic analyzer might provide the following derivations, given a long string of text:
With such an Analyzer, a rule's antecedent can be written at a higher level -- "if the article contains keyword 'Lotus'" can be raised to "if the article is about automobiles", whether or not "automobiles" is explicitly contained as a keyword. Analyzers provide a sensing function to other engines, but they are not Adapters because they do not truly interact with the external application world. (There are other complexities which are mentioned later.) Their mapping or evaluative functions tend be entirely stateless, self-contained, and are wholly internal to the agent composition. Also note that linguistic Analyzers are very sensitive to National Language considerations.
Analyzers can be thought of as between Adapters and Engines, and can serve the needs of either. For instance, category-from-keywords might be geared to the application of electronic commerce adapters. On the other hand, some engines might require special transformations such as one-of-N or "thermometer" codings. Analyzers can also be image-based rather than text-based, but in general, their functions cut across the application focus of specific adapters. Their analyses are based on their argument types.
Analyzers have a much simpler interface than Adapters. Like Adapters, they must register their procedures when asked to identify themselves, but otherwise, they only provide sensor-like, synchronous queries. Like Adapters, Boolean sensing is a special case, allowed for efficiency. Note that the reference to IAEventHeader is still required; Analyzers provide stateless, static functions based only on the binding strings they are given, but the event header contains fields that might become critical, such as request priority during real-time inferencing.
The interface is otherwise identical to the Adapter terminology. The use of "sensor" within function names emphasizes the similarities between the two and will ease migration for engine developers from the Adapter Interface to this. Analyzers provide internal "sense".
Monitor type engines can range from trivial loggers to the most sophisticated learning algorithms. In general, they attach and watch an event stream. However, the semantics and control of learning (for example) is much more complicated than the generic IAEngine::observe(TriggerEvent) function. While it is architecturally allowed that any engine can implement itself as a full IAEngine, it is preferred that each monitor focus on its core functionality and leave all of the application authoring control to rule-based configuration. Not only will this approach provide more application flexibility, implementing a full IAEngine is a lot of work, which can be saved and better used to develop the Monitor's core functions.
A lot of authoring is required around Monitors. For instance, core learning technologies (such as a statistical analysis or neural network) do not always evaluate the relevance of an event or the contiguity of cause and effect in real-time. This application specific knowledge must be authored. Something other than the associative algorithm must specify when to learn and what to learn. What events are indeed important to the application and would be valuable to predict in the future or merely remember? What are the relevant attributes for attention? In some regards, answering this question is the job of learning technology, but the stripping of attributes can help the Monitor better focus. This "authoring" is also true of real neural systems, which demonstrate instinctive behaviors and -- even in learning -- a required "preparedness" to learn.
As well, a learning engine would be trivial if it only watched; it should also do something to be of value. Learning engines are most valuable when the generalize their knowledge to new situations -- but this can be a very dangerous benefit! Simply because an agent predicts that a user will delete a mail item does not mean that it should in fact delete it. To the agent, it might as well delete it as place it in a "Junk Folder". The decision of what to do must be authored by the application developer (configuration rules) or end-user (personal rules).
This is a general principle of knowledge and the control of engines within the IBM Agent Building Environment Developer's Toolkit. Knowing something does not necessarily imply doing something. Predicting an action certainly does not imply doing the identical action. In fact, prediction of an event is often useful for controlling avoidance of the event. Therefore, the control of Monitors must be provided by explicit authoring/instruction, such as through a rule-based Inferencer.
Monitors designers will often also want to generate events, although this is entirely optional. By notifying the engine composition that some confidence threshold has been reached or that a new link has been formed, for examples, some consequence can be attached to it. The user can be asked a question about whether the agent should automate some step in the future, or the agent may pursue some any other course of action as it might be instructed.
The Monitor interface is more complex than an Analyzer and requires almost the entire Adapter Interface. A Monitor will provide write services, which will be used when the Monitor should watch an event stream and record important information. The performAction function allows a Monitor to express its write services. Different forms of writing, even within the same Monitor, might require different arguments as well and therefore different effectors should be defined. For instance, some forms of learning differentiate between the simple observation of an object (for sensory correlations) and the learning of cause and effect (for predictive contiguities).
A monitor will also provide read services, which will be used when the Monitor uses its stored information to respond to queries. The answerQuery function allow for the Monitor read services to be expressed. For instance, the application might require pattern matching; given a partial set of facts, many learning/memory engines can complete a pattern of facts. On the other hand, predicting an event based on the partial set of facts can be another, different function.
Finally, the notify function is optional, but Monitor engines will tend to implement it. Monitors will often reach internal critical states, which are potentially important to the agent system. For instance, as learning slowly develops case by case, reaching some level of confidence might provide a valuable opportunity, such as asking the user about automating some action which is now confidently predicted. Again, rule authoring of such agent behavior is suggested. The Monitor's responsibility is to make notice of such an event. The consequence of the event is the responsibility of the application.
These engine types can be configured into several common design patterns. Only a few of the possible configurations are reasonable, ranging from a single Inferencer to a mixture of hybrid types and controls. The composition of these engines requires that the knowledge of one engine be expressed in terms of knowledge and inferential processes of another.
The simple composition is one Chainer, which observes events and calls adapters for sensing and effecting. This is the current level of the Level 6 of the IBM Agent Building Environment Developer's Toolkit, where the Chainer is a rule-based inference engine. The strongest requirements for IA systems are for rule-based inferencing; therefore, it has been provided first.
The reflective composition is one Chainer, supported by an Analyzer. This composition is called reflective because the agent's decision are based on the state of another internal component.
The hybrid compositions consists of one Chainer, supported by an Analyzer and Monitor. Although hybrid technology systems are still in the minority, this is an emerging trend. A complete agent is constituted from a set of different, complementary technologies. In most applications, both rule-based and learning-based requirements coexist; users need to give explicit instruction in some cases. In other cases, the user cannot or does not want to give instruction. While the Chainer can control the operation of the Monitor, the value of rule-based instruction is available in its own right. Obviously, such complexities should be made seamless at the user interface so that the hybridization is not a hodge-podge of user controls. This is the responsibility of the View Framework, but rule-based composition and control of these various other engine types allows the perceived integrity of the agent to the user.
Agents will often be built as multi-user services, which adds complexities for managing different knowledge sets for different users. When the agent receives a single event from an adapter representing a common source and needs to service this event for many users, executive control is required, which is not the responsibility of inference or learning engines.
In this composition, an Executive will control a Chainer. The Executive will receive a trigger event from an adapter. The Executive will then explode that single trigger event into multiple inferencing episodes, one inferencing episdoe for each user supported by the Executive. For this and other such problems, the Engine Framework will expand and allow for other basic types and more complex compositions.
The Knowledge Framework is not truly an architectural "layer" with a well-defined API and SPI. Within an engine, knowledge is used by engines to do their processing and this knowledge is stored by the library. Because of that role, to understand the Knowledge Framework, you need to look at the Engine Framework and the Library Framework.
Each object in the library store is named and typed for identification. Naming is controlled by the user of the library through the administrative process or rule editor, for example. Persistent library object typing is accomplished automatically and implicitly by the library based on the library object (in memory) and the context that is used to create the persistent storage object. Because the library provides the means for users to scope names, selection of unique names is made simple. Name scopes are based on hierarchical groupings of inferencing objects. These same groupings provide control points for secure access and consistency for concurrent access to persistent objects in the library.
Rules and facts in the library can be selected by name, as described above. Additionally, you can control the state of rules and facts, where such state can affect which rules or facts are loaded for inferencing. Changes in state can also be affected by the consequent of a rule firing; thus one rule could affect the loading of subsequent rules.
Local access to the IBM Agent Building Environment Developer's Toolkit Library is through a simple set of local objects in agent memory. Transparent to the library user, access to local or remote library repositories is accomplished through these local objects. These local library objects are for library access, not directly for inferencing. They are used as a source for building knowledge sets which are used for inferencing.
To allow for multiple instances of metadata for a single library object, metadata can be named. The scope of the name is limited to the scope of the object to which the metadata is associated. Multiple instances of metadata allow you to describe the same inferencing data in different ways. You can use this to set up conventions whereby the same objects can be shared, interpreted, or used in different contexts by different programs or users.
Note: Although multiple instances of metadata are supported at the Collector, RuleSet, and LTFactSet levels, we are still examining the feasibility of multiple instances of metadata at the lowest level, e.g. the Rule or LTFact level. Although it may be nice to allow for multiple rule editors of the same set of rules, it is not clear how these editors would coordinate changes, especially concerning their own versions of metadata. Therefore, while we allow for multiple editors of the same rules, multiple instances of Rule or LTFact-level metadata are not yet permitted. (If we had two rule editors, each with its own metadata for the same set of rules, it is not clear what happens when one of these editors adds a rule to the set. Even though the editor that adds the rule would be able to add its own metadata to the rule, the second editor might not know how to retrofit its different metadata to the new rules. Metadata would tend to naturally fallout of the authoring of a new rule. However, retrofitting metadata after rule creation would seem to be unnatural.)
The View Framework includes all interactions between the user and the agent (and between the user and the Library, not formally contained by the agent). Such views cover a wide scope of issues including:
Moreover, the View Framework will address a number of advanced topics such as
While the full framework covers this scope, the initial focus of this document is on agent instruction, which includes administration and some underlying services.
The word "editors" is used loosely, because rule editors have mostly failed for common end-users. Rather than have users write rules (which they don't do), some researchers claim that agents should just watch and learn. This approach is very true, but the View Framework is more open to the range of knowledge technologies than such a single paradigmatic claim. For instance, many forms of knowledge, such as corporate or departmental policy, are rule-based by nature. Some instructions to an assistant can be explicit and well stated. Sometimes, there is no other method of instruction; an agent cannot learn how to handle office jobs when a user is on vacation; the agent must be told. On the other hand, learning is in fact a primary form of instruction, and so the View Framework's "editor" must be able to provide and to simplify the underlying complexity -- from single inference engines to the hybrid combination of many engine types.
This framework focuses on the notion of an agent dialog or "smart guide". Similar to the Engine Framework's use of rule-based composition -- even around the inclusion of learning -- the View Framework uses instruction as a primary notion for the following tasks:
This philosophy of instruction allows the seamless use of smart guide dialogs from application integrator to administrator to end-user -- based on a standard KIF representation language.
Adapters make absolutely no assumption about viewers and editors. They are responsible only for their semantic interface, which is symbolic. Any association between adapter symbols and end-user terminology is provided by the view framework.
Adapters will tend to require some administrative viewing and control. For instance, some adapters such as for e-mail will require the end-user's password ( depending on the system's security model) in order to act as an autonomous agent. The collection and maintenance of such data is the responsibility of the View Framework. Installation and removal of agent components in the operating system's registry is also a primary administrative task, which must be addressed.
In future releases of IBM Agent Building Environment Developer's Toolkit, when learning engines are also provided, adaptive user modeling will require that adapters provide more and more semantic events. For instance, MAILARRIVED event is driven by the mail system, not the end-user. It is the critical event for agent automation, but other events such as OPENED, CLOSED, and PRINTED would need to be delivered by an adapter for agent learning. Special adapters built specifically for end-user interaction are also required. For instance, a rule might require the "agent" to ask the user a question (a sensor) or deliver a message (an effector). Some special events such as COMMAND can be sent directly to the agent. The user interface to these functions and the semantic interface to the agent are provided by specialized adapters.
These last two requirements for the user-agent runtime dialog are not the focus of this document, however. The initial needs from the View Framework are for rule-based instruction.
For any type of knowledge representation, viewing and explicit editing of knowledge are performed through the Library. As much as possible, the Library is based on standard knowledge representations such as KIF, allowing any viewer or editor to plug-and-play in the View Framework, so long as it works against the KIF format. In the same way that Engines are free to convert KIF into a parochial format for their run-time if they choose, rule editors might choose to map KIF into their own format to best manage the presentation.
As a service to Views, the Library can also store any view specific metadata associated with a rule. The KIF representation contains user provided values, but otherwise is entirely symbolic and formal. Unfortunately, some views may lose data when "compiling" to KIF format; therefore, the Library can be used to store such additional data as needed. For instance, natural language mapping from a rule to a more "common language" expression can be stored as metadata.
Other relationships such as to the Engine Framework are also required and will be elaborated in future releases.
The gamut of design criteria for the View Framework is as large as the framework itself. Of course, user interface design criteria are primary. These include issues of panel design and the special issues involved with user-agent interaction. Also, IA engenders new system design issues: For instance, an agent that polls every five minutes for new mail might be disconcerting to the user who assumes the agent acts immediately when mail arrives; the user might see an example of junk mail that he/she knows should be filtered by the agent -- but the agent has not yet "seen" this item.
Aside from all such other issues, there are the design goals of the View Framework itself:
Examples of these design points are provided in the following components, which are based around the natural metaphor of user-to-agent instruction.
Note: These components are not provided in the Level 6 of the IBM Agent Building Environment Developer's Toolkit, but they give some sense of direction for how the IBM Agent Building Environment Developer's Toolkit will grow.
Rule authoring can be done with a composite client program for administrators and end-users. This program will allow creation of rules rules that configure an agent and creation of rules that instruct the agent. While administrators and end-users will author different sorts of rules, there is little difference between construction and instruction of the agent.
Rule authoring will use a combination of these advanced presentation techniques:
Form-based and other graphically-based methods of direct rule authoring by common end-users have generally failed. Many rule editors have been very well done. For instance, some graphical editors do not require the explicit expression of "and" or "or" by making these functions implied in the sequential/parallel arrangement of nodes. Some form-based editors try to simplify the problem by disallowing any term nesting, and even disallowing OR between terms (all terms are ANDed; another rule can be made to OR another case). But end-users simply do not want to write rules. Not only is this a secondary task to doing business, it is the very hardest work in IA.
Common language can be used to suggest rule as templates; users must merely fill in the blanks if they like the overall function of the template. However, even this is a hard cognitive load. The View Framework provides both the common language and smart guide methods in concert with each other.
This is the script which runs the dialogs. Dialogs tend to be presentation- and media-independent. The Frame classes mentioned below assume a graphical user interface, but the nature of dialog allows the user-agent interactions to be played over a telephone as well.
Stylistically, each instance of a dialog should be a relatively small, modular set of questions about a specific context, such as deleting junk mail. Each dialog is associated with that context.
InstructionDialogs are controlled by an interpretive engine.
The rule authoring program must be scalable. It must handle the instruction of a simple agent with only one or two adapters as well as a monolithic personal secretary agent that handles virtually all of a user's office objects. To achieve this, each InstructionDialog is modular and associated to a particular object and event. The event type will tend to define the context for the dialog such as "Let's talk about what you want me to do when mail arrives." "Do you get a lot of junk mail from your manger?" "Who is your manager?" What is the typical subject of this junk mail?" This list of context can be managed in terms of adapters and adapter events, or otherwise include an item for other tasks such as administration.
The context list frame is not only scalable; it will also be context sensitivity. The starting point can be specified so that any particular dialog can become the initial focus. This allows two flavors of sensitivity:
The modularity of InstructionDialogs allows users to incrementally build the agent's repertoire. It allows the user to incrementally build up trust in the agent. For instance, the user might use the components initially through only one of several dialogs. This is similar to getting a secretary. As this seems to go well, the user can return to give more instructions.
While KIF is used as the standard knowledge format for rules, KIF syntax is very far from natural language of common end-users. The mapping from a KIF rule to a common language representation is defined by a RuleTemplate. The major part of this mapping is for the location of user-specified values between the two forms.
As the user works with the dialogs to build rules, the common language representations of those rules can be displayed in an "Agent Instructions" list. In other words, the dialogs can display a fair representation of what it "thinks" it needs to do as a result of the dialog. This is one form of confirmation, an important feedback mechanism in any dialog. The confirmation re-expresses the dialog in another format. The dialog itself and the instructions it generates are different but related. The user can edit an instruction by selecting it, which will place the user back in the dialog. Direct viewing of the KIF structure will be allowable through context menu.
Given the absolute separation of models from views, Adapters and Engines need only specify their symbolic interface. They do not maintain terminology resources. This is the responsibility of the EndUserLabel Dictionary as one of the underlying presentation services of the View Framework. Its use is not required by any view but is helpful to any views that present events, conditions, and actions in lists or other highly structured forms. It manages associations between the adapter/engine symbols and end-user strings (defined by National Language Support or other customization to particular end-user needs).
Whenever an end-user specifies a particular value such as a boss' name, an important customer project name, even numeric data such as rate limits, these values are assumed to be important and potentially reusable. For instance, a phone number specified in one rule might also be used in another rule. Any editor can use this dictionary for providing these already-used terms as suggestions.
This service leads to CommonTermManagement. For instance, heavily reused literals should be made into variables -- through conversation with the user. Once a variable is defined, such as MYPHONENUMBER is XXX-XXXX, the variable can be edited rather than changing the literal values through all the dialogs. This service will be especially important to learning engines, which can automatically track the relevancy of terms as users go about their business.