Imagine a world where every computer is equipped with an artificial intelligence by default. Probably in the form of a large language model the artificial intelligence would be installed as part of the operating system and already be set up to help the user work with his computer.
In that type of ecosystem an application that could provide enabling functionality to the artificial intelligence needs a way to register its capabilities with the operating system. The primary objective of this project is to facilitate this in a way that does not require coding so that an artificial intelligence enabling application can, during installation, let the operating system know what it is capable of.
Once the AI knows what an application is capable of, it can launch the application and make a call to it when the user is asking for something that the enabling application can do. A simple example of this might be an email client. The user might ask the AI to help him write an email. The AI could then launch the email application and set the email subject; this is an example of taking an action based on a capability.
An AI enabling application has its own user interface. It can appear to the user to be just another application installed like any other in the Windows environment. The email client would have a form, with recipients, and the subject, and the email body and a send button. It could be used directly, the user doesn't have to have an AI to use it, but the AI can use it too. The way the AI communicates with such an enabling application is not by the same means that the user uses. In other words, the AI does not need to see the screen and move the mouse and type on the keyboard. The AI enabling application provides it with a sort of “backplane” application programming interface that the AI can call.
So now we have closed the loop. The AI is in place and provides a programming interface that an AI enabling application installer can use to register its capabilities, then when the user asks the AI for help with something that the AI enabling application can provide the AI has a way to make an API call back to the application.
This type of ecosystem is powerful because:
- The artificial intelligence does not have to be installed with or configured into each application.
- Applications can be written assuming that an artificial intelligence is in place and can be leveraged.
- Applications can be written knowing that they will be automated by some other intelligence than the user.
- The ecosystem can continue to grow and evolve as new enabling applications are installed, the AI is given more and more tools to work with over time.
- The AI could even orchestrate multiple applications to do things that none of them could do alone.
Okay, so how might this work? The operating system that hosts the large language model needs to make public an application programming interface of some type that application installers can use to register capabilities. This could be done in a number of different ways; I chose to go with simple http protocol networking. Each program is both a user interface and a web server.
Going forward I will refer to the application that facilitates communication between the large language model and the enabling applications as the Broker.
The Broker has a UI that is sort of like the chat interface we've seen in other large language model applications like Chat GPT. But instead of taking the user's request and simply responding with text it is designed to turn the user's request into a list of actions to take. The LLM is instructed in its system prompt and through examples what actions it has available and how to use them. When a user requests something the LLM will generate a list of actions to take and the Broker will be able to identify these actions, relate these to registered capabilities, and call the APIs of the enabling applications.
The list of actions that the LLM generates should be concise but in order to be flexible the Broker uses semantic matching between the LLM’s response and the registered capabilities. This is accomplished by getting embeddings. These embeddings are generated by another model, one designed to do so. In this implementation the model that provides the embeddings runs on the AMD NPU. Embeddings are generated very quickly and without impacting the CPU or the GPU, then the vectors are compared by CPU to resolve which capability to use to satisfy an action generated by the LLM.
Capabilitiespublic enum ActionType
{
LAUNCH,
HTTP,
UI
}
public enum MethodType
{
GET,
POST,
PUT,
DELETE,
NA
}
public class ApplicationCapibility
{
public string Action { get; set; }
public string AppClass { get; set; }
public string AppPath { get; set; }
public ActionType ActionType { get; set; }
public string Description { get; set; }
public string Route { get; set; }
public MethodType Method { get; set; }
public string ContentType { get; set; }
public string Contract { get; set; }
public double[]? Vector { get; set; }
}
A capability simply maps an Action (words that together have some semantic meaning) to an API call. Let’s consider the action that sets the subject of an email.
{
"Action": "set email subject to []",
"AppClass": "email",
"AppPath": null,
"ActionType": 1, //HTTP
"Description": "Set the Subject of an EMail",
"Route": "subject",
"Method": 1, //POST
"ContentType": "application/json",
"Contract": "SingleText"
“Vector”: [-0.01321471,-0.0028538294,0.0075712195,-0.023238236, many many more]
}
In the action, the square brackets indicate a place to put a parameter. When the Broker receives a command similar to “set email subject” it can semantically match that to this capability, then it can execute the capability, but first we have to look at a launch capability to understand how it knows what API to call.
{
"Action": "start email application",
"appClass": "email",
"AppPath": "C:\\AIE\\AIE_Email\\Email.exe",
"ActionType": 0, //LAUNCH
"Description": "launch an empty email in the email app",
"Route": "",
"Method": 4, //NA
"ContentType": "",
"Contract": ""
}
The LAUNCH ActionType tells the Broker that it needs to launch an application. It dynamically keeps track of the ports that the target application's web servers need to run on. This “port map” is used to know what applications have been launched on what ports, and to know what port to tell the application to launch on next. An artificial intelligence enabling application has to be written to accept a port as an argument, so it knows what port to stand up its web server listening on.
RegistrationAI enabling applications register their capabilities during installation. I picked port 7770 as the starting port for the project. The embedding model API runs on this port. The next port, 7771, is the port the Broker listens on. It makes available a capability API that the enabling apps can POST a capability to. In this example the Email app will register its capabilities with the Broker if it gets “-install” as argument 1.
HttpClient sender = new HttpClient();
sender.BaseAddress = new Uri("http://localhost:7771/");
var capability = new ApplicationCapability()
{
Action = "start email application",
ActionType = ActionType.LAUNCH,
AppClass = "email",
AppPath = Application.ExecutablePath,
ContentType = "",
Description = "launch an empty email in the email app",
Contract = "",
Method = MethodType.NA,
Route = ""
};
var url = "capability";
var json = JsonSerializer.Serialize<ApplicationCapability>(capability);
var content = new StringContent(json, Encoding.UTF8, "application/json");
var response = await sender.PostAsync(url, content);
capability = new ApplicationCapability()
{
Action = "set email subject to []",
ActionType = ActionType.HTTP,
AppClass = "email",
ContentType = "application/json",
Description = "set the subject of the email",
Contract = "SingleText",
Method = MethodType.POST,
Route = Constants.SUBJECT_KEY
};
json = JsonSerializer.Serialize<ApplicationCapability>(capability);
content = new StringContent(json, Encoding.UTF8, "application/json");
response = await sender.PostAsync(url, content);
NOTE: It is important that the LAUNCH capability be registered first.
The Email app will also register a capability to set the recipient and append the email body. After the capabilities are registered the Broker should be able to make semantic matches.
CompilationIn this screenshot the 'Response' text area is where the response from an LLM would be displayed. This response can be edited by the user before the Broker attempts to compile the actions. In this case I just typed some actions into the response. Note how I intentionally typed in actions that are semantically similar to capabilities but not the same syntactically, and the compile step still picks the best capability to execute. "open a new email" was matched to "start email application" and "set [] as the email subject" was matched to "set email subject to []".
During compilation:
- Each line of the response is considered to be an action
- The action is scanned for square brackets "[]", the text inside is removed and saved as a parameter to be reinserted into the capability chosen for execution
- A vector is retrieved for the action from the embedding model
- The vector is compared to the vectors of all capabilities generating a value from 0 to 1, 1 being the closest semantic match
- The comparison values are ranked
- The comparison can be a match > 0.998, Weak < 0.998 and > 0.5, Fail < 0.5 or Ambiguous = the text is a close match to more than one capability
- The compilation results are queued for execution
When an action is executed, the Broker looks up properties of the capability and determines whether or not it needs to launch an app or send a POST over an HTTP connection. (These are the two action types supported at this time.) When it's launching an application, it looks to the port map to see what the next port available is. It then reserves that port for the app class. The following commands that use that app class can look to that reservation in the port map to build up the URI which includes the route.
When the application is launched the Broker will add the port as a command line option. This is how the application knows what port to listen on when it launches.
Execution is serial. When an app launches for an app class all commands for that app class are sent to that instance. If the same app class is launched again all subsequent commands for that app class will be sent to the new instance.
LLMTo this point we haven't discussed the large language model or how it can generate actions. I interface with the LLM via a nuget package called Microsoft.ML.OnnxRuntimeGenAI. This package gave me a way of working with generative AI in C#. The model has to be prepared by a utility, but once prepared using it is very straight forward to work with.
We would load the model at startup:
_Model = new Model(modelPath);
Then after that we can run prompts:
_Text = string.Empty;
using var tokenizer = new Tokenizer(_Model);
var tokens = tokenizer.Encode(_prompt);
var parms = new GeneratorParams(_Model);
parms.SetSearchOption("max_length", 8192);
parms.TryGraphCaptureWithMaxBatchSize(8192);
parms.SetInputSequences(tokens);
using var generator = new Generator(_Model, parms);
using var tokenizerStream = tokenizer.CreateStream();
while (!generator.IsDone())
{
generator.ComputeLogits();
generator.GenerateNextToken();
var tokenId = generator.GetSequence(0)[^1];
var sentencePiece = tokenizerStream.Decode(tokenId);
_Text = _Text + sentencePiece;
}
"It would be amazing if the OnnxRuntimeGenAI team or AMD could enhance this package to run inference on NPU. We want to run workloads on NPU in Windows and in Windows the development tools of choice are usually provided by Microsoft and the language of choice is C#." - July 2024Prompt Engineering
The LLM has to be given some pretty specific instructions in order to generate actions that can be matched to capabilities and executed. This is the system prompt:
You are the helpful assistant who generates a list of actions to be executed in the correct order to control a user's personal computer. You only respond with a list of actions with concise syntax based on a list of action templates. You do not have to use all templates, pick the best ones to satisfy the user's request and no more. When generating an action you may substitute a question mark in the action template with a phrase of your own, you can generate creative phrases when doing this. When generating a replacement in the template for a ? surround it with [] punctuation. Do not number the list, the syntax should be concise. The allowed action templates are as follows.
Then the capabilities are listed.
As detailed as this is, with this instruction alone the LLM is rarely able to generate a good list of actions. The parameter substitution inside the square brackets is especially hard for the LLM to understand. Additionally, the format of the parameters can differ from capability to capability.
Examples in the PromptThe best way I have found to teach the AI how we want its response formatted is by giving it examples in the prompt as if we had been having a conversation and it had generated previous responses. Consider this prompt:
<|begin_of_text|>
<|start_header_id|>system<|end_header_id|>
You are the helpful assistant who generates a list of actions to be executed in the correct order to control a user's personal computer. You only respond with a list of actions with concise syntax based on a list of action templates. You do not have to use all templates, pick the best ones to satifiy the user's request and no more. When generating an action you may substitute a question mark in the action template with a phrase of your own, you can generate creative phrases when doing this. When generating a replacement in the template for a ? surround it with [] punctuation. Do not number the list, the syntax should be concise. The allowed action templates are as follows.
respond to user with [?]
start column chart application
set column chart title to [?]
add column chart series [?]
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
create a chart that shows the number of days in each month of the year
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
start column chart application
set column chart title to [Days in Each Month of the Year]
add column chart series [January=31:February=28:March=31:April=30:May=31:June=30:July=31:August=31:September=30:October=31:November=30:December=31]
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
create a chart that shows the planet's sizes
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
start column chart application
set column chart title to [Planets in Our Solar System - Size Comparison in km]
add column chart series [Mercury=4879:Venus=12104:Earth=12742:Mars=6794:Jupiter=142984:Saturn=116464:Uranus=51118:Neptune=49528:Pluto=2374]
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
User's Name = Robert Henry
Current Date = 7/7/2024 1:02:03 AM
chart the population of the continents in millions
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
Given this information the LLM generates:
start column chart application
set column chart title to [Continents Population in Millions]
add column chart series [Asia=4,600:Africa=1,300:Europe=740:North America=580:South America=430:Antarctica=0:Australia/Oceania=40]
Which gives us a nice chart:
The format of the chart series data is pretty distinct:
[Asia=4,600:Africa=1,300:Europe=740:North America=580:South America=430:Antarctica=0:Australia/Oceania=40]
The LLM knows to generate it this way because of the examples in the prompt.
Given the importance of examples we need a way for AI enabling applications to provide examples of how to use their capabilities. So, the Boker has an example endpoint that the applications can call to register examples during installation, just like it does for capabilities.
Installing- Install Ryzen AI Software 1.1 Installation Instructions — Ryzen AI Software 1.1 documentation (amd.com)
- Make sure 'python quicktest.py' works
- Clone the RyzenAI Software github repo amd/RyzenAI-SW (github.com)
- Get the Resnet image classification example working Getting Started Tutorial — Ryzen AI Software 1.1 documentation (amd.com)
- Clone my Embedding repo rhenry74/Embeddings (github.com)
- You can get the model used to generate embeddings here: WhereIsAI/UAE-Large-V1 · Hugging Face
- Quantize the model by running quantize.py
- Carefully read and follow the steps laid out in this closed issue on the RyzenAI Software github pages Will only some models run on IPU? · Issue #92 · amd/RyzenAI-SW (github.com)
- You can download the wheel that AMD made available to me here: AIE
- Also, starting with a directory (on C:) called AIE, recreate the structure and files at this same link: AIE
- You can get the ONNX GenAI Llama 3 8B Instruct model that goes in the LLM folder from here: FusionQuill/Llama-3-8B-Instruct-Onnx at main (huggingface.co)
- Clone my AIE repo rhenry74/AIE: Hackster.IO and AMD Pervasive AI Developer Contest Artifacts (github.com)
- I build the C# code with Visual Studio 2022
- Watch the video below to see how the apps register with the Broker
- When you are ready to use the broker run the Embeddings API as shown in the video
No one said it would be easy ;-)
By providing an ecosystem that supports applications that can be automated by AI this solution would open the door to users that are keen to leverage AI in daily workflow, vendors that want to create AI enabling applications and hardware manufactures providing AI support to all thrive at the Operating System level.
Dial in here for a walk through:
Comments