The Slesforce developer creates a locally operating LLM assistant on your device -

I have tried local LLMS inside the Salesforce and I would like to tell you about the component I developed as a result. It contains the already familiar chat interface that uses salesforce records for context. It works locally on your computer, so the processing data is not sent to any third -party service.

The input of Agentforce was what affected me to develop the component. Agentforce uses agents – systems that can make decisions and implement various action. Help, in contrast to that, processing information only interactively. Although I think it is possible to build a local agent using Pico LLM, it will take a tremendous effort. Thus, I decided to develop an assistant instead.

Features

You also expect LLM to work, it generates responses on any topic, because it is equipped on a wide range of data. Moreover, it is able to use Salesforce records for an additional context. The ingredient features are:

Supports multiple models. Any open source model can be used from the PICO website, such as GEMMA, Llama or Phi. The only restriction here is the amount of random access memory that your computer has. The more the model weighs more of the RAM it consumes.
It works with one record. When the component is placed on the registration page, it is able to access the record for the context. For example, being on the account record details page, he can create a response based on its field values.
Supports relevant records. When the record has relevant records, the component can query and integrate into responses.
Constitated. The component can be formed during flying, using a composition popup. It is allowed to change obstetric options, such as reducing the distinctive symbol, temperature, and higher P.

How to work

From the ultimate user point of view, the process is clear. You download a form, select the system router, select the records, write the user’s mentor, and consider the result created.

What is pico llm?

LLMS operating in the browser is important that takes resources due to the size of the model, the requirements of the domain and the needs of the RAM. Therefore, the Pico team has developed their PICOLLM compression technology, making the use of LLMS more efficient in computers. They have presented the Picollm Interference, as JavaScript SDK, to allow the front developers to operate LLMS locally via browsers. It supports all modern browsers including Chrome, Safari, Edge, Firefox and Opera. To learn more about how the Picollm Infection engine works, you can read its article.

Part LWC

The component acts as a bridge between a user and a picollm interface. At the heart of the component, the Visualforce page is included as iframe. The page downloads Picollm SDK and communicates with LWC, allowing another to use SDK via mail messages. The full mix of elements deal with the following:

Download Model. LWC has a button that allows you to download a form of your choice. It leads to an insertion of a hidden file inside the IFRAME. Once the form is downloaded, Pico SDK creates web workers, and the component is ready to process the user’s entry.
Set the system’s mentor. You do not have to write a system router each time, it is easy to determine which record preserved for System_Prompt__c goal. Once you press the button, it displays a pop -up window with the current system claims to choose from.
Accept the inputs of the user. There is a flammable text area to collect user inputs. When collected, it is sent to the IFRAME as a load load and added to the conversation record.
Access to Salesforce records. There is a button: Select the fields and select related records. The first collects the field values for the record on a record of LWC. The second allows you to choose a related object and inquire about its records along with the specific field values. This information is sent to iframe as a load also.
Change the obstetric options. If desired, the completion code can be changed, temperature, and higher P, through a custom button in the component. This information is also sent as a load to the IFRAME.
Getting as a result. When IFRAME receives the beneficial load, it uses Pico SDK to use a portable model and create a result. If the obstetric options are provided, they are taken into account. Also, the dialogue is updated every time, so you will remember the LLM history.
Providing chat messages. LWC enables the messages issued, which is the user. The messages received, which contains the constructed response, are provided dynamically as soon as the component has anything to say to the user. Such as the results created or information and error messages.

A little bit of the summit symbol

On the back side of the things, there is nothing to imagine. The APEX code has all the heavy lift related to detecting relationships between organisms using the record of the record of the record page. It also performs a few Soql inquiries, and thus its duty is done here.

Development challenges

Web workers

Previously, the UNPKG tool was used to implement the code of the knot unit in the LWC component. This approach led to additional training steps, and it was a less safe way to make it work. This time, I wanted to implement the Picollm Unit directly from Salesforce and not only from the Cloud Experience, which I previously made, but the Lightning experience interface.

Under the cap, Picollm uses web workers for parallel treatment, and the main problem was because it is not allowed to operate them from LWC. Fortunately, no one refused to allow us to operate web workers from the Visualforce page, and the approach I used was.

I downloaded the Raw Picollm icon and added it as a fixed resource to the Visualforce page. In LWC, IFRAME is used on the Visualforce page. I allowed me to communicate between LWC and the page inside the IFRAME using web workers. The page prepares the picollm code from the Web Lightning component.

Using Slesforce records for context

Copy and paste the Salesforce records in JSON or CSV format, throw them into any LLM online and watch. You will consume records, use them for an additional context and create a response. It turns out that it is not so easy when using compressed models for local treatment.

Initially, I simply put the records, with JSON format, directly in the user router. Then I expected that the thing would be smart enough to distinguish the same claim from the additional context I made. I used different models of different sizes and did not understand why Json did not use to generate responses. He was mostly a refusal to respond to my demands or generate fictional data that are not related to what I asked to do. It started trying different formats from context data: using CSV, using JSON, using strictly directed discrimination breaks – nothing helps.

I gave up almost the idea because the main feature was not working. After two months, she suddenly got a simple brain wave stupidly. What if it just reflected the arrangement of fast parts? From the next user’s mentor first and the context comes second, to the next context first and second. For my amazement, I succeeded, and any model I used immediately began to understand Salesforce records as a context.

performance

The function of the component has been tested on these machines:

PC with the AMD Ryzen 9900X processor and 32 GB of RAM (5600 tons/s).
Microsoft Surface Labtop 7 is supported by the ARM Snapdragon X-ILite processor with 16 GB RAM (8448 tons/s).

Form download speed – everything about memory

The most consuming time of the ingredient’s use is to download the initial model. You may expect 9900x to easily outperform Snapdragon X-Eleite, but you will be wrong. For my amazement, the latter is faster. Since it contains a faster memory, I assume that the greater the speed of your ram download, the more download the form. Here is a schedule comparing the download speed of the form to refer to:

Speed to generate response

The same story with the response of the response. As I understand, you need to get a quick set of the CPU and RAM to get the fastest possible generation. Since the response of the response varies with the same claim, I did not perform accurate speed tests. However, the generation speed is very fast, almost as quickly as online alternatives.

What about the use of the graphics processing unit?

In fact, the use of the graphics processing unit to generate responses will be more efficient. Although it is possible to use the graphics processing unit with Picolllm, I have not tested this composition myself. There are a few reasons for that. First, I think it uses the Webgpu feature, which is not enabled by default in most browsers (except the edge). Second, it is likely that it takes several GB from VRAM to download the model I do not have.

conclusion

The development of this assistant was a great journey to explore. From the struggle with the restrictions of the web factor to the discovery of the decisive role of immediate arrangement in providing context, the challenges were stimulating and rewarding. The result is a Lightning web component provides a unique approach to take advantage of the power of large language models within the Salsforce ecosystem.

While the time to download the initial model can be considered, especially for the larger models, the ability to process data locally provides great advantages in terms of security, response and cost effectiveness. Possible cases of use, from automation to generate content to smart, wide help and wait for it.

Check the Gitap Ribo.

Muhammad Usman

adxpro.online