Code Assistant: How to Self-Host Your Own

The release of the Code Assistant GitHub Copilot to the public in June 2021 marked the beginning of a new kind of helper in the tool belt of developers – alongside existing ones such as for example linters and formatters.

While basic code completion has been on the market for years with varying degree of complexity, a tool that understands code and completes it in a meaningful way that transcends simple parameter suggestions was a novelty.

This blog article is showing how to build a state-of-the-art Code Assistant using several open source tools created by Hugging Face 🤗:

Text Generation Inference, the model inference API
VSCode extension for TGI, the extension that lets you access the model from Visual Studio Code
Chat UI, a ChatGPT-like UI for the model

… all via a single docker-compose file 🔥! This file and all the others discussed in this article are available in an accompanying repository.

Wait… Have We Been There Already?

Kite was one of the companies that provided a more advanced variant of code completion and gave up on the task for various reasons. In late 2022 the company gave the following explanation:

First, we failed to deliver our vision of AI-assisted programming because we were 10+ years too early to market, i.e. the tech is not ready yet.

We built the most-advanced AI for helping developers at the time, but it fell short of the 10× improvement required to break through because the state of the art for ML on code is not good enough. You can see this in Github Copilot, which is built by Github in collaboration with Open AI. As of late 2022, Copilot shows a lot of promise but still has a long way to go.

But in “late“ 2023 you can run a publicly available model that even beats ChatGPT and old versions of GPT-4 on your personal computer! One year in AI moves blazingly fast and can cover a decade…

Challenge Accepted

Ever since Copilot was released, the open source LLM community tried its best to replicate its functionality. ChatGPT and GPT-4 raised the bar even higher. The release of StarCoder by the BigCode project was a major milestone for the open LLM community: The first truly powerful large language model for code generation that was released to the public under a responsible but nonetheless open license: The code wars had begun and the source was with StarCoder.

While it still performed considerably worse than the proprietary and walled GPT-4 (67 in March) and ChatGPT (48.1) models on the HumanEval benchmark with 32.9 points, it positioned itself successfully within striking distance.

The releases of Llama 2 and subsequently Code Llama – both by Meta – are also important waypoints. Code Llama achieved an impressive HumanEval pass@1 score of 48.8, beating ChatGPT. A few days later WizardCoder builds on top of StarCoder, thereby achieving 73.2 pass@1 which even surpasses GPT-4’s March score!

Why Bother with Self-Hosting?

While Coding Assistant services like GitHub Copilot and tabnine (allows VPC and air-gapped installs) exist already, there are many reasons to self-host one for your company or even yourself.

Full control over all the moving parts, models and software
The ability to easily fine-tune models on your own data
No vendor lock-in
The fact that by now many of the most capable models are public anyway
Various compliancy reasons

On August 22, Hugging Face 🤗 announced an enterprise Code Assistant called SafeCoder, which brings together StarCoder (and other models), as well as an inference endpoint and a VSCode extension all in a single managed package. SafeCoder addresses many of the points above, but hides most of its moving parts behind its managed service – by design. Luckily, the main components are open source and readily available. In the following, we will setup everything that is needed to run your very own Coding Assistant serviced by you.

Prerequisites

The best and most performant way to run LLM today is by leveraging GPUs or TPUs. This article assumes that you have a NVIDIA GPU with CUDA support with at least 10 Gigabytes of VRAM at your disposal. Be sure to install an up-to-date driver and CUDA version. You will also need Docker (or another container engine like Podman) and the NVIDIA Container Toolkit.

First Component: The Inference Engine

The core of the Coding Assistant is the backend that is handling the user’s completion requests and generating new tokens based on them. For this we will use huggingface’s Text Generation Inference, which powers Inference Endpoints and the Inference API – a well tested and vital part of huggingface’s infrastructure. Note that the license for the software was slightly changed recently: TGI (text generation inference) from 1.0 onwards uses a new license called HFOIL 1.0, which restricts commercial use. Olivier Dehaene, the maintainer of the project, summarises the implications of the license as follows:

building and selling a chat app for example that uses TGI as a backend is ok whatever the version you use
building and selling a Inference Endpoint like experience using TGI 1.0+ requires an agreement with HF

While this summary should give you a basic understanding of what is possible under the license, be sure to consult a lawyer to get a thorough understanding of whether your use case is covered or not.

The Model: WizardCoder

We will use a quantised and optimised version of a SOTA Code Assistant model called WizardCoder. There are several options available today for quantised models: GPTQ, GGML, GGUF… Tom Jobbins aka “TheBloke“ gives a good introduction here. Since GGUF is not yet available for Text Generation Inference yet, we will stick to GPTQ. For the model to run properly, you will need roughly 10 Gigabytes of available VRAM. If you happen to have more than that available, feel free to try the 34B model, or the slightly better 34B Phind model, which unfortunately is not yet available in a 13B version. Also, check the “Big Code Models Leaderboard“ on huggingface to regularly select the best performing model for your use case.

Setting up Text Generation Inference

Create a docker-compose.yml file with the following contents:

version: '3.8'

services:
  text-generation:
    image: ghcr.io/huggingface/text-generation-inference:1.0.3
    environment:
      HUGGING_FACE_HUB_TOKEN: ${HUGGING_FACE_HUB_TOKEN}
    ports:
      - "8080:80"
    volumes:
      - ./data:/data
    command:
      - "--model-id"
      - "${MODEL_ID:-TheBloke/WizardCoder-Python-13B-V1.0-GPTQ}"
      - "--quantize"
      - "${QUANTIZE:-gptq}"
      - "--max-batch-prefill-tokens=${MAX_BATCH_PREFILL_TOKENS:-2048}"
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]
    container_name: text-generation
    restart: always # Ensuring service always restarts on failure

version: '3.8'

services:

text-generation:

image: ghcr.io/huggingface/text-generation-inference:1.0.3

environment:

HUGGING_FACE_HUB_TOKEN: ${HUGGING_FACE_HUB_TOKEN}

ports:

- "8080:80"

volumes:

- ./data:/data

command:

- "--model-id"

- "${MODEL_ID:-TheBloke/WizardCoder-Python-13B-V1.0-GPTQ}"

- "--quantize"

- "${QUANTIZE:-gptq}"

- "--max-batch-prefill-tokens=${MAX_BATCH_PREFILL_TOKENS:-2048}"

deploy:

resources:

reservations:

devices:

- driver: nvidia

count: all

capabilities: [gpu]

container_name: text-generation

restart: always # Ensuring service always restarts on failure

Optionally, create an .env file with:

# optional, only if you want to use a guarded model like StarCoder or Code Llama
HUGGING_FACE_HUB_TOKEN=1234
# the model we are going to use
MODEL_ID=TheBloke/WizardCoder-Python-13B-V1.0-GPTQ
# how the model is quantized
QUANTIZE=gptq
MAX_BATCH_PREFILL_TOKENS=2048

# optional, only if you want to use a guarded model like StarCoder or Code Llama

HUGGING_FACE_HUB_TOKEN=1234

# the model we are going to use

MODEL_ID=TheBloke/WizardCoder-Python-13B-V1.0-GPTQ

# how the model is quantized

QUANTIZE=gptq

MAX_BATCH_PREFILL_TOKENS=2048

Finally, use sudo docker compose up -d to run the text generation service. It will now be available at localhost:8080. sudo docker container ls gives you a list of all running container instances. Next, type sudo docker logs text-generation –follow to get live-output of the TGI container logs. This is particularly helpful for debugging. As you can see in the logs, TGI will download the model the first time that it is run and save it to the data folder that is mounted as a volume inside the container.

To test if everything was setup correctly, try to send the following POST request to your API from a new terminal window/tab:

curl localhost:8080/generate -X POST -d '{"inputs":"write a python functions that gets me all folders in the working directory,"parameters":{"max_new_tokens":200}}' -H 'Content-Type: application/json'

1	curl localhost:8080/generate -X POST -d '{"inputs":"write a python functions that gets me all folders in the working directory,"parameters":{"max_new_tokens":200}}' -H 'Content-Type: application/json'

Now, you should get a response back from the API and also see the request in the container logs! Note that the quality of the response may very well be lacking, since we did not configure any parameters for our request, as this is just to test the basic functionality. You should now have Text Generation Inference up and running on your machine with WizardCoder as a model. Well done!

Second Component: The VSCode Extension

Next, we will setup a plugin for Visual Studio Code that allows us to query TGI conveniently from our IDE! For this we will use huggingface’s VSCode extension available from the marketplace. The plugin is actively developed and thankfully a recent update made it possible to configure the max_new_tokens parameter, which controls how long the model’s response can be. A larger number allows for longer code to be generated but also results in more load.

Setting up the Extension

Once you have installed the plugin, head over to the extension settings. We will need to configure a few parameters:

First, change the Hugging Face Code: Config Template
to WizardLM/WizardCoder-Python-34B-V1.0
Next, configure the Hugging Face Code: Model ID Or Endpoint setting and change it to http://YOUR-SERVER-ADDRESS-OR-IP:8080/generate or localhost if TGI runs on the same machine.

To test if everything works as intended, create a new .py file and copy over the following text. Since we are using an instruction model, the model will perform best when prompted properly:

# write a function that lists all text files in a given directory. use type hints and python docstrings

1	# write a function that lists all text files in a given directory. use type hints and python docstrings

Then move your cursor to the end of function definition’s line and hit enter. You should see a spinning circle in the bottom of the window and should be greeted with some (hopefully functional) code!

Third Component: The Chat UI

Would it not be convenient to also be able to access the Code Assistant from your web browser without needing to open an IDE? Certainly! And this is where another great open source software comes into play: huggingface’s Chat UI. It is the very same code that drives the Assistant HuggingChat, which is a very well put together variant of the familiar ChatGPT UI.

Setting up Chat UI

First, clone the repository and create a file called .env.local in its root directory with the following contents:

# url to our local mongodb
MONGODB_URL="mongodb://mongo-chatui:27017"
# we don't need authorization for our purposes
REJECT_UNAUTHORIZED=false
# insert your favorite color here
PUBLIC_APP_COLOR=blue

# overwrite the standard model card with the model we serve via tgi
# be sure the edit the 'endpoints' field!

MODELS=`[{"name":"TheBloke/WizardCoder-Python-13B-V1.0-GPTQ",
          "endpoints":[{"url":"http://text-generation:/generate_stream"}],
          "description":"Programming Assistant",
          "userMessageToken":"\n\nHuman: ",
          "assistantMessageToken":"\n\nAssistant:",
          "preprompt": "You are a helpful, respectful and honest assistant. Below is an instruction that describes a task. Write a response that appropriately completes the request.",
          "chatPromptTemplate": "{{preprompt}}\n\n### Instruction:\n{{#each messages}}\n {{#ifUser}}{{@root.userMessageToken}}{{content}}{{@root.userMessageEndToken}}{{/ifUser}}\n {{#ifAssistant}}{{@root.assistantMessageToken}}{{content}}{{@root.assistantMessageEndToken}}{{/ifAssistant}}\n{{/each}}\n{{assistantMessageToken}}\n\n### Response:",
          "promptExamples":[{"title":"Code a snake game","prompt":"Code a basic snake game in python, give explanations for each step."}],
          "parameters":{"temperature":0.1,"top_p":0.9,"repetition_penalty":1.2,"top_k":50,"truncate":1000,"max_new_tokens":1024}}]`

# url to our local mongodb

MONGODB_URL="mongodb://mongo-chatui:27017"

# we don't need authorization for our purposes

REJECT_UNAUTHORIZED=false

# insert your favorite color here

PUBLIC_APP_COLOR=blue

# overwrite the standard model card with the model we serve via tgi

# be sure the edit the 'endpoints' field!

MODELS=`[{"name":"TheBloke/WizardCoder-Python-13B-V1.0-GPTQ",

"endpoints":[{"url":"http://text-generation:/generate_stream"}],

"description":"Programming Assistant",

"userMessageToken":"\n\nHuman: ",

"assistantMessageToken":"\n\nAssistant:",

"preprompt": "You are a helpful, respectful and honest assistant. Below is an instruction that describes a task. Write a response that appropriately completes the request.",

"chatPromptTemplate": "{{preprompt}}\n\n### Instruction:\n{{#each messages}}\n {{#ifUser}}{{@root.userMessageToken}}{{content}}{{@root.userMessageEndToken}}{{/ifUser}}\n {{#ifAssistant}}{{@root.assistantMessageToken}}{{content}}{{@root.assistantMessageEndToken}}{{/ifAssistant}}\n{{/each}}\n{{assistantMessageToken}}\n\n### Response:",

"promptExamples":[{"title":"Code a snake game","prompt":"Code a basic snake game in python, give explanations for each step."}],

"parameters":{"temperature":0.1,"top_p":0.9,"repetition_penalty":1.2,"top_k":50,"truncate":1000,"max_new_tokens":1024}}]`

There is still a lot of room for improvement especially in the chatPromptTemplate section. See here for further information.

Unfortunately, no prebuilt Docker image exists for Chat UI. Thus, we have to build the image ourselves. The .env and .env.local files are needed at build-time, so be sure to have them ready. Run the following command in the root directory of the Chat UI repository:

sudo docker build . -t chat-ui:latest

1	sudo docker build . -t chat-ui:latest

Next, create a new folder and create a new docker-compose.yml file with the following contents. It is important that the .env file from Chat UI is not in the same folder hierarchy as the docker-compose.yml (hence the new folder), since Docker compose will try to parse and use the .env file in this case case, which will lead to parsing errors due to the JSON string formatting. And we do not need the .env file and its contents at runtime, anyway.

version: '3.8'

services:
  # The frontend
  chat-ui:
    image: chat-ui
    ports:
      - "3000:3000"
    environment:
       - MONGODB_URL=mongodb://mongo-chatui:27017
    container_name: chatui
    restart: always # Ensuring service always restarts on failure
  # The database where the history and context are going to be stored
  mongo-chatui:
    image: mongo:latest
    ports:
      - "27017:27017"
    container_name: mongo-chatui
    restart: always # Ensuring service always restarts on failure

version: '3.8'

services:

# The frontend

chat-ui:

image: chat-ui

ports:

- "3000:3000"

environment:

- MONGODB_URL=mongodb://mongo-chatui:27017

container_name: chatui

restart: always # Ensuring service always restarts on failure

# The database where the history and context are going to be stored

mongo-chatui:

image: mongo:latest

ports:

- "27017:27017"

container_name: mongo-chatui

restart: always # Ensuring service always restarts on failure

Now, we can test-drive Chat UI. To do so, type in sudo docker compose up -d in the directory of the docker-compose.yml (as before with TGI) and be sure to also keep an eye on the logs via sudo docker container logs chat-ui –follow. If all works as expected, you should be able to access the UI on port 3000!

Code Assistant example using the UI

Putting Everything Together

Besides, it is also possible, of course, to use one combined docker-compose file if you are willing to host the backend, frontend and database on the same machine. Copy the data folder from earlier so the models do not need to be re-downloaded. You might also have to remove the old Chat UI and database containers using sudo docker container remove chat-ui mongo-chatui.

version: '3.8'

services:
  # Text Generation Inference backend
  text-generation:
    image: ghcr.io/huggingface/text-generation-inference:1.0.3
    environment:
      HUGGING_FACE_HUB_TOKEN: ${HUGGING_FACE_HUB_TOKEN}
    ports:
      - "8080:80"
      - ./data:/data
    command:
      - "--model-id"
      - "${MODEL_ID:-TheBloke/WizardCoder-Python-13B-V1.0-GPTQ}"
      - "--quantize"
      - "${QUANTIZE:-gptq}"
      - "--max-batch-prefill-tokens=${MAX_BATCH_PREFILL_TOKENS:-2048}"
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]
    container_name: text-generation
    restart: always # Ensuring service always restarts on failure
  # The frontend
  chat-ui:
    image: chat-ui
    ports:
      - "3000:3000"
    environment:
       - MONGODB_URL=mongodb://mongo-chatui:27017
    container_name: chatui
    restart: always # Ensuring service always restarts on failure
  # The database where the history and context are going to be stored
  mongo-chatui:
    image: mongo:latest
    ports:
      - "27017:27017"
    container_name: mongo-chatui
    restart: always # Ensuring service always restarts on failure

version: '3.8'

services:

# Text Generation Inference backend

text-generation:

image: ghcr.io/huggingface/text-generation-inference:1.0.3

environment:

HUGGING_FACE_HUB_TOKEN: ${HUGGING_FACE_HUB_TOKEN}

ports:

- "8080:80"

- ./data:/data

command:

- "--model-id"

- "${MODEL_ID:-TheBloke/WizardCoder-Python-13B-V1.0-GPTQ}"

- "--quantize"

- "${QUANTIZE:-gptq}"

- "--max-batch-prefill-tokens=${MAX_BATCH_PREFILL_TOKENS:-2048}"

deploy:

resources:

reservations:

devices:

- driver: nvidia

count: all

capabilities: [gpu]

container_name: text-generation

restart: always # Ensuring service always restarts on failure

# The frontend

chat-ui:

image: chat-ui

ports:

- "3000:3000"

environment:

- MONGODB_URL=mongodb://mongo-chatui:27017

container_name: chatui

restart: always # Ensuring service always restarts on failure

# The database where the history and context are going to be stored

mongo-chatui:

image: mongo:latest

ports:

- "27017:27017"

container_name: mongo-chatui

restart: always # Ensuring service always restarts on failure

Do not forget to change the endpoints parameter in the MODELS variable of Chat UI’s .env.local to “endpoints“:[{„url“:“http://text-generation:/generate_stream“}], since we now can conveniently use the container address of the shared Docker network. Remember, you have to re-build the image after adapting the .env.local file.

Great! Now you can start the backend, the frontend and the database with one single sudo docker compose -up -d.

Bonus: Adding HTTPS

Up to this point, the API and UI are all served only via HTTP. It is therefore advisable to better secure our traffic with HTTPS and the help of a reverse proxy like nginx. Without HTTPS, you will not be able to access the UI from other destinations as localhost.

Create a new directory called nginx and inside of it a new file nginx.conf. The specific settings depend on what local registrar you are using – in case you only want to make the service available to your local network.

This nginx.conf template can serve as a starting point:

events {
    worker_connections  1024;
}

http {
    server_tokens off;
    charset utf-8;

    server {
        listen 80 default_server;
        listen [::]:80 default_server;

        location /nginx_status {
            stub_status on;
        }

    }
	
    # Frontend
    server {
        listen              443 ssl http2;
        listen              [::]:443 ssl http2;
        server_name         your.local.address.io;
        client_max_body_size 15G;
        
        ...
    
        # reverse proxy
        location / {
            proxy_pass            http://chat-ui:3000;
            
		        ...
        }
    }

    # Serving backend
    server {
        listen              443 ssl http2;
        listen              [::]:443 ssl http2;
        server_name         api.your.local.address.io;
        client_max_body_size 15G;

        ...
        # reverse proxy
        location / {
            proxy_pass            http://text-generation:80;
            
            ...
        }

    }

    # HTTP redirect
    server {
        listen      80;
        listen      [::]:80;
        server_name .your.local.address.io;
        return      301 https://your.local.address.io$request_uri;
    }
}

events {

worker_connections 1024;

}

http {

server_tokens off;

charset utf-8;

server {

listen 80 default_server;

listen [::]:80 default_server;

location /nginx_status {

stub_status on;

}

# Frontend

server {

listen 443 ssl http2;

listen [::]:443 ssl http2;

server_name your.local.address.io;

client_max_body_size 15G;

...

# reverse proxy

location / {

proxy_pass http://chat-ui:3000;

...

}

# Serving backend

server {

listen 443 ssl http2;

listen [::]:443 ssl http2;

server_name api.your.local.address.io;

client_max_body_size 15G;

...

# reverse proxy

location / {

proxy_pass http://text-generation:80;

...

}

# HTTP redirect

server {

listen 80;

listen [::]:80;

server_name .your.local.address.io;

return 301 https://your.local.address.io$request_uri;

}

You also need to add the nginx service to your existing docker-compose.yml.

version: '3.8'

services:
	...
  # The reverse proxy
  nginx:
    container_name: nginx
    restart: unless-stopped
    image: nginx
    ports:
      - 80:80
      - 443:443
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
      - ./certificates:/certificates

version: '3.8'

services:

...

# The reverse proxy

nginx:

container_name: nginx

restart: unless-stopped

image: nginx

ports:

- 80:80

- 443:443

volumes:

- ./nginx/nginx.conf:/etc/nginx/nginx.conf

- ./certificates:/certificates

Now you only need to generate the certificates, save them in the certificates folder and restart everything.

This is it!

Good job. You now have all the components needed to self-host our very own Code Assistant. Thanks to the awesome people at huggingface, it is easier than ever. And maybe you even learned a thing or two along the way. Before you put it in production though, you may want to do a final load test, e.g. via locust. Doing so, you get an understanding of how many users are able to use the service at the same time. For this you will need to write a small locust-file.py – and for that you could kindly ask WizardCoder to help you out 🧙‍♀️.

Name	Borlabs Cookie
Anbieter	Eigentümer dieser Website
Zweck	Speichert die Einstellungen der Besucher, die in der Cookie Box von Borlabs Cookie ausgewählt wurden.
Cookie Name	borlabs-cookie
Cookie Laufzeit	1 Jahr

Akzeptieren
Name	Google Analytics
Anbieter	Google LLC
Zweck	Cookie von Google für Website-Analysen. Erzeugt statistische Daten darüber, wie der Besucher die Website nutzt.
Datenschutzerklärung	https://policies.google.com/privacy?hl=de
Cookie Name	_ga,_gat,_gid
Cookie Laufzeit	2 Jahre

Akzeptieren
Name	Hotjar
Anbieter	Hotjar Ltd.
Zweck	Hotjar ist ein Analysewerkzeug für das Benutzerverhalten von Hotjar Ltd. Wir verwenden Hotjar, um zu verstehen, wie Benutzer mit unserer Website interagieren.
Datenschutzerklärung	https://www.hotjar.com/legal/policies/privacy/
Host(s)	*.hotjar.com
Cookie Name	_hjClosedSurveyInvites, _hjDonePolls, _hjMinimizedPolls, _hjDoneTestersWidgets, _hjIncludedInSample, _hjShownFeedbackMessage, _hjid, _hjRecordingLastActivity, hjTLDTest, _hjUserAttributesHash, _hjCachedUserAttributes, _hjLocalStorageTest, _hjptid
Cookie Laufzeit	Sitzung / 1 Jahr

Akzeptieren
Name	HubSpot
Anbieter	HubSpot Inc.
Zweck	HubSpot ist ein Verwaltungsdienst für Benutzerdatenbanken bereitgestellt von HubSpot, Inc. Wir nutzen HubSpot auf dieser Website für unsere Online Marketing-Aktivitäten.
Datenschutzerklärung	https://legal.hubspot.com/privacy-policy
Host(s)	*.hubspot.com, hubspot-avatars.s3.amazonaws.com, hubspot-realtime.ably.io, hubspot-rest.ably.io, js.hs-scripts.com
Cookie Name	__hs_opt_out, __hs_d_not_track, hs_ab_test, hs-messages-is-open, hs-messages-hide-welcome-message, __hstc, hubspotutk, __hssc, __hssrc, messagesUtk
Cookie Laufzeit	Sitzung / 30 Minuten / 1 Tag / 1 Jahr / 13 Monate

Akzeptieren
Name	OpenStreetMap
Anbieter	OpenStreetMap Foundation
Zweck	Wird verwendet, um OpenStreetMap-Inhalte zu entsperren.
Datenschutzerklärung	https://wiki.osmfoundation.org/wiki/Privacy_Policy
Host(s)	.openstreetmap.org
Cookie Name	_osm_location, _osm_session, _osm_totp_token, _osm_welcome, _pk_id., _pk_ref., _pk_ses., qos_token
Cookie Laufzeit	1-10 Jahre

Code Assistant: How to Self-Host Your Own

Wait… Have We Been There Already?

Challenge Accepted

Why Bother with Self-Hosting?

Prerequisites

First Component: The Inference Engine

The Model: WizardCoder

Setting up Text Generation Inference

Second Component: The VSCode Extension

Setting up the Extension

Third Component: The Chat UI

Setting up Chat UI

Putting Everything Together

Bonus: Adding HTTPS

This is it!

Hat dir der Beitrag gefallen? Antworten abbrechen

Ähnliche Artikel

Prompt Engineering: From Zero to GenAI Hero

How to Leverage Knowledge Graphs in Question Answering?

Haystack: Schnelle Entwicklung Generativer AI-Applikationen

Code Assistant: How to Self-Host Your Own

Wait… Have We Been There Already?

Challenge Accepted

Why Bother with Self-Hosting?

Prerequisites

First Component: The Inference Engine

The Model: WizardCoder

Setting up Text Generation Inference

Second Component: The VSCode Extension

Setting up the Extension

Third Component: The Chat UI

Setting up Chat UI

Putting Everything Together

Bonus: Adding HTTPS

This is it!

Hat dir der Beitrag gefallen? Antworten abbrechen

Ähnliche Artikel

Prompt Engineering: From Zero to GenAI Hero

How to Leverage Knowledge Graphs in Question Answering?

Haystack: Schnelle Entwicklung Generativer AI-Applikationen

Newsletter