SCRP provides access to locally-hosted, state-of-the-art generative AI models. Information passed to the models never leaves the cluster, making it suitable for processing proprietary data.

Web Access

  1. Log in to https://scrp-chat.econ.cuhk.edu.hk with your SCRP credentials.
  2. If necessary, choose a model at the top. The default model is Llama3.1 70B.
  3. Files can be uploaded by clicking the + button on the left of the dialog box at the bottom. The models are capable of parsing many types of files, including pure text, Excel spreadsheets, CSVs and PDFs.

API Access

SCRP provides an OpenAI-compatible API. To access the API,

  1. Log in to https://scrp-chat.econ.cuhk.edu.hk with your SCRP credentials.
  2. Click on the user avatar on the top-right corner, then click Settings.
  3. In Account, click Show next to ‘API Keys’.
  4. Click + Create new secret key.
  5. Copy the created API key and provide it when accessing the API through HTTP requests or OpenAI Python library. See the examples below.

Available Models

SCRP currently provide access to the following models:

Model Max. Speed Quantization Full Model Name
Llama3.1 70B (default) 15 token/s Q4_K_M llama3.1:70b-instruct-q4_K_M
Llama3.1 405B 5 token/s Q4_K_M llama3.1:405b-instruct-q4_K_M
Llama3.1 8B 95 token/s Q5_K_M llama3.1:8b-instruct-q5_K_M

Larger models are more capable but run slower. The different is particularly noticeable when it comes to fact recall, mathematics and logical reasoning. A comparison of the models’ performance can be found here.

Tips for Getting the Best Result

  • When formulating your prompt, be specific in what you are asking for. For example, if you believe the answer is uncertain, you should tell the model about that, otherwise the model will likely provide a wrong answer with confidence.
  • Asking the models for concise output will affect output quality. The models do have the ability to think privately, so asking them for concise output is equivalent to asking them not to think in detail. Instead, ask them to “think step-by-step”, followed by outputing a concise answer at the end.
  • Pure text is preferred to formatted text. For example, if you were to provide the model with your manuscript for editing, provide the raw LaTeX instead of the generated PDF file.

Text Generation Settings

Below are a list of common settings for text generation. For web access, the settings are shown when you click on the control panel icon next to your avatar on the top-right corner.

  • Temperature: a lower temperature sharpens the distribution from which the model sample the next word. Setting temperature to 0 will make the model’s output deterministic. Exposed as temperature in OpenAI API.
  • Top K: Only sample from the top 𝑘-most likely words. Not available in OpenAI API.
  • Top P: Only sample words are within the top cumulative distribution of 𝑝. Exposed as top_p in OpenAI API.
  • Frequency Penalty: A higher penalty lowers the probability of sampling words that have already been sampled before. Exposed as frequency_penalty in OpenAI API.

Examples

Below is an example of using the OpenAI Python Library to access Llama3.1 70B on SCRP, with temperature set to 1 and top-p set to 0.5:

from openai import OpenAI

client = OpenAI(
    base_url = 'https://scrp-chat.econ.cuhk.edu.hk/api',
    api_key='your_api_key_here',
)

response = client.chat.completions.create(
  model="llama3.1:70b-instruct-q4_K_M",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Where is Hong Kong?"},
  ],
  temperature=1,
  top_p=0.5
)
print(response.choices[0].message.content)

You can also interact with the models through HTTP request:

curl https://scrp-chat.econ.cuhk.edu.hk/ollama/api/generate \
-H "Authorization:bearer your_api_key_here" \
-H 'Content-Type: application/json' \
-d '{ "model": "llama3.1:70b-instruct-q4_K_M", 
      "prompt": "Where is Hong Kong?", 
      "stream": false }'