SCRP-Chat Generative AI Service

SCRP provides access to locally-hosted, state-of-the-art generative AI models. Information passed to the models never leaves the cluster, making it suitable for processing proprietary data.

Web Access

Log in to https://scrp-chat.econ.cuhk.edu.hk with your SCRP credentials.
If necessary, choose a model at the top. The default model is Gemma 3 27B.
Files can be uploaded by clicking the + button on the left of the dialog box at the bottom. The models are capable of parsing many types of files, including pure text, Excel spreadsheets, CSVs and PDFs.

API Access

SCRP provides an OpenAI-compatible API. To access the API,

Log in to https://scrp-chat.econ.cuhk.edu.hk with your SCRP credentials.
Click on the user avatar on the top-right corner, then click Settings.
In Account, click Show next to ‘API Keys’.
Click + Create new secret key.
Copy the created API key and provide it along with the model name when accessing the API through HTTP requests or OpenAI Python library. See the examples below.
- Use the model’s Stable Model Name if you want to ensure your request is handled by a model, even if the specific version of model you used previously is no longer supported. E.g. text-large will always be handled by our most capable model, but the specific model used might change as new models become available.
- Use the model’s Original Model Name if you want to make sure your request is handled by the specific model. E.g. mistralai/Pixtral-Large-Instruct-2411 will only use that specific model, and will return error if the model is no longer supported.

Available Models

SCRP currently provide access to the following models:

Model	Max. Throughput	Context Length	Quantization	Stable Model Name Original Model Name
Vision and Text - Large (Pixtral Large)	18 token/s	95K	FP16	- `vision` / `vision-large` / `text-large` - `mistralai/Pixtral-Large-Instruct-2411`
Vision and Text - Medium (Gemma 3 27B)	44 token/s	128K	FP16	- `vision-medium` - `google/gemma-3-27b-it`
Vision and Text - Small (Gemma 3 4B)	76 token/s	65K	FP16	- `vision-small` - `google/gemma-3-4b-it`
Reasoning (DeepSeek-R1-Llama-70B)	33 token/s	95K	FP16	- `reasoning` - `deepseek-ai/DeepSeek-R1-Distill-Llama-70B`
Reasoning and Text (Qwen 3 32B)	60 token/s	32K	FP8	- - `Qwen/Qwen3-32B-FP8`
Text - Small (Llama 3.1 8B)	85 token/s	32K	FP8	- `text-small` - `neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8-dynamic`

Larger models are more capable but run slower. The different is particularly noticeable when it comes to fact recall, mathematics and logical reasoning. A comparison of the models’ performance can be found here.

Due to the hidden reasoning process, there is a noticeable delay before the reasoning model provides a final response. You can inspect the reasoning process in real time by clicking the ‘Thinking…’ or ‘Thought for X seconds’ dropdown menu.

Tips for Getting the Best Result

Vision

Set temperature to 0 if the image contains text. This forces the model to output the most likely next word in every single step. This is necessary to ensure the model faithfully reproduce the text in the image.

Text Generation

Models can hallucinate. During text generation, a Large Language Model (LLM) probabilistically chooses the most likely next word, so it can generate convincingly-looking but completely fabricated content. Do not trust the model’s output for specific fact, such as article reference or the ranking of a particular university.
When formulating your prompt, be specific in what you are asking for. For example, if you believe the answer is uncertain, you should tell the model about that, otherwise the model will likely provide a wrong answer with confidence.
Asking the models for concise output will affect output quality. The models do have the ability to think privately, so asking them for concise output is equivalent to asking them not to think in detail. Instead, ask them to “think step-by-step”, followed by outputting a concise answer at the end.
Pure text is preferred to formatted text. For example, if you were to provide the model with your manuscript for editing, provide the raw LaTeX instead of the generated PDF file.
Use the reasoning model only if reasoning is crucial to task. Because a reasoning model’s reasoning process generates a significant amount of hidden tokens, it takes a much longer time than a normal text model to provide a final response. For task that requires reasoning, such as mathematics, coding and scientific reasoning, the tradeoff is often worthwhile. For task that does not require a reasoning process—e.g. text formatting—you will get much faster response with a normal text model.

Text Generation Settings

Below are a list of common settings for text generation. For web access, the settings are shown when you click on the control panel icon next to your avatar on the top-right corner.

Temperature: a lower temperature sharpens the distribution from which the model sample the next word. Setting temperature to 0 will make the model’s output deterministic. Exposed as temperature in OpenAI API.
Top K: Only sample from the top 𝑘-most likely words. Not available in OpenAI API.
Top P: Only sample words are within the top cumulative distribution of 𝑝. Exposed as top_p in OpenAI API.
Frequency Penalty: A higher penalty lowers the probability of sampling words that have already been sampled before. Exposed as frequency_penalty in OpenAI API.

Examples

Text

Below is an example of using the OpenAI Python Library to access the medium-sized text model on SCRP, with temperature set to 1 and top-p set to 0.5:

from openai import OpenAI

client = OpenAI(
    base_url = 'https://scrp-chat.econ.cuhk.edu.hk/api',
    api_key='your_api_key_here',
)

response = client.chat.completions.create(
  model="text-medium",
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Where is Hong Kong?"},
  ],
  temperature=1,
  top_p=0.5
)
print(response.choices[0].message.content)

You can also interact with the models through HTTP request:

curl https://scrp-chat.econ.cuhk.edu.hk/ollama/api/generate \
-H "Authorization:bearer your_api_key_here" \
-H 'Content-Type: application/json' \
-d '{ "model": "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4", 
      "prompt": "Where is Hong Kong?", 
      "stream": false }'

Vision

Below is an example of using the OpenAI Python Library to access the default vision model. The image is referenced by its URL in the file image_url. For accurate reading of the content in the image, a low temperature is recommended.

from openai import OpenAI

client = OpenAI(
    base_url = 'https://scrp-chat.econ.cuhk.edu.hk/api',
    api_key='your_api_key_here',
)

response = client.chat.completions.create(
  model="vision",
  messages=[
    {"role": "user", "content":[ 
        {"type":"text","text":"Describe this image:"},
        {"type":"image_url","image_url":{"url":"https://www.example.com/my-picture.png"}}
        ]
    }
  ],
  temperature=0.1,
)
print(response.choices[0].message.content)

For local images, you should first load the image, followed by encoding it in the base64 format.

from openai import OpenAI
import cv2

# Settings
api_key = 'your_api_key_here'
image_file_path = 'path_to_image'

# Create interface to model
client = OpenAI(
    base_url = 'https://scrp-chat.econ.cuhk.edu.hk/api',
    api_key = api_key,
)

# Load the image
img = cv2.imread(image_file_path)

# Convert the NumPy array to bytes
_, buffer = cv2.imencode(".png", img)

# Convert the bytes to base64 encoding
base64_bytes = base64.b64encode(buffer).decode("utf-8")

# Send prompt and image to model
response = client.chat.completions.create(
  model="vision",
  messages=[
    {"role": "user", "content":[ 
        {"type":"text","text":"Describe this image:"},
        {"type":"image_url","image_url":{"url":f"data:image/jpeg;base64,{base64_bytes}"}}
        ]
    }
  ],
  temperature=0.1,
)
print(response.choices[0].message.content)