SCRP-Chat Generative AI Service
SCRP provides access to locally-hosted, state-of-the-art generative AI models. Information passed to the models never leaves the cluster, making it suitable for processing proprietary data.
If the model you need is not available on SCRP-Chat, you can consider running it through SCRP’s customized Ollama platform.
Web Access
- Log in to https://scrp-chat.econ.cuhk.edu.hk with your SCRP credentials.
- If necessary, choose a model at the top. The default model is Gemma 3 27B.
- Files can be uploaded by clicking the
+
button on the left of the dialog box at the bottom. The models are capable of parsing many types of files, including pure text, Excel spreadsheets, CSVs and PDFs.
API Access
SCRP provides an OpenAI-compatible API. To access the API,
- Log in to https://scrp-chat.econ.cuhk.edu.hk with your SCRP credentials.
- Click on the user avatar on the top-right corner, then click Settings.
- In Account, click Show next to ‘API Keys’.
- Click
+ Create new secret key
. - Copy the created API key and provide it along with the model name
when accessing the API through HTTP requests or
OpenAI Python library.
See the examples below.
- Use the model’s Stable Model Name if you want to ensure your request is handled by a model,
even if the specific version of model you used previously is no longer supported.
E.g.
text-large
will always be handled by our most capable model, but the specific model used might change as new models become available. - Use the model’s Original Model Name if you want to make sure your request is handled by
the specific model. E.g.
mistralai/Pixtral-Large-Instruct-2411
will only use that specific model, and will return error if the model is no longer supported.
- Use the model’s Stable Model Name if you want to ensure your request is handled by a model,
even if the specific version of model you used previously is no longer supported.
E.g.
Available Models
SCRP currently provide access to the following models:
Model | Max. Throughput | Context Length | Quantization | Stable Model Name Original Model Name |
---|---|---|---|---|
Vision, Text and Reasoning (T&R: GPT-OSS-120B, V: Qwen2.5-VL-70B) |
155 token/s (V: 21 token/s) |
128K (V: 32K) |
Native (V: AWQ) |
- default - |
Text and Reasoning (GPT-OSS-120B) |
155 token/s | 128K | Native | - text / reasoning - openai/gpt-oss-120b |
Vision (Qwen2.5-VL-70B) |
21 token/s | 32K | AWQ | - vision - Qwen/Qwen2.5-VL-72B-Instruct-AWQ |
Reasoning and Text (GLM4.5-355B) Experimental |
50 token/s | 128K | FP8 | - - zai-org/GLM-4.5-FP8 |
Reasoning and Text (Qwen3-32B) |
60 token/s | 32K | FP8 | - - Qwen/Qwen3-32B-FP8 |
Reasoning (DeepSeek-R1-Llama-70B) |
33 token/s | 95K | Native | - - deepseek-ai/DeepSeek-R1-Distill-Llama-70B |
Larger models are more capable but run slower. The different is particularly noticeable when it comes to fact recall, mathematics and logical reasoning.
Due to the hidden reasoning process, there is a noticeable delay before the reasoning model provides a final response. You can inspect the reasoning process in real time by clicking the ‘Thinking…’ or ‘Thought for X seconds’ dropdown menu.
Tools
Models on SCRP-Chat can search the web and generate images.
Web Search
You can turn on web search by clicking the ‘+’ button at the bottom of the prompt text box and switch on ‘Web Search’.
Image Generation
You can turn on image generation by clicking the ‘Image’ button at the bottom of the prompt textbox.
Tips for Getting the Best Result
Vision
- Set temperature to 0 if the image contains text. This forces the model to output the most likely next word in every single step. This is necessary to ensure the model faithfully reproduce the text in the image.
Text Generation
- Models can hallucinate. During text generation, a Large Language Model (LLM) probabilistically chooses the most likely next word, so it can generate convincingly-looking but completely fabricated content. Do not trust the model’s output for specific fact, such as article reference or the ranking of a particular university.
- When formulating your prompt, be specific in what you are asking for. For example, if you believe the answer is uncertain, you should tell the model about that, otherwise the model will likely provide a wrong answer with confidence.
- Asking non-reasoning models for concise output will affect output quality. Non-reasoning models do not have the ability to think privately, so asking them for concise output is equivalent to asking them not to think in detail. Instead, ask them to “think step-by-step”, followed by outputting a concise answer at the end. Reasoning models inherently do this by design.
- Pure text is preferred to formatted text. For example, if you were to provide the model with your manuscript for editing, provide the raw LaTeX instead of the generated PDF file.
- Use the reasoning model only if reasoning is crucial to task. Because a reasoning model’s reasoning process generates a significant amount of hidden tokens, it takes a much longer time than a normal text model to provide a final response. For task that requires reasoning, such as mathematics, coding and scientific reasoning, the trade off is often worthwhile. For task that does not require a reasoning process—e.g. text formatting—you will get much faster response with a normal text model.
Text Generation Settings
Below are a list of common settings for text generation. For web access, the settings are shown when you click on the control panel icon next to your avatar on the top-right corner.
- Temperature: a lower temperature sharpens the distribution from which the model
sample the next word. Setting temperature to 0 will make the model’s output deterministic.
Exposed as
temperature
in OpenAI API. - Top K: Only sample from the top 𝑘-most likely words. Not available in OpenAI API.
- Top P: Only sample words are within the top cumulative distribution of 𝑝.
Exposed as
top_p
in OpenAI API. - Frequency Penalty: A higher penalty lowers the probability of sampling words that have
already been sampled before.
Exposed as
frequency_penalty
in OpenAI API.
Examples
Text
Below is an example of using the OpenAI Python Library to access the medium-sized text model on SCRP, with temperature set to 1 and top-p set to 0.5:
from openai import OpenAI
client = OpenAI(
base_url = 'https://scrp-chat.econ.cuhk.edu.hk/api',
api_key='your_api_key_here',
)
response = client.chat.completions.create(
model="text-medium",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Where is Hong Kong?"},
],
temperature=1,
top_p=0.5
)
print(response.choices[0].message.content)
You can also interact with the models through HTTP request:
curl https://scrp-chat.econ.cuhk.edu.hk/ollama/api/generate \
-H "Authorization:bearer your_api_key_here" \
-H 'Content-Type: application/json' \
-d '{ "model": "hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4",
"prompt": "Where is Hong Kong?",
"stream": false }'
Vision
Below is an example of using the OpenAI Python Library to access the default
vision model. The image is referenced by its URL in the file image_url
.
For accurate reading of the content in the image, a low temperature is recommended.
from openai import OpenAI
client = OpenAI(
base_url = 'https://scrp-chat.econ.cuhk.edu.hk/api',
api_key='your_api_key_here',
)
response = client.chat.completions.create(
model="vision",
messages=[
{"role": "user", "content":[
{"type":"text","text":"Describe this image:"},
{"type":"image_url","image_url":{"url":"https://www.example.com/my-picture.png"}}
]
}
],
temperature=0.1,
)
print(response.choices[0].message.content)
For local images, you should first load the image,
followed by encoding it in the base64
format.
from openai import OpenAI
import cv2
# Settings
api_key = 'your_api_key_here'
image_file_path = 'path_to_image'
# Create interface to model
client = OpenAI(
base_url = 'https://scrp-chat.econ.cuhk.edu.hk/api',
api_key = api_key,
)
# Load the image
img = cv2.imread(image_file_path)
# Convert the NumPy array to bytes
_, buffer = cv2.imencode(".png", img)
# Convert the bytes to base64 encoding
base64_bytes = base64.b64encode(buffer).decode("utf-8")
# Send prompt and image to model
response = client.chat.completions.create(
model="vision",
messages=[
{"role": "user", "content":[
{"type":"text","text":"Describe this image:"},
{"type":"image_url","image_url":{"url":f"data:image/jpeg;base64,{base64_bytes}"}}
]
}
],
temperature=0.1,
)
print(response.choices[0].message.content)
Tool Calling
Tool calling, also known as function calling, allows a model to call Python functions you provide as it deem necessary. Note that what the model actually do is to return in its response the name of the function and the argument it provides to the function. The actual execution of the function has to be done by the user.
To use tool calling, you need the following components:
- Each tool is a Python function.
- A list with one JSON for each tool. The JSON should have the name, description and parameters of the tool.
- A loop to call the model and execute function calls repeatedly, until the model generates a final response.
Currently only GLM 4.5 and Qwen3 support tool calling on SCRP-Chat.
The following is a simple example:
# Ollama + GPT-OSS tool calling
# Settings
model="Qwen/Qwen3-32B-FP8"
base_url = 'https://scrp-chat.econ.cuhk.edu.hk/api'
api_key = 'your_api_key_here'
# Tool Python function
import random
def get_weather(location: str, unit: str):
return str(round(random.uniform(32, 35),1)) + unit
tool_functions = {"get_weather": get_weather}
# Tool JSON definition. This is how the model knows about the function
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City and state, e.g., 'San Francisco, CA'"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["location", "unit"]
}
}
}]
# Conversation history
messages = [{"role": "user", "content": "What's the weather like in San Francisco and Portland in celsius?"}]
# Imports for OpenAI client and output parsing
from openai import OpenAI
import json
# Client
client = OpenAI(
base_url = base_url,
api_key=api_key
)
tool_calls = [1]
# Call model as long as there is a tool call
while tool_calls != None and tool_calls != []:
# Call model
response = client.chat.completions.create(
model=model,
messages=messages,
tools=tools,
tool_choice="auto",
temperature=0,
)
# Model's reasoning
try:
print(response.choices[0].message.reasoning_content)
except:
try:
print(response.choices[0].message.reasoning)
except:
pass
print('-' * 30)
# Add reasoning to conversation history
response_dump = response.choices[0].message.model_dump()
messages.append(response_dump)
# Process tool calling
tool_calls = response_dump.get("tool_calls", None)
print(tool_calls)
print('-' * 30)
if tool_calls is not None:
for tool_call in tool_calls:
call_id: str = tool_call["id"]
if fn_call := tool_call.get("function"):
fn_name: str = fn_call["name"]
fn_args: dict = json.loads(fn_call["arguments"])
fn_res: str = json.dumps(tool_functions[fn_name](**fn_args))
print("fn_res:",fn_res)
print('-' * 30)
# Append tool output to conversation history
messages.append({
"role": "tool",
"content": fn_res,
"tool_call_id": call_id,
})
print(response.choices[0].message.content)