Deploying Unsloth Fine-Tunes to Cactus for Phones¶

Cactus is an inference engine for mobile devices, macs and ARM chips like Raspberry Pi.
At INT8, Cactus runs Qwen3-0.6B and LFM2-1.2B at 60-70 toks/sec on iPhone 17 Pro, 13-18 toks/sec on budget Pixel 6a.
INT4 quantization provides ~50% memory reduction with minimal quality loss.
Task-Specific INT8 tunes of Gemma3-270m hit 150 toks/sec on iPhone 17 Pro and 23 toks/sec on Raspberry Pi.

Quick Start¶

1. Train (Google Colab / GPU)¶

Use the provided notebook or your own Unsloth training script:

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/gemma-3-4b-it",
    max_seq_length=2048,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    use_gradient_checkpointing="unsloth",
)

# ... train with SFTTrainer ...

# Save adapter
model.save_pretrained("my-lora-adapter")
tokenizer.save_pretrained("my-lora-adapter")

# Push to Hub (optional)
model.push_to_hub("username/my-lora-adapter")

2. Setup Cactus¶

git clone https://github.com/cactus-compute/cactus && cd cactus && source ./setup

3. Convert for Cactus¶

# From local adapter: Use the correct base model!
cactus convert Qwen/Qwen3-0.6B ./my-qwen3-0.6b --lora ./my-lora-adapter 

# From HuggingFace Hub: Use the correct base model!
cactus convert Qwen/Qwen3-0.6B ./my-qwen3-0.6b  --lora username/my-lora-adapter

4. Run¶

Test your model on Mac:

cactus run ./my-qwen3-0.6b

5. Use in iOS/macOS App¶

Build the native library:

cactus build --apple

Build complete!
Total time: 58 seconds
Static libraries:
  Device: /Users/henry/Desktop/cactus/apple/libcactus-device.a
  Simulator: /Users/henry/Desktop/cactus/apple/libcactus-simulator.a
XCFrameworks:
  iOS: /Users/henry/Desktop/cactus/apple/cactus-ios.xcframework
  macOS: /Users/henry/Desktop/cactus/apple/cactus-macos.xcframework
Apple build complete!
(venv) henry@Henrys-MacBook-Air cactus %

Link cactus-ios.xcframework to your Xcode project, then:

import Foundation

// Load model from app bundle
let modelPath = Bundle.main.path(forResource: "my-model", ofType: nil)!
let model = cactus_init(modelPath, nil)

// Run completion
let messages = "[{\"role\":\"user\",\"content\":\"Hello!\"}]"
var response = [CChar](repeating: 0, count: 4096)
cactus_complete(model, messages, &response, response.count, nil, nil, nil, nil)
print(String(cString: response))

// Cleanup
cactus_destroy(model)

You can now build iOS apps using the following code, but to see performance on any device while testing, run cactus tests by plugging any iphone to your Mac then running:

cactus test --<model-path-or-name> --ios

Cactus demo apps will eventually expand to using your custom fine-tunes. Also, cactus run will allow plugging in a phone, such that the interactive session use the phone chips, this way you can test before fully building out your apps.

6. Use in Android App¶