Skip to content

Deploying Unsloth Fine-Tunes to Cactus for Phones

  • Cactus is an inference engine for mobile devices, macs and ARM chips like Raspberry Pi.
  • At INT8, Cactus runs Qwen3-0.6B and LFM2-1.2B at 60-70 toks/sec on iPhone 17 Pro, 13-18 toks/sec on budget Pixel 6a.
  • INT4 quantization provides ~50% memory reduction with minimal quality loss.
  • Task-Specific INT8 tunes of Gemma3-270m hit 150 toks/sec on iPhone 17 Pro and 23 toks/sec on Raspberry Pi.

Quick Start

1. Train (Google Colab / GPU)

Use the provided notebook or your own Unsloth training script:

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/gemma-3-4b-it",
    max_seq_length=2048,
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    use_gradient_checkpointing="unsloth",
)

# ... train with SFTTrainer ...

# Save adapter
model.save_pretrained("my-lora-adapter")
tokenizer.save_pretrained("my-lora-adapter")

# Push to Hub (optional)
model.push_to_hub("username/my-lora-adapter")

2. Setup Cactus

git clone https://github.com/cactus-compute/cactus && cd cactus && source ./setup
Logo

3. Convert for Cactus

# From local adapter: Use the correct base model!
cactus convert Qwen/Qwen3-0.6B ./my-qwen3-0.6b --lora ./my-lora-adapter 

# From HuggingFace Hub: Use the correct base model!
cactus convert Qwen/Qwen3-0.6B ./my-qwen3-0.6b  --lora username/my-lora-adapter 
Logo

4. Run

Test your model on Mac:

cactus run ./my-qwen3-0.6b
Logo

5. Use in iOS/macOS App

Build the native library:

cactus build --apple
Build complete!
Total time: 58 seconds
Static libraries:
  Device: /Users/henry/Desktop/cactus/apple/libcactus-device.a
  Simulator: /Users/henry/Desktop/cactus/apple/libcactus-simulator.a
XCFrameworks:
  iOS: /Users/henry/Desktop/cactus/apple/cactus-ios.xcframework
  macOS: /Users/henry/Desktop/cactus/apple/cactus-macos.xcframework
Apple build complete!
(venv) henry@Henrys-MacBook-Air cactus % 

Link cactus-ios.xcframework to your Xcode project, then:

import Foundation

// Load model from app bundle
let modelPath = Bundle.main.path(forResource: "my-model", ofType: nil)!
let model = cactus_init(modelPath, nil)

// Run completion
let messages = "[{\"role\":\"user\",\"content\":\"Hello!\"}]"
var response = [CChar](repeating: 0, count: 4096)
cactus_complete(model, messages, &response, response.count, nil, nil, nil, nil)
print(String(cString: response))

// Cleanup
cactus_destroy(model)
Logo

You can now build iOS apps using the following code, but to see performance on any device while testing, run cactus tests by plugging any iphone to your Mac then running:

cactus test --<model-path-or-name> --ios 

Cactus demo apps will eventually expand to using your custom fine-tunes. Also, cactus run will allow plugging in a phone, such that the interactive session use the phone chips, this way you can test before fully building out your apps.

6. Use in Android App

Build the native library:

cactus build --android
Build complete!
Shared library location: /Users/henry/Desktop/cactus/android/libcactus.so
Static library location: /Users/henry/Desktop/cactus/android/libcactus.a
Android build complete!
(venv) henry@Henrys-MacBook-Air cactus % 

Copy libcactus.so to app/src/main/jniLibs/arm64-v8a/, then:

class CactusWrapper {
    init { System.loadLibrary("cactus") }

    external fun init(modelPath: String, contextSize: Long, corpusDir: String?): Long
    external fun complete(model: Long, messagesJson: String, bufferSize: Int): String
    external fun destroy(model: Long)
}

// Usage
val cactus = CactusWrapper()
val model = cactus.init("/data/local/tmp/my-model", 2048, null)
val response = cactus.complete(model, """[{"role":"user","content":"Hello!"}]""", 4096)
cactus.destroy(model)

You can now build ANdroid apps using the following code, but to see performance on any device while testing, run cactus tests by plugging any android phone to your Mac then running:

cactus test --<model-path-or-name> --android 

Cactus demo apps will eventually expand to using your custom fine-tunes. Also, cactus run will allow plugging in a phone, such that the interactive session use the phone chips, this way you can test before fully building out your apps.

Resources

  • Supported Base Models: Qwen3, Gemma3, LFM2, SmolLM2
  • Full API reference: Cactus Engine
  • Learn more and report bugs: Cactus

See Also