Deploying Unsloth Fine-Tunes to Cactus for Phones¶
- Cactus is an inference engine for mobile devices, macs and ARM chips like Raspberry Pi.
- At INT8, Cactus runs
Qwen3-0.6BandLFM2-1.2Bat60-70 toks/secon iPhone 17 Pro,13-18 toks/secon budget Pixel 6a. - INT4 quantization provides ~50% memory reduction with minimal quality loss.
- Task-Specific INT8 tunes of
Gemma3-270mhit150 toks/secon iPhone 17 Pro and23 toks/secon Raspberry Pi.
Quick Start¶
1. Train (Google Colab / GPU)¶
Use the provided notebook or your own Unsloth training script:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/gemma-3-4b-it",
max_seq_length=2048,
load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
lora_alpha=16,
lora_dropout=0,
use_gradient_checkpointing="unsloth",
)
# ... train with SFTTrainer ...
# Save adapter
model.save_pretrained("my-lora-adapter")
tokenizer.save_pretrained("my-lora-adapter")
# Push to Hub (optional)
model.push_to_hub("username/my-lora-adapter")
2. Setup Cactus¶
3. Convert for Cactus¶
# From local adapter: Use the correct base model!
cactus convert Qwen/Qwen3-0.6B ./my-qwen3-0.6b --lora ./my-lora-adapter
# From HuggingFace Hub: Use the correct base model!
cactus convert Qwen/Qwen3-0.6B ./my-qwen3-0.6b --lora username/my-lora-adapter
4. Run¶
Test your model on Mac:
5. Use in iOS/macOS App¶
Build the native library:
Build complete!
Total time: 58 seconds
Static libraries:
Device: /Users/henry/Desktop/cactus/apple/libcactus-device.a
Simulator: /Users/henry/Desktop/cactus/apple/libcactus-simulator.a
XCFrameworks:
iOS: /Users/henry/Desktop/cactus/apple/cactus-ios.xcframework
macOS: /Users/henry/Desktop/cactus/apple/cactus-macos.xcframework
Apple build complete!
(venv) henry@Henrys-MacBook-Air cactus %
Link cactus-ios.xcframework to your Xcode project, then:
import Foundation
// Load model from app bundle
let modelPath = Bundle.main.path(forResource: "my-model", ofType: nil)!
let model = cactus_init(modelPath, nil)
// Run completion
let messages = "[{\"role\":\"user\",\"content\":\"Hello!\"}]"
var response = [CChar](repeating: 0, count: 4096)
cactus_complete(model, messages, &response, response.count, nil, nil, nil, nil)
print(String(cString: response))
// Cleanup
cactus_destroy(model)
You can now build iOS apps using the following code, but to see performance on any device while testing, run cactus tests by plugging any iphone to your Mac then running:
Cactus demo apps will eventually expand to using your custom fine-tunes.
Also, cactus run will allow plugging in a phone,
such that the interactive session use the phone chips,
this way you can test before fully building out your apps.
6. Use in Android App¶
Build the native library:
Build complete!
Shared library location: /Users/henry/Desktop/cactus/android/libcactus.so
Static library location: /Users/henry/Desktop/cactus/android/libcactus.a
Android build complete!
(venv) henry@Henrys-MacBook-Air cactus %
Copy libcactus.so to app/src/main/jniLibs/arm64-v8a/, then:
class CactusWrapper {
init { System.loadLibrary("cactus") }
external fun init(modelPath: String, contextSize: Long, corpusDir: String?): Long
external fun complete(model: Long, messagesJson: String, bufferSize: Int): String
external fun destroy(model: Long)
}
// Usage
val cactus = CactusWrapper()
val model = cactus.init("/data/local/tmp/my-model", 2048, null)
val response = cactus.complete(model, """[{"role":"user","content":"Hello!"}]""", 4096)
cactus.destroy(model)
You can now build ANdroid apps using the following code, but to see performance on any device while testing, run cactus tests by plugging any android phone to your Mac then running:
Cactus demo apps will eventually expand to using your custom fine-tunes.
Also, cactus run will allow plugging in a phone,
such that the interactive session use the phone chips,
this way you can test before fully building out your apps.
Resources¶
- Supported Base Models:
Qwen3, Gemma3, LFM2, SmolLM2 - Full API reference: Cactus Engine
- Learn more and report bugs: Cactus
See Also¶
- Cactus Engine API — Full C API reference for inference, streaming, and tool calling
- Runtime Compatibility — Ensure your weights match your Cactus runtime version
- Python SDK — Use fine-tuned models from Python
- Swift SDK — Deploy fine-tuned models in iOS/macOS apps
- Kotlin/Android SDK — Deploy fine-tuned models in Android apps