Build Your Own Cuely-Style AI Assistant on Windows
Cuely is an impressive AI assistant for macOS that lets you activate an AI helper with a simple keyboard shortcut, read your screen content, and provide contextual assistance. But what if you're on Windows? In this guide, I'll show you how I built a similar AI assistant for Windows using Electron, Tesseract.js for OCR, and the GPT-4o API.
What We're Building
Our Windows AI assistant will have these key features:
- Global hotkey activation Press Ctrl+\ anywhere to bring up the assistant
- Screen OCR capability Read and understand text on your screen
- GPT-4o integration Powered by OpenAI's latest model for intelligent responses
- Quick dismissal Press Esc or click outside to close instantly
- Minimal UI Clean interface that doesn't get in the way
- System tray integration Runs quietly in the background
Prerequisites
Before we begin, make sure you have:
- Windows 10 or 11 The operating system we're targeting
- Node.js Download from nodejs.org (LTS version recommended)
- OpenAI API key Sign up at platform.openai.com and create an API key
- Basic JavaScript knowledge Understanding of async/await and promises helps
- Text editor VS Code, Sublime Text, or any editor you prefer
Technology Stack
Here's what powers our AI assistant:
- Electron Framework for building cross-platform desktop apps with web technologies
- Tesseract.js JavaScript OCR library for reading screen text
- OpenAI API GPT-4o model for natural language understanding and generation
- electron-globalShortcut For registering system-wide keyboard shortcuts
- screenshot-desktop Capturing screen content programmatically
Step 1: Project Setup
Create a new directory and initialize the project:
mkdir windows-ai-assistant
cd windows-ai-assistant
npm init -y
Install required dependencies:
npm install electron
npm install tesseract.js
npm install openai
npm install screenshot-desktop
npm install electron-store
Step 2: Basic Electron Structure
Create the main process file (main.js) that handles:
- Creating the application window
- Registering global shortcuts
- Managing system tray icon
- Handling window visibility
The window should be frameless, always on top, and positioned near the cursor when activated.
Step 3: Configure OpenAI API
Store your API key securely using electron-store or environment variables. Never hardcode API keys in your source code!
Create a settings panel where users can input their API key on first run. The key should be encrypted and stored locally.
Step 4: Implement Screen OCR
The OCR functionality involves:
- Capture screenshot Use screenshot-desktop to grab the current screen
- Process with Tesseract Extract text from the image
- Send to GPT-4o Include the extracted text as context
- Display response Show AI's answer in the interface
Tesseract.js can take a few seconds to process images, so include a loading indicator to show progress.
Step 5: Create the User Interface
Build a minimal, clean interface with:
- Input field Where users type their questions
- Screen read button Trigger OCR to read current screen
- Response area Display AI's answers with markdown support
- Settings button Access API key configuration
- Loading animation Show when processing requests
Use CSS to create a modern, semi-transparent window with rounded corners and smooth animations.
Step 6: Hotkey Implementation
Register the global shortcut in the main process:
globalShortcut.register('CommandOrControl+\\', () => {
mainWindow.show();
mainWindow.focus();
});
Also implement:
- Esc key Hide the window
- Click outside Dismiss the assistant
- Blur event Auto-hide when focus is lost
Step 7: Integrate GPT-4o
Create an API wrapper that:
- Sends user queries to OpenAI's GPT-4o endpoint
- Includes screen text as context when OCR is used
- Handles rate limiting and errors gracefully
- Streams responses for better UX (show text as it generates)
- Maintains conversation history for context
Example API call structure:
const messages = [
{ role: "system", content: "You are a helpful AI assistant." },
{ role: "user", content: userQuestion }
];
if (screenText) {
messages.push({
role: "user",
content: `Screen content: ${screenText}`
});
}
Step 8: Add Advanced Features
Conversation History
Store recent conversations so the AI can reference previous exchanges. Clear history when the window is closed or after a timeout.
Quick Actions
Implement shortcuts for common tasks:
- "Explain this" Quick explanation of selected screen text
- "Summarize" Condense long screen content
- "Translate" Convert text to another language
- "Fix grammar" Correct writing mistakes
Custom Prompts
Allow users to create custom prompt templates they use frequently.
Clipboard Integration
Option to automatically copy AI responses to clipboard for easy pasting.
Step 9: Optimize Performance
OCR Optimization
- Only process the relevant portion of the screen, not the entire display
- Cache Tesseract worker to avoid initialization on every use
- Compress screenshots before processing
API Efficiency
- Implement request queuing to avoid simultaneous API calls
- Add response caching for identical queries
- Use shorter context windows when appropriate
Resource Management
- Hide window instead of closing to keep it ready
- Lazy load heavy components
- Release resources when window is hidden
Step 10: Package and Distribute
Use electron-builder to create an installer:
npm install --save-dev electron-builder
Configure in package.json:
"build": {
"appId": "com.yourname.ai-assistant",
"productName": "AI Assistant",
"win": {
"target": "nsis",
"icon": "assets/icon.ico"
}
}
Build the installer:
npm run build
Usage Tips
Screen Reading Best Practices
- Works best with clear, high-contrast text
- May struggle with handwritten text or unusual fonts
- Give Tesseract 2-3 seconds to process accurately
Effective Prompting
- Be specific about what you want from screen content
- Use context: "Based on the code shown, suggest improvements"
- Combine screen reading with specific questions
API Cost Management
- GPT-4o is powerful but costs more than GPT-3.5
- Set token limits to control costs
- Consider using GPT-3.5 for simple queries
- Monitor usage through OpenAI dashboard
Troubleshooting Common Issues
Hotkey Not Working
- Check if another app is using the same shortcut
- Run Electron app with administrator privileges
- Try a different key combination
OCR Producing Gibberish
- Ensure screenshot is captured correctly
- Check image quality (resolution, clarity)
- Try preprocessing image (contrast adjustment, noise reduction)
Slow Response Times
- Check internet connection
- Reduce context size sent to API
- Use streaming responses to show progress
Privacy and Security Considerations
- Screen content OCR reads everything on screen, including sensitive information
- API transmission Screen text is sent to OpenAI servers
- Local storage Conversation history is stored on your machine
- API key security Keep your key encrypted and never share it
Consider adding:
- Option to exclude certain apps from screen reading
- Clear button to delete conversation history
- Local-only mode without cloud AI (using local models)
- Encryption for stored conversations
Future Enhancements
Ideas to expand functionality:
- Multi-monitor support Choose which screen to read
- Selection mode Draw a box around specific screen area to analyze
- Voice input Speak your questions instead of typing
- Plugin system Extend with custom actions and integrations
- Cross-platform Make it work on macOS and Linux too
- Local LLM support Use Ollama or similar for offline AI
- Image understanding Analyze charts, diagrams, UI elements visually
Learning Resources
To dive deeper:
- Electron documentation electronjs.org/docs
- Tesseract.js guide github.com/naptha/tesseract.js
- OpenAI API reference platform.openai.com/docs
- Electron sample apps github.com/electron/electron-quick-start
Conclusion
Building an AI assistant like Cuely for Windows is surprisingly achievable with modern web technologies. Electron makes desktop development accessible to web developers, Tesseract.js brings powerful OCR capabilities, and GPT-4o provides the intelligence.
This project taught me about:
- Desktop application architecture
- Working with system-level APIs (screenshots, global shortcuts)
- Integrating AI models into real applications
- Optimizing for performance and user experience
- Handling sensitive data responsibly
The best part? Once you build it, you have a personalized AI assistant that works exactly how you want it to. You can customize the UI, add features specific to your workflow, and extend it indefinitely.
Start building, experiment, and create your own AI-powered productivity tool. The code is yoursmake it perfect for your needs.