You point your camera at a plate of pasta. Two seconds later, you've got a calorie count, protein, carbs, and fat. No barcode, no typing, no measuring cups. How is that even possible?
Short answer: vision models got really, really good in the last couple of years, and food is one of the most photographed subjects on the entire internet. So the AI has seen a lot of dinners. Here's what actually happens between the snap and the number.
Step 1: figuring out what's on the plate
When you take a photo, it gets sent to a multimodal AI model, the same family of models that powers stuff like ChatGPT's vision features. These models have been trained on hundreds of millions of food images and can recognize thousands of dishes, from "pad thai" to "sourdough toast with avocado and a fried egg."
The trick is, the model doesn't just classify the whole dish. It breaks it down into components. A burrito bowl becomes rice + black beans + grilled chicken + salsa + guac + cheese, each one estimated separately.
Step 2: how much of each thing is there (this is the hard part)
Recognizing the food is the easy bit. Estimating portion size is where the magic actually lives. The AI uses a few signals:
- Scale references: utensils, plates, hands, packaging
- Density: denser foods (steak, cheese) pack more calories per visible volume than lighter ones (lettuce, broth)
- Depth cues: modern phones capture enough depth info that the model can estimate volume, not just area
For everyday meals, this gets you within roughly ±15% of the actual calorie count, which is honestly about the same accuracy as careful manual logging, because most database entries are themselves estimates.
Step 3: looking up the numbers
Once the AI knows roughly "200g of grilled chicken + 150g of brown rice + 50g of avocado," it pulls per-100g nutrition data for each piece (often from public databases like the USDA's FoodData Central) and adds them up. Boring but important.
Where it still gets tripped up
- Hidden ingredients: the butter or oil that's already cooked into a dish is invisible
- Mixed dishes from above: a casserole or stew is way harder to break apart than a clearly plated meal
- Opaque drinks: smoothies and lattes are basically guesses without a label
- Restaurant variability: the same "chicken burrito bowl" can be 600 or 1,200 calories depending on how heavy the cook is with rice and oil that day
The thing that actually closes the gap
The best AI calorie trackers don't try to be perfect on the first guess. They make it stupid-easy to correct the estimate. In Calchi, you can just chat with Mochi: "actually that bowl was huge, like double the size", and it recalculates on the spot. That feedback loop is what makes AI logging more accurate over time than rigid manual entry, where most people give up and stop logging at all.
Snap a photo of your meal. We'll do the math.
Is it accurate enough for weight loss?
For weight loss and general health goals, yes, easily. The energy gap that drives weight change (usually 300-500 calories a day) is way wider than the AI's typical error margin. If you're a competitive bodybuilder cutting for stage, you'll still want a food scale for key meals. For the other 99% of us, photo-based AI logging is the first thing in years that makes calorie tracking actually sustainable.
The TL;DR
AI calorie tracking works because vision models got incredibly good fast, and food is everywhere on the internet. The result is an app you can actually use every day without feeling like it's a chore, which is the only kind of calorie tracking that ever produces results in the long run.
Mochi