Wan 2.2 Famous Landmarks Test - Does AI Know World Sites? 2025 | Apatero Blog - Open Source AI & Programming Tutorials
/ AI Video / How Well Does Wan 2.2 Know Famous Landmarks? A Comprehensive Test
AI Video 9 min read

How Well Does Wan 2.2 Know Famous Landmarks? A Comprehensive Test

Testing Wan 2.2's knowledge of world-famous landmarks. Does it accurately render the Eiffel Tower, Taj Mahal, and other iconic sites?

Wan 2.2 AI generated famous world landmarks comparison

I had a hypothesis. If Wan 2.2 can generate realistic humans and dynamic scenes, surely it knows what the Eiffel Tower looks like. Right? So I spent a weekend systematically testing famous landmarks from around the world. The results were fascinating, sometimes impressive, and occasionally hilarious.

Quick Answer: Wan 2.2 handles globally iconic landmarks well (Eiffel Tower, Great Wall, Statue of Liberty) but struggles with less famous sites and often gets architectural details wrong. It knows the vibe but not always the specifics.

Key Takeaways:
  • Top-tier landmarks (Eiffel Tower, Taj Mahal) render accurately 80%+ of the time
  • Second-tier landmarks are recognizable but often have detail errors
  • Lesser-known sites are frequently invented or blended with other architecture
  • Adding location context improves accuracy significantly
  • Motion quality remains excellent regardless of landmark accuracy

The Test Methodology

I approached this systematically. No cherry-picking best results. For each landmark, I generated:

  • 10 text-to-video outputs with the same prompt
  • 5 image-to-video outputs with reference images
  • Various prompt formulations (simple vs detailed)

I then rated outputs on:

  • Accuracy: Does it look like the real landmark?
  • Quality: Is the video technically good?
  • Consistency: Does it stay accurate throughout?

Let's explore the results.

Tier 1: Globally Iconic Landmarks

These are the sites everyone knows. The ones on postcards, movies, travel ads.

Eiffel Tower (Paris, France)

AI generated Eiffel Tower at sunset Wan 2.2 excels at the Eiffel Tower, capturing its distinctive lattice structure accurately

Accuracy: 9/10 Quality: 9/10 Consistency: 8/10

Wan 2.2 knows the Eiffel Tower. Lattice structure is correct, proportions are right, the distinctive shape is unmistakable. Minor issues included occasionally wrong leg positions and sometimes the top antenna was missing or oddly shaped.

Best prompt that worked:

Cinematic shot of the Eiffel Tower at sunset, Paris cityscape in background,
warm golden lighting, camera slowly panning upward

Taj Mahal (Agra, India)

AI generated Taj Mahal with reflecting pool The Taj Mahal's symmetry and white marble dome render beautifully in Wan 2.2

Accuracy: 9/10 Quality: 9/10 Consistency: 9/10

Surprisingly excellent. The white marble dome, the four minarets, the reflecting pool. Wan 2.2 captured the symmetry beautifully. The ornamental details weren't always right, but the overall impression was authentic.

Great Wall of China

Accuracy: 8/10 Quality: 9/10 Consistency: 7/10

The wall itself was accurate. The problem was context. Sometimes the wall was in clearly wrong terrain, or watchtowers appeared at odd intervals. But as a video of "the Great Wall," it was convincing.

Statue of Liberty (New York, USA)

Accuracy: 9/10 Quality: 9/10 Consistency: 8/10

The torch, the crown, the robes. All correct. The face was occasionally slightly off, but you'd never mistake it for anything else. Harbor context was excellent.

Pyramids of Giza (Egypt)

Accuracy: 8/10 Quality: 9/10 Consistency: 8/10

The pyramids themselves were fine. The Sphinx occasionally appeared when not prompted. Desert context was appropriate. Main issue: sometimes the pyramids were the wrong relative sizes.

Tier 2: Well-Known but Less Iconic

Colosseum (Rome, Italy)

Accuracy: 7/10 Quality: 8/10 Consistency: 6/10

This is where things got interesting. Wan 2.2 knows it's an ancient Roman arena, oval shaped, with arches. But the specific Colosseum details varied. Sometimes it looked more like a generic Roman amphitheater. The interior was rarely accurate.

Big Ben (London, UK)

Accuracy: 8/10 Quality: 9/10 Consistency: 7/10

The clock tower was generally correct. Issues arose with the clock faces themselves, which were sometimes blank or showed wrong times. The Gothic Revival style was captured well.

Sydney Opera House (Australia)

Accuracy: 7/10 Quality: 8/10 Consistency: 6/10

Free ComfyUI Workflows

Find free, open-source ComfyUI workflows for techniques in this article. Open source is strong.

100% Free MIT License Production Ready Star & Try Workflows

The distinctive shell roofs were there, but the exact configuration was often wrong. Sometimes extra shells appeared, or they were positioned incorrectly. Harbor context helped a lot.

Burj Khalifa (Dubai, UAE)

Accuracy: 7/10 Quality: 9/10 Consistency: 8/10

Wan knew it was a very tall, tapered skyscraper. But the specific silhouette and tier structure wasn't always right. Sometimes it looked more like a generic supertall tower.

Christ the Redeemer (Rio de Janeiro, Brazil)

Accuracy: 8/10 Quality: 8/10 Consistency: 7/10

The posed figure with outstretched arms was correct. The face detail and robe folds were sometimes off. Mountain context helped accuracy significantly.

Tier 3: Where Things Get Creative

Sagrada Familia (Barcelona, Spain)

Accuracy: 5/10 Quality: 8/10 Consistency: 5/10

This is where Wan started improvising. It knew "ornate cathedral with tall spires," but Gaudi's distinctive style was rarely captured. Sometimes it looked like a generic Gothic cathedral. The organic, flowing architecture that makes Sagrada Familia unique was usually missing.

Angkor Wat (Cambodia)

Accuracy: 6/10 Quality: 8/10 Consistency: 5/10

Temple complex vibes were there. Jungle setting was appropriate. But the specific Angkor Wat layout and its distinctive five towers were rarely accurate. It felt more like "generic ancient Southeast Asian temple."

Machu Picchu (Peru)

Accuracy: 6/10 Quality: 9/10 Consistency: 6/10

Mountain setting was beautiful. Ancient stone terraces appeared. But the specific Machu Picchu layout that's so recognizable was usually wrong. It captured "Incan mountain ruins" but not the specific site.

Want to skip the complexity? Apatero gives you professional AI results instantly with no technical setup required.

Zero setup Same quality Start in 30 seconds Try Apatero Free
No credit card required

Neuschwanstein Castle (Germany)

Accuracy: 5/10 Quality: 8/10 Consistency: 4/10

This fairy tale castle is visually distinctive. Wan 2.2 generated beautiful fairy tale castles in Bavarian settings. But they weren't specifically Neuschwanstein. Wrong tower configurations, different details.

The Surprise Failures

Mount Rushmore (USA)

Accuracy: 4/10

I expected this to be easy. Four presidents carved into a mountain. Instead, I got generic mountain faces, sometimes the wrong number, sometimes completely wrong people, once even what looked like George Washington three times.

Leaning Tower of Pisa (Italy)

Accuracy: 6/10

It leaned. That's the good news. But the architectural details were often wrong, and sometimes it was leaning the wrong direction. The surrounding baptistery and cathedral were rarely present.

Stonehenge (UK)

Accuracy: 4/10

This should be simple. Big rocks in a circle. Yet Wan 2.2 frequently got the arrangement wrong, added extra stones, or made them the wrong shape. The most common error was making them too uniform and orderly.

Why Does Accuracy Vary?

After analyzing the results, I have some theories:

1. Training Data Distribution

The Eiffel Tower appears in millions of images online. Sagrada Familia appears in far fewer. The model's knowledge reflects what it's seen.

2. Visual Distinctiveness

Landmarks with unique silhouettes (Eiffel Tower, Sydney Opera House) are easier to recognize and generate than those defined by details (Sagrada Familia, Angkor Wat).

Creator Program

Earn Up To $1,250+/Month Creating Content

Join our exclusive creator affiliate program. Get paid per viral video based on performance. Create content in your style with full creative freedom.

$100
300K+ views
$300
1M+ views
$500
5M+ views
Weekly payouts
No upfront costs
Full creative freedom

3. Context Dependency

Some landmarks are strongly associated with specific contexts (Taj Mahal with its reflecting pool). Without that context in the prompt, accuracy drops.

4. Temporal Compression

Video models sometimes drift. A landmark might start accurate and become less so as the video progresses.

How to Improve Landmark Accuracy

Based on my testing, here's what helps:

1. Add Location Context

Bad: "The Colosseum"
Better: "The Colosseum in Rome, Italy, surrounded by ancient Roman ruins"

2. Include Distinctive Features

Bad: "Sydney Opera House"
Better: "Sydney Opera House with white sail-shaped roof shells, Sydney Harbour Bridge visible in background"

3. Reference Time Period

"The Eiffel Tower lit up at night with golden lights, modern Paris traffic below"

4. Use Image-to-Video

Starting with an accurate reference image dramatically improves results. I wrote about image-to-video techniques in my Wan 2.2 guide.

5. Specify Camera Angles

Certain angles are more in training data:

"Wide shot of the Taj Mahal from the front entrance gates"

Practical Implications

For Travel Content Creators

If you're making travel content with Wan 2.2, stick to Tier 1 landmarks for text-to-video. Use reference images for anything else.

For Documentary Work

Don't trust Wan 2.2 for accuracy. Always verify against real references. Consider using it for b-roll only.

For Educational Content

Tier 1 landmarks are safe. Beyond that, the AI might teach wrong information about what places actually look like.

For Creative Projects

Embrace the imperfection. AI-generated landmarks have their own aesthetic. If strict accuracy isn't required, the results are often visually stunning even when architecturally wrong.

The Motion Quality Is Consistent

Here's what's interesting: regardless of landmark accuracy, the motion quality remained excellent. Camera movements were smooth, lighting changes were natural, and videos felt cinematic.

This suggests the architectural knowledge and the motion generation are somewhat separate systems. The model knows how to make beautiful video even when it doesn't quite know what it's videoing.

Frequently Asked Questions

Why does Wan 2.2 know some landmarks better than others?

Training data. More famous landmarks appear more often in training images and videos.

Can I train a LoRA for specific landmarks?

Yes, though it requires substantial reference material. Location-specific LoRAs could significantly improve accuracy.

Does image-to-video always produce accurate results?

Much better, but the model can still drift from the reference. For maximum accuracy, use short clips and multiple generations.

How does this compare to other video models?

Similar patterns. Kling and Runway also know Tier 1 landmarks well and struggle with lesser-known sites.

Will this improve in future versions?

Likely. As training data expands and diversifies, landmark knowledge should improve.

Can I use these videos for commercial projects?

Yes, but verify accuracy for any educational or documentary use. Creative projects have more flexibility.

What about modern architecture?

Results vary. Very recent buildings aren't in training data. Older famous modern buildings like Bilbao Guggenheim had mixed results.

Does the prompt language matter?

Using the local language name sometimes helps. "Torre Eiffel" occasionally gave better results than "Eiffel Tower."

What's the best landmark to test a new model with?

The Eiffel Tower. It's distinctive, widely known, and any model's best effort. If a model can't do the Eiffel Tower, it won't do anything else.

How long should landmark videos be?

Shorter is safer for accuracy. 2-3 second clips maintain consistency better than 10-second videos.

Wrapping Up

Wan 2.2's landmark knowledge is impressive for global icons and spotty for everything else. It's not a virtual tour guide. It's more like a well-traveled friend who remembers the general vibes of places but fuzzy on specifics.

For practical use, understand its limitations. Tier 1 landmarks in text-to-video, reference images for everything else, and always verify when accuracy matters.

The silver lining? Even when the landmarks are wrong, the videos are beautiful. That counts for something.

Ready to Create Your AI Influencer?

Join 115 students mastering ComfyUI and AI influencer marketing in our complete 51-lesson course.

Early-bird pricing ends in:
--
Days
:
--
Hours
:
--
Minutes
:
--
Seconds
Claim Your Spot - $199
Save $200 - Price Increases to $399 Forever