Graphics programming • Re: Performance of rotating video by 90/270 degree on Pi3

I believe VC has a 256-bit width path to memory, so 16-bit loads/stores are optimal.
With a 16-bit load, the ls-byte is loaded to H(0,0) and ms-byte to H(0,16), so you need
to shuffle before writing back.

I think you want the vinterl/vintelh pair for getting the bytes into natural order in VRF after the v16ld.
Then the veven/vodd pair for getting the bytes back into a format suitable for a v16st.

Thanks. I'll see if it's worth getting that done too. This all is quite a deep rabbit hole to get lost into while optimizing stuff. My brain is melting, but the instruction set is quite fun. Right now I'm dynamically using both 16x16 and 16x64 transposers. A transpose now takes between 10 and 14ms, so it's fast enough to do 1080p60.

And in this case you should be using the 0xCxxxxxxx alias to bypass the cache anyway (a cache miss is slower that a bypass cache address).

I've seen this mentioned a few times. Is the physical memory aliased and some addresses don't use the cache? But that's only possible on the Pi3 with it's 1GB memory, correct? From looking the the addresses, right now it seems the physical address I get are in the 0xeXXXXXXX range. What's with those? None of the vc_sm_cma_ioctl_import_dmabuf.cached settings seems to make a difference.

Pi5 has scaled back the VPU significantly. Getting hold of VPU addresses is trickier. vc_sm_cma is scaling back what is available to userspace, particularly as it gets upstreamed.

That's fine for me. It's only the Pi3 I struggle to get the performance back to the old firmware levels. With this VPU transpose, I think I'm basically there and on the Pi4 and 5 the GPU is fast enough to do the rotation for me.

Implementing a V4L2 rotation M2M device using the VPU for any 8bpc symmetrical (ie not YUV422) colour format probably wouldn't be too difficult. As Dom says, the VPU can load the VRF in nice efficient bursts, and save out with rotation with equally efficient bursts. I haven't messed with the firmware for a while, and other priorities may mean it takes a while to get implemented.

Might be interesting to have an official alternative that helps with implementing a 90/270 degree rotation for DRM planes. For now I no longer immediately need that, so I'm good

I also noticed that setting the scaling governor to 'performance' is a lot better than 'ondemand'. I guess the VPU shared a clock somewhere and is faster if the ARM side is clocked higher?

Statistics: Posted by dividuum — Fri Feb 21, 2025 7:27 pm

Graphics programming • Re: Performance of rotating video by 90/270 degree on Pi3

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...