It's incredible what lengths people go to to avoid memorizing basic ffmpeg usage. It's really not that hard, and the (F.) manual explains the basic concepts fairly well.
Now, granted, ffmpeg's defaults (reencoding by default and only keeping one stream of each type unless otherwise specified) aren't great, which can create some footguns, but as long as you remember to pass `-c copy` by default you should be fine.
Also, hiding those footguns is likely to create more harm than it fixes. Case in point: "ff convert video.mkv to mp4" (an extremely common usecase) maps to `ffmpeg -i video.mkv -y video.mp4` here, which does a full reencode (losing quality and wasting time) for what can usually just be a simple remux.
Similarly, "ffmpeg extract audio from video.mp4" will unconditionally reencode the audio to mp3, again losing quality. The quality settings are also hardcoded and hidden from the user.
I can sympathize with ffmpeg syntax looking complicated at first glance, but the main reason for this is just that multimedia is really complicated and that some of this complexity is necessary in order to not make stupid mistakes that lose quality or waste CPU resources. I truly believe that these ffmpeg wrappers that try to make it seem overly simple (at least when it's this simple, i.e. not even exposing quality settings or differentiating between reencoding and remuxing) are more hurtful than helpful. Not only can they give worse results, but by hiding this complexity from users they also give users the wrong ideas about how multimedia works. "Abstractions" like this are exactly how beliefs like "resolution and quality are the same thing" come to be. I believe the way to go should be educating users about video formats and proper ffmpeg usage (e.g. with good cheat sheets), not by hiding complexity that really should not be hidden.
Edit: Reading through my comment again, I have to apologize for the slightly facetious opening statement, even if I quality it later on. The fact that so many ffmpeg wrappers exists is saying something about its apparent difficulty, but as I argue above, a) there are reasons for this (namely, multimedia itself just being complicated), and b) I believe there are good and bad ways to "fix" this, with oversimplified wrappers being more on the "bad" side.
I'm not saying it couldn't be better (and I even gave examples), my point is that the drawbacks of such a wrapper outweigh the benefits, at least when it's such an oversimplified one. I've said in other replies how I'd be very interested in e.g. an alternative libav* frontend with better defaults and more consistent argument syntax, but I don't think that this invalidates my criticism of the linked project.
Some people just want to use an intuitive tool with better QoL, even if it leads to compromises, to do a job swiftly without going over documentation/learning a ton of new things. Not everything has to be an educational experience. ffmpeg exists in its original form like you prefer, but some folks want to use lossless cut. Nothing wrong with that IMO.
Personally I think it’s great that it’s such a universally useful tool that it has been deployed in so many different variations.
> Some people just want to use a tool to do a job swiftly. Not everything has to be educational.
> some folks want to use lossless cut
In that case I would encourage you to ruminate on what the following in the post you're replying to means and what the implications are:
> "ff convert video.mkv to mp4" (an extremely common usecase) maps to `ffmpeg -i video.mkv -y video.mp4` here, which does a full reencode (losing quality and wasting time) for what can usually just be a simple remux
Depending on the size of the video, the time it would take you to "do the job swiftly" (i.e. not caring about how the tools you are using actually work) might be more than just reading the ffmpeg manual, or at the very least searching for some command examples.
The thing is that when a video is being re-encoded, so long as I'm not trying to play games on my computer at the same time, I'm free to go do something else. It does not command any of my attention while it's working, whereas sitting and reading the man pages commands my attention absolutely.
> > some folks want to use lossless cut
> In that case I would encourage you to ruminate on what the following in the post you're replying to means and what the implications are:
You may have misunderstood the comment: "lossless cut" is the name of an ffmpeg GUI front end. They're not discussing which exact command line gives lossless results.
Yes, I am not opposed to ffmpeg wrappers in and of themselves. Some decent ffmpeg wrappers definitely exist. But I argue in my comment above that this specific tool does not have better QoL - again, since it reencodes unconditionally with quality settings that are usually not configurable.
no the point is that there are some things I've done a hundred times and I never remember it because it's designed in a wildly bad way. ffmpeg, gpg, openssl and git has those things all over the place. Is it -c:v or -v:c? I don't know. used to be -vcodec so it's -v:c now? no it's -c:v I think because they swapped it?
There isn't internal consistency to really hold on to ... it's just a bunch of seemingly independent options.
The biggest problem is open source teams really don't get people on board that focus on customer and product the way commercial software does. This is what we get as a result
Sure, I agree with all of this. Like I said above, the syntax (and, even more, the defaults) isn't great. I'm just arguing that "improving the syntax" should not mean "hiding complexity that should not be hidden", as the linked project does. An alternative ffmpeg frontend (i.e. a new CLI frontend using the libav* libraries like ffmpeg is, not a wrapper for the ffmpeg CLI program) with better syntax and defaults but otherwise similar capabilities would be a very interesting project.
(The answer to your question is that both -vcodec and -c:v are valid, but I imagine that's not the point.)
> The biggest problem is open source teams really don't get people on board that focus on customer and product the way commercial software does.
I believe in this case it may be more of a case of backwards compatibility, with options being added incrementally over time to add what was needed at the moment. Though that's just my guess.
I've been thinking of integrating pngquant as an ffmpeg filter, it would make it possible to generate even better pallettes. That would get ffmpeg on par with gifski.
Does ffmpeg's gif processing support palette-per-frame yet? Last time I compared them (years ago, maybe not long after that blog post), this was a key benefit of gifski allowing it to get better results for the same filesize in many cases (not all, particularly small images, as the total size of the palette information can be significant).
I would definitely use an LLM, to see what the suggested options do and tweak them.
Using a different package name could be helpful. I searched for ezff docs and found a completely different Python library. Also ez-ffmpeg turns up a Rust lib which looks great if calling from Rust.
The one good usecase I've found for AI chatbots, is writing ffmpeg commands. You can just keep chatting with it until you have the command you need. Some of them I save as an executable .command, or in my .txt note.
As pessimistic about it as I am, I do think LLMs have a place in helping people turn their text description into formal directives. (Search terms, command-line, SQL, etc.)
... Provided that the user sees what's being made for them and can confirm it and (hopefully) learn the target "language."
I agree apart from the learning part. The thing is unless you have some very specific needs where you need to use ffmpeg a lot, there’s just no need to learn this stuff. If I have to touch it once a year I have much better things to spend my time learning than ffmpeg command
Agreed. I have a bunch of little command-line apps that I use 0.3 to 3 times a year* and I'm never going to memorize the commands or syntax for those. I'll be happy to remember the names of these tools, so I can actually find them on my own computer.
* - Just a few days ago I used ImageMagick for the first time in at least three years. I downloaded it just to find that I already had it installed.
Do most devs even look at the source code for packages they install? Or the compiled machine code? I think of this as just a higher level of abstraction. Confirm it works and not worry about the details of how it works
For the kinds of things you’d need to reach for an LLM, there’s no way to trust that it actually generated what you actually asked for. You could ask it to write a bunch of tests, but you still need to read the tests.
It isn’t fair to say “since I don’t read the source of the libraries I install that are written by humans, I don’t need to read the output of an llm; it’s a higher level of abstraction” for two reasons:
1. Most Libraries worth using have already been proven by being used in actual projects. If you can see that a project has lots of bug fixes, you know it’s better than raw code. Most bugs don’t show up unless code gets put through its paces.
2. Actual humans have actual problems that they’re willing to solve to a high degree of fidelity. This is essentially saying that humans have both a massive context window and an even more massive ability to prioritize important things that are implicit. LLMs can’t prioritize like humans because they don’t have experiences.
I don't install 3rd party dependencies if I can avoid them. Why? Because although someone could have verified them, there's no guarantee that anybody actually did, and this difference has been exploited by attackers often enough to get its own name, a "supply-chain attack".
With an LLM’s output, it is short enough that I can* put in the effort to make sure it's not obliviously malicious. Then I save the output as an artefact.
* and I do put in this effort, unless I'm deliberately experimenting with vibe coding to see what the SOTA is.
This seems like a glib one liner but I do think it is profoundly insightful as to how some people approach thinking about LLMs.
It is almost like there is hardwiring in our brains that makes us instinctively correlate language generation with intelligence and people cannot separate the two.
It would be like if for the first calculators ever produced instead of responding with 8 to the input 4 + 4 = printed out "Great question! The answer to your question is 7.98" and that resulted in a slew of people proclaiming the arrival of AGI (or, more seriously, the ELIZA Effect is a thing).
But doesnt something like this interface kind of show the inefficiency of this? Like we can all agree ffmpeg is somewhat esoteric and LLMs are probably really great at it, but at the end of the day if you can get 90% of what you need with just some good porcelain, why waste the energy spinning up the GPU?
Because FFmpeg is a swiss army knife with a million blades and I don't think any easy interface is really going to do the job well. It's a great LLM use case.
But you only need to find the correct tool once and mark it in some way. Aka write a wrapper script, jot down some notes. You are acting like you’re forced to use the cli each time.
Because the porcelain is purpose built for a specific use case. If you need something outside of what its author intended, you'll need to get your hands dirty.
And, realistically, compute and power is cheap for getting help with one-off CLI commands.
I can only speak to my experience, but I spent a long time being puzzled by video editor user interfaces, until I ran into ScreenFlow about ten years ago. For whatever reason, the UI clicked, and I've used it ever since. It's a single purchase, not monthly, and relatively affordable. https://www.telestream.net/screenflow/overview.htm
That's beautiful! I see a .claude folder in your code, I am curious if you've "vibecoded" the whole project or just had claude there for some tasks! not that it matters or takes away from your work but just pure curiosity as someone who enjoys betting on the LLM output XD
That's the problem ideally solved by typed data, i.e., some UI where instead of trying to memorize whether it's thumb/s/nails you can read the closed list of alternatives, read contextual help and pick one
Yeah, no, that's a pale imitation that only addresses the one specific example given. But, like, how would you even know what target formats are supported? Break the flow and look it up or simply read the drop-down list?
The free type-any-text interface with poor helpers is the worst in accessibility
Which format is the default if no argument is given?
Or more complicated contextual knowledge - if you cut 1sec of a video file, does fish autocomplete to tell you whether the video is reencoded or cut (otherwise) losslessly
the flow that doesn't require you to open a different tab or cancel a command to `man` your way through dozens of poorly searchable pages of documentation, but allows you to continue translating what you want in your mind into the interface command with delay potentially subsecond interrupts
Is there kind of rewards for speed running typing ffmpeg flags? Like an advent of ffmpeg?
I know what I want to do, I don't know how it's being done, but there's a wealth of information that is very accessible. So I just read it.
It's very easy to type `apropos ffmpeg`. And even if you typed `man ffmpeg`, if you go to the end, you will find related manuals name for more information. And you can always use the pager (`less` in most case) facility for quick search.
I believe that a lot of frustration comes from people unwilling to learn the conceptual basis of the tools they are using.
What's the reward for trivializing real issues and coming up with broken "solutions"?
> It's very easy to type `apropos ffmpeg`
No it's not. First, that's not a Windows command, so right off the bat you've cut off the largest OS.
Second, your command is naively empty and it's telling that you've given it instead of an actual search query because you wouldn't be able to come up with a great one right away that would result in the correct result at the top - while the correct resuls is "hardcoded" in the field type in the UI.
So yeah, go on, find that perfect query and then explain why you think every single user should be able to do the same quickly. Then you can think about how justified your other beliefs are about basic workflow issues you don't understand
i figure out the niche ffmpeg commands various chain filters, etc
then expose them from my python cli tool with words similar to what this gentleman above has done.
If one has fewer such commands its as simple as just bash aliases and just adding it to ~/.bashrc
alias convertmkvtomp4='ffmpeg command'
then just run it anytime with just that alias phrase
i use ffmpeg a lot so i have my own dedicated cli snippet tool for me, to quickly build out complex pipeline in easier language
the best part is i have --dry-run then exposes the flow + explicit commands being used at each step, if i need details on whats happening and verbose output at each step
Very cool idea since ffmpeg is one of those tools that has a few common tasks but most users would need to look up the syntax every time to implement them (or make an alias). In line with the ease of use motivation, you might consider supporting tab completion.
I have a little script that I use on the CLI to do this kind of stuff (calls an LLM to figure out how to do CLI stuff) but you can just as easily now use any of the coding agents.
I agree. Apart from having to use npm (and its package repository being susceptible to security issues), I’d prefer something a lot simpler. Could’ve been a Rust program or a Go program (a single executable) that could be built locally or installed (using several different methods and offering a choice).
interesting approach, i solved similar problem by creating visual tool to generate ffmpeg commands but its not the same(it cant do conversion etc.)
I like that you took no AI approach, i am looking for something like this i.e. understanding intent and generating command without using AI but so far regex based approaches have proved to be inadequate. I also tried indexing keywords and creating index of keywords with similar meaning that improved the situation a bit but without something heavy like bert its always a subpar experience.
Sometimes an idea comes along thats so obvious it makes me angry. I have been struggling with ffmpeg commands for over well a decade. All the time I wasted googling and creating scripts so I wouldn't have to regoogle and this could have existed literally from day one
I was surprised that macOS (QuickTime/Preview, iMovie) can't read .mp4 files. Not sure if it was due to H.265 or the audio codec. I tried using ffmpeg to convert to .mov but that also failed to open, since I guess MOV is just another container format.
MP4 is container, not format, so if you have unsupported format packed into MP4 container it won’t be played. Example is trying to play AV1 video codec on devices with M2 chip or older. It won’t play. But it will play on devices with M3 chip and newer. Easiest solution is to use other player so that you can watch any MP4 file but with software decoding where hardware decoding is not available. Examples of such players are MPV or VLC.
Yes, VLC works fine for playing. The user wanted to edit some mp4 videos with iMovie (vs ffmpeg).
I think it was an M4 Mac. Does iMovie need a codec pack? I know some PC OEMs don't ship an h.265 codec, pointing users to a $0.99 download. Thought Mac would include it, being promoted as for content creators.
It's incredible what lengths people go to to avoid memorizing basic ffmpeg usage. It's really not that hard, and the (F.) manual explains the basic concepts fairly well.
Now, granted, ffmpeg's defaults (reencoding by default and only keeping one stream of each type unless otherwise specified) aren't great, which can create some footguns, but as long as you remember to pass `-c copy` by default you should be fine.
Also, hiding those footguns is likely to create more harm than it fixes. Case in point: "ff convert video.mkv to mp4" (an extremely common usecase) maps to `ffmpeg -i video.mkv -y video.mp4` here, which does a full reencode (losing quality and wasting time) for what can usually just be a simple remux.
Similarly, "ffmpeg extract audio from video.mp4" will unconditionally reencode the audio to mp3, again losing quality. The quality settings are also hardcoded and hidden from the user.
I can sympathize with ffmpeg syntax looking complicated at first glance, but the main reason for this is just that multimedia is really complicated and that some of this complexity is necessary in order to not make stupid mistakes that lose quality or waste CPU resources. I truly believe that these ffmpeg wrappers that try to make it seem overly simple (at least when it's this simple, i.e. not even exposing quality settings or differentiating between reencoding and remuxing) are more hurtful than helpful. Not only can they give worse results, but by hiding this complexity from users they also give users the wrong ideas about how multimedia works. "Abstractions" like this are exactly how beliefs like "resolution and quality are the same thing" come to be. I believe the way to go should be educating users about video formats and proper ffmpeg usage (e.g. with good cheat sheets), not by hiding complexity that really should not be hidden.
Edit: Reading through my comment again, I have to apologize for the slightly facetious opening statement, even if I quality it later on. The fact that so many ffmpeg wrappers exists is saying something about its apparent difficulty, but as I argue above, a) there are reasons for this (namely, multimedia itself just being complicated), and b) I believe there are good and bad ways to "fix" this, with oversimplified wrappers being more on the "bad" side.
I’m going to guess your job does not involve much UX design?
Personally I think it’s great that it’s such a universally useful tool that it has been deployed in so many different variations.
> some folks want to use lossless cut
In that case I would encourage you to ruminate on what the following in the post you're replying to means and what the implications are:
> "ff convert video.mkv to mp4" (an extremely common usecase) maps to `ffmpeg -i video.mkv -y video.mp4` here, which does a full reencode (losing quality and wasting time) for what can usually just be a simple remux
Depending on the size of the video, the time it would take you to "do the job swiftly" (i.e. not caring about how the tools you are using actually work) might be more than just reading the ffmpeg manual, or at the very least searching for some command examples.
You may have misunderstood the comment: "lossless cut" is the name of an ffmpeg GUI front end. They're not discussing which exact command line gives lossless results.
There isn't internal consistency to really hold on to ... it's just a bunch of seemingly independent options.
The biggest problem is open source teams really don't get people on board that focus on customer and product the way commercial software does. This is what we get as a result
Sure, I agree with all of this. Like I said above, the syntax (and, even more, the defaults) isn't great. I'm just arguing that "improving the syntax" should not mean "hiding complexity that should not be hidden", as the linked project does. An alternative ffmpeg frontend (i.e. a new CLI frontend using the libav* libraries like ffmpeg is, not a wrapper for the ffmpeg CLI program) with better syntax and defaults but otherwise similar capabilities would be a very interesting project.
(The answer to your question is that both -vcodec and -c:v are valid, but I imagine that's not the point.)
> The biggest problem is open source teams really don't get people on board that focus on customer and product the way commercial software does.
I believe in this case it may be more of a case of backwards compatibility, with options being added incrementally over time to add what was needed at the moment. Though that's just my guess.
[1] https://blog.pkh.me/p/21-high-quality-gif-with-ffmpeg.html
/s
Using a different package name could be helpful. I searched for ezff docs and found a completely different Python library. Also ez-ffmpeg turns up a Rust lib which looks great if calling from Rust.
... Provided that the user sees what's being made for them and can confirm it and (hopefully) learn the target "language."
Tutor, not a do-for-you assistant.
* - Just a few days ago I used ImageMagick for the first time in at least three years. I downloaded it just to find that I already had it installed.
It isn’t fair to say “since I don’t read the source of the libraries I install that are written by humans, I don’t need to read the output of an llm; it’s a higher level of abstraction” for two reasons:
1. Most Libraries worth using have already been proven by being used in actual projects. If you can see that a project has lots of bug fixes, you know it’s better than raw code. Most bugs don’t show up unless code gets put through its paces.
2. Actual humans have actual problems that they’re willing to solve to a high degree of fidelity. This is essentially saying that humans have both a massive context window and an even more massive ability to prioritize important things that are implicit. LLMs can’t prioritize like humans because they don’t have experiences.
You can’t verify LLM’s output. And thus, any form of trust is faith, not rational logic.
With an LLM’s output, it is short enough that I can* put in the effort to make sure it's not obliviously malicious. Then I save the output as an artefact.
* and I do put in this effort, unless I'm deliberately experimenting with vibe coding to see what the SOTA is.
> Write an ffmpeg command that implements the "bounce" effect: play from 0:00 to 0:03, then backwards from 0:03 to 0:00, then repeat 5 times.
The problem is someone decided that and the contents of Wikipedia was all something needs to be intelligent haha
It is almost like there is hardwiring in our brains that makes us instinctively correlate language generation with intelligence and people cannot separate the two.
It would be like if for the first calculators ever produced instead of responding with 8 to the input 4 + 4 = printed out "Great question! The answer to your question is 7.98" and that resulted in a slew of people proclaiming the arrival of AGI (or, more seriously, the ELIZA Effect is a thing).
And, realistically, compute and power is cheap for getting help with one-off CLI commands.
Could you elaborate on this? I see a lot of AI-use and I'm wondering if this is claude speaking or you
Which format is the default if no argument is given?
Or more complicated contextual knowledge - if you cut 1sec of a video file, does fish autocomplete to tell you whether the video is reencoded or cut (otherwise) losslessly
Also, what does fish complete to on Windows?
I know what I want to do, I don't know how it's being done, but there's a wealth of information that is very accessible. So I just read it.
It's very easy to type `apropos ffmpeg`. And even if you typed `man ffmpeg`, if you go to the end, you will find related manuals name for more information. And you can always use the pager (`less` in most case) facility for quick search.
I believe that a lot of frustration comes from people unwilling to learn the conceptual basis of the tools they are using.
> It's very easy to type `apropos ffmpeg`
No it's not. First, that's not a Windows command, so right off the bat you've cut off the largest OS. Second, your command is naively empty and it's telling that you've given it instead of an actual search query because you wouldn't be able to come up with a great one right away that would result in the correct result at the top - while the correct resuls is "hardcoded" in the field type in the UI. So yeah, go on, find that perfect query and then explain why you think every single user should be able to do the same quickly. Then you can think about how justified your other beliefs are about basic workflow issues you don't understand
Quite telling that these tools need to exist to make ffmpeg actually usable by humans (including very experienced developers).
If one has fewer such commands its as simple as just bash aliases and just adding it to ~/.bashrc
alias convertmkvtomp4='ffmpeg command'
then just run it anytime with just that alias phrase i use ffmpeg a lot so i have my own dedicated cli snippet tool for me, to quickly build out complex pipeline in easier language
the best part is i have --dry-run then exposes the flow + explicit commands being used at each step, if i need details on whats happening and verbose output at each step
But yea ffmpeg is awesome software, one of the great oss projects imo. working with video is hellish and it makes it possible.
Has anyone else been avoiding typing FFmpeg commands by using file:// URLs with yt-dlp
I like that you took no AI approach, i am looking for something like this i.e. understanding intent and generating command without using AI but so far regex based approaches have proved to be inadequate. I also tried indexing keywords and creating index of keywords with similar meaning that improved the situation a bit but without something heavy like bert its always a subpar experience.
Is there an easier way?
I think it was an M4 Mac. Does iMovie need a codec pack? I know some PC OEMs don't ship an h.265 codec, pointing users to a $0.99 download. Thought Mac would include it, being promoted as for content creators.
To re-encode the content into H.264+AAC, rather than simply "muxing" the encoded bitstreams from the MP4 container into a new MOV container.