Search Microcontrollers

Thursday, December 7, 2017

Many different cryptocurrencies

Looks like my last post about the basic of cryptocurrencies  sparked some attention, as I am writing 1 (one!) Bitcoin is trading around 14K USD and the % increase since the beginning of the year is astronomical, so that might be a good part of the reason.

However there are many different cryptocurrencies and, while they all share the basic concepts, they are designed in slightly different ways which, by designs, affects their market.

They all have a blockchain as described in my previous article, they have wallets, miners etc.
Some were created with slightly different goals in mind, but the main differentiation is how the mining is done.

To understand how we got here, we need to see what happened with Bitcoin.

At the beginning coding the software for the mining process was pretty much standard, it would involve some (most likely) C routine to run in the most common operating systems (Windows, Linux, MacOs etc).
This routine would use the CPU to run the loop and calculate the hashcodes.
It worked, but then we quickly realized that common CPUs have 4 cores, meaning they can run 4 routines in parallel, and moved to a multi threaded approach, this allowed to run the routine at the same speed, but 4 times at a time.

By design all these algorithms must be relatively simple to compute because we need many iterations to find the solution to the puzzle (finding the correnct nonce), but it should be quick to verify the correctness of such nonce, else it would generate a lot of work on non-miners.

Let me expand on this.
Say that on average you need to run a loop 10.000 times before you can find the solution.
Veryfying such solution would require one single iteration, therefore 1/10.000 time of the mining process.
Typically, when you mine, you are fed with a number of tasks by "the network" (normally a mining pool) and then you dispatch back the solution.
The pool must verify all your solutions, and also all the solutions of the other peers connected to the pool, therefore the verification process must be fast.

Finally this boils down to the fact that we need a computation which is relatively simple, but can be executed many times in order to mine.

This is the best scenario for parallel computing : simple tasks to be executed many times.

Turns out that modern VGA cards have processosrs (GPU) very well equipped for that : their cores are way less sophisticated than those that we have in our main CPU, however they are good enough to run those basic calculations.

Probably the most desired card for mining at the moment is the NVIDIA GTX 1080 TI , and to put things in perspective, this card has 3.584 Cores (@1.48GHz) while a normal PC CPU usually has 4  cores (@2 to 4 GHz).

So, the GPU cores are still slower than the main CPU cores, but you get PLENTY of them, plus it is possible (with dedicated motherboards) to install 8 of them on a single PC, that makes 28K+ cores!

So, GPU mining was immediately a huge leap forward, but it did not disrupt the market too much because GPUs are anyways consumer products. Sure buying 8 1080 Ti is quite an investment, but it is still feasible and anyways you could start with one and then expand (don't do it, not for Bitcoins at least, it would not pay off, keep reading).

But GPUs are not the only computing hardware that deals well with parallel computing, FPGAs (Field Programmable Gate Arrays)  are extremely interesting devices that allow the designer to create computing logic (including simple CPU cores) designed for a specific task.

Now, GPUs were not designed to mine currencies, they were repurposed for that, while FPGAs are a sort of blank canvas that you can arrange pretty much the way you like (I played with them, you can see how I created, ironically, a VGA interface).

The problem with FPGAs is that they tend to steeply increase in price when you need many logic cells (to accomodate many computing units) and performance.

However FPGAs were invented to allow prototyping of solutions, whatever you can do with them it can be transferred to ASICs.
ASICs are Application Specific Integrated Circuit and they are designed to perform a single task, they are not as versatile as CPUs.
While you can purpose an FPGA (technicaly you synthesize a circuit) to do a task and then you can erase it and repurpose for something else, an ASIC is the implementation of one single circuit.

ASICs are not a good solution unless you want to run a mass production, typically they will have a very high production setup cost and a low cost per item afterwards.

When mining and Bitcoin became mainstream, the market was big enough to justify mass productions and that's when ASIC miners changed everything.

Since they are designed for that specific role only, they are "cheap" and power efficient (that's why you should not mine bitcoins with your GPU), allowing massive parallel computing.

Technically this changed the market because these are not typically consumer products, while you probably have at least one GPU in your house, chances are that you don't have an ASIC miner (but you can buy one if you like, even tho it's profitability is going to drop quite quickly. Usually only the latest model is profitable, get ready to change them often).

You can actually buy one, but again, unless your electricity is really cheap, you may waste money.

This means that specialized companies were created, ususally they have access to cheap electricity (i.e. with Solar or geothermal) and always keep updated with the latest asic miner model.

The reward of the mining process is linked to the average time effort of mining a coin, so those companies managed to blow out of the water all the home miners.

At this point the cryptocurrencies, that were designed to be controlled by an extremely wide segment of the population, started to look like common valuable resources : controlled by big investors.

To (try to) avoid that new currencies and markets were created such as Ethereum or Monero.

They have many different peculiarities, but from the mining point of view, they tried to make the calcualtion less effective on ASICs.

How to achieve that?
We saw that ASICs are great for simple multi-threaded calculations (they run in parallel), so the approach was to make it more efficient for single threaded architectures.

One path was to increase the complexity of the hashing algoritm or at least to make it less efficient by design.
This favors cores with a high frequency (such as CPUs), but still running many computation simultaneously would be more efficient.
So the trick was to artificially creaste a slow hashing algorithm that would need to use a lot of memory.
Monero implemented the CyptoNight algorithm (for those who like technical documentation, check here) which requires about 2MB of temnporary buffer (the "scratchpad") to compute tha hash.

Why this is usually not a good fit for parallel comoputing?
For every instance of the process you are running, you need to allocate a separate buffer so, if you have a 10.000 cores machine, you need enough memory to allocate 10.000 buffers.
Memory is still relatively expensive and typically ASICs cannot have all that RAM available per core if we want them to have an acceptable price.

A PC normally has a lot of RAM because we use it for many different tasks, plus, PC RAM modules are widely used so this brings their average cost pretty low.

This worked when a top of the line VGA had 2GB of memory, it was still quite good to mine, but not much better than a CPU.

Just to give you a feeling, this is a benchmark I just ran on my PC mining Monero (XMR):

(my GPU is an old trusty Radeon HD 7870 with 2GB and my CPU is a Intel Core i5 4670K , both quite outdated now. The CPU is mining with 3 cores at about 1/3 of my GPU)

Now the 1080 Ti sports 11GB or ram, that makes it pretty good with memory intensive hashing.

In my example, I would be currently mining about 400 Hashes / second of Monero which would yeald to a wopping 1.21$ / day of reward (you can chek it here, varies a lot with exchange rates, it's pretty high now) which is probably less than the cost of the power consumption of my PC.

Many mining companies currently have mining rigs based on GPUs alongiside with their ASIC miners, so they can mine these currencies too.

Still GPUs are a consumer product so they don't have a huge advantage versus home miners as their hash / watt rate is comparable.

Is it going to last?
Probably not, in fact the cost of the memory is merely a matter of  scale : if those currencies become so attractive to justify bigger investments, then scale production of dedicated hardware with enough memory will become financially viable.

However this memory inefficient algorithm is still a good solution : currencies are usually profitable to mine when they start (profit is balanced by the risk that the currency itself will not be successfull and eventually will fade to oblivion), then they usually pay less over time, when they are established.
The Bitcoin millionaires are those that had the coins early on, they gambled, they won, fair enough.
So, if this beginning fase does not favor big companies, but spreads the mining rewards over a wider population, to my book, it's already a success.

Sunday, December 3, 2017

Making sense of cryptocurrencies

There is a lot of talk about bitcoins and the other crypto-currencies lately.
"BlockChain" is a hype word and a sort of "myth" has been created around it, but in reality it is something quite simple and not really new as a concept.

What is "new" (ahem... since a few years at least) is the fact that it is applied to a digital currency.

I have been discussing this topic with few friends lately, so I thought some basic explanations might be generally useful since my impression is that most of the people thinks it is more complex than it actually is.

If you don't quite understand how it works, I will try to explain it in this post, in a way that should not assume any particular technical skills.

Ingredients : Blocks , hashcodes, wallets, distributed ledger

Let's start from the last one, the Distributed Ledger.

First off: it is a public ledger, imagine a spreadsheet with lines that record money movements from and to all the people (who use the cryptocurrency) in the world.

John gives Martha 0.001 coins  , that's your first ledger entry, everybody can see it.

Now, how do we identify John and Martha? We give them an unique address, in the form of a long string  such as xxo6YxxL19qs5JxxxxMaea4xxxxL55vmNr
That's my bitcoin receiving address (to which I replaced a few letters with "x" so that you don't send me money by mistake :) ).

The ledger is public, but the entities mentioned in it are only identified by an address (which somehow prevents governments to knock on your door asking taxes on them).

Why can't you simply get my bitcoins at my address? Well, turns out that my balance goes into a wallet, to open it you must be able to decrypt it with a  proper key that I (and everybody else having a digital wallet) created.
Provided that balances can be attributed to a given address, only the person being able to open the associated wallet will be able to claim them.
There is a bit more to it, but let's leave it like that.

Since the ledger is distributed, many million copies exist, so the likelyhood of them being lost is pretty low.
Also this prevents people to spend coins they don't have as transactions are verified (if you are sending x coins, your running balance should be at least x) before inserting a block in the ledger.

Blocks :
instead of adding single rows to the ledger, like you would do in a spreadsheet, transactions are grouped in blocks, imagine a list of 20 spreadsheet rows.

The tricky part here is that the blocks are "chained" into the ledger, meaning to insert block n you need to know block n-1.
This is where hashcodes enter in the scene :
A hash is an algorithm that takes a bunch of data and outputs an identificative code for it.
This is a destructive transformation, meaning that you take the input data and you get a deterministic result (if you apply the same algorithm n times to the same input you always get the same result), BUT knowing the result does not allow you to compute the original data.
There are plenty of hashing algorithms, Bitcoin uses SHA-256 which outputs a 256 bit value (a 32 byte binary code).
You may have a block of 1Megabyte or 1Kilobyte worth of data, but their hash will always be 32 bytes.

Blocks, before the list of transactions, contain the hashcode of the previous block, this means that if any block is changed in the ledger, all the subsequent blocks will have a wrong hashcode, this prevents tampering with past transactions (and the resulting balances).

If you have hard time to visualize this process, just think about prime numbers : they are actually a chain since you cannot compute a nth number unless you know the n-1 ones.

At this point everything would be easy: get a new block of transactions, calculate the hashcode of the last block, add the value to the current block, append. Done.

Actually not quite, this is where mining becomes part of the game.
Calculating a SHA-256 hashcode is something that modern computers manage quite easily, but then there would be no effort to run the operation, therefore it would be impossible to attribute a reward to it.

To generate artificially an effort, a condition is added : the resulting hashcode must start with a given number of 0 bits.
This number varies and sets the "difficulty" of the mining for a particular currency. As computing power increases, so does the difficulty to avoid uncontrolled inflaction of rewarded values.

But we said that a given block of data can produce a single hash with a given algorithm, how can we ask it to start with a number of leading zeros?
We can't, this is why we change the data of the block, but we cannot change the transactions, so there is a field called "nonce" in the blocks, consisting of four bytes (32 bits) for us to change.

Altering the nonce field and computing the sha-256 hashcode until we get a "golden" hash (with the required leading zeros) is the mining process, it usually require many iterations.
Since it is impossible to start from a hash and get the input data, we have to run a loop with trial and error, until we find the desired hash.

Once the golden hash has been found, the "proof of work" is simply the nonce used to alter the data block. Everybody can verify the resulting hash conforms to the difficulty requirements.

When you hear about services, hardware etc that can generate a given number of hashes / second (aka hashrate : could be Mega hashes /s, Giga hashes /s, Tera hashes /s ) that's a measure of how many attemps per second that particular hardware or service is able to deliver.

I will not enter into technological or financial considerations here (maybe in a future post, who knows?), but I hope you understand the basic concept behind blockchain is quite simple... and probably that's the beauty of it.

P.S. : No, in general it is not worth mining currencies on your computer as an individual or as a part of a pool. Money can be made, but usually at a larger scale.
There are better options to gain money (like doing some actual work and getting paid for it :) ). Cloud mining can be an option too (major players are Genesis Mining and HashFlare) ,be aware that in general these are high risk investments.
I have a small investment with HashFlare, not really planning to make any money out of it, but that gives me the opportunity to experiment and better understand how the whole cryptocurrency marketplace works.

Saturday, May 27, 2017

STM32 Graphic capabilities

I had a great opportunity yesterday, the kind of opportunities I usually try not to waste: I was offered to learn something!
Never pass an opportunity like that!

But that was just the beginning of my luck.
What if I tell you the topic was incredibly cool, that the instructor definitely knew his stuff, the same stuff I have been struggling around for a while, before getting invited at this free 1 day training?

Does it get any better? Actually it does, the whole thing came with free pizza (and a STM32F746 discovery board too!).

(Sorry, don't have an image of the pizza)

Jokes aside, thanks ST Microelectronics for this opportunity.

As usual, one of my preferred ways of writing down notes about something I learnt or I am learning is to write a blog post, it definitely helps me in the process of digesting the information and, from some feedback I received, it turns sometimes helpful to others too.

Ok, enough blah blah, down to business.
Graphics, oh yeah!

Let's start from the basics.

We live in a world where even toilets might have some GUI (not joking, see yourself..), adding graphics capabilities to MCUs seems just natural.
You can find in the web several projects even with Arduino unos and some SPI screen, some are pretty damn cool too.
However if you want something a bit more performant, you need horse power.
Cortex cores (from M3 upwards) start to get you in the ballpark, but for complex applications, you may need to scale up to M4s and M7s.
A 32 bit risc processor running at 216MHz such as the STM32F746 has the needed computational power to support a modern interactive GUI.

Turns out the horsepower is not all, there is more because we need more.

Ideally you want your gui to
1) run on a screen with decent definition (the bigger the screen, the higher the definition in order to mantain an acceptable DPI resolution)
2) Have a high refresh rate (60Hz would do)
3) Have decent color depth, such as 16 or 24 bits
4) Avoid to drain your CPU as it usually needs to do other suff in the meanwhile, like reacting to your inputs.

If you put all that together you discover that you need a fast processor, with quite a bit of ram and maybe some easy way to expand it with fast external SDRAM, you need maybe some fast storage for your images, and ideally something that deals with the complexity of the screen interface for you.

The thing is, it is not enough to solve ONE problem, you need to solve them all... and that's the kind of thing where MCUs excell, they provide you with a set of integrated peripherals designed specifically to tackle all the needs for an application.
I believe the STM32F7 is one remarkable example of that (Note : I am not sponsored by ST, although, techncially... they gave me a free pizza so... I might be biased :) ).

To explain why I believe the F4 and F7 families have very good features to support a gui, let's analyze what are the things that are needed.

If you are interested in getting the details, ST has great documents such as the Application Note AN4861, which I strongly encourage you to check.

When you interface a screen you can chose different devices, but there are mainly two (more actually, read the AN4861 for that) kind of screens :
- Those that have memory and timing controller onboard
- Those that don't

The TFT screens you see around connected with arduinos or even Cortex M3s in projects are usually of the first kind.
You can find them on ebay for few bucks so, why would you even consider the second kind?
Typically, if you need a higher resolution than 320x240 with a decent color depth and refresh rate, you need to manage yourself the controller and that adds quite a bit of complexity.
You need to manage the timing for the sync signals (see my posts here and here if you understand a bit of Verilog and FPGAs) and a 24 bit parallel interface for the colors, but the worse part is that the two must be precisely synchronized.
Say that you are refreshing a VGA screen at 60Hz, that means that at any specific instant you have to send the exact RGB values to correctly draw the current pixel, over and over.
If you CPU is dealing with that, probably it is not going to be able to do much more.
A cortex M7 @216Mhz might be able to do that, but actually it does not have to because you want to save your CPU clocks for your application, besides the GUI.

ST added to some F4 F7 and L4 devices a peripheral called LCD - TFT Display Controller (LTDC).
It can interface with several kind of displays (note : depending on the device itself, some recent interface technologies might or might not be supported), it is integrated in the MCU and can access the MCUs DMA channels.
Basically you define a buffer in memory (internal or external memory, works just the same) and your code populates this buffer with your graphics, the LTDC reads it authonomously and provides the needed signals for the display, no CPU needed once you started it.
And that's your RAM you are dealing with, the fastest thing you can read and write to from your MCU, no match for a SPI interface.

During a hands-on lab in the training I was debugging the code, the execution was halted at a breakpoint so the CPU was actually waiting, not doing anything. Still the internal LTDC was refreshing the screen, completely authonomously, just like if the screen had its own controller and ram buffer onboard (and it did not, the screen is a RK043FN48H-CT672B, it has an RGB parallel interface, no controller).

So, the LTDC plays an important role in enabling graphics capabilities and it does it allowing quite a bit of flexibility.

A second key component for the solution is RAM : GUIs in high resolution are memory hungry and this requires two things :
1) A decent amount of internal memory and / or an easy way to integrate inexpensive and powerful external memories
2) An efficient high bandwidth DMA

The F7 shines there, I suggest you check the schematics for the F7Discovery board, it is surprisingly (to me) quite readable and extremely informative.
The beauty is that the LTDC can tap with no issues in that DMA, in fact it is a master device in the AHB

 from the AN4861 @ST Microelectronics

If you want to learn more about the F7  architecture, ST has a nice MOOC training about it, you can search it on their website.

The potential draw-back of this kind of solution is that you have a high pin count to connect to the display (24 for the RGB, 3 for the sync signals, something to control the backlight, an I2c eventually for the touch panel...) wich calls for some "interesting" PCB layout and also you should expect to deal with high pin count devices, typically BGAs... maybe not all of you -nor me anyways- will be able to solder those in your kitchen.
Some new technologies such as MIPI-DSI , supported in some STM32 devices, solve that problem, I will not enter in details here.

So, now we can update the screen and we have a fast access to the frame buffer, being internal or external, that's a lot already, but it's not all.

Chrom-ART, aka DMA2D.
This is another important component for graphics of the STM32 architecture, it actually helps in populating your frame buffer.
Imagine your gui has a background image and some buttons with icons.
What you do is you get the background image maybe from a QSPI ram, copy it on the frame buffer, then you draw the buttons, back in the QSPI to fetch tthe icons, and finally copy them on the framebuffer as well.
It works, but requires wuite some effort to the CPU to handle all those memory transfers... unless you have a DMA2D which gladly takes care of that for you.
It's pretty cool, read more about it in the AN4943 application note document :)

Finally, all this is indeed amazing, but how does it come all together when you are writing an application?
The keyword here is : Software libraries.
By any means, you can fire up you STM32CubeMX, activate LTDC, DMA2D, QSPI and  whatever you need, then use HAL drivers to do your magic.
I tried and failed, maybe now I might have a better chance at it as back then I had very little understanding about what I just wrote in this post.
Is there another way?
Yes, as I said, software libraris.
We played with 3 of them during the training, all 3 looked pretty good to me even if you may want to use them for sligthly different purposes.
We tried Embedded Wizard , touchGFX and STemWin (Segger).
Those libraris smoothly and seamlessly integrate with the STM32 hardware, you don't even need to care about LTDC, DMA2D etc... they take care for you, and they do way more.
They have a PC based design where you build graphically your gui, including interaction with the touch screen, and then c code is generated for your device.
They work in slightly different ways, EW and tGFX have a well integrated environment, they deal with most of the tasks for you, while emWin requires a bit more coding.
I personally prefer the emWin approach because I feel it gives me better control over the code, but you pay that with more effort in most cases.
Also the PC tools are a bit less polished, but again, that's not a main concern for me.
One good thing about STemWin is that it comes for free with STM32 devices since ST made an agreement with Segger, customized/optimized the code for its devices and provided licenses for free to its customers.
If you have a medium to big sized project, you probably are not going to decide based on library license costs anyways.
My impression (but I still need to play more with those libraries) is that EW and tGFX may provide a faster time to market option and ensure good performances.
With STemWin I think you can achieve good performances, but it is up to you to optimize the process.

To wrap it up, ST seems to be quite committed in supporting graphics capabilities by :
- Providing fast MCUs (the STM32H7 will run at 400MHz!)
- Providing peripherals to remove load from the MCU and to ease the integration with memory and displays
- Working with partners to boost the software ecosystem
- Supporting customers with good documentation, examples and training

I may write more on this topic once I played a bit more with the libraries, maybe with some code examples.

Friday, February 10, 2017

STM32 Programming Ecosystem

Not long ago I started playing with STM32CubeMX and Eclipse to do some experiments with the STM32 ARM Cortex M3 processors.

Setting up the toolchain, the IDE etc was a bit complex, so I decided to create a youtube video about it, thinking it might be useful for others going through the same thing.

The reason why I did things that way was that with my Eclipse/ARM setup I was planning to use also other (non ST) devices, so it made sense not to use the ST specific version of the tools, which was also bound to the Mars eclipse version while I normally use Neon now.

I was wrong.

I mean yes, the intent made sense, but honestly all the additional hassle to avoid installing a new Eclipse instance was not worth it.

A couple of days ago I was lucky enough to participate to an extremely interesting Workshop at the ST Headquarters in Geneva (Switzerland).
They explained how to setup the tools,provided a few tips on how to best use them and provided extremely valuable information.
The workshop was engaging, well paced and indeed informative, kudos to ST for it ad thanks again for the invitation!

The workshop will be held in various cities in the next days (at the time I am writing), I strongly encourage you to participate if you are interested (it  is free).
This is the link for Europe, you might need to search around their website if you are interested in other regions, there might be something available, not sure

Now I need to capture in a new video the “standard/correct way” of doing things, I do it mainly because it is a sort of collection of minutes for myself, but then again, others might benefit from it.

ST uses a proprietary very low cost interface to allow you program and debug its chips, this interface is called ST-Link, which is basically an alternative to a standard JTAG (I normally use Jlink from Segger).
All the official boards include this interface and this allows you to plug in an usb cable and do all the programming/debugging thanks to a Windows driver.
No need for additional hardware.
However, should you have a non official ST board -that has no usb debugging- with one of their STM32 chips on it, chances are that it exposes the pins needed for the ST-Link interface, which you can grab for few bucks (less that 3$ shipped on ebay).

While a full blown JTAG device such as the Jlink might provide some more functionality/speed I have to admit that the cheap ST-Link will probably get the job done for everybody.

There is an (optional) utility you can use to upload the binary file on the STM32 flash, using ST-Link, this is called ST-Link utility.
I am saying “optional” because usually your programming IDE will be able to do that too, interfacing directly with the ST-Link driver.

When it comes to the IDE,while there are many different valid options, ST proposes a free solution based on Eclipse (Mars 2 at the moment).

If you followed my previous video, you saw that you need to install three main components with you IDE :
1) The IDE / Code Editor itself
2) The ARM toolchain (compiler, builder, linker, make…)
3) The Debugger interface

The good news part is that if you choose to go with the ST standard IDE (System Workbench – SW4) this is all taken care of, since ST packaged an eclipse environment that contains all the needed components.
I strongly recommend this approach, makes things WAY easier.
System Workbench comes with the Ac6 STM32 MCU GCC toolchain.

I like Eclipse, I admit it might be a bit “scary” at the beginning, but it is well worth spending a little bit of time to learn it since it can be used in so many different solutions (Coding any language / platform, ETL, Data Mining…).

With System Workbench (SW4) you can create your projects, but creating a Cortex M Project requires a few steps which include adding the relevant libraries / header files for your specific devices (CMSIS and additional stuff).
Like most IDEs, SW4 takes care of that, it will simply ask you which device or board you are targeting.
But it does more than that, it will automatically allow you to chose which additional libraries to include or even which middleware (such as freertos).

… but you will probably not even use those features.
Why? Don’t get me wrong, they provide tremendous help, but the reason why you may not want to use them is that you can do the same in an even better and easier way!

Imagine you were able to add all the libraries, middlewares, set up the correct stack and heap map etc, what would be your next step in the project?
These MCUs have an incredible number of peripherals, you will probably use a few of those in your project, so the first step is usually to set up the clock(s) and then enable and configure the multiplexer for the different pins and the peripherals you need to use.
While it obviously depends on the complexity of your project, this usually requires quite a bit of work, what if you could skip most of that?

Actually you can, let me introduce you to STM32CubeMX.
You are not obliged to use it, but I cannot imagine a reason why you would not.
For starters:It’s free and it nicely exchanges information with your IDE (SW4 is obviously supported, but so are Kyle and IAR).
What the cube does, is to help you set up your project by doing pretty much the same thing I said you would skip in the project creation in the IDE, PLUS it guides you in setting up your peripherals and the clocks.
Once you are finished, it generates a project dor your IDE with all the configuration set correctly for you and with a code skeleton that configures initializes the peripherals you selected.
Usually you decide upfront which peripherals you need to use and how they should be used and add your code later, but CubeMX allows you to change your mind.
In fact when it regenerates the skeleton code, it uses some specific comment tags to preserve code that you eventually added.
You need to be careful then, on where you write your code, in the skeleton files Cube will add lines like these



That means that if you need to add your code, it should be placed between those two lines, there are many different sections, in different places, where you can place your code.
As long as you respect this rule, you can go back to CubeMX, change whatever you need to change there and regenerate the code, the code you added will be kept in the new skeleton.
That DID NOT work in my previous setup (Users Eclipse with ARM GNU Toolchain manually setup, instead of using SW4), but maybe it was just me messing up things.
It definitely works smoothly using SW4.

The final component is STMStudio.
You can use it to debug your applcation, this is nto the only option obviosuly, since the IDE already includes a pretty good debugger, but STMStudio gives you a nice and simple way of monitoring variables (with graphical output eventually, sometimes it is useful).

Indeed there are many ways of customizing this ecosystem, but I found that, particularly if you are not an expert, it really helps in sticking to the standards: SW4, STM32CubeMx, ST-link and STMStudio seem to work very well together.

Here the links to download them :

Happy coding!

The new video is here