- Google Cloud computing is the use of the internet to get resources for developing, deploying, and managing applications on the web.
- As the name suggests, Google Cloud Status means the status of Google Cloud services at a given time. The status is provided in the Google Cloud Service Health Dashboard.
- When a Google Cloud incident occurs, it takes several steps before it is resolved, which we will detail below.
Although Google Cloud is a latecomer in the cloud market, it has found its way to one of the top positions of cloud supremacy, as its competitive features have converted many to the service.
Let’s dive into Google Cloud and Google Cloud Status to understand exactly how these services are used and what they can do for you.
History of Google Cloud
Google Cloud was launched in 2008, two years after Amazon launched its cloud service in 2006. In 2008, the cloud started as an App Engine. The App Engine tool allowed users to run their data on Google’s infrastructure and made it easy to start a new web app. The tool also predicts when the app has a good number of users and traffic.
To get public opinion on the app, they gave the App Engine to 10,000 developers who ran the app on specific parameters. These were 200 million megacycles of CPU per day, 500 MB of storage, and 10 GB of bandwidth daily. This was a great success since, within a short time, there were about 75,000 developers and more than 80,000 on the waitlist, making Google open the service to everyone.
After this trial phase was a success, Google fully embraced App Engine and made it one of the products fully supported by Google. Nevertheless, they let the preview label remain until November 2011. Google continued improving the App Engine, and eventually, Google’s compute cloud was launched in 2012.
What Exactly is Google Cloud Computing?
Google Cloud computing is the use of the internet to get resources for developing, deploying, and managing applications on the web. It can also mean using Google’s physical assets, like computers, to get things done on the internet.
Google Cloud is also known as the Google Cloud Platform.
What is Google Cloud’s Status?
As the name suggests, Google Cloud Status means the status of Google Cloud services at a given time. The status is provided in the Google Cloud Service Health Dashboard. This dashboard was called Google Cloud Status Dashboard but later changed its name in March 2022.
The Google Cloud Status is up when all Google Cloud services are functioning and down in case of an incident. An incident is major if it affects many services, regions, and customers or if it remains unresolved for many hours. Google acts urgently in case of a major incident that puts the Google Cloud Status down to ensure this is resolved.
When an incident occurs, Google informs those affected through the Google Workspace Status Dashboard. On the Google Workspace Status Dashboard, you get status information on services that are part of the Google Workspace. The page also informs you of any recent disruptions or outages.
On this dashboard, a major incident is marked as ×. After resolving the issue, Google gives a report of factors that led to the problem causing the incident and steps they plan to take to prevent recurrences. On the other hand, if the incident was not major but affected only a few customers, Google makes a nonpublic report to the affected customers.
What is the Life Cycle of an Incident?
When a Google Cloud incident occurs, it takes several steps before it is resolved, which we will detail below.
Google is fast at detecting and reacting to incidences. They use internal and black box monitoring to detect the incidences. This monitoring involves testing the externally visible behavior as it is received on the user’s end.
Customers can also report incidences they realize if they have premium, standard, or enhanced support. You will need to create a support case in the Google Cloud Console to report the incident.
The first response after the detection of an incident is communication. The customer care team informs you that Google is aware of the malfunction, and they are working towards rectifying it. This initial response is usually not so detailed. The team first focuses on rectifying the issue before they can dig into the causes. You get details on the causes of failure in later updates.
Google uses different communication channels to report on incidents that occur. If you notice that there may be an issue, the first place to check is the Google Cloud Service Health Dashboard. On this dashboard, incidences are marked as either an outage or a disruption to show their severity. You can also get an incident marked as a temporary notice if it was minor but affected many users.
An outage occurs when a relevant Google Cloud Product reports an issue in the Cloud Service Health Dashboard. You can click on the notice to learn more about the issue’s status. To receive notifications about new incidents, you can subscribe to Google Groups in some Google Cloud products.
You will get a display called Known Issues in the Google Cloud Support Center. Under this, you can get comprehensive views of incidences and incidences that affected a few people that were not shown on the dashboard. Additionally, on the known issues page, you can create a case from posted incidents, talk to support staff, and get regular updates on incidents.
Investigate an Incidence
After an incident reaches the Google Cloud team, it is investigated by software engineers or Site Reliability Engineers. Other engineers may also deal with the issue depending on the situation and product that has a problem.
Fixing the Incident
After identifying the cause, the team deals with the trouble markers. This may be to either reduce or completely do away with whatever is causing the problem. In other cases, they may need to add resources to a service if an overload causes the problem.
In case no issues are found, the team looks for workarounds. These are steps taken to restore normal functioning regardless of the incident.
As the Customer Care team works towards resolving the issue, regular updates provide information on the progress towards fixing it. Moreover, they give more details on the incident, such as affected areas, features affected, percentage of impact, and error messages for people trying to use the affected services. There is also communication after a change of status, for instance, after an incident is fixed.
A postmortem mainly happens after the issue has been resolved. The teams pay close attention to it to get to the root of the incident and steps that Google can take to ensure that this does not happen again. This report is then given to the public.
However, a postmortem is not done after every incident. Some of the incidences that trigger a postmortem are data loss, a resolution time above the threshold, a monitoring failure, on-call engineer intervention, and user-visible downtime or degradation beyond a certain threshold.
An incident report is the last step of the lifecycle after the incident has been resolved and functions are back to normal. The report provided shows the incident’s symptoms, impact, root cause, remedy, and future prevention measures. This is done so that Google can be transparent with its customers and show its commitment to delivering reliable services.
Over the years, Google has managed to earn its customers’ trust by offering reliable services. However, the unexpected happens sometimes, causing the Google Cloud Status to go down.
In case you notice an issue, you can check Google Cloud Service Health Dashboard to see whether Google is working on the issue. Thankfully, they work fast to ensure that the Google Cloud Status is back up and soon you will get a report of the incident.
- iCloud vs. Google One: Which Storage Solution is Better?
- iCloud vs. Google Drive: Pros, Cons, and Features Compared
- The Largest Cloud Services Companies in the World, and What They Do
The image featured at the top of this post is ©DANIEL CONSTANTE/Shutterstock.com.