Oracle Enterprise Manager Concepts Guide
CHAPTER 3. Jobs and Events
This chapter describes the following topics:
Job Control System
The Job Control System allows you to schedule and manage operations on remote sites.
1. You submit a job from the Console.
2. The communication daemon sends the job information to the appropriate intelligent agent(s).
3. The agent executes the job on schedule.
4. The agent returns any related job messages back to the daemon for display in the Console
Enterprise Manager includes a variety of pre-defined jobs for you to select from. Some examples of pre-defined jobs are
- starting up or shutting down
You can also submit your own custom jobs to the Job Control System.
Centralized Job Control
The Job Control System is simple to use, because the task of submitting and managing jobs is centralized in the Console. You only need to submit a job once, regardless of the number of nodes on which the job will run or how often it needs to be executed.
To schedule a job, you need not connect to the node on which the job will be run. Instead, you submit the job from the Console and specify the nodes or services on which it should run.
Automating Tasks
The Job Control System allows you to automate repetitive and periodic tasks (such as backup). The Job Control System, communication daemon, and agents work in unison scheduling and execution of the job. If the job has to be executed periodically, the agents automatically reschedule the job without your intervention. Messages about a job's status are reported back to the Console.
Fix-It Jobs
The Job Control System can be used with the Event Management System to automate problem correction. When you register an event to be monitored by Enterprise Manager, you have the option of specifying a fix-it job, which will be executed to correct the problem if the event occurs.
Parallelizing Tasks
Because the intelligent agents are responsible for scheduling and executing jobs, you can use the Job Control System to perform asynchronous tasks on multiple sites without having to maintain connections to all those sites. In addition, jobs can run simultaneously on different nodes in the system.
Job Scripts
Jobs are implemented as Tool Command Language (Tcl) scripts. Tcl is a scripting language that is used to write both job and event scripts. Oracle has also extended Tcl (OraTcl) to include database-specific commands.
OraTcl can be used to
- invoke operating system facilities, such as programs or shell scripts
- execute SQL and PL/SQL scripts
- start up and shut down Oracle databases
- directly access an intelligent agent's cache of MIB variables
- use SNMP to access host, network, and other products' MIBs
- communicate with the intelligent agent
Any Tcl script can be submitted to the Job Control System. Therefore, any operation that can be implemented in OraTcl can be scheduled as a job.
Stored on Agent Nodes
Although you submit a job from the Console, the job scripts themselves reside on the agent nodes. Because the manner in which a job is implemented may depend on the platform, each agent keeps its own set of job scripts.
This allows you to submit a job (such as backing up a database) without worrying about specifics of the platform. For example, you can select a group of databases residing on UNIX and VMS machines, and send one backup job request to back up the databases. The agents on those nodes run backup job scripts that are specific to their platforms.
Composite Jobs
Some DBA jobs involve more than one task. For example, before making schema changes to a database, you may want to back up the database. To accommodate these types of jobs, the Job Control System allows you to combine two or more pre-defined jobs into one job. This is called a composite job. Each of the pre-defined jobs contained in the composite job is called a task.
Composite jobs can contain test conditions based on the success of a task. For example, if a composite job consists of two tasks, starting up a database and then running a SQL script, you can specify that the script be run only if the database was successfully started.
Scalability
The Job Control System allows you to run jobs efficiently on multiple remote nodes. When you submit a job to run on a remote node, all the information needed to run the job is transferred to the agent servicing the node. When the job is executed, it is run by the agent on that node. Therefore, network traffic between the remote node and the Console and daemon is minimized. The only communication between the agents and the Console and daemon are the initial transmission of job information and any subsequent messages about job status changes.
Because jobs are run independently by agents, you can submit any number of jobs on multiple nodes without affecting the Console. For example, you can submit several jobs and then immediately start another task without waiting for the agents to schedule the jobs.
In addition, because there is an intelligent agent residing on each managed node, jobs can be run on multiple nodes simultaneously. For example, you can submit a job to run a report on multiple databases worldwide. The job is scheduled and run independently by the agent that services each database. Therefore, the jobs can be executed by their respective agents at the same time.
Job Queuing
When you submit a job to a one or more destinations, it is possible that any one of those sites may be down. If a site or its agent is down, the communication daemon queues job requests that could not be delivered to the site. Once the site can be contacted, the daemon will submit the queued job to the agent.
Security and Jobs
Jobs are normally run with your preferred credentials. Therefore, jobs cannot be used to perform functions that you could not perform if logged into the machine directly.
Because jobs are categorized by the type of service they act on, the Job Control System knows which credentials to pass to the agent. If the job runs on a node, the Job Control System passes either your preferred credentials for the node or, if none are specified, the username and password you used when you logged into the Console. If the job runs on a service (such as a database), the Job Control System also passes your preferred credentials to the service.
A job can also be run with the agent's credentials. This flexibility allows a site to easily incorporate the Job Control System's authentication methods with existing security policies.
Event Management System
The Event Management System allows you to monitor for specified events throughout the network (such as problems on a node, database, or other service) and optionally to execute a job to fix the problem. The Event Management System, therefore, automates problem detection and correction.
1. From the Console, you register an event set.
2. The communication daemon sends the event information to the appropriate intelligent agent(s).
3. The agent does the monitoring and alerts you if the event occurs.
4. Optionally, you can specify a fix-it job to execute if the event occurs.
Event Sets
Pre-defined event sets come standard with Enterprise Manager. Advanced event sets are included with the Performance Pack. You can also submit their own custom events.
The standard pre-defined events are:
Some of pre-defined event sets that come with the Performance Pack are:
- space and resource management events, such as a disk becoming too full or a tablespace running out of extents
- performance management events, such as CPU load abnormal or a database system statistic too high
Notification
When an event occurs, you can be notified in various ways, such as electronic mail or paging. Also, events are always logged in the repository and can be viewed in the Console.
Event Scripts
As with jobs, events are OraTcl scripts that are stored on the agent node. Refer to the section "Job Scripts" for a description of OraTcl scripts.
Event scripts can save state information. Saving a state between executions of an event script allows the agent to remember if it has detected a certain event already and eliminates redundant event messages to the Console. It also allows event scripts to keep a history of a database and adjust to behavior that is typical.
Note: Unlike job scripts, event scripts are run with the permissions of the agent.
Extensibility
The Event Management System does not require that the intelligent agent be the only mechanism for error detection. The Event Management System can include other tools and applications that detect events independent of the intelligent agents. These tools and applications can be integrated into the Event Management System and communicate directly with the intelligent agents.
For example, a third-party application can detect an event on a node and report that event to the intelligent agent on that node. The agent then sends the message back to the Console as usual.
Fix-it Jobs
When an event set is registered, you have the option of specifying a fix-it job for the event. A fix-it job is simply a job that the agent runs if it detects the event. Events and fix-it jobs used together allow you to automate problem detection and correction.
Scalability
The Event Management System allows one person to monitor a large system. For example, if you are responsible for 100 databases, you cannot connect to each database every day to check on its performance. However, using the Event Management System you can effectively monitor all the databases 24 hours a day, and be alerted if a problem is detected.
The Event Management System also allows you to focus on select systems and events. This control is vital in a large system. Rather than monitor all sites or a large number of sites, you can pinpoint only those services they wish to monitor.
On the other hand, an administrator can monitor a large number of sites, with minimal performance impact on the Console. Because the intelligent agents perform the monitoring independent of the Console, an administrator has the option of monitoring many sites without slowing other tasks.
Minimal Performance Impact
The intelligent agent has been optimized to monitor large numbers of systems and events efficiently. Event tests are generally executed by the agent process directly, and can be run quickly.
The agent has direct access to the database system global area (SGA), so event tests based on SGA information execute quickly. In addition, the agent keeps a cache of database MIB variables, so any event tests using these variables can be run quickly as well.